CHAPTER TWELVE

INFERENTIAL TECHNIQUES II

12.1Introduction

This chapter discuss non-parametric test. Non-parametric tests are tests that do not depend on a knowledge of the population distribution or its parameters. It does not test hypothesis based on the parent population. It requires different assumption and sometimes been referred to distribution free test. It also uses data collected from nominal and ordinal measurement. Non-parametric statistics gives considerable view on the general idea of statistical inference because the perspectives are not clouded in complicated Mathematics. They were developed to deal with situation where the population distribution is non-normal or unknown. However, parametric tests are better than non-parametric test, because they are more likely to reject a false hypothesis or unknown. There are many non-parametric tests, but this chapter will discuss few.

12.2Objectives

At the end of this chapter, you should be able to:

define chi-square test and conditions governing its application.

enumerate the step for computing chi-square test.

carryout chi-square analysis.

define Wilcoxon signed rank test

outline the procedure form using Wilcoxon’s signed rank test

demonstrate on computation of Wilcoxon’s signed rank test

12.3Chi-Square (x2)

The chi-square is non-parametric test developed by Karl Pearson in 1900. It is used to determine whether or not a significant difference exists between the observed and expected frequencies. Usually, the frequencies are associated with common categorizations such as, Boy or Girls, true or false, agree or disagree, success or no success etc.

The chi-square is the most popular and reliable statistical test that compare observed frequency distributions with theoretical or expected distributions.

12.3.1Basic Conditions for Chi-square

There are four major conditions to be satisfied for using chi-square analysis and they are:

The sample observation must be independent from each other.

The sample data are randomly selected from the population.

Sample size should be fairly large at least between 25 and 250, with not more than 20% of the expected frequency should be less than 5.

Data should be nominal measurement in nature.

The chi-square (x2), the observed and expected frequencies composite in a table known as contingency table. Contingency table has number of rows and column build-up of observed and expected frequencies. The formula for computing chi-square is:

x2 = \\frac{\\left(\\mathrm{O}_{\\mathrm{f}}-\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}}

Where Of = is the observed frequency

Ef = is the expected frequency.

12.3.2Computation of Chi-square Test for Goodness-of-fit

Chi-square analysis is conducted for Goodness-of-fit when one sample is measured on one sample. Researcher may be interested to find out when the observed frequencies concerned with or fit some assumed theoretical distribution for the population from which the sample data are selected. The observed frequencies are obtained from empirical observations conducted during research studies, while the expected frequencies are collected on the hypothesis.

Now we shall demonstrate how to calculate chi-square test of goodness-of-fit with the following example 12.1

A descriptive survey was conducted to determine opinion on the political party membership. Use a null hypothesis to determine whether there is statistically significant.

Table 12.1: Data Generated Party Membership

SEX

APC

PDP

SDP

YPP

TOTAL

MALE

20

5

10

15

50

FEMALE

5

20

15

10

50

TOTAL

25

25

25

25

100

Test the political membership with 5% level of significance

The researcher adopts the steps for hypothesis testing in solving this problem.

Step 1: Statement of Hypothesis: Null and alternative hypothesis are:

H0: = there is no statistically significant difference to the views of male and female

party memberships.

H1: = There is statistically significant difference to the views of male and female

party memberships.

Step 2: Select the level of significance at α =.05

Step 3: Calculate the test statistic: We select the chi-square test of Goodness-of-fit. Since

our sample size is 100 with each category as 20, 5, 5, 20, 10, 15, 15 and 10.

The expected frequency of each category is calculated as:

Ef = \\frac{25\\times 50}{100} = 12.5

x2 = \\frac{\\sum \\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}}

Where X2 represent Greek letter for chi-square

Of is the observed frequency

Ef is the expected frequency

Table 12.2: Data for calculation of X2 Goodness-of-fit Statistic.

Of

Ef

Of—E

\\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}

\\frac{\\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}}

20

12.5

7.5

56.25

4.5

5

12.5

–7.5

56.25

4.5

5

12.5

–7.5

56.25

4.5

20

12.5

7.5

56.25

4.5

10

12.5

–2.5

6.25

0.5

15

12.5

2.5

6.25

0.5

15

12.5

2.5

6.25

0.5

10

12.5

–2.5

6.25

0.5

TOTAL

\\mathbf{x}_{\\mathbf{c}\\mathbf{a}\\mathbf{l}}^{2}

=

20.0

That is x2 = \\frac{\\sum \\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}} = 20.0

Degree of frequency (df) = (4 - 1) = 3

Step 4: Determine the critical region. By doing that you look at table of critical values of

chi-square with df = 3, \\alpha = 0.05 and \\mathbf{x}_{\\mathbf{v}\\mathbf{a}\\mathbf{l}\\mathbf{u}\\mathbf{e}}^{2} = 7.82

Step 5: Decision, since the \\mathbf{x}_{\\mathbf{c}\\mathbf{a}\\mathbf{l}}^{2} = 20.00 is greater than \\mathbf{x}_{\\mathbf{c}\\mathbf{r}\\mathbf{i}\\mathbf{t}\\mathbf{i}\\mathbf{c}\\mathbf{a}\\mathbf{l}}^{2} 7.82. Hence the null

hypothesis is rejected

Step 6: Conclusion. It is then concluded that there is statistically significant difference to the

views of male and female on party memberships.

Another example can be used to illustrate the same procedure for hypothesis testing.

Example 12.2: A survey was carried out to find out the preference mode of some families

on the choice of careers for their wards in the table below:

Table 12.3: Data for observed & Expected Frequencies on Careers in Some families

Frequency

Carpentry

Cloth Maker

Engineering

Plumbing

Total

Observed

42

14

24

40

120

Expected

30

30

30

30

120

Analyse the data to determine whether performance mode is significant difference or not.

Steps 1: State the statement of hypotheses

H0: There is no statistically significant difference between the expected and observed preference mode of some families on the choice of careers

H1: There is statistically significant difference between the expected and observed preference mode of some families on the choice of careers

Step 2: Computation and applying the chi-square formula in each cell and add up.

Carpentry = \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}} = \\frac{\\left(42_{-}30\\right)^{2}}{30} = 4.8 +

Cloth Maker = \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}} = \\frac{\\left(42_{-}30\\right)^{2}}{30} = 8.5 +

Engineering = \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}} = \\frac{\\left(24_{-}30\\right)^{2}}{30} = 1.2 +

Plumbing = \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}} = \\frac{\\left(24_{-}30\\right)^{2}}{30} = 3.3

\\therefore ∑x2 = 4.8 + 8.5 + 1.2 + 3.3 = 17.8

Step 4: Decision and Conclusion

To decide on the statistical difference of the \\mathrm{x}_{\\mathrm{c}\\mathrm{r}\\mathrm{i}\\mathrm{t}\\mathrm{i}\\mathrm{c}\\mathrm{a}\\mathrm{l}}^{2} value on single variable. The

degree of freedom given as df = k—1, i.e., 4—1 = 3 with α = 0.05, \\mathrm{x}_{\\mathrm{c}\\mathrm{a}\\mathrm{l}\\mathrm{c}\\mathrm{u}\\mathrm{l}\\mathrm{a}\\mathrm{t}\\mathrm{e}}^{2} (17.8)

greater than \\mathrm{x}_{\\mathrm{v}\\mathrm{a}\\mathrm{l}\\mathrm{u}\\mathrm{e}}^{2} (7.82), therefore, H0 is rejected.

Sometimes, researcher is to deal with the test of independence were observed and expected frequencies are presented with number of rows and columns (contingency table). Efforts will be directed in that direction in the next unit.

12.3.3Computation of Chi-square Test Independence

To illustrate on how to calculate of chi-square test of independence using the example is shown below:

The Directorate of University Affiliated Programmes, ABU Zaria released their admission list 2021/22 session according to Local Government Area as provided in the Table below:

Compute the chi-square and test for statistical difference at \\alpha = 0.05

DEGREE COURSES

LOCAL GOV’T

MATHS

BIO

PHY

CHEM

ROW TOTALS

Bida

35

40

45

40

160

Edati

30

20

35

35

120

Lapai

20

35

32

30

107

Lavun

25

25

25

20

95

COLUMN TOTALS

110

120

127

125

482

The Researcher has observed the following frequencies through survey research conducted. From the data given chi-square test is mostly appropriate for the statistical test. The formula for chi-test is as: x2 \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}}

Where Of is the observed frequency in each cell of the contingency table Ef is the expected frequency in each cell of the table, then let us determine the expected frequency by using the formula:

Ef = \\frac{\\left(\\mathrm{T}\\mathrm{o}\\mathrm{t}\\mathrm{a}\\mathrm{l}\\mathrm{r}\\mathrm{o}\\mathrm{w}\\right)\\times (\\mathrm{T}\\mathrm{o}\\mathrm{t}\\mathrm{a}\\mathrm{l}\\mathrm{c}\\mathrm{o}\\mathrm{l}\\mathrm{u}\\mathrm{m}\\mathrm{n})}{\\mathrm{G}\\mathrm{r}\\mathrm{a}\\mathrm{n}\\mathrm{d}\\mathrm{t}\\mathrm{o}\\mathrm{t}\\mathrm{a}\\mathrm{l}} = \\frac{\\mathrm{F}_{\\mathrm{r}\\times }\\mathrm{F}_{\\mathrm{c}}}{\\mathrm{N}}

Where Ef is the expected frequency

Fr is the frequency of the rth row

Fc is the frequency of the cth column

Applying the formula to calculate the expected frequencies:

1st Row

2nd Row

3rd Row

4th Row

For 1st cell,

\\frac{110\\times 160}{482} = 36.52

\\frac{120\\times 160}{482} = 39.83

\\frac{127\\times 160}{482} = 42.16

\\frac{125\\times 160}{482} = 41.49

For 2nd cell,

\\frac{110\\times 120}{482} = 27.39

\\frac{120\\times 120}{482} = 29.88

\\frac{127\\times 120}{482} = 31.62

\\frac{125\\times 120}{482} = 31.12

For 3rd cell,

\\frac{110\\times 107}{482} = 24.42

\\frac{120\\times 107}{482} = 26.64

\\frac{127\\times 107}{482} = 28.19

\\frac{125\\times 107}{482} = 27.75

For 4th cell

\\frac{110\\times 95}{482} = 21.68

\\frac{120\\times 95}{482} = 23.65

\\frac{127\\times 95}{482} = 25.03

\\frac{125\\times 95}{482} = 24.64

Since the frequencies is calculated, entries are made in contingency Table 12.5

Table 12.5: A 4 x 4 contingency Table Admitted students by Local Gov’t and Academic courses

ACADEMIC COURSES

LOCAL GOV’T

MATHS

BIO

PHY

CHEM

ROW TOTALS

Bida

36.52

39.83

42.16

41.49

160

Edati

27.39

29.88

31.62

31.12

120

Lapai

24.42

26.64

38.19

27.75

107

Lavun

21.68

23.65

25.03

24.64

95

COLUMN TOTALS

110

120

127

125

482

Computation of Degree of freedom

The degree of freedom could be determine using the formula below:

df = (r—1) (c—1)

Where df is the degree of freedom

r is the number of rows

c is the number of columns

Using the above example to determine degree of freedom, for a 4 x 4 contingency table as: df = (r - 1) (c - 1) = (4 - 1) (4 - 1) = 3 x 3 = 9

Adopting the procedure of hypothesis testing using the problem above, we shall now do the chi-square test as:

Step 1: Formulation of Hypothesis

H0: Students’ admission into academic course, is not dependent on the local

government areas i.e., (x2 = 0)

H1: Students’ admission into academic course is dependent on the local government

areas i.e., (x2 \\neq 0)

Step 3: Selecting the significance level α =.05

Step 4: Choosing of appropriate statistical test, with the data collected in the form of frequencies and cross-tabulated in a contingency table then chi-square test is

appropriate statistical test for the data in Table 12.4. The expected frequencies are

calculated, and chi-square test of independence is calculate applying the formula:

x2 = \\frac{\\sum \\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}}

Where, Of is observed frequencies

Ef is expected frequencies

Table 12.6 shows the calculation of chi-square test of independence.

Table 12.6: Data Calculating x_{test}^{2} of Independence

Cells

(1)

Of

(2)

Ef

(3)

Of—Ef

(4)

\\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}

(5)

\\frac{\\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}}

1

35

36.52

–1.52

2.31

0.06

2

30

27.37

2.61

6.81

0.25

3

20

24.42

–4.42

19.54

0.80

4

25

21.68

3.32

11.02

0.51

5

40

39.83

0.17

0.03

0.00

6

20

29.88

–9.88

97.61

3.27

7

35

26.64

8.36

69.89

2.62

8

25

23.65

1.35

1.82

0.89

9

45

42.16

2.84

8.07

0.19

10

35

31.62

3.38

11.42

0.00

11

22

28.19

–6.19

38.32

1.36

12

25

25.03

–0.03

0.00

0.00

13

40

41.49

–1.49

2.22

0.05

14

35

31.12

3.88

15.05

0.48

15

30

27.75

2.25

5.06

0.18

16

20

24.64

–4.64

21.53

0.87

∑x2 = 11.08

\\alpha \\mathrm{f} = (r—1) (c—1) = (4—1) (4—1) = 9

Step 4: Determine the critical value, looking for critical value from the table, while df = 9

at α = 0.5. The critical value is 4.48 and calculated x2 (11.08)

Step 5: Decision. Since the calculated x2 is more than critical value. The null hypothesis is therefore rejected.

Step 6: Conclusion. It is concluded that students’ admission into academic course is dependently on the local government area.

Let us summary steps to guide you calculating chi-square Test of independence

Step 1: Write each of the observed frequencies in the appropriate cell

Step 2: Compute the row, column, and grand totals

Step 3: Find Of—Ef and \\left(\\mathrm{O}_{\\mathrm{f}}-\\mathrm{E}_{\\mathrm{f}}\\right)^{2} for each cell.

Step 4: Compute for x2 = \\frac{\\mathrm{O}_{\\mathrm{f}}-\\mathrm{E}_{\\mathrm{f}}}{\\mathrm{E}_{\\mathrm{f}}}

Step 5: Calculate x2 by adding all the \\frac{\\mathrm{O}_{\\mathrm{f}}-\\mathrm{E}_{\\mathrm{f}}}{\\mathrm{E}_{\\mathrm{f}}}

Step 6: Find the df using the formula (r—1) (c—1)

Step 7: Look for significance of the computed value of x2 from the table of critical values of x2. If the calculated value of x2 is greater than equals the critical value, the H0 is rejected in favor of H1

12.4Wilcoxon’s Matched Pairs of Signed-rank Test

This type of test is another non-parametric test. It operates similar to t–test for dependent (correlated) samples. It is known as Wilcoxon’s signed-rank test. It is a test employed for testing null hypothesis and is used to find out the statistical difference between two samples consisting of matched pairs of subjects. Wilcoxon signed test makes use of the difference between pairs of scores.

12.4.1Computation of Wilcoxon’s Signed-rank Test

Wilcoxon’s signed rank test has the following procedure, performing the test.

Determine the signed difference by matching each pair of observation.

Rank all the differences obtained without respect to sign.

Affix to every rank the sign positive or negative by indicating which ranks rise from differences and which ranks rise from negative differences

Find ‘T’; by adding all the ranks which have a positive sign.

Find ‘T’; by adding all the ranks which have a negative sign.

Find T, the smallest of T+ and T- forgetting their signs.

Determine the significance of T in any of two approaches, depending on the size of N.

If N is 15 or less that the significance is determine through comparing its values to the tabulated value in the table of critical values in the Wilcoxon signed rank.

If N is more than 15, the significance of Tis determine through the table of normal curve after computing a z-value using the formula:

Z=\\frac{\\mathbf{T}-\\frac{\\mathbf{n}(\\mathbf{n}+1)}{4}}{\\sqrt[]{\\mathbf{n}\\left(\\mathbf{n}+1\\right)(2\\mathbf{n}+1)}}

T = the term (out of T+ and T-) with the smallest frequency of occurrence, ignoring

their signs

n = the number of non-zero differences.

For N more than 15 could try in next edition or you read from other books.

Example: To test whether the hypothesis from two distributions of scores are identical. Use the matched-pairs data below:

Pairs

X

Y

1

53

69

2

73

60

3

79

64

4

74

66

5

67

46

6

70

65

7

77

59

8

64

68

Solution

Step 1: Find the signed difference between the two scores (see column iv, in Table 12.7).

Step 2: The differences are ranked with respect to sign (see column v in table 12.7)

Step 3: Each rank is affix with the sign (i.e., positive, or negative (see column v in Table 12.7).

Table 12.7: Data Pairs of Scores with Differences and Ranks

i

Pairs

ii

X

iii

Y

Iv

Differences

v

Ranks of difference

Vi

Rank with Less Frequency Sign

1

53

69

\\hbox{--}16

3

3

2

73

60

13

5

3

79

64

15

4

4

74

66

8

6

5

67

46

21

1

6

70

65

5

7

7

77

59

18

2

8

64

68

\\hbox{--}4

8

\\frac{8}{11}

Step 4: Determine T. This is done by adding the rank with less frequent sign. The signed rank is 6, while the negative signed ranks are 2. Hence, the add-up of the two ranks that are negative (8 + 3), since 2 is less than 6. We have T = 8 + 3 = 11.

Step 5: Determine the critical region. Looking at table of critical values in the Wilcoxon signed-rank test when N = 8 less than 15 and T = 11. With \\alpha = 0.05 level of significance is 4.

Step 6: Decision to decide whether to reject or accept the H0. The decision has the

following rules:

If T value is less than or equal to the tabulated value, the H0 is retain.

If T value is greater than the tabulated value, the H0 is rejected.

Results obtained are:

Tvalue = 11

Tabulated value = 4 since the values are obtained then H0 is rejected.

Student Activity

The shows the result of the opinion poll concerning the introduction of History combination in College of Education.

Table: Opinions of Lecturers and General Public on History Combination.

Scales

Lecturers

General Public

Row Total

Strongly Agree

25

10

35

Agree

15

10

25

Disagree

05

10

25

Strongly Disagree

05

15

20

Column Totals

50

50

100

If the stated null hypothesis is, there is no significant difference of opinion between Lecturers and General public concerning the introduction of History combination in College of Education curriculum. Does the data support the Hypothesis?

The Provost claimed that his College employed 50% state indigene males, 25% state indigene females, 15% non-indigene males and 10% non-indigene females. To test this claim a Researcher randomly selected 120 employees and obtained the observed frequencies.

Category

Observed Frequencies

Indigene Males

75

Indigene females

30

Non-Indigene Males

18

Non-Indigene Females

12

Test the Provost’s claim with a 5% level of significance.

The data provided are set of scores: P1: 12, 11, 10, 9, 7, 6 and 5, P2 = 10, 8, 8, 5, 3, 2. Use the Wilcoxon’s signed rank test whether statistical significance.

References

Awotunde, P.O & Ugodulunwa (2002). An Introduction to Statistical Methods in Education. Printed and published in Nigeria by Fab Anieh (Nig). Ltd.

Sambo, A.A (2008). Research Methods in Education Stirling-Horden Publishers (Nig) Ltd