CHAPTER TWELVE
INFERENTIAL TECHNIQUES II
12.1Introduction
This chapter discuss non-parametric test. Non-parametric tests are tests that do not depend on a knowledge of the population distribution or its parameters. It does not test hypothesis based on the parent population. It requires different assumption and sometimes been referred to distribution free test. It also uses data collected from nominal and ordinal measurement. Non-parametric statistics gives considerable view on the general idea of statistical inference because the perspectives are not clouded in complicated Mathematics. They were developed to deal with situation where the population distribution is non-normal or unknown. However, parametric tests are better than non-parametric test, because they are more likely to reject a false hypothesis or unknown. There are many non-parametric tests, but this chapter will discuss few.
12.2Objectives
At the end of this chapter, you should be able to:
define chi-square test and conditions governing its application.
enumerate the step for computing chi-square test.
carryout chi-square analysis.
define Wilcoxon signed rank test
outline the procedure form using Wilcoxon’s signed rank test
demonstrate on computation of Wilcoxon’s signed rank test
12.3Chi-Square (x2)
The chi-square is non-parametric test developed by Karl Pearson in 1900. It is used to determine whether or not a significant difference exists between the observed and expected frequencies. Usually, the frequencies are associated with common categorizations such as, Boy or Girls, true or false, agree or disagree, success or no success etc.
The chi-square is the most popular and reliable statistical test that compare observed frequency distributions with theoretical or expected distributions.
12.3.1Basic Conditions for Chi-square
There are four major conditions to be satisfied for using chi-square analysis and they are:
The sample observation must be independent from each other.
The sample data are randomly selected from the population.
Sample size should be fairly large at least between 25 and 250, with not more than 20% of the expected frequency should be less than 5.
Data should be nominal measurement in nature.
The chi-square (x2), the observed and expected frequencies composite in a table known as contingency table. Contingency table has number of rows and column build-up of observed and expected frequencies. The formula for computing chi-square is:
x2 = \\frac{\\left(\\mathrm{O}_{\\mathrm{f}}-\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}}
Where Of = is the observed frequency
Ef = is the expected frequency.
12.3.2Computation of Chi-square Test for Goodness-of-fit
Chi-square analysis is conducted for Goodness-of-fit when one sample is measured on one sample. Researcher may be interested to find out when the observed frequencies concerned with or fit some assumed theoretical distribution for the population from which the sample data are selected. The observed frequencies are obtained from empirical observations conducted during research studies, while the expected frequencies are collected on the hypothesis.
Now we shall demonstrate how to calculate chi-square test of goodness-of-fit with the following example 12.1
A descriptive survey was conducted to determine opinion on the political party membership. Use a null hypothesis to determine whether there is statistically significant.
Table 12.1: Data Generated Party Membership
SEX
APC
PDP
SDP
YPP
TOTAL
MALE
20
5
10
15
50
FEMALE
5
20
15
10
50
TOTAL
25
25
25
25
100
Test the political membership with 5% level of significance
The researcher adopts the steps for hypothesis testing in solving this problem.
Step 1: Statement of Hypothesis: Null and alternative hypothesis are:
H0: = there is no statistically significant difference to the views of male and female
party memberships.
H1: = There is statistically significant difference to the views of male and female
party memberships.
Step 2: Select the level of significance at α =.05
Step 3: Calculate the test statistic: We select the chi-square test of Goodness-of-fit. Since
our sample size is 100 with each category as 20, 5, 5, 20, 10, 15, 15 and 10.
The expected frequency of each category is calculated as:
Ef = \\frac{25\\times 50}{100} = 12.5
x2 = \\frac{\\sum \\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}}
Where X2 represent Greek letter for chi-square
Of is the observed frequency
Ef is the expected frequency
Table 12.2: Data for calculation of X2 Goodness-of-fit Statistic.
Of
Ef
Of—E
\\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}
\\frac{\\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}}
20
12.5
7.5
56.25
4.5
5
12.5
–7.5
56.25
4.5
5
12.5
–7.5
56.25
4.5
20
12.5
7.5
56.25
4.5
10
12.5
–2.5
6.25
0.5
15
12.5
2.5
6.25
0.5
15
12.5
2.5
6.25
0.5
10
12.5
–2.5
6.25
0.5
TOTAL
\\mathbf{x}_{\\mathbf{c}\\mathbf{a}\\mathbf{l}}^{2}
=
20.0
That is x2 = \\frac{\\sum \\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}} = 20.0
Degree of frequency (df) = (4 - 1) = 3
Step 4: Determine the critical region. By doing that you look at table of critical values of
chi-square with df = 3, \\alpha = 0.05 and \\mathbf{x}_{\\mathbf{v}\\mathbf{a}\\mathbf{l}\\mathbf{u}\\mathbf{e}}^{2} = 7.82
Step 5: Decision, since the \\mathbf{x}_{\\mathbf{c}\\mathbf{a}\\mathbf{l}}^{2} = 20.00 is greater than \\mathbf{x}_{\\mathbf{c}\\mathbf{r}\\mathbf{i}\\mathbf{t}\\mathbf{i}\\mathbf{c}\\mathbf{a}\\mathbf{l}}^{2} 7.82. Hence the null
hypothesis is rejected
Step 6: Conclusion. It is then concluded that there is statistically significant difference to the
views of male and female on party memberships.
Another example can be used to illustrate the same procedure for hypothesis testing.
Example 12.2: A survey was carried out to find out the preference mode of some families
on the choice of careers for their wards in the table below:
Table 12.3: Data for observed & Expected Frequencies on Careers in Some families
Frequency
Carpentry
Cloth Maker
Engineering
Plumbing
Total
Observed
42
14
24
40
120
Expected
30
30
30
30
120
Analyse the data to determine whether performance mode is significant difference or not.
Steps 1: State the statement of hypotheses
H0: There is no statistically significant difference between the expected and observed preference mode of some families on the choice of careers
H1: There is statistically significant difference between the expected and observed preference mode of some families on the choice of careers
Step 2: Computation and applying the chi-square formula in each cell and add up.
Carpentry = \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}} = \\frac{\\left(42_{-}30\\right)^{2}}{30} = 4.8 +
Cloth Maker = \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}} = \\frac{\\left(42_{-}30\\right)^{2}}{30} = 8.5 +
Engineering = \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}} = \\frac{\\left(24_{-}30\\right)^{2}}{30} = 1.2 +
Plumbing = \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}} = \\frac{\\left(24_{-}30\\right)^{2}}{30} = 3.3
\\therefore ∑x2 = 4.8 + 8.5 + 1.2 + 3.3 = 17.8
Step 4: Decision and Conclusion
To decide on the statistical difference of the \\mathrm{x}_{\\mathrm{c}\\mathrm{r}\\mathrm{i}\\mathrm{t}\\mathrm{i}\\mathrm{c}\\mathrm{a}\\mathrm{l}}^{2} value on single variable. The
degree of freedom given as df = k—1, i.e., 4—1 = 3 with α = 0.05, \\mathrm{x}_{\\mathrm{c}\\mathrm{a}\\mathrm{l}\\mathrm{c}\\mathrm{u}\\mathrm{l}\\mathrm{a}\\mathrm{t}\\mathrm{e}}^{2} (17.8)
greater than \\mathrm{x}_{\\mathrm{v}\\mathrm{a}\\mathrm{l}\\mathrm{u}\\mathrm{e}}^{2} (7.82), therefore, H0 is rejected.
Sometimes, researcher is to deal with the test of independence were observed and expected frequencies are presented with number of rows and columns (contingency table). Efforts will be directed in that direction in the next unit.
12.3.3Computation of Chi-square Test Independence
To illustrate on how to calculate of chi-square test of independence using the example is shown below:
The Directorate of University Affiliated Programmes, ABU Zaria released their admission list 2021/22 session according to Local Government Area as provided in the Table below:
Compute the chi-square and test for statistical difference at \\alpha = 0.05
DEGREE COURSES
LOCAL GOV’T
MATHS
BIO
PHY
CHEM
ROW TOTALS
Bida
35
40
45
40
160
Edati
30
20
35
35
120
Lapai
20
35
32
30
107
Lavun
25
25
25
20
95
COLUMN TOTALS
110
120
127
125
482
The Researcher has observed the following frequencies through survey research conducted. From the data given chi-square test is mostly appropriate for the statistical test. The formula for chi-test is as: x2 \\frac{\\sum \\left(\\mathrm{O}_{\\mathrm{f}-}\\mathrm{E}_{\\mathrm{f}}\\right)^{2}}{\\mathrm{E}_{\\mathrm{f}}}
Where Of is the observed frequency in each cell of the contingency table Ef is the expected frequency in each cell of the table, then let us determine the expected frequency by using the formula:
Ef = \\frac{\\left(\\mathrm{T}\\mathrm{o}\\mathrm{t}\\mathrm{a}\\mathrm{l}\\mathrm{r}\\mathrm{o}\\mathrm{w}\\right)\\times (\\mathrm{T}\\mathrm{o}\\mathrm{t}\\mathrm{a}\\mathrm{l}\\mathrm{c}\\mathrm{o}\\mathrm{l}\\mathrm{u}\\mathrm{m}\\mathrm{n})}{\\mathrm{G}\\mathrm{r}\\mathrm{a}\\mathrm{n}\\mathrm{d}\\mathrm{t}\\mathrm{o}\\mathrm{t}\\mathrm{a}\\mathrm{l}} = \\frac{\\mathrm{F}_{\\mathrm{r}\\times }\\mathrm{F}_{\\mathrm{c}}}{\\mathrm{N}}
Where Ef is the expected frequency
Fr is the frequency of the rth row
Fc is the frequency of the cth column
Applying the formula to calculate the expected frequencies:
1st Row
2nd Row
3rd Row
4th Row
For 1st cell,
\\frac{110\\times 160}{482} = 36.52
\\frac{120\\times 160}{482} = 39.83
\\frac{127\\times 160}{482} = 42.16
\\frac{125\\times 160}{482} = 41.49
For 2nd cell,
\\frac{110\\times 120}{482} = 27.39
\\frac{120\\times 120}{482} = 29.88
\\frac{127\\times 120}{482} = 31.62
\\frac{125\\times 120}{482} = 31.12
For 3rd cell,
\\frac{110\\times 107}{482} = 24.42
\\frac{120\\times 107}{482} = 26.64
\\frac{127\\times 107}{482} = 28.19
\\frac{125\\times 107}{482} = 27.75
For 4th cell
\\frac{110\\times 95}{482} = 21.68
\\frac{120\\times 95}{482} = 23.65
\\frac{127\\times 95}{482} = 25.03
\\frac{125\\times 95}{482} = 24.64
Since the frequencies is calculated, entries are made in contingency Table 12.5
Table 12.5: A 4 x 4 contingency Table Admitted students by Local Gov’t and Academic courses
ACADEMIC COURSES
LOCAL GOV’T
MATHS
BIO
PHY
CHEM
ROW TOTALS
Bida
36.52
39.83
42.16
41.49
160
Edati
27.39
29.88
31.62
31.12
120
Lapai
24.42
26.64
38.19
27.75
107
Lavun
21.68
23.65
25.03
24.64
95
COLUMN TOTALS
110
120
127
125
482
Computation of Degree of freedom
The degree of freedom could be determine using the formula below:
df = (r—1) (c—1)
Where df is the degree of freedom
r is the number of rows
c is the number of columns
Using the above example to determine degree of freedom, for a 4 x 4 contingency table as: df = (r - 1) (c - 1) = (4 - 1) (4 - 1) = 3 x 3 = 9
Adopting the procedure of hypothesis testing using the problem above, we shall now do the chi-square test as:
Step 1: Formulation of Hypothesis
H0: Students’ admission into academic course, is not dependent on the local
government areas i.e., (x2 = 0)
H1: Students’ admission into academic course is dependent on the local government
areas i.e., (x2 \\neq 0)
Step 3: Selecting the significance level α =.05
Step 4: Choosing of appropriate statistical test, with the data collected in the form of frequencies and cross-tabulated in a contingency table then chi-square test is
appropriate statistical test for the data in Table 12.4. The expected frequencies are
calculated, and chi-square test of independence is calculate applying the formula:
x2 = \\frac{\\sum \\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}}
Where, Of is observed frequencies
Ef is expected frequencies
Table 12.6 shows the calculation of chi-square test of independence.
Table 12.6: Data Calculating x_{test}^{2} of Independence
Cells
(1)
Of
(2)
Ef
(3)
Of—Ef
(4)
\\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}
(5)
\\frac{\\left(\\mathbf{O}_{\\mathbf{f}-}\\mathbf{E}_{\\mathbf{f}}\\right)^{2}}{\\mathbf{E}_{\\mathbf{f}}}
1
35
36.52
–1.52
2.31
0.06
2
30
27.37
2.61
6.81
0.25
3
20
24.42
–4.42
19.54
0.80
4
25
21.68
3.32
11.02
0.51
5
40
39.83
0.17
0.03
0.00
6
20
29.88
–9.88
97.61
3.27
7
35
26.64
8.36
69.89
2.62
8
25
23.65
1.35
1.82
0.89
9
45
42.16
2.84
8.07
0.19
10
35
31.62
3.38
11.42
0.00
11
22
28.19
–6.19
38.32
1.36
12
25
25.03
–0.03
0.00
0.00
13
40
41.49
–1.49
2.22
0.05
14
35
31.12
3.88
15.05
0.48
15
30
27.75
2.25
5.06
0.18
16
20
24.64
–4.64
21.53
0.87
∑x2 = 11.08
\\alpha \\mathrm{f} = (r—1) (c—1) = (4—1) (4—1) = 9
Step 4: Determine the critical value, looking for critical value from the table, while df = 9
at α = 0.5. The critical value is 4.48 and calculated x2 (11.08)
Step 5: Decision. Since the calculated x2 is more than critical value. The null hypothesis is therefore rejected.
Step 6: Conclusion. It is concluded that students’ admission into academic course is dependently on the local government area.
Let us summary steps to guide you calculating chi-square Test of independence
Step 1: Write each of the observed frequencies in the appropriate cell
Step 2: Compute the row, column, and grand totals
Step 3: Find Of—Ef and \\left(\\mathrm{O}_{\\mathrm{f}}-\\mathrm{E}_{\\mathrm{f}}\\right)^{2} for each cell.
Step 4: Compute for x2 = \\frac{\\mathrm{O}_{\\mathrm{f}}-\\mathrm{E}_{\\mathrm{f}}}{\\mathrm{E}_{\\mathrm{f}}}
Step 5: Calculate x2 by adding all the \\frac{\\mathrm{O}_{\\mathrm{f}}-\\mathrm{E}_{\\mathrm{f}}}{\\mathrm{E}_{\\mathrm{f}}}
Step 6: Find the df using the formula (r—1) (c—1)
Step 7: Look for significance of the computed value of x2 from the table of critical values of x2. If the calculated value of x2 is greater than equals the critical value, the H0 is rejected in favor of H1
12.4Wilcoxon’s Matched Pairs of Signed-rank Test
This type of test is another non-parametric test. It operates similar to t–test for dependent (correlated) samples. It is known as Wilcoxon’s signed-rank test. It is a test employed for testing null hypothesis and is used to find out the statistical difference between two samples consisting of matched pairs of subjects. Wilcoxon signed test makes use of the difference between pairs of scores.
12.4.1Computation of Wilcoxon’s Signed-rank Test
Wilcoxon’s signed rank test has the following procedure, performing the test.
Determine the signed difference by matching each pair of observation.
Rank all the differences obtained without respect to sign.
Affix to every rank the sign positive or negative by indicating which ranks rise from differences and which ranks rise from negative differences
Find ‘T’; by adding all the ranks which have a positive sign.
Find ‘T’; by adding all the ranks which have a negative sign.
Find T, the smallest of T+ and T- forgetting their signs.
Determine the significance of T in any of two approaches, depending on the size of N.
If N is 15 or less that the significance is determine through comparing its values to the tabulated value in the table of critical values in the Wilcoxon signed rank.
If N is more than 15, the significance of Tis determine through the table of normal curve after computing a z-value using the formula:
Z=\\frac{\\mathbf{T}-\\frac{\\mathbf{n}(\\mathbf{n}+1)}{4}}{\\sqrt[]{\\mathbf{n}\\left(\\mathbf{n}+1\\right)(2\\mathbf{n}+1)}}
T = the term (out of T+ and T-) with the smallest frequency of occurrence, ignoring
their signs
n = the number of non-zero differences.
For N more than 15 could try in next edition or you read from other books.
Example: To test whether the hypothesis from two distributions of scores are identical. Use the matched-pairs data below:
Pairs
X
Y
1
53
69
2
73
60
3
79
64
4
74
66
5
67
46
6
70
65
7
77
59
8
64
68
Solution
Step 1: Find the signed difference between the two scores (see column iv, in Table 12.7).
Step 2: The differences are ranked with respect to sign (see column v in table 12.7)
Step 3: Each rank is affix with the sign (i.e., positive, or negative (see column v in Table 12.7).
Table 12.7: Data Pairs of Scores with Differences and Ranks
i
Pairs
ii
X
iii
Y
Iv
Differences
v
Ranks of difference
Vi
Rank with Less Frequency Sign
1
53
69
\\hbox{--}16
3
3
2
73
60
13
5
3
79
64
15
4
4
74
66
8
6
5
67
46
21
1
6
70
65
5
7
7
77
59
18
2
8
64
68
\\hbox{--}4
8
\\frac{8}{11}
Step 4: Determine T. This is done by adding the rank with less frequent sign. The signed rank is 6, while the negative signed ranks are 2. Hence, the add-up of the two ranks that are negative (8 + 3), since 2 is less than 6. We have T = 8 + 3 = 11.
Step 5: Determine the critical region. Looking at table of critical values in the Wilcoxon signed-rank test when N = 8 less than 15 and T = 11. With \\alpha = 0.05 level of significance is 4.
Step 6: Decision to decide whether to reject or accept the H0. The decision has the
following rules:
If T value is less than or equal to the tabulated value, the H0 is retain.
If T value is greater than the tabulated value, the H0 is rejected.
Results obtained are:
Tvalue = 11
Tabulated value = 4 since the values are obtained then H0 is rejected.
Student Activity
The shows the result of the opinion poll concerning the introduction of History combination in College of Education.
Table: Opinions of Lecturers and General Public on History Combination.
Scales
Lecturers
General Public
Row Total
Strongly Agree
25
10
35
Agree
15
10
25
Disagree
05
10
25
Strongly Disagree
05
15
20
Column Totals
50
50
100
If the stated null hypothesis is, there is no significant difference of opinion between Lecturers and General public concerning the introduction of History combination in College of Education curriculum. Does the data support the Hypothesis?
The Provost claimed that his College employed 50% state indigene males, 25% state indigene females, 15% non-indigene males and 10% non-indigene females. To test this claim a Researcher randomly selected 120 employees and obtained the observed frequencies.
Category
Observed Frequencies
Indigene Males
75
Indigene females
30
Non-Indigene Males
18
Non-Indigene Females
12
Test the Provost’s claim with a 5% level of significance.
The data provided are set of scores: P1: 12, 11, 10, 9, 7, 6 and 5, P2 = 10, 8, 8, 5, 3, 2. Use the Wilcoxon’s signed rank test whether statistical significance.
References
Awotunde, P.O & Ugodulunwa (2002). An Introduction to Statistical Methods in Education. Printed and published in Nigeria by Fab Anieh (Nig). Ltd.
Sambo, A.A (2008). Research Methods in Education Stirling-Horden Publishers (Nig) Ltd