CHAPTER ELEVEN
INFERENTIAL STATISTICS I
11.1Introduction
Inferential Statistics describes population depending on the sample’s behavior. It also involves figuring out if the outcomes based on the sample or samples match the outcomes that would have been achieved for the complete population. Inferential statistics are used to predict population features from a randomly chosen sample as well as to estimate population parameter using sample data. They are used to test formulated hypothesis to draw a valid conclusion from research studies. This chapter deals with types of inferential statistics, computation of t-test, Z-test, correlation analysis, ANOVA or F-Ratio.
11.2Objectives
At the end of this chapter, you should be able to:
learn the types of inferential statistics
describe situations where the use of z-test, t-test and f-test are applicable.
compute z-test with relevant examples
compute t-test with relevant examples
compute f-test with relevant examples
use t-test for hypothesis testing for difference between population and sample means
use t-test for hypothesis testing for difference between correlation coefficients.
11.3Types of Inferential Statistics
Inferential statistics are categorized into two: Parameter statistics and non-parameter statistics. The two statistics are useful when testing hypothesis in research. However, parametric statistics is more powerful and generally preferred. By more powerful is meant that it requires certain assumptions which must be considered in order to make valid decision. The followings are three very important assumptions that are made when applying parametric statistics to test a formulated hypothesis:
The variable measured is normally distributed or at least in the form of distribution must be known.
The data collected must be from interval or ratio scale of measurement
The variable must be independently selected without affecting the selection of any other one.
The researcher must note that any one or more of these assumptions discarded, then non-parametric inferential statistical test should be employed. This chapter will only discuss t-test, Z-test, ANOVA.
11.4The Z-Test
The Z-statistic is applied to investigate whether two means are significantly different. It is used for testing hypothesis when the sample size is equal or greater than 30 (≥ 30). When the population parameters \\mu and \\sigma , are well-defined for a population. The Z-test formula is given as:
Z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\mathbf{S}\\mathbf{D}}
Where \\overline{\\mathbf{x}} = Calculate mean score
SDx = Standard error of difference between means
SDx = \\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}
\\therefore Z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}
Where\\overline{\\mathbf{x}}_{1}= Mean of group 1
\\overline{\\mathbf{x}}_{2}= Mean of group 2
SDx = Standard error of difference between means
Now, let us demonstrate the use of Z-test in hypothesis testing with an example given below:
Analyze the given data representing set of scores obtained by five students from Mathematics and Chemistry test.
Table 11.1: Sets of Scores obtained in Mathematics and Chemistry Test
Students
1
2
3
4
5
Maths Scores X1
4
5
6
7
8
Chem Scores X2
4
4
5
3
4
Based on the data provided in Table 11.1, we shall solve this problem by going through the process of hypothesis testing.
Step 1: Statement of Hypotheses
H0: \\mathbf{\\mu }= x
H1: \\mathbf{\\mu }\\neq x
Step 2: Determine the level of significance
Assuming \\alpha =0.05 is selected
Step 3: Calculate the test statistics by applying the formula provided as:
Z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\mathbf{S}\\mathbf{D}}, SDx = \\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}} given Z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}
x1
\\mathbf{x}_{1}-\\overline{\\mathbf{x}}
(\\mathbf{x}_{1}-\\overline{\\mathbf{x}})2
\\mathbf{x}_{2}
\\mathbf{x}_{2}-\\overline{\\mathbf{x}}
(\\mathbf{x}_{2}-\\overline{\\mathbf{x}})2
4
2
4
4
0
0
5
1
1
4
0
0
6
0
0
5
1
1
7
1
1
3
–1
1
8
2
4
4
0
0
\\mathbf{\\Sigma }\\mathbf{x}_{1}=30
\\mathbf{\\Sigma }\\left(\\mathbf{x}_{1}-\\overline{\\mathbf{x}}\\right)2
∑X2 = 20
∑(X2-\\overline{\\mathbf{x}}) = 2
\\overline{\\mathrm{x}}1 = \\frac{30}{5}=6 ∑X2 - \\frac{20}{5}=4
SD = \\sqrt[]{\\frac{10}{5}} = \\sqrt[]{\\frac{2}{5}}
= \\sqrt[]{2} = \\sqrt[]{0.4}
S1 = 1.41 S2 = 0.63
Now substitute in z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}
= \\frac{6\\hbox{--}4}{\\sqrt[]{\\frac{\\left(1.41\\right)^{2}}{5}+\\frac{\\left(0.63\\right)^{2}}{5}}} = \\frac{2}{\\sqrt[]{\\frac{1.99}{5}+\\frac{0.40}{5}}} = \\frac{2}{\\sqrt[]{0.40+0.8}}=\\frac{2}{\\sqrt[]{0.48}}=\\frac{2}{0.69} = 2.90
Step 4: Determine the critical region. At p = 0.05 level of significance, the critical or table value of z = ±1.96 and calculated Z-value is 2.90.
Step 5: Decision. By the available records the calculated Z-value is 2.90 greater than the. Z table value 1.96. Therefore, the null hypothesis is rejected.
Step 6: Conclusion. By the result obtained, it is concluded that there is significant difference between the two mean scores under statistical analysis.
11.5The T-Test
Student t-test was the name given to the t-test statistic. Williams Gosset created it as an inferential statistic in 1908. The t-test statistic offers a number of methods for testing hypotheses; however, we will just cover the following in this unit:
Use the t-test to determine if two independent samples’ mean scores differ significantly.
T-test for a non-independent significant difference between two mean scores samples.
Thus, getting to computations of t-test, there are some conditions to be satisfied before using the t-test. These are as follows:
A comparison of two groups is required.
The sample that is chosen must have a normal distribution.
Population variance is homogeneous.
The samples are independently or at random chosen from the population.
The requirements for the variable values must hold.
Both big and small samples can be utilized for the t-test, although the sample size cannot be fewer than ten.
11.5.1Computation of T-Test for Difference Between Two Independent Samples
When two independent samples’ mean scores are given, it is possible to assess whether there is a significant difference between them as:
t = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}
where \\overline{\\mathbf{x}}_{1} = mean scores of sample group 1
\\overline{\\mathbf{x}}_{2} = Mean scores of samples group 2
\\mathbf{S}_{\\mathbf{x}}^{2} = Variance
n = Sample size.
Let us now, demonstrates the calculation of t-test with data provided in Table 11.2 assumption, that the following conditions to use t-test are satisfied.
The distribution of the value in both samples normal.
Data collected are interval measurement scale.
Sample is randomly selected.
Sample variances are homogeneous.
Table 11.2 A set of pre-service Teachers took post-test for two samples randomly selected
S/N
1
2
3
4
5
6
7
8
9
10
11
12
Group A
04
16
16
15
14
15
10
17
17
18
18
20
Group B
10
12
18
13
6
16
10
15
05
19
09
11
Are the results significant different, or not?
Since the Researcher is interested to determine whether significant difference exists between the mean value of Group A and B. Let us illustrate the computation for t-test or unrelated or independent samples. We shall adopt the steps for hypothesis testing in solving the problem in Table 11.2
Step 1: Statement of Hypotheses
H0: \\mathbf{\\mu }_{\\mathbf{A}}= \\mathbf{\\mu }_{\\mathbf{B}}
H1: \\mathbf{\\mu }_{\\mathbf{A}}\\neq \\mathbf{\\mu }_{\\mathbf{B}}
Step 2: Level of significance is at 0.05
Step 3: Calculate the t-test by applying the formula:
t = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}
Firstly, let us label each group as Group A = X1 and Group B = X2, we then calculate mean scores.
\\mathbf{x}_{1}
\\mathbf{x}_{1}-\\overline{\\mathbf{x}}
(\\mathbf{x}_{1}-\\overline{\\mathbf{x}})2
\\mathbf{x}_{2}
\\mathbf{x}_{2}-\\overline{\\mathbf{x}}
(\\mathbf{x}_{2}-\\overline{\\mathbf{x}})2
04
–11
121
10
–2
4
16
1
1
12
0
0
16
1
1
18
6
36
15
0
0
13
1
1
14
–1
1
6
–6
36
15
0
0
16
4
16
10
–5
25
15
3
9
17
2
4
05
–7
49
17
2
4
19
7
49
18
3
9
09
–3
9
18
3
9
11
–3
1
20
5
25
\\mathbf{\\Sigma }\\mathbf{x}_{1}=360
\\mathbf{\\Sigma }\\left(\\mathbf{x}_{1}-\\overline{\\mathbf{x}}\\right)2
\\mathbf{\\Sigma }\\mathbf{x}_{2} = 144
\\mathbf{\\Sigma } (\\mathbf{x}_{2}-\\overline{\\mathbf{x}}) = 214
\\overline{\\mathrm{x}}_{1}=\\frac{360}{12}=15 S2=\\frac{200}{12}=16.67 \\overline{\\mathrm{x}}_{2}=\\frac{144}{12} = 12 S2 = \\frac{214}{12}=17.83
S = \\sqrt[]{16.67}=4.08 S = \\sqrt[]{17.83}=4.22
Therefore, t = \\frac{15.00\\hbox{--}12.00}{\\sqrt[]{\\frac{4.08}{12}+\\frac{4.22}{12}}}= \\frac{3}{\\sqrt[]{\\frac{8.3}{12}}} = \\frac{3}{\\sqrt[]{0.69}} = \\frac{3}{0.83} = 3.75
Step 4: Critical region is determined by α = 0.05 level of significance and degree of freedom (αf) = 12 + 12—2 = 22 looking for t-value or table value, which gives 2.074
Step 5: Decision. Thus, the t-calculated is 3.75 and critical value is 2.074, since the t-calculated is greater than the critical value, the null hypothesis is rejected.
Step 6: Conclusion, based on the result obtained, we shall conclude that there is significant difference between the two groups.
11.5.2Computation of T-Test for Non-Independent Samples
Researchers occasionally get into situations where they must compare student performance across two unrelated or closely related courses. When this occurs, the t-test for non-independent samples is used to determine if the mean scores of two matched or non-independent samples differ significantly from one another. The calculation formula is as follows:
\\mathbf{t}=\\frac{\\sum \\mathbf{d}}{\\sqrt[]{\\frac{\\mathbf{N}\\sum \\mathbf{d}^{2}-(\\sum \\mathbf{d})^{2}}{\\mathbf{N}-1}}}
Where d = difference between each material samples
∑d = Addition of the differences between the matched samples
d2 = Square of the difference between each matched sample.
N = total matched samples
N—1 = number of degree freedom
For example, research administered in both Mathematics and Chemistry with scores as follows:
Table 11.3: Data Obtained from Two Subjects
S/N
1
2
3
4
5
6
7
8
9
10
Mathematics
10
25
50
37
80
23
48
63
40
35
Chemistry
29
48
46
17
30
45
19
48
42
50
Is the result significantly different?
Let us solve the problem in Table 11.3 using the procedure for testing hypothesis.
Step 1: Statement of hypotheses
H0: \\mathbf{\\mu }_{1} = \\mathbf{\\mu }_{2}
H1\\colon \\mathbf{\\mu }_{1} \\neq \\mathbf{\\mu }_{2}
Step 2: Selection of level of significance. \\alpha = 0.05 two-tailed test
Step 3: Calculate the t-test by going through the data provided in table 11.3
Students
Mathematics
\\mathbf{x}_{1}
Chemistry
\\mathbf{x}_{2}
D
\\mathbf{x}_{1}- \\mathbf{x}_{2}
D2
1
10
29
19
361
2
25
48
–23
529
3
50
46
4
16
4
37
17
20
400
5
80
30
50
2500
6
23
45
–22
484
7
48
19
29
841
8
63
48
15
225
9
40
42
–2
04
10
35
50
–15
225
∑
104
5585
Substitute in the given formula as: \\mathbf{t}=\\frac{\\sum \\mathbf{d}}{\\sqrt[]{\\frac{\\mathbf{N}\\sum \\mathbf{d}^{2}-(\\sum \\mathbf{d})^{2}}{\\mathbf{N}-1}}}
∑d = 104, d2 = 5,585, (∑d)2 = (104)2 = 10,816
\\mathbf{t}=\\frac{\\sum \\mathbf{d}}{\\sqrt[]{\\frac{\\mathbf{N}\\sum \\mathbf{d}^{2}-(\\sum \\mathbf{d})^{2}}{\\mathbf{N}-1}}}=\\frac{104}{\\sqrt[]{\\frac{10\\times 5585-(104)^{2}}{10-1}}}=\\frac{104}{\\sqrt[]{\\frac{55850\\hbox{--}10816}{9}}}=\\frac{104}{\\sqrt[]{\\frac{45034}{9}}}= \\frac{104}{\\sqrt[]{5003.78}} = \\frac{104}{70.74} = 1.47
Step 4: Now critical region is determining as \\alpha = 0.05 with df = n—1 = 10—1 = 9.
Looking at t-critical table and search for t-value with df = 9 at\\alpha = 0.5. which shows that the t-value is 1.83
Step 5: Decision; Since our calculated t (1.47) is less than the critical t (1.83) then the H0 is retained.
Step 6: Conclusion, since t cal < t tab, we RETAIN that there is no significant difference in the results or the results are not significant difference.
11.5.3T-test for Difference Between Population and Sample Means
When a Researcher want to compare a population and sample means, Researcher make use of this formula:
t=\\frac{\\overline{x}-\\mathrm{\\mu}}{\\sfrac{S}{\\sqrt[]{n\\hbox{--}1}}}
Where \\overline{x}=sample mean
\\mathrm{\\mu}= Population mean
s= Standard deviation
n= Number.
For example, A Researcher conducted a study and obtained the mean achievement score of all SS II students in senior secondary schools, in Bida Local Government Area in Post-test as 25.50%. Another Researcher carryout a study to verify that result and used 15 SS II Students sampled out in that study area. He then gave treatment on for areas of mathematics, for six weeks. At the end of treatment, the Researcher administered the mathematics achievement test (MAT) and obtained the following results: means are 30.10, 7.5 standard deviation.
Based on the data obtained, we shall solve the by going through the process of hypothesis testing.
Step 1: Statement of Hypotheses
H0\\colon \\mathrm{\\mu}=\\overline{x}
H1\\colon \\mathrm{\\mu}\\neq \\overline{x}
Step 2: Determine the level of significance
Assuming \\alpha =0.05 is selected.
Step 3: Compute the test statistics by using the formula:
=\\frac{\\overline{x}-\\mathrm{\\mu}}{\\sfrac{S}{\\sqrt[]{n\\hbox{--}1}}}
Where \\overline{x}=30.10,\\mathrm{\\mu}=25.50,s=7.5,n=15.
\\therefore t=\\frac{30.10\\hbox{--}25.50}{\\sfrac{7.5}{\\sqrt[]{15\\hbox{--}1}}}=\\frac{4.6}{\\sfrac{7.5}{\\sqrt[]{14}}}=\\frac{4.5}{\\sfrac{7.5}{3.7}}=\\frac{4.6}{2.03}=2.27
Step 4: determine the critical region, at \\alpha =0.05
Level of significance. Now that t-calculated = 2.27, df = 15 –1 = 14, alpha level =0.05 then the t-critical = 2.13
Step 5: Decision on rule, if calculated value is greater than the critical value then the null hypothesis is rejected. But if the t-calculated value is less than the critical value, the null hypothesis is retained.
From the result obtained, t-calculated is greater than the t-critical i.e. 2.27, 2.13. We therefore rejected null hypothesis.
Step 6: Conclusion: we concluded that there is a significant difference between the two means.
11.5.4Computation for Difference between Correlation Coefficients
Testing hypothesis about correlations have two approaches: the first one which you are familiar with, is to use the table and find out if the correlation coefficient is significant, while the second way is by using the correlation coefficient directly from the table then you can subject it to a t-test. Using below formula:
t=\\frac{\\sqrt[]{1-r^{2}}}{n\\hbox{--}2}\\mathrm{o}\\mathrm{r}t=\\frac{r\\sqrt[]{n\\hbox{--}2}}{1-r^{2}}
For instance, a Lecturer want to investigate whether students’ scores in MAT 201 have any significant relationship with their scores in MAT 301. He then used applied Pearson Product Moment Correlation. He obtained results as r = 0.70, N = 40. Find out whether there is significant relation.
Let us use the process of hypothesis testify to solve the above problem.
Step 1: Propose a null hypothesis
There is no significance relationship between the students’ scores in both MAT 201 and MAT 301.
Step 2: Select the level of significance. At \\alpha =0.05 level of significance is assumed.
Step 3: Calculate the t-test using the formula:
\\frac{r\\sqrt[]{n\\hbox{--}1}}{1-r^{2}}
Giving that r = 0.70, n = 40 substitute with formula as
= \\frac{0.70\\sqrt[]{40\\hbox{--}2}}{\\sqrt[]{1\\hbox{--}0.70^{2}}}=\\frac{0.70\\sqrt[]{38}}{\\sqrt[]{1\\hbox{--}0.70^{2}}}=\\frac{0.070\\times 6.16}{\\sqrt[]{1\\hbox{--}0.49}}=\\frac{4.312}{\\sqrt[]{0.51}} =\\frac{4.312}{0.714} =6.04
Step 4: Determine the critical region as at\\alpha =0.05 level of significant and t-calculated is 6.04.with df = 40—1 = 39, then critical value is at 2.021
Step 5: Decision: now decision is taken since t-calculated greater than t-critical i.e., 6.04 > 2.021, the null hypothesis is rejected.
Step 6: Conclusion: Based on the results obtained we conclude that there is significant relationship between MAT 201 and MAT 301.
Student Activity
Differentiate between parametric and non-parametric test
State three conditions for using parametric test.
What is t-test?
Differentiate between t-test and z-test
What is z-test?
What are those conditions to be looked into, before choosing t-test?
Analyse the given data representing the set of scores from day and boarding schools. Use the t-test to determine significance difference or not.
Day (D)
26
15
8
44
26
13
38
24
13
29
Boarding (B)
20
4
9
36
20
3
25
10
6
14
The Researcher obtained the following scores for the experimental and control groups.
Experimental Group
30
64
47
38
59
81
44
Control Group
20
24
31
18
57
26
10
Find out whether these sets of scores are significantly different or not using t-test for non-independent samples.
Using t-test for independent samples with data provided
Group 1
10
11
13
14
15
16
17
18
19
20
Group 2
9
10
12
13
13
13
14
14
15
16
Are the results significant different or not?
Suppose the Researcher obtains sets of scores
Score (x1)
3
4
5
6
2
7
8
9
10
11
Score (x2)
2
3
3
3
4
4
5
5
6
6
Compute using z-test find out whether the set of scores are significantly different.
The Researcher conducted studies and obtained the following data provide below:
Population
Mean
Sample
Mean
Sample
Size
Standard
Deviation
1st Researcher
55%
59.85
25
8.50
2nd Researcher
65%
70.15
45
11.50
3rd Researcher
58%
65.01
40
14.50
Find out whether performance significant different?
Using \\alpha =0.5 level of significant.
In research conducted, it was found that the correlation coefficient of two variable was 0.85 and the number of the respondents, was 50. Propose a null hypothesis and test using \\alpha at 0.05 levels.
11.6Analysis of Variance (F-test)
R. A. Fisher created the acronym ANOVA, or Analysis of Variance, in 1923. Since then, researchers have utilized it frequently and broadly. It is a parametric test that assesses if there is a statistical link between the variables being analyzed by contrasting the mean scores of three or more groups. Whenever a researcher wants to find out if two or more independent samples taken from populations with similar mean scores have significantly different mean scores, they should do so, F-test is the best statistical test to be used. Because ANOVA not only eliminates the differences but also brings out the cause or causes of such significant difference.
The basic rule of ANOVA is comparing the amount of variance ‘between the samples’ with that of the '‘within the samples’‘. This comparison is carried out by dividing the variance ‘between samples’ with the variance of ‘within samples’ to obtain a ratio known as F-ratio. There are two major types of ANOVA, one-way ANOVA, and two-way ANOVA.
11.6.1Computation of ANOVA (F-test)
The researcher must note that some basic assumptions are considered before applying F-test in any research studies. These assumptions are:
The samples selected from the population should be independent random samples.
The variance in the population should be normal distributed.
The data generated should be interval in nature
Homogeneity of variances is necessary
In carryout the F-test the following steps should be consider:
Step 1: calculate the sum of squares (∑x2) and sum of scores (∑x) for each group in the data provided.
Step 2: calculate the scores for all combined groups into composite group called as the total group variance (Vt), given as.
SStotal (Vt) = ∑x2 -\\frac{(\\sum \\mathbf{X})^{2}}{\\mathbf{n}}
Step 3: Find the difference between the total group variance and the within groups variance known as the between-groups variance (Vt—Vw - Vb)
The formula is given as:
SSbetween (Vb) = (\\sum xg)2 - \\frac{(\\sum \\mathbf{X})^{2}}{\\mathbf{n}}
Where SSbetween = Between sum of square
ng = the number of individual scores in each group
n = the total number of individual scores in all the groups.
\\frac{(\\sum \\mathbf{X}\\mathbf{g})^{2}}{\\mathbf{n}} = the sum of each group’s raw scores squared and divided by ng
\\frac{(\\sum \\mathbf{X})^{2}}{\\mathbf{n}} = the sum of all raw sores squared and divided by n
Step 4: The mean value of the variance of each group calculated separately is known as the within groups (Vw). this is done by
SStotal—SSbetween = SSwithin and the formula below is applied:
\\mathbf{x}_{\\mathbf{g}}^{2} - \\frac{(\\sum \\mathbf{X}\\mathbf{g})^{2}}{\\mathbf{n}}
Step 5: At this step, you need to determine all degrees of freedom. The degrees of freedom are given as:
Total degrees of freedom (Dftotal) = n—1
Between groups df(Dfbetween) = k—1
Within groups df(Dfwithin) = n—k
Where n = total number of cases and k = numbers of groups.
Step 6: to compute for the between groups mean square, it is obtaining between sum of squares divided by the between df as:
MSbetween = \\frac{\\mathbf{S}\\mathbf{S}_{\\mathbf{s}}}{\\mathbf{d}\\mathbf{f}_{\\mathbf{g}}}
Similarly, that of within group mean square given by dividing the within sum of squares by within df as: \\frac{\\mathrm{S}\\mathrm{S}\\mathrm{w}}{\\mathrm{D}\\mathrm{f}\\mathrm{w}}
(between-groups variance)
(within-groups variance)
(between-groups variance)
(within-groups variance)
Step 7:
F = \\frac{\\mathbf{M}\\boldsymbol{S}_{\\mathbf{b}\\mathbf{e}\\mathbf{t}\\mathbf{w}\\mathbf{e}\\mathbf{e}\\mathbf{n}}}{\\mathbf{M}\\boldsymbol{S}_{\\mathbf{w}\\mathbf{i}\\mathbf{t}\\mathbf{h}\\mathbf{i}\\mathbf{n}}}= \\frac{\\mathbf{V}_{\\mathbf{b}}}{\\mathbf{V}_{\\mathbf{w}}}
Let us demonstrate how to calculate F-test using the example below:
Develop an ANOVA table for the following data on the sets of scores of students after treatment using three teaching strategies on three groups of students. Are the strategies have significant differences on the performance of the students at \\alpha = 5% level of significance.
Table 11.4: Data on Set of Scores for the Three Groups
Discovery
2
3
5
7
6
5
4
3
5
Scaffolding
4
3
2
5
7
9
5
2
1
3
Laboratory
5
7
6
7
5
3
2
4
6
2
1
Let us demonstrate the calculation in F-test with the data in Table 11.4 by using the steps for hypothesis testing.
Step 1: Statement of hypotheses
H0: \\overline{\\mathbf{x}}_{1}=\\overline{\\mathbf{x}}_{2}=\\overline{\\mathbf{x}}_{3}
H1: \\overline{\\mathbf{x}}_{1}\\neq \\overline{\\mathbf{x}}_{2}\\neq \\overline{\\mathbf{x}}_{3}(not all the mean scores are equal)
Step 2: Formulate the level of significance as \\alpha =.05
Step 3: Select the appropriate test statistics for research study. The F-test is appropriate test statistic
F = \\frac{\\mathbf{B}\\mathbf{e}\\mathbf{t}\\mathbf{w}\\mathbf{e}\\mathbf{e}\\mathbf{n}\\mathbf{s}\\mathbf{a}\\mathbf{m}\\mathbf{p}\\mathbf{l}\\mathbf{e}\\mathbf{v}\\mathbf{a}\\mathbf{r}\\mathbf{i}\\mathbf{a}\\mathbf{n}\\mathbf{c}\\mathbf{e}}{\\mathbf{w}\\mathbf{i}\\mathbf{t}\\mathbf{h}\\mathbf{i}\\mathbf{n}\\mathbf{s}\\mathbf{a}\\mathbf{m}\\mathbf{p}\\mathbf{l}\\mathbf{e}\\mathbf{v}\\mathbf{a}\\mathbf{r}\\mathbf{i}\\mathbf{a}\\mathbf{n}\\mathbf{c}\\mathbf{e}}= \\frac{\\mathbf{V}_{\\mathbf{b}}}{\\mathbf{V}_{\\mathbf{w}}}
k—1 degrees of freedom for numerator and N—k degrees of freedom for denominator, where K stands for number of treatment and ’n’ for total number of observations.
Now composite the table of value to calculate for ∑x and ∑x2 for each group.
Table 11.4: Data of F-Statistic for Three Group
Group 1
Group 2
Group 3
\\mathbf{x}_{1}
\\mathbf{x}_{1}^{2}
\\mathbf{x}_{2}
\\mathbf{x}_{2}^{2}
\\mathbf{x}_{3}
\\mathbf{x}_{3}^{3}
2
04
4
16
5
25
3
09
3
09
7
49
5
25
2
04
6
36
7
49
5
25
7
49
6
36
7
49
5
25
5
25
9
81
3
09
4
16
5
25
2
04
3
09
2
04
4
16
5
25
1
01
6
36
3
09
2
04
1
01
\\mathbf{n}_{1} = 9 \\mathbf{n}_{2} = 10 \\mathbf{n}_{3} = 11
∑\\mathbf{X}_{1} = 40 ∑\\mathbf{X}_{2} = 41 ∑\\mathbf{X}_{3} = 48
∑\\mathbf{x}_{1}^{2} = 198 ∑\\mathbf{x}_{2}^{2} = 223 ∑\\mathbf{x}_{3}^{3} = 254
\\sum \\mathbf{X}\\mathbf{g} = \\mathbf{X}_{1}+\\mathbf{X}_{2}+\\mathbf{X}_{3}=40+41+48=129
Calculate the total sum of squares as provided by the formula:
SStotal = ∑x2 -\\frac{(\\sum \\mathbf{X})^{2}}{\\mathbf{n}}
Since ∑\\mathrm{x}_{1}^{2}, 198, ∑\\mathrm{x}_{2}^{2} = 223, ∑\\mathrm{x}_{3}^{3} = 254 and ∑x = 40 + 41 + 48 = 129
Now substitute in the formula above:
SStotal = 198 + 223 + 254-\\frac{(192)^{2}}{30} = 675—554.7 = 120.3
Again, calculate the between-group sum of squares using the formula: \\frac{(\\sum \\mathrm{X}\\mathrm{g})^{2}}{\\mathrm{n}\\mathrm{g}} - \\frac{(\\sum \\mathrm{X})^{2}}{\\mathrm{n}}
= \\frac{40^{2}}{9}+\\frac{(41)2}{10} + \\frac{48^{2}}{11}-\\frac{129^{2}}{30}
= \\frac{1600}{9}+\\frac{1681}{10}+\\frac{2304}{11}-\\frac{16641^{2}}{30}
= 177.78 + 168.1 + 209.46—554.7
= 555.l34—554.7 = 0.64
To have, within-group which is sum of squares by the formula:
SSwithin = ∑\\begin{array}{c} \\left[\\mathbf{x}_{\\mathbf{g}}^{2-}\\frac{\\left(\\sum \\mathbf{X}_{\\mathbf{g}}\\right)}{\\mathbf{n}\\mathbf{g}}^{2}\\right] \\end{array}
By substituting in the formula, we have:
SSwithin = ∑\\mathbf{X}_{1}^{2}-\\frac{\\left(\\sum \\mathbf{X}_{1}\\right)}{\\mathbf{n}_{1}}^{2}=198\\hbox{--}\\frac{40^{2}}{9}=20.22+
= ∑\\mathbf{X}_{2}^{2}-\\frac{\\left(\\sum \\mathbf{X}_{2}\\right)}{\\mathbf{n}_{2}}^{2}=223-\\frac{(41)^{2}}{10}=54.9+
= ∑X33-\\frac{\\left(\\sum \\mathbf{x}_{3}\\right)}{\\mathbf{n}3}^{2}=254\\hbox{--}\\frac{\\left(48\\right)2}{11}=44.54
= 20.22 + 54.9 + 44.54 = 119.66
Another approach to obtain within-groups using formula
SSwithin = SStotal—SSbetween substituting we have:
= 120.3—0.64 = 119.66
To obtain the degree of freedom through different sources of variation in the formula as:
Total degree of freedom (df1) = n—1 = 30—1 = 29
Between-groups degree of freedom (dfB) = k—1 = 3—1 = 2
Within-groups degree of freedom (dfw) = n—k = 30—3 = 27
To obtain the variance estimate (mean square), which is between group and within-group. This is carried out by dividing the SSbetween by dfbetween and SSwithin by dfwithin as given by formula:
MSBW =\\frac{\\mathbf{S}\\mathbf{S}_{\\mathbf{B}\\mathbf{W}}}{\\mathbf{D}\\mathbf{F}_{\\mathbf{B}\\mathbf{W}}}=\\frac{0.64}{2}=0.32
MSW = \\frac{\\mathbf{S}\\mathbf{S}_{\\mathbf{W}}}{\\mathbf{D}\\mathbf{F}_{\\mathbf{W}}}=\\frac{119.66}{27}=4.43
F statistics is calculated by using the
F =\\frac{\\mathbf{M}\\mathbf{S}_{\\mathbf{B}\\mathbf{W}}}{\\mathbf{M}\\mathbf{S}_{\\mathbf{B}\\mathbf{W}}}, Now substituting into the formula, we obtain:
F =\\frac{0.32}{4.43}=0.07
Now we shall search for critical value in F-table with degree of freedom for between groups at horizontal across the table, whereas the degree of freedom for within groups at vertical down left side of the table. Given as (between) = 2 and df (within) = 27 at α =.05
Step 4: Decision: since the calculated F value is 0.07 is less than the F (critical) = 3.32 for df = (2, 27). Hence the H0 is retain. That is there is no significant difference among the means of the groups.
Step 5: Conclusion: it is concluded that there is no significant difference among the groups.
This may be by chance or sampling error.
The summary of our analysis can be provided in table as:
Table 11.5: One-way ANOVA Summary Table
Source of Variance
Sum of Squares
Degree of Freedom
Mean Sum of Square
F
Between Groups
0.64
2
0.32
Within Groups
119.66
27
0.07
4.43
Total
120.3
29
Now we have illustrated how to compute t-test, z-test, and ANOVA in this book. However, you will meet other tests such as ANCOVA, MANOVA and MANCOVA in other books.
Students Activity
The following are set of test scores for three sample groups:
S/N
1
2
3
4
5
X1
3
5
4
5
4
X2
3
4
5
6
7
X3
4
4
4
6
8
Determine whether the set of scores are significantly different or not.
Construct an ANOVA for the following data
Method
1
2
3
4
Conventional
7
8
4
9
Discussion
6
6
4
8
Experimental
6
5
4
5
Use the data above to verify a null hypothesis at α = 0.05 and state whether the methods have significant differences.
References
Awotunde, P. O. & Ugodulunwa (2002). An Introduction to Statistical Methods in Education. Printed and published in Nigeria by Fab Anieh (Nig) Ltd.
National Teachers’ Institute, Kaduna & National Open University of Nigeria (2016). Basic Research Methods in Education.