CHAPTER ELEVEN

INFERENTIAL STATISTICS I

11.1Introduction

Inferential Statistics describes population depending on the sample’s behavior. It also involves figuring out if the outcomes based on the sample or samples match the outcomes that would have been achieved for the complete population. Inferential statistics are used to predict population features from a randomly chosen sample as well as to estimate population parameter using sample data. They are used to test formulated hypothesis to draw a valid conclusion from research studies. This chapter deals with types of inferential statistics, computation of t-test, Z-test, correlation analysis, ANOVA or F-Ratio.

11.2Objectives

At the end of this chapter, you should be able to:

learn the types of inferential statistics

describe situations where the use of z-test, t-test and f-test are applicable.

compute z-test with relevant examples

compute t-test with relevant examples

compute f-test with relevant examples

use t-test for hypothesis testing for difference between population and sample means

use t-test for hypothesis testing for difference between correlation coefficients.

11.3Types of Inferential Statistics

Inferential statistics are categorized into two: Parameter statistics and non-parameter statistics. The two statistics are useful when testing hypothesis in research. However, parametric statistics is more powerful and generally preferred. By more powerful is meant that it requires certain assumptions which must be considered in order to make valid decision. The followings are three very important assumptions that are made when applying parametric statistics to test a formulated hypothesis:

The variable measured is normally distributed or at least in the form of distribution must be known.

The data collected must be from interval or ratio scale of measurement

The variable must be independently selected without affecting the selection of any other one.

The researcher must note that any one or more of these assumptions discarded, then non-parametric inferential statistical test should be employed. This chapter will only discuss t-test, Z-test, ANOVA.

11.4The Z-Test

The Z-statistic is applied to investigate whether two means are significantly different. It is used for testing hypothesis when the sample size is equal or greater than 30 (≥ 30). When the population parameters \\mu and \\sigma , are well-defined for a population. The Z-test formula is given as:

Z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\mathbf{S}\\mathbf{D}}

Where \\overline{\\mathbf{x}} = Calculate mean score

SDx = Standard error of difference between means

SDx = \\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}

\\therefore Z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}

Where\\overline{\\mathbf{x}}_{1}= Mean of group 1

\\overline{\\mathbf{x}}_{2}= Mean of group 2

SDx = Standard error of difference between means

Now, let us demonstrate the use of Z-test in hypothesis testing with an example given below:

Analyze the given data representing set of scores obtained by five students from Mathematics and Chemistry test.

Table 11.1: Sets of Scores obtained in Mathematics and Chemistry Test

Students

1

2

3

4

5

Maths Scores X1

4

5

6

7

8

Chem Scores X2

4

4

5

3

4

Based on the data provided in Table 11.1, we shall solve this problem by going through the process of hypothesis testing.

Step 1: Statement of Hypotheses

H0: \\mathbf{\\mu }= x

H1: \\mathbf{\\mu }\\neq x

Step 2: Determine the level of significance

Assuming \\alpha =0.05 is selected

Step 3: Calculate the test statistics by applying the formula provided as:

Z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\mathbf{S}\\mathbf{D}}, SDx = \\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}} given Z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}

x1

\\mathbf{x}_{1}-\\overline{\\mathbf{x}}

(\\mathbf{x}_{1}-\\overline{\\mathbf{x}})2

\\mathbf{x}_{2}

\\mathbf{x}_{2}-\\overline{\\mathbf{x}}

(\\mathbf{x}_{2}-\\overline{\\mathbf{x}})2

4

2

4

4

0

0

5

1

1

4

0

0

6

0

0

5

1

1

7

1

1

3

–1

1

8

2

4

4

0

0

\\mathbf{\\Sigma }\\mathbf{x}_{1}=30

\\mathbf{\\Sigma }\\left(\\mathbf{x}_{1}-\\overline{\\mathbf{x}}\\right)2

∑X2 = 20

∑(X2-\\overline{\\mathbf{x}}) = 2

\\overline{\\mathrm{x}}1 = \\frac{30}{5}=6 ∑X2 - \\frac{20}{5}=4

SD = \\sqrt[]{\\frac{10}{5}} = \\sqrt[]{\\frac{2}{5}}

= \\sqrt[]{2} = \\sqrt[]{0.4}

S1 = 1.41 S2 = 0.63

Now substitute in z = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}

= \\frac{6\\hbox{--}4}{\\sqrt[]{\\frac{\\left(1.41\\right)^{2}}{5}+\\frac{\\left(0.63\\right)^{2}}{5}}} = \\frac{2}{\\sqrt[]{\\frac{1.99}{5}+\\frac{0.40}{5}}} = \\frac{2}{\\sqrt[]{0.40+0.8}}=\\frac{2}{\\sqrt[]{0.48}}=\\frac{2}{0.69} = 2.90

Step 4: Determine the critical region. At p = 0.05 level of significance, the critical or table value of z = ±1.96 and calculated Z-value is 2.90.

Step 5: Decision. By the available records the calculated Z-value is 2.90 greater than the. Z table value 1.96. Therefore, the null hypothesis is rejected.

Step 6: Conclusion. By the result obtained, it is concluded that there is significant difference between the two mean scores under statistical analysis.

11.5The T-Test

Student t-test was the name given to the t-test statistic. Williams Gosset created it as an inferential statistic in 1908. The t-test statistic offers a number of methods for testing hypotheses; however, we will just cover the following in this unit:

Use the t-test to determine if two independent samples’ mean scores differ significantly.

T-test for a non-independent significant difference between two mean scores samples.

Thus, getting to computations of t-test, there are some conditions to be satisfied before using the t-test. These are as follows:

A comparison of two groups is required.

The sample that is chosen must have a normal distribution.

Population variance is homogeneous.

The samples are independently or at random chosen from the population.

The requirements for the variable values must hold.

Both big and small samples can be utilized for the t-test, although the sample size cannot be fewer than ten.

11.5.1Computation of T-Test for Difference Between Two Independent Samples

When two independent samples’ mean scores are given, it is possible to assess whether there is a significant difference between them as:

t = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}

where \\overline{\\mathbf{x}}_{1} = mean scores of sample group 1

\\overline{\\mathbf{x}}_{2} = Mean scores of samples group 2

\\mathbf{S}_{\\mathbf{x}}^{2} = Variance

n = Sample size.

Let us now, demonstrates the calculation of t-test with data provided in Table 11.2 assumption, that the following conditions to use t-test are satisfied.

The distribution of the value in both samples normal.

Data collected are interval measurement scale.

Sample is randomly selected.

Sample variances are homogeneous.

Table 11.2 A set of pre-service Teachers took post-test for two samples randomly selected

S/N

1

2

3

4

5

6

7

8

9

10

11

12

Group A

04

16

16

15

14

15

10

17

17

18

18

20

Group B

10

12

18

13

6

16

10

15

05

19

09

11

Are the results significant different, or not?

Since the Researcher is interested to determine whether significant difference exists between the mean value of Group A and B. Let us illustrate the computation for t-test or unrelated or independent samples. We shall adopt the steps for hypothesis testing in solving the problem in Table 11.2

Step 1: Statement of Hypotheses

H0: \\mathbf{\\mu }_{\\mathbf{A}}= \\mathbf{\\mu }_{\\mathbf{B}}

H1: \\mathbf{\\mu }_{\\mathbf{A}}\\neq \\mathbf{\\mu }_{\\mathbf{B}}

Step 2: Level of significance is at 0.05

Step 3: Calculate the t-test by applying the formula:

t = \\frac{\\overline{\\mathbf{x}}_{1}-\\overline{\\mathbf{x}}_{2}}{\\sqrt[]{\\frac{{\\mathbf{S}_{1}}^{2}}{\\mathbf{n}_{1}}+\\frac{{\\mathbf{S}_{2}}^{2}}{\\mathbf{n}_{2}}}}

Firstly, let us label each group as Group A = X1 and Group B = X2, we then calculate mean scores.

\\mathbf{x}_{1}

\\mathbf{x}_{1}-\\overline{\\mathbf{x}}

(\\mathbf{x}_{1}-\\overline{\\mathbf{x}})2

\\mathbf{x}_{2}

\\mathbf{x}_{2}-\\overline{\\mathbf{x}}

(\\mathbf{x}_{2}-\\overline{\\mathbf{x}})2

04

–11

121

10

–2

4

16

1

1

12

0

0

16

1

1

18

6

36

15

0

0

13

1

1

14

–1

1

6

–6

36

15

0

0

16

4

16

10

–5

25

15

3

9

17

2

4

05

–7

49

17

2

4

19

7

49

18

3

9

09

–3

9

18

3

9

11

–3

1

20

5

25

\\mathbf{\\Sigma }\\mathbf{x}_{1}=360

\\mathbf{\\Sigma }\\left(\\mathbf{x}_{1}-\\overline{\\mathbf{x}}\\right)2

\\mathbf{\\Sigma }\\mathbf{x}_{2} = 144

\\mathbf{\\Sigma } (\\mathbf{x}_{2}-\\overline{\\mathbf{x}}) = 214

\\overline{\\mathrm{x}}_{1}=\\frac{360}{12}=15 S2=\\frac{200}{12}=16.67 \\overline{\\mathrm{x}}_{2}=\\frac{144}{12} = 12 S2 = \\frac{214}{12}=17.83

S = \\sqrt[]{16.67}=4.08 S = \\sqrt[]{17.83}=4.22

Therefore, t = \\frac{15.00\\hbox{--}12.00}{\\sqrt[]{\\frac{4.08}{12}+\\frac{4.22}{12}}}= \\frac{3}{\\sqrt[]{\\frac{8.3}{12}}} = \\frac{3}{\\sqrt[]{0.69}} = \\frac{3}{0.83} = 3.75

Step 4: Critical region is determined by α = 0.05 level of significance and degree of freedom (αf) = 12 + 12—2 = 22 looking for t-value or table value, which gives 2.074

Step 5: Decision. Thus, the t-calculated is 3.75 and critical value is 2.074, since the t-calculated is greater than the critical value, the null hypothesis is rejected.

Step 6: Conclusion, based on the result obtained, we shall conclude that there is significant difference between the two groups.

11.5.2Computation of T-Test for Non-Independent Samples

Researchers occasionally get into situations where they must compare student performance across two unrelated or closely related courses. When this occurs, the t-test for non-independent samples is used to determine if the mean scores of two matched or non-independent samples differ significantly from one another. The calculation formula is as follows:

\\mathbf{t}=\\frac{\\sum \\mathbf{d}}{\\sqrt[]{\\frac{\\mathbf{N}\\sum \\mathbf{d}^{2}-(\\sum \\mathbf{d})^{2}}{\\mathbf{N}-1}}}

Where d = difference between each material samples

∑d = Addition of the differences between the matched samples

d2 = Square of the difference between each matched sample.

N = total matched samples

N—1 = number of degree freedom

For example, research administered in both Mathematics and Chemistry with scores as follows:

Table 11.3: Data Obtained from Two Subjects

S/N

1

2

3

4

5

6

7

8

9

10

Mathematics

10

25

50

37

80

23

48

63

40

35

Chemistry

29

48

46

17

30

45

19

48

42

50

Is the result significantly different?

Let us solve the problem in Table 11.3 using the procedure for testing hypothesis.

Step 1: Statement of hypotheses

H0: \\mathbf{\\mu }_{1} = \\mathbf{\\mu }_{2}

H1\\colon \\mathbf{\\mu }_{1} \\neq \\mathbf{\\mu }_{2}

Step 2: Selection of level of significance. \\alpha = 0.05 two-tailed test

Step 3: Calculate the t-test by going through the data provided in table 11.3

Students

Mathematics

\\mathbf{x}_{1}

Chemistry

\\mathbf{x}_{2}

D

\\mathbf{x}_{1}- \\mathbf{x}_{2}

D2

1

10

29

19

361

2

25

48

–23

529

3

50

46

4

16

4

37

17

20

400

5

80

30

50

2500

6

23

45

–22

484

7

48

19

29

841

8

63

48

15

225

9

40

42

–2

04

10

35

50

–15

225

104

5585

Substitute in the given formula as: \\mathbf{t}=\\frac{\\sum \\mathbf{d}}{\\sqrt[]{\\frac{\\mathbf{N}\\sum \\mathbf{d}^{2}-(\\sum \\mathbf{d})^{2}}{\\mathbf{N}-1}}}

∑d = 104, d2 = 5,585, (∑d)2 = (104)2 = 10,816

\\mathbf{t}=\\frac{\\sum \\mathbf{d}}{\\sqrt[]{\\frac{\\mathbf{N}\\sum \\mathbf{d}^{2}-(\\sum \\mathbf{d})^{2}}{\\mathbf{N}-1}}}=\\frac{104}{\\sqrt[]{\\frac{10\\times 5585-(104)^{2}}{10-1}}}=\\frac{104}{\\sqrt[]{\\frac{55850\\hbox{--}10816}{9}}}=\\frac{104}{\\sqrt[]{\\frac{45034}{9}}}= \\frac{104}{\\sqrt[]{5003.78}} = \\frac{104}{70.74} = 1.47

Step 4: Now critical region is determining as \\alpha = 0.05 with df = n—1 = 10—1 = 9.

Looking at t-critical table and search for t-value with df = 9 at\\alpha = 0.5. which shows that the t-value is 1.83

Step 5: Decision; Since our calculated t (1.47) is less than the critical t (1.83) then the H0 is retained.

Step 6: Conclusion, since t cal < t tab, we RETAIN that there is no significant difference in the results or the results are not significant difference.

11.5.3T-test for Difference Between Population and Sample Means

When a Researcher want to compare a population and sample means, Researcher make use of this formula:

t=\\frac{\\overline{x}-\\mathrm{\\mu}}{\\sfrac{S}{\\sqrt[]{n\\hbox{--}1}}}

Where \\overline{x}=sample mean

\\mathrm{\\mu}= Population mean

s= Standard deviation

n= Number.

For example, A Researcher conducted a study and obtained the mean achievement score of all SS II students in senior secondary schools, in Bida Local Government Area in Post-test as 25.50%. Another Researcher carryout a study to verify that result and used 15 SS II Students sampled out in that study area. He then gave treatment on for areas of mathematics, for six weeks. At the end of treatment, the Researcher administered the mathematics achievement test (MAT) and obtained the following results: means are 30.10, 7.5 standard deviation.

Based on the data obtained, we shall solve the by going through the process of hypothesis testing.

Step 1: Statement of Hypotheses

H0\\colon \\mathrm{\\mu}=\\overline{x}

H1\\colon \\mathrm{\\mu}\\neq \\overline{x}

Step 2: Determine the level of significance

Assuming \\alpha =0.05 is selected.

Step 3: Compute the test statistics by using the formula:

=\\frac{\\overline{x}-\\mathrm{\\mu}}{\\sfrac{S}{\\sqrt[]{n\\hbox{--}1}}}

Where \\overline{x}=30.10,\\mathrm{\\mu}=25.50,s=7.5,n=15.

\\therefore t=\\frac{30.10\\hbox{--}25.50}{\\sfrac{7.5}{\\sqrt[]{15\\hbox{--}1}}}=\\frac{4.6}{\\sfrac{7.5}{\\sqrt[]{14}}}=\\frac{4.5}{\\sfrac{7.5}{3.7}}=\\frac{4.6}{2.03}=2.27

Step 4: determine the critical region, at \\alpha =0.05

Level of significance. Now that t-calculated = 2.27, df = 15 –1 = 14, alpha level =0.05 then the t-critical = 2.13

Step 5: Decision on rule, if calculated value is greater than the critical value then the null hypothesis is rejected. But if the t-calculated value is less than the critical value, the null hypothesis is retained.

From the result obtained, t-calculated is greater than the t-critical i.e. 2.27, 2.13. We therefore rejected null hypothesis.

Step 6: Conclusion: we concluded that there is a significant difference between the two means.

11.5.4Computation for Difference between Correlation Coefficients

Testing hypothesis about correlations have two approaches: the first one which you are familiar with, is to use the table and find out if the correlation coefficient is significant, while the second way is by using the correlation coefficient directly from the table then you can subject it to a t-test. Using below formula:

t=\\frac{\\sqrt[]{1-r^{2}}}{n\\hbox{--}2}\\mathrm{o}\\mathrm{r}t=\\frac{r\\sqrt[]{n\\hbox{--}2}}{1-r^{2}}

For instance, a Lecturer want to investigate whether students’ scores in MAT 201 have any significant relationship with their scores in MAT 301. He then used applied Pearson Product Moment Correlation. He obtained results as r = 0.70, N = 40. Find out whether there is significant relation.

Let us use the process of hypothesis testify to solve the above problem.

Step 1: Propose a null hypothesis

There is no significance relationship between the students’ scores in both MAT 201 and MAT 301.

Step 2: Select the level of significance. At \\alpha =0.05 level of significance is assumed.

Step 3: Calculate the t-test using the formula:

\\frac{r\\sqrt[]{n\\hbox{--}1}}{1-r^{2}}

Giving that r = 0.70, n = 40 substitute with formula as

= \\frac{0.70\\sqrt[]{40\\hbox{--}2}}{\\sqrt[]{1\\hbox{--}0.70^{2}}}=\\frac{0.70\\sqrt[]{38}}{\\sqrt[]{1\\hbox{--}0.70^{2}}}=\\frac{0.070\\times 6.16}{\\sqrt[]{1\\hbox{--}0.49}}=\\frac{4.312}{\\sqrt[]{0.51}} =\\frac{4.312}{0.714} =6.04

Step 4: Determine the critical region as at\\alpha =0.05 level of significant and t-calculated is 6.04.with df = 40—1 = 39, then critical value is at 2.021

Step 5: Decision: now decision is taken since t-calculated greater than t-critical i.e., 6.04 > 2.021, the null hypothesis is rejected.

Step 6: Conclusion: Based on the results obtained we conclude that there is significant relationship between MAT 201 and MAT 301.

Student Activity

Differentiate between parametric and non-parametric test

State three conditions for using parametric test.

What is t-test?

Differentiate between t-test and z-test

What is z-test?

What are those conditions to be looked into, before choosing t-test?

Analyse the given data representing the set of scores from day and boarding schools. Use the t-test to determine significance difference or not.

Day (D)

26

15

8

44

26

13

38

24

13

29

Boarding (B)

20

4

9

36

20

3

25

10

6

14

The Researcher obtained the following scores for the experimental and control groups.

Experimental Group

30

64

47

38

59

81

44

Control Group

20

24

31

18

57

26

10

Find out whether these sets of scores are significantly different or not using t-test for non-independent samples.

Using t-test for independent samples with data provided

Group 1

10

11

13

14

15

16

17

18

19

20

Group 2

9

10

12

13

13

13

14

14

15

16

Are the results significant different or not?

Suppose the Researcher obtains sets of scores

Score (x1)

3

4

5

6

2

7

8

9

10

11

Score (x2)

2

3

3

3

4

4

5

5

6

6

Compute using z-test find out whether the set of scores are significantly different.

The Researcher conducted studies and obtained the following data provide below:

Population

Mean

Sample

Mean

Sample

Size

Standard

Deviation

1st Researcher

55%

59.85

25

8.50

2nd Researcher

65%

70.15

45

11.50

3rd Researcher

58%

65.01

40

14.50

Find out whether performance significant different?

Using \\alpha =0.5 level of significant.

In research conducted, it was found that the correlation coefficient of two variable was 0.85 and the number of the respondents, was 50. Propose a null hypothesis and test using \\alpha at 0.05 levels.

11.6Analysis of Variance (F-test)

R. A. Fisher created the acronym ANOVA, or Analysis of Variance, in 1923. Since then, researchers have utilized it frequently and broadly. It is a parametric test that assesses if there is a statistical link between the variables being analyzed by contrasting the mean scores of three or more groups. Whenever a researcher wants to find out if two or more independent samples taken from populations with similar mean scores have significantly different mean scores, they should do so, F-test is the best statistical test to be used. Because ANOVA not only eliminates the differences but also brings out the cause or causes of such significant difference.

The basic rule of ANOVA is comparing the amount of variance ‘between the samples’ with that of the '‘within the samples’‘. This comparison is carried out by dividing the variance ‘between samples’ with the variance of ‘within samples’ to obtain a ratio known as F-ratio. There are two major types of ANOVA, one-way ANOVA, and two-way ANOVA.

11.6.1Computation of ANOVA (F-test)

The researcher must note that some basic assumptions are considered before applying F-test in any research studies. These assumptions are:

The samples selected from the population should be independent random samples.

The variance in the population should be normal distributed.

The data generated should be interval in nature

Homogeneity of variances is necessary

In carryout the F-test the following steps should be consider:

Step 1: calculate the sum of squares (∑x2) and sum of scores (∑x) for each group in the data provided.

Step 2: calculate the scores for all combined groups into composite group called as the total group variance (Vt), given as.

SStotal (Vt) = ∑x2 -\\frac{(\\sum \\mathbf{X})^{2}}{\\mathbf{n}}

Step 3: Find the difference between the total group variance and the within groups variance known as the between-groups variance (Vt—Vw - Vb)

The formula is given as:

SSbetween (Vb) = (\\sum xg)2 - \\frac{(\\sum \\mathbf{X})^{2}}{\\mathbf{n}}

Where SSbetween = Between sum of square

ng = the number of individual scores in each group

n = the total number of individual scores in all the groups.

\\frac{(\\sum \\mathbf{X}\\mathbf{g})^{2}}{\\mathbf{n}} = the sum of each group’s raw scores squared and divided by ng

\\frac{(\\sum \\mathbf{X})^{2}}{\\mathbf{n}} = the sum of all raw sores squared and divided by n

Step 4: The mean value of the variance of each group calculated separately is known as the within groups (Vw). this is done by

SStotal—SSbetween = SSwithin and the formula below is applied:

\\mathbf{x}_{\\mathbf{g}}^{2} - \\frac{(\\sum \\mathbf{X}\\mathbf{g})^{2}}{\\mathbf{n}}

Step 5: At this step, you need to determine all degrees of freedom. The degrees of freedom are given as:

Total degrees of freedom (Dftotal) = n—1

Between groups df(Dfbetween) = k—1

Within groups df(Dfwithin) = n—k

Where n = total number of cases and k = numbers of groups.

Step 6: to compute for the between groups mean square, it is obtaining between sum of squares divided by the between df as:

MSbetween = \\frac{\\mathbf{S}\\mathbf{S}_{\\mathbf{s}}}{\\mathbf{d}\\mathbf{f}_{\\mathbf{g}}}

Similarly, that of within group mean square given by dividing the within sum of squares by within df as: \\frac{\\mathrm{S}\\mathrm{S}\\mathrm{w}}{\\mathrm{D}\\mathrm{f}\\mathrm{w}}

(between-groups variance)

(within-groups variance)

(between-groups variance)

(within-groups variance)

Step 7:

F = \\frac{\\mathbf{M}\\boldsymbol{S}_{\\mathbf{b}\\mathbf{e}\\mathbf{t}\\mathbf{w}\\mathbf{e}\\mathbf{e}\\mathbf{n}}}{\\mathbf{M}\\boldsymbol{S}_{\\mathbf{w}\\mathbf{i}\\mathbf{t}\\mathbf{h}\\mathbf{i}\\mathbf{n}}}= \\frac{\\mathbf{V}_{\\mathbf{b}}}{\\mathbf{V}_{\\mathbf{w}}}

Let us demonstrate how to calculate F-test using the example below:

Develop an ANOVA table for the following data on the sets of scores of students after treatment using three teaching strategies on three groups of students. Are the strategies have significant differences on the performance of the students at \\alpha = 5% level of significance.

Table 11.4: Data on Set of Scores for the Three Groups

Discovery

2

3

5

7

6

5

4

3

5

Scaffolding

4

3

2

5

7

9

5

2

1

3

Laboratory

5

7

6

7

5

3

2

4

6

2

1

Let us demonstrate the calculation in F-test with the data in Table 11.4 by using the steps for hypothesis testing.

Step 1: Statement of hypotheses

H0: \\overline{\\mathbf{x}}_{1}=\\overline{\\mathbf{x}}_{2}=\\overline{\\mathbf{x}}_{3}

H1: \\overline{\\mathbf{x}}_{1}\\neq \\overline{\\mathbf{x}}_{2}\\neq \\overline{\\mathbf{x}}_{3}(not all the mean scores are equal)

Step 2: Formulate the level of significance as \\alpha =.05

Step 3: Select the appropriate test statistics for research study. The F-test is appropriate test statistic

F = \\frac{\\mathbf{B}\\mathbf{e}\\mathbf{t}\\mathbf{w}\\mathbf{e}\\mathbf{e}\\mathbf{n}\\mathbf{s}\\mathbf{a}\\mathbf{m}\\mathbf{p}\\mathbf{l}\\mathbf{e}\\mathbf{v}\\mathbf{a}\\mathbf{r}\\mathbf{i}\\mathbf{a}\\mathbf{n}\\mathbf{c}\\mathbf{e}}{\\mathbf{w}\\mathbf{i}\\mathbf{t}\\mathbf{h}\\mathbf{i}\\mathbf{n}\\mathbf{s}\\mathbf{a}\\mathbf{m}\\mathbf{p}\\mathbf{l}\\mathbf{e}\\mathbf{v}\\mathbf{a}\\mathbf{r}\\mathbf{i}\\mathbf{a}\\mathbf{n}\\mathbf{c}\\mathbf{e}}= \\frac{\\mathbf{V}_{\\mathbf{b}}}{\\mathbf{V}_{\\mathbf{w}}}

k—1 degrees of freedom for numerator and N—k degrees of freedom for denominator, where K stands for number of treatment and ’n’ for total number of observations.

Now composite the table of value to calculate for ∑x and ∑x2 for each group.

Table 11.4: Data of F-Statistic for Three Group

Group 1

Group 2

Group 3

\\mathbf{x}_{1}

\\mathbf{x}_{1}^{2}

\\mathbf{x}_{2}

\\mathbf{x}_{2}^{2}

\\mathbf{x}_{3}

\\mathbf{x}_{3}^{3}

2

04

4

16

5

25

3

09

3

09

7

49

5

25

2

04

6

36

7

49

5

25

7

49

6

36

7

49

5

25

5

25

9

81

3

09

4

16

5

25

2

04

3

09

2

04

4

16

5

25

1

01

6

36

3

09

2

04

1

01

\\mathbf{n}_{1} = 9 \\mathbf{n}_{2} = 10 \\mathbf{n}_{3} = 11

∑\\mathbf{X}_{1} = 40 ∑\\mathbf{X}_{2} = 41 ∑\\mathbf{X}_{3} = 48

∑\\mathbf{x}_{1}^{2} = 198 ∑\\mathbf{x}_{2}^{2} = 223 ∑\\mathbf{x}_{3}^{3} = 254

\\sum \\mathbf{X}\\mathbf{g} = \\mathbf{X}_{1}+\\mathbf{X}_{2}+\\mathbf{X}_{3}=40+41+48=129

Calculate the total sum of squares as provided by the formula:

SStotal = ∑x2 -\\frac{(\\sum \\mathbf{X})^{2}}{\\mathbf{n}}

Since ∑\\mathrm{x}_{1}^{2}, 198, ∑\\mathrm{x}_{2}^{2} = 223, ∑\\mathrm{x}_{3}^{3} = 254 and ∑x = 40 + 41 + 48 = 129

Now substitute in the formula above:

SStotal = 198 + 223 + 254-\\frac{(192)^{2}}{30} = 675—554.7 = 120.3

Again, calculate the between-group sum of squares using the formula: \\frac{(\\sum \\mathrm{X}\\mathrm{g})^{2}}{\\mathrm{n}\\mathrm{g}} - \\frac{(\\sum \\mathrm{X})^{2}}{\\mathrm{n}}

= \\frac{40^{2}}{9}+\\frac{(41)2}{10} + \\frac{48^{2}}{11}-\\frac{129^{2}}{30}

= \\frac{1600}{9}+\\frac{1681}{10}+\\frac{2304}{11}-\\frac{16641^{2}}{30}

= 177.78 + 168.1 + 209.46—554.7

= 555.l34—554.7 = 0.64

To have, within-group which is sum of squares by the formula:

SSwithin = ∑\\begin{array}{c} \\left[\\mathbf{x}_{\\mathbf{g}}^{2-}\\frac{\\left(\\sum \\mathbf{X}_{\\mathbf{g}}\\right)}{\\mathbf{n}\\mathbf{g}}^{2}\\right] \\end{array}

By substituting in the formula, we have:

SSwithin = ∑\\mathbf{X}_{1}^{2}-\\frac{\\left(\\sum \\mathbf{X}_{1}\\right)}{\\mathbf{n}_{1}}^{2}=198\\hbox{--}\\frac{40^{2}}{9}=20.22+

= ∑\\mathbf{X}_{2}^{2}-\\frac{\\left(\\sum \\mathbf{X}_{2}\\right)}{\\mathbf{n}_{2}}^{2}=223-\\frac{(41)^{2}}{10}=54.9+

= ∑X33-\\frac{\\left(\\sum \\mathbf{x}_{3}\\right)}{\\mathbf{n}3}^{2}=254\\hbox{--}\\frac{\\left(48\\right)2}{11}=44.54

= 20.22 + 54.9 + 44.54 = 119.66

Another approach to obtain within-groups using formula

SSwithin = SStotal—SSbetween substituting we have:

= 120.3—0.64 = 119.66

To obtain the degree of freedom through different sources of variation in the formula as:

Total degree of freedom (df1) = n—1 = 30—1 = 29

Between-groups degree of freedom (dfB) = k—1 = 3—1 = 2

Within-groups degree of freedom (dfw) = n—k = 30—3 = 27

To obtain the variance estimate (mean square), which is between group and within-group. This is carried out by dividing the SSbetween by dfbetween and SSwithin by dfwithin as given by formula:

MSBW =\\frac{\\mathbf{S}\\mathbf{S}_{\\mathbf{B}\\mathbf{W}}}{\\mathbf{D}\\mathbf{F}_{\\mathbf{B}\\mathbf{W}}}=\\frac{0.64}{2}=0.32

MSW = \\frac{\\mathbf{S}\\mathbf{S}_{\\mathbf{W}}}{\\mathbf{D}\\mathbf{F}_{\\mathbf{W}}}=\\frac{119.66}{27}=4.43

F statistics is calculated by using the

F =\\frac{\\mathbf{M}\\mathbf{S}_{\\mathbf{B}\\mathbf{W}}}{\\mathbf{M}\\mathbf{S}_{\\mathbf{B}\\mathbf{W}}}, Now substituting into the formula, we obtain:

F =\\frac{0.32}{4.43}=0.07

Now we shall search for critical value in F-table with degree of freedom for between groups at horizontal across the table, whereas the degree of freedom for within groups at vertical down left side of the table. Given as (between) = 2 and df (within) = 27 at α =.05

Step 4: Decision: since the calculated F value is 0.07 is less than the F (critical) = 3.32 for df = (2, 27). Hence the H0 is retain. That is there is no significant difference among the means of the groups.

Step 5: Conclusion: it is concluded that there is no significant difference among the groups.

This may be by chance or sampling error.

The summary of our analysis can be provided in table as:

Table 11.5: One-way ANOVA Summary Table

Source of Variance

Sum of Squares

Degree of Freedom

Mean Sum of Square

F

Between Groups

0.64

2

0.32

Within Groups

119.66

27

0.07

4.43

Total

120.3

29

Now we have illustrated how to compute t-test, z-test, and ANOVA in this book. However, you will meet other tests such as ANCOVA, MANOVA and MANCOVA in other books.

Students Activity

The following are set of test scores for three sample groups:

S/N

1

2

3

4

5

X1

3

5

4

5

4

X2

3

4

5

6

7

X3

4

4

4

6

8

Determine whether the set of scores are significantly different or not.

Construct an ANOVA for the following data

Method

1

2

3

4

Conventional

7

8

4

9

Discussion

6

6

4

8

Experimental

6

5

4

5

Use the data above to verify a null hypothesis at α = 0.05 and state whether the methods have significant differences.

References

Awotunde, P. O. & Ugodulunwa (2002). An Introduction to Statistical Methods in Education. Printed and published in Nigeria by Fab Anieh (Nig) Ltd.

National Teachers’ Institute, Kaduna & National Open University of Nigeria (2016). Basic Research Methods in Education.