CHAPTER SEVEN

MEASURE OF VAR

IABILITY

7.1I

ntrodu

ction

In the pre

vi

ous chapte

r,

you have lear

nt Me

asure

of

Central Tendency

. Meas

ures of Centr

al Tendency do not

provide enough information

abo

ut descriptio

n of the data. Meas

ures of Central Tendency only d

escribe a distribution in term

s of mean score of the highe

st frequency score but no en

ough information about desc

ription of the data. Sometimes mean and median may be the same for a data without informing us whether they are dispersed or spread. For instance,

The scores 5, 6, 7, 8, 9 has a mean of 7 and 1, 3, 5, 8, 13 has a mean of 7. The two sets recorded the same mean but is well-informed that the first set is more rightly arranged around comparing to the second set. Therefore, there is the need for a measure to inform us how the score disperses or spread about the mean. That is why we need an index that describe the variability scores in a distribution.

Measures of Variability are also known as Spread or Dispersion Measures. This measurement depicts how far away from the mean different points in a distribution are. The range, quartile, variance, and standard deviation are among the measures of dispersion.

7.2Objectives

At the end of this chapter, you should be able to:

  1. describe the measures of variability, range, quartile, variance, and standard deviation.

  2. calculate range, quartile, variance, and standard deviation.

7.3Range

The easiest way to measure dispersion is to look at a distribution’s range. The disparity between the greatest and lowest scores in the data is what we mean by this. Sometimes a distribution’s score range is either inclusive or exclusive. When the difference between the upper border of the interval comprises the lowest score, the range is said to be inclusive. The exclusive range, however, is the variation in a distribution’s top and lowest scores.

Range is not a stable indicator of the nature of the spread of the measures around the central value. It is a method of determining spread as it takes into account of only the two extremes in a distribution.

7.3.1Range for Ungrouped Data

To demonstrate on how to determine a range of data, which is ungrouped. Let us look in example 7.1.

Determine the range of the scores in mathematics achievement test: 76, 90, 71, 95, 65, 60, 71, 75

The highest score = 95

The lowest score = 60

Range = 95—60 = 35

7.3.2Range for Grouped Data

For a grouped data, individual measurement is not known for that fact, the range is considered to be the difference between the upper limit of the last class interval using the distribution of scores:

50, 25, 27, 30, 50, 45, 22, and 34. Here, the highest boundary limit 50.5 and the lowest boundary limit is 21.5.

Range = 50.5—21.5 = 29. This approach of calculating range is defined as

Range = H—L + 1

= 50—22 + 1 = 29

7.4Percentile

Another measure of dispersion is carried out by the use of percentile. The Pth percentile of a set of “n” measurement arranged the scores in order of magnitude as it provides estimation value at most P% of the measurement below it and at most (100—P) % of the measurement above it. Where P is a value between 0 and 1.

To describe the results of achievement test scores and the ranking of a person in comparison to all other students who took a particular test is achieved through the use of percentiles.

7.5Quartiles

These values create four equal halves from a given data collection. that each component stands for \\frac{1}{4}\\mathrm{t}\\mathrm{h} of the population or sample. That is to say, the number of scores in any one of the four sections equals the number of scores in any one of the three remaining parts. First quartile data may be used to illustrate this Q1, second quartile Q2, and the quartile Q3.

7.5.1Computation of the Quartiles.

To compute quartiles in s grouped data, the below formula can be applied

\\mathbf{Q}_{\\mathbf{i}}=\\mathbf{L}+\\frac{\\left(\\mathbf{i}\\left(\\frac{\\mathbf{N}}{4}\\right)-\\mathbf{C}\\mathbf{f}\\mathbf{b}\\right)\\mathbf{C}}{\\mathbf{f}\\mathbf{w}}

Where i = 1, 2, 3, i.e., quartiles

N = \\Sigma \\mathrm{f} is the sample size

L = Lower class boundary of the quartile class

Cfb = Cumulative frequency below the quartile class

fw = frequency of the quartile class

C = Class interval size

Let us now, illustrate the calculation of the first quartile (Q1), the third quartile (Q3) and semi-interquartile range.

Example 7.2

Determine Q1 and Q3 In the given distribution

Class Interval

F

Cf

60—64

1

34

55—59

2

33

50—59

2

31

45—49

5

29*

40—44

8

24

34—39

6

16

30—34

4

10*

25—29

3

6

20—24

2

3

15—19

1

1

34

Now let us adopt the following procedures:

Step 1: Determine the cumulative frequency (cf)

Step 2: Divide 34 by 4 = 34 \\div 4 = 8.5

Step 3: Use the formula \\mathbf{Q}_{\\mathbf{i}}=\\mathbf{L}+\\frac{\\left(\\mathbf{i}\\left(\\frac{\\mathbf{N}}{4}\\right)-\\mathbf{C}\\mathbf{f}\\mathbf{b}\\right)\\mathbf{C}}{\\mathbf{f}\\mathbf{w}}

For the first quartile (Q1) = 8.5 is between the class 30—34

L = 29.5, fw = 4, Cfb = 6

\\mathbf{Q}_{\\mathbf{i}}=\\mathbf{L}+\\frac{\\left(\\mathbf{i}\\left(\\frac{\\mathbf{N}}{4}\\right)-\\mathbf{C}\\mathbf{f}\\mathbf{b}\\right)\\mathbf{C}}{\\mathbf{f}\\mathbf{w}} = 29.5+\\frac{\\left(8.5\\hbox{--}6\\right)5}{4}=29.5+\\frac{\\left(2.5\\right)5}{4}

= 29.5 + 3.13

= 32.63

For third quartile (Q3) lies between 45—49

L = 44.5, fw = 8.5, Cfb = 24

\\mathrm{Q}_{3} = 44.5+\\frac{\\left(3\\times 8.5\\hbox{--}24\\right)5}{5}=44.5+\\frac{\\left(1.5\\right)5}{5}

= 44.5 + 1.52

= 46.02

To determine the interquartile range in the distribution data is to find the differences between the first quartile Q1 and the third quartile Q3. By the computation provided through the working example above.

Q3 = 46.02—32.63 = 13.39

While the semi-interquartile range is the half value of the inter-quartile range. Semi-interquartile range is also known as quartile deviation, and it is computed by the formula:

\\frac{\\mathbf{Q}_{3}-\\mathbf{Q}_{1}}{2}

Semi-interquartile range = \\frac{46.02-32.63}{2} = 6.70

7.6Variance

Another form of Measure of Variability is Variance. The variance Is calculated by taking the mean of the sum of squared deviation of individual score from their average. It is also known as mean square or mean squared deviation. The variance is denoted by the lower-case Greek symbol \\delta 2 and it is defined by the formula:

\\mathbf{\\delta }2 = \\frac{\\mathbf{\\Sigma }\\mathbf{f}\\left(\\mathbf{x}-\\overline{\\mathbf{x}}\\right)^{2}}{\\mathbf{N}\\hbox{--}1}

This formula given above is known as definitional formula which is used for computing \\delta 2 when the sample size is small. It can also be used for large sample size, but it is tedious. For computational purposes, the formula below is usually preferred

S2 = \\frac{\\sum \\mathbf{f}\\mathbf{x}^{2}-\\frac{\\left(\\sum \\mathbf{f}\\mathbf{x}\\right)^{2}}{\\mathbf{n}}}{\\mathbf{n}-1}

The two letters S2 and \\delta 2 for sample variance and population variance respectively.

7.6.1Computation of Variance for Ungrouped Data

To calculate for variance of ungrouped data with a finite set of score, you can apply the following procedure.

For instance, 9 students have the following scores in a test.

9, 4, 5, 7, 2, 6, 8, 7, 5.

Step 1: Compute the mean score of distribution.

\\frac{9+4+5+7+2+6+8+7+5}{9}=\\frac{53}{9}=5.9

Step 2: Compute the difference of each score from the mean and square the result of each.

(9—5.9)2 + (4—5.9)2 + (5—5.9)2 + (7—5.9)2 + (2—5.9)2 + (6—5.9)2 + (8—5.9)2 + (7—5.9)2 + (5—5.9)2

(3.1)2 + (- 1.9)2 + (- 0.9)2 + (1.1)2 + (- 3.9) + (0.1)2 + (2.1)2 + (1.1)2 + (0.9)

9.61 + 3.61 + 0.81 + 1.21 + 15.21 + 0.01 + 4.41 + 1.21 + 0.81 = 36.89.

Step 3: find the mean of these scores and that gives the variance

\\frac{36.89}{9}=4.10

7.6.2Computation of Variance for Grouped Data

To compute for variance in grouped data both definitional formula and working formula can be used by a researcher, who desires, but the definitional formula is too tedious and may be greatly affected by rounding up errors. Both have the same procedure as in ungrouped data but the only difference, the class mark of each class is used for the score (x).

Let us illustrate the computing of sample variance of test score, using Definition Formula. Data provided below is the test scores of 50 Students in Geography test.

Table 7.1 Distribution Scores for 50 Students in Geography Test.

A

Class Mark

(X)

B

Frequency

(F)

C

Fx

D

x - \\overline{\\mathbf{x}}

E

(x - \\overline{\\mathbf{x}}\\right)2

F

F (x - \\overline{\\mathbf{x}}\\right)2

10

3

30

4.58

20.98

62.94

9

4

36

3.58

12.82

51.28

8

3

24

2.58

6.66

19.98

7

8

56

1.58

2.50

20.00

6

10

60

0.58

0.34

3.40

5

5

25

–0.42

0.18

0.90

4

5

20

–1.42

2.02

10.10

3

2

6

–2.42

5.86

11.72

2

4

8

–3.42

1.70

46.80

1

6

6

–4.42

19.54

117.24

n = 50 \\mathbf{\\Sigma }\\mathbf{f}\\mathbf{x} = 271 \\mathbf{\\Sigma }F(x - \\overline{\\mathbf{x}}\\right)2 = 344.36

\\overline{\\mathbf{x}}=5.42

To determine sample variance by using definitional formula, the following steps should be adopted:

Step 1: Determine the mean, as \\overline{\\mathbf{x}}=\\frac{\\mathbf{\\Sigma }\\mathbf{f}\\mathbf{x}}{\\mathbf{n}} which is \\frac{271}{50}=5.42

Step 2: Find the deviation score and write the results in Column B (x - \\overline{\\mathrm{x}}).

Step 3: Square the deviation scores obtained and record in Column C, i.e., (x - \\overline{\\mathrm{x}}\\right)2.

Step 4: Find the product of F(x - \\overline{\\mathrm{x}}\\right)2 and write the results in Column D.

Step 5: Sum up the product of F(x - \\overline{\\mathrm{x}}\\right)2 and record the results in Column E, which is \\Sigma F(x - \\overline{\\mathrm{x}}\\right)2 = 344.36

Step 6:Compute the variance given as \\frac{\\mathbf{\\Sigma }\\mathbf{f}\\left(\\mathbf{x}-\\overline{\\mathbf{x}}\\right)^{2}}{\\mathbf{n}\\hbox{--}1} = \\frac{344.36}{50\\hbox{--}1} = \\frac{344.36}{49} =7.03\\approx 7.0

The working formula that is much easier is illustrate using the data in Table 7.4.

Table 7.1 Distribution Scores for 50 Students in Geography Test.

A

Class Mark

(X)

B

Frequency

(F)

C

FX

D

X2

E

FX2

10

3

30

100

300

9

4

36

81

324

8

3

24

64

192

7

8

56

49

392

6

10

60

36

360

5

5

25

25

125

4

5

20

16

80

3

2

6

9

18

2

4

8

4

16

1

6

6

1

6

\\mathbf{\\Sigma }\\mathbf{F} = 50\\mathbf{\\Sigma }\\mathbf{F}\\mathbf{X} = 271 \\mathbf{\\Sigma }FX2 = 1813

\\frac{\\mathbf{\\Sigma }\\mathbf{F}(\\mathbf{X})^{2}}{\\mathbf{n}}=\\frac{271}{50}=1468.82

To work out variance using working formula, the following procedure should be adopted

Step 1: Draw a frequency distribution of the scores

Step 2: Find the product of each score with its frequency and write the result in Column B (FX)

Step 3: Add the values of FX and divide by n, that is \\frac{\\Sigma \\mathrm{F}(\\mathrm{X})^{2}}{\\mathrm{n}}=1468.82

Step 4: find the square of each score (X) and record in column D.

Step 5: Find the product of each (X2) with its frequency and write the results in column E.

We have \\Sigma FX2 = 1813

Step 6: Substitute all the values obtained in the formula, that is (\\Sigma FX)2, \\Sigma FX2

S2 = \\frac{\\sum \\mathbf{f}\\mathbf{x}^{2}-\\frac{\\left(\\sum \\mathbf{f}\\mathbf{x}\\right)^{2}}{\\mathbf{n}}}{\\mathbf{n}-1}

S2 = \\frac{1813\\hbox{--}1468.82}{50-1}=\\frac{344.18}{49}=7.02\\approx 7.0

The two formulas, definitional formula and working formula were used to obtain the same values of variance. You can discover that using working formula is simpler and can yield more accurate result that using definition formula that is tedious and affected with rounding up errors.

7.7Mean Deviation

It is simply mean the deviation by which individual scores in a distribution differ from the mean. However, summing up all the mean deviation for each score to the average always zero. For example, the scores are: 95, 90, 85, 80, and 75.

\\mathrm{A}\\mathrm{v}\\mathrm{e}\\mathrm{r}\\mathrm{a}\\mathrm{g}\\mathrm{e}=\\frac{95+90+85+80+75}{5}=\\frac{425}{5}=85

If the average is 85, then deviation will be

95\\hbox{--}85=10

90\\hbox{--}85=5

85\\hbox{--}85=0

80\\hbox{--}85=\\hbox{--}5

75\\hbox{--}85=\\hbox{--}10

It is good to note that, the mean is always zero, because the negative number cancel out the positive numbers.

7.8Standard Deviation

The standard deviation is adopted by the letter ‘S” for sample of data while that of the population is represented by the lower-case Greek letters \\delta (sigma). The standard deviation is simply the variance’s square root. It is a generally reliable measure of variability and is regarded as the best indicator of dispersion. A low standard deviation suggests that the data points tend to be extremely near to the average, whereas a high standard deviation indicates that the points are dispersed over a wide range of values. It reflects how much departure from the mean there is value. It gives an idea of how close the entire set of data is to the mean value.

Therefore, variance of ungrouped data which is \\frac{53}{9}=5.9, standard deviation will read as

\\sqrt[]{\\frac{53}{9}}=\\sqrt[]{5.9} = 2.429

7.8.1Standard Deviation for Grouped Data

When the data is grouped in the class intervals, the researcher should use a modified formula either definitional or working formula. The definitional formula given as: S = \\sqrt[]{\\frac{\\mathbf{\\Sigma }\\mathbf{f}\\left(\\mathbf{x}-\\overline{\\mathbf{x}}\\right)^{2}}{\\mathbf{n}\\hbox{--}1}} and that of population is defined as: \\mathbf{\\delta }=\\sqrt[]{\\frac{\\mathbf{\\Sigma }\\mathbf{f}\\left(\\mathbf{x}-\\overline{\\mathbf{x}}\\right)^{2}}{\\mathbf{N}}}

Step 1:Write down the scores and their frequencies in Column into A and B.

Step 2:Find the mean of distribution \\frac{\\mathbf{\\Sigma }\\mathbf{f}\\mathbf{x}}{\\mathbf{n}} and write it down in column C.

Step 3:write down x - \\overline{\\mathbf{x}} note it down in Column D.

Step 4:Record square of the deviation record it down in column E, i.e., \\left(\\mathbf{x}-\\overline{\\mathbf{x}})2.

Step 5:Find the product of (\\mathbf{x}-\\overline{\\mathbf{x}})2 and enter it down in column F.

Step 6:Find the sum of \\mathbf{\\Sigma }\\mathbf{f}(\\mathbf{x}-\\overline{\\mathbf{x}})2.

Step 7:Divide \\mathbf{\\Sigma }\\mathbf{f}(\\mathbf{x}-\\overline{\\mathbf{x}})2 by n—1 to obtain the variance.

Step 8: Compute the square root of the variance to obtain the standard deviation.

Let us use the data in Table 7.1 to calculate the standard deviation using the definitional formula.

For n = 50 and \\Sigma \\mathrm{f}(\\mathrm{x}-\\overline{\\mathrm{x}})2 = 344.36, substitute in the formula as

S = \\sqrt[]{\\frac{\\mathbf{\\Sigma }\\mathbf{f}\\left(\\mathbf{x}-\\overline{\\mathbf{x}}\\right)^{2}}{\\mathbf{n}\\hbox{--}1}} = \\sqrt[]{\\frac{344.36}{50\\hbox{--}1}} = \\sqrt[]{\\frac{344.36}{49}} = \\sqrt[]{7.0} = 2.65

Another modified formula for computing the standard deviation is working formula, stated as: S = \\sqrt[]{\\frac{\\sum \\mathbf{F}\\mathbf{X}^{2}-\\frac{\\left(\\sum \\mathbf{F}\\mathbf{X}\\right)^{2}}{\\mathbf{n}}}{\\mathbf{n}-1}}

Where all the terms hold as for the variance. When a Researcher desired to compute for standard deviation, the following procedure should be applied.

Step 1: Develop a frequency distribution table

Step 2: Find the product of FX i.e., multiplying the score and its frequency

Step 3: Add up the (FX) values to obtain (\\sum \\mathrm{F}\\mathrm{X}), square it and divide by n. That is \\frac{\\left(\\sum \\mathbf{F}\\mathbf{X}\\right)^{\\mathbf{n}}}{\\mathbf{n}}

Step 4: Each score should be squared. That is (X2)

Step 5: Record the product of (X2) by its frequency and sum all the values. That is \\sum \\mathbf{f}\\mathbf{x}^{2}

Step 6: Calculate the variance by substituting in the working formular as: S2 = \\frac{\\sum \\mathbf{F}\\mathbf{X}^{2}-\\frac{\\left(\\sum \\mathbf{F}\\mathbf{X}\\right)^{2}}{\\mathbf{n}}}{\\mathbf{n}-1}

Step 7: Obtain the standard deviation by finding the square root of variance. S = \\sqrt[]{\\frac{\\sum \\mathbf{F}\\mathbf{X}^{2}-\\frac{\\left(\\sum \\mathbf{F}\\mathbf{X}\\right)^{2}}{\\mathbf{n}}}{\\mathbf{n}-1}}

Let us use the data in Table 7.2 to compute the standard deviation using the working formula,

For n = 50, (\\sum \\mathrm{F}\\mathrm{X}) = 271, \\sum \\mathrm{F}\\mathrm{X}^{2}=1813

Substitute in the above formula to obtain standard deviation.

S = \\sqrt[]{\\frac{\\sum \\mathbf{F}\\mathbf{X}^{2}-\\frac{\\left(\\sum \\mathbf{F}\\mathbf{X}\\right)^{2}}{\\mathbf{n}}}{\\mathbf{n}-1}}

S = \\sqrt[]{\\frac{1813\\hbox{--}\\frac{\\left(271\\right)^{2}}{50}}{50-1}} = \\sqrt[]{\\frac{1813\\hbox{--}1468.82}{50-1}} = \\sqrt[]{\\frac{344.18}{49}} = \\sqrt[]{7.0} = 2.65

The variance and standard deviation have many merits over other measures of variability. These include the fact that the sample standard deviation is a more accurate estimate of the population parameter than other measure of variability.

The variance and standard deviation are good for the calculation of many types of statistics and also widely used as measures of error.

Student Activity

  1. Given the measurements 8, 24, 18, 14, 12, and 4. Compute the range, the variance, and the standard deviation.

  2. Consider a set of seven scores say 1, 2, 3, 4, 5, 6, 7.

Calculate:

  1. \\sum \\mathrm{x}

  2. \\sum \\mathrm{x}^{2}

  3. \\Sigma \\mathrm{f}(\\mathrm{x}-\\overline{\\mathrm{x}})2

  4. The scores for achievement test are as follows

64695361605174639380

5559575530407032434

48354467525347425037

32495843441252395665

68756786212173626650

  1. Construct frequency distribution table with class interval 0—9, 10—19, _ _ _

  2. Compute the variance and standard deviation.

References

Maruf, O. I. & Aliyu, Z. (2013). Measurement and Evaluation in Education. Printed by: Stevano Printing Press, General Printers and Publishers.

Sambo, A. A. (2008). Research Methods in Education. Stirling Horden Publishers (Nig) Ltd.