CHAPTER EIGHTTEEN
STATISTICAL ANALYSIS IN RESEARCH
18.1 Objectives
At the end of this chapter, you should be able to:
i. define the word statistics
ii. explain the types of statistics;
iii. organize a set of data
iv. represent data graphically
v. calculate simple percentage, measure of central tendency, correlation, z-test and t-test
18.2 Introduction
Statistics is involved with the gathering, arrangement, and analysis of data; it plays a crucial part in educational decision-making and forecasting. Practically all scientific fields, including the physical, applied, and social sciences, as well as commerce, the humanities, government, industry, and education, employ statistics.
In this chapter, we will discuss the definition of statistics in research, the types of statistics, the significance of statistics in education, data organization, and statistical tools.
18.3 Research Concept of Statistics
Statistics, in the context of research, may be defined as the application of scientific methods for transforming information gathered through a systematic process into quantifiable data, which are then subjected to quantitative analysis in an effort to draw conclusions about populations based on empirical studies. In scientific, business, and mathematics education research, descriptive and inferential studies can be categorized into two major categories.
18.4 Statistical Categories
There are two primary forms of statistics: descriptive statistics and inferential statistics.
The objective of descriptive statistics is to organize and describe the features of educational variables without reaching conclusions. Mean, mode, median, standard deviation, range, percentage, and proportions are the topics covered by descriptive statistics.
Inferential statistics use descriptive statistics' features to test hypotheses and make conclusions. This statistic includes procedures such as the t-test, f-test, and ANOVAs.
18.5 Value of Education Statistics
In the following ways, statistical analyses are extremely beneficial to mathematics education researchers:
§ Statistical analyses assist academics in quantifying the properties of variables in order to draw relevant findings on educational factors.
§ Statistical analyses aid researchers in statistically comparing factors in order to get appropriate conclusions on the investigated educational variables.
§ Statistical analyses derived from accurate conclusions about a population based on empirical study aid researchers in drawing such judgments.
§ Researchers in mathematics education who seek to read, evaluate, and interpret study results for publication in textbooks and academic publications benefit from knowledge of statistical analyses.
18.6 Administration of Data
Data collected to make such scores more relevant or manageable for computation must be written in sequential sequence and rounded to the appropriate number of significant figures or decimal places.
18.6.1 Sequencing
The data may be arranged in ascending or descending order by magnitude.
For example, seven students have the following mathematics test scores: 9,7,11,6,10,8,12
By ascending position: 6,7,8,9,10,11,12
While in decreasing sequence as follows: 12, 11, 10, 9, 8, 7, 6, and 6
Note that when collected data consist of names, they might be organized alphabetically.
However, if the data consists of things, animals, events, etc., you may organize it according to its categories or groupings.
18.6.2 Significant Figures
The first non-zero digit seen, starting from the left, is used to determine the significant digits. When the requisite number of significant figures has been subtracted, the remaining digits are eliminated according to the following rule:
If the first digit to be eliminated is a 5 or more, the final significant digit is raised by 1.
For example:
1. Approximate 6.2543 to two, three, significant figures
6.2543 = 6.3 to 2 sf
6.2543 = 6.25 to 3 sf
2. Approximate 0.000856 to one sf, two sf.
i. 0.000856The zero before the real number is not a significant figure. The rounding off will take place after the first real number.
0.000856 = 0008 to 1 sf
ii. 0.000856 = 0.00086 to 2 sf.
Note that the number zero is only significant if only situated after any non-zero real number in the whole number part e.g., 5409, the zero here is significant, but in 0.058, 17.60 and 0.000067 are zeros that are not significant.
18.6.3 Decimal Point
Decimal point is sometimes referred to decimal place and abbreviated to dp. These are counted to the Right of the decimal point and contained the same rules of rounding off in significant figures. For example,
Round off each of the numbers to
i. one decimal place
ii. two decimal places
a) 0.006
b) 7.6020
a) 0.006 = 0.0 to 1 dp
= 0.01 to 2 dp
b) 7.6020 = 7.6 to 1 dp
= 7.60 to 2 dp
18.6.4 Frequency Distribution Table
When the researcher wants to carry out the analysis, you needto summaries and organize the data in a frequency table. This facilitates the analysis of the data collected. A frequency table shows how many times each score in a distribution occurred. It consists basically of three columns – score column, tally column and frequency column. More columns could be added depending on what other information is required.
To construct a frequency distribution table, there are three basic steps.
(i) Listing the different scores in the distribution under the score column beginning with the highest scores on top.
(ii) Tallying the scores. Tallying of scores involves placing a stroke against a score each time the score occurs in the distribution and strokes are arranged in bundles of five for easy counting. Example 3, III, 4, IIII, 5, IIII, 6, IIII, I
(iii) Determining the number of times each score occurred. The number of strokes at each score is counted and written down as frequency of that particular score.
We now illustrate these steps with data collected from 12 students on their interest towards mathematics. 9, 8, 6, 6, 8, 8, 9, 9, 10, 12, 12, 12, 6, 6, 10, 6, 9, 5
The highest score in the distribution is 12 while the lowest score is 5.
Table 1: Frequency Distribution of Data in Mathematics Interest.
Score X | Tally | Frequency |
12 | III | 3 |
10 | II | 2 |
9 | IIII | 4 |
8 | III | 3 |
7 | I | 1 |
6 | IIII I | 6 |
5 | I | 1 |
|
| 20 |
18.6.5 Grouped Frequency Distribution
There are occasions when there are so many scores that it is essential to combine various scores. A class interval is made up of a set of score values.
Example
Present the scores below in a grouped frequency table.
55, 62, 60, 50, 52, 58, 55, 9, 59, 53, 52, 33, 48, 65, 60, 36, 68, 45, 62, 59, 60, 33, 40, 61, 38
60, 51, 55, 68, 55, 47, 39, 58, 52, 47, 42, 48, 55, 48, 46, 55, 51, 58, 65, 52, 35, 54, 55, 52, 56, 46, 65, 53, 34, 48, 50, 3
Before grouping the scores, the researcher has to determine the class size to use. To do this, the following procedure should be adopted.
(i) Find the range from the score; we have 60 – 33 = 27
(ii) Determine the number of groups. It is customary to have between 10 and 15
(iii) Divide the range by the number 10 e.g. 27 ÷ 10 = 3 (approximate). It is approximate to the nearest odd number because it is good to use odd number.
(iv) Draw a table and tally the scores according to groups
Table: Grouped Frequency Distribution Table
S/N | Class Interval | Tally | Frequency |
1 | 66 – 68 | II | 2 |
2 | 63 – 65 | III | 3 |
3 | 60 – 62 | IIII II | 7 |
4 | 57 – 59 | IIII | 5 |
5 | 54 – 56 | IIIIIIII | 9 |
6 | 51 – 53 | IIII III | 8 |
7 | 48 – 50 | IIII | 5 |
8 | 45 – 47 | IIII | 5 |
9 | 42 – 44 | II | 2 |
10 | 39 – 41 | III | 3 |
11 | 36 – 38 | II | 2 |
12 | 33 – 35 | IIII | 4 |
|
|
| 56 |
In order to arrange the scores into class intervals, the class size must be determined. For informational correctness, you should guarantee that the number of class intervals is between 10 and 20 when deciding class size. Also desired are calculations for class sizes of 2, 3, 5, and 7. Understanding interval, class size, class limitations, and class borders are fundamental.
A class interval is a set of scores in which the number of scores within each group is uniform. It has both the lowest and greatest ratings of the tiny group. Class size refers to the number of scores contained within a class interval. Each of the two highest and lowest scores in a class interval is referred to as a class limit. The lowest score is the limit for the lower class, while the highest score is the limit for the higher class. For instance, the class interval 66–68 has a class size of 3. The limit for the lower class is 66, while the higher-class limit is 68.
Any pair of successive class intervals is separated. To determine the class border, you deduct 0.5 from the minimum class limit and add 0.5 to the maximum class limit. Example 66–68 will be shown as 65.5–68.5
Example of class limits and class boundaries
Class Limit | Class Boundaries |
2 – 4 | 1.5 – 4.5 |
5 – 7 | 4.5 – 7.5 |
8 – 10 | 7.5 – 10.5 |
11 – 13 | 10.5 – 13.5 |
18.7 Graphical Representation of Data
Following, a researcher used frequency distribution tables to arrange the data. Graphs are sometimes used to arrange data. The term "graphical representations" refers to this. Bar charts, histograms, frequency polygons, pie charts, and ogives come in various varieties. The bar chart and the histogram are the two types of data representation that are most frequently utilized in studies on mathematics education.
Bar Chart
A bar chart, also known as a bar diagram or a bar graph, consists of bars that stand out from one another. This demonstrates that the measurement scales are discrete rather than continuous. Bar graphs depict the relative frequency of instances within each group. It has vertical and horizontal axes. The vertical axis is termed the ordinate, while the horizontal axis is known as the abscissa. The bars may be vertical or horizontal, and the line bars or columns are of equal height; however, the height changes based on the proportion of the data.
Using a pie chart, a researcher interested in evaluating the performance of students in different departments, for instance, might exhibit the data. The examination officer of the school of science, for instance, displays the performance of students in several school departments as follows: Integrated Science = 60, Biology = 55, Chemistry = 30, Computer Science = 40, Mathematics = 25, Physics = 20.
The data can be represented in a bar chart by the following, these steps:
Step 1: Choose a convenient scale to draw the two axes (vertical and horizontal).
Step 2: Make out the height of each section based on the chosen scale.
Step 3: Draw out the bars of each to represent the height.
Histogram
A histogram is a graphical depiction of data points arranged into user-specified ranges, comparable to a bar graph. The histogram consists of a collection of rectangles with bases and intervals between class borders. Each rectangle bar represents data, and each rectangle is contiguous to its neighbors. It is created by charting the frequencies vs the class borders of the matching class interval. It is mostly used to represent continuous data below.
To create a histogram, the following steps must be taken.
Step 1: Compose frequency distribution table having the class interval, adjusted class
boundaries and the frequencies.
Step 2: Choose suitable scales for both axes and draw vertical and horizontal axes.
Step 3: Label the axis based on the chosen scales.
Step 4: Draw rectangular bars on each boundary with the height corresponding to the
frequencies.
Step 5: Draw arrows to indicate what is on the vertical and horizontal axis.
To draw the histogram, you then adopt the listed procedures 1 – 5.
To illustrate the histogram scores of 80 students in MAT 224 test at end of 2nd semester exams in a certain College of Education as follows:
Scores | Number of Students |
50 – 52 | 5 |
53 – 55 | 11 |
56 – 58 | 14 |
59 – 61 | 10 |
62 – 64 | 8 |
65 – 67 | 7 |
68 – 70 | 6 |
71 – 73 | 9 |
74 – 76 | 5 |
77 – 79 | 5 |
Table: MAT225 Test Score for 2nd Semester Exams
S/N | Class Interval | Class Boundary | Frequency |
1. | 50 – 52 | 49.5 – 51.5 | 5 |
2. | 53 – 55 | 51.5 – 55.5 | 11 |
3. | 56 – 58 | 55.5 – 58.5 | 14 |
4. | 59 – 61 | 58.5 – 61.5 | 10 |
5. | 62 – 64 | 61.5 – 64.5 | 8 |
6. | 65 – 67 | 64.5 – 67.5 | 7 |
7. | 68 – 70 | 67.5 – 70.5 | 6 |
8. | 71 – 73 | 70.5 – 73.5 | 9 |
9. | 74 – 76 | 73.5 – 76.5 | 5 |
10. | 77 – 79 | 76.5 – 79.5 | 5 |
| TOTAL | 80 |
Frequency Polygon
This is a type of graph of a frequency distribution which is obtained by plotting the class frequencies against the class marks. It is polygon because the mid-point of the tops of the rectangles in the histogram are connected.
A frequency polygon is constructed by the following the procedure bellow:
Step 1: Draw both axes (i.e., vertical, and horizontal).
Step 2: Mark out the frequencies along the vertical axis and the mid-points of class
intervals on the horizontal axis.
Step 3: Plot the frequency of each class interval at the appropriates height as a point
above the mid-point of interval.
Step 4: Join these points with straight lines.
Step 5: Connect the first and last dots with the horizontal axis at the mid-point before the
first dot and the one after the last dot.
Use the data above to present a frequency polygon by adopted step 1 – 5.
frequency polygon of data in table
Students’ Activity
1. A newly admitted mathematics students of ABU Zaria spent a total of N60,000.00 with the following details:
· Tuition fees = N10,000.00
· Game fees = 5,000.00
· Clinic fee = N2,000.00
· Course materials = N15,000.00
· Accommodation = N10,000.00
· Stationeries = N5,000.00
· Feeding = N10,000.00
· Notebooks = N3,000.00
Construct these exercises in pie chart.
2. Consider the following scores obtained by a researcher by administering 40 students in post-test.
30 25 54 50 12 5 18 40
21 25 55 13 40 3 46 3
23 21 34 49 18 8 48 18
39 27 37 30 15 5 23 46
21 21 33 35 5 8 45 38
a) Prepare a frequency distribution table for the data using class size of 5.
b) Draw a bar chart.
c) Draw a Histogram.
d) Draw a frequency polygon.
18.8 Simple Percentage
This is the simplest of all the statistical methods used in analysis of data. What is usually done is to translate frequency counts into percentages. It is useful in making general statements about a given situation and for comparing different parts of a whole or a given situation. This is because; the percentage treats the groups as formula for calculating the percentage is
= x 100
To illustrate the calculation of the percentage, let us suppose that a researcher interested in comparing the performance of the students in the different science subjects in the SSCE.
Numbers of students who enrolled and passed in the science subjects in SSCE are given below:
Subject Nos Admitted Nos. Passed
Biology 8,500 4,500
Chemistry 2,320 1,800
Physics 1,800 1,500
Agric. Sci. 9000 3,000
The percentage of students who passed in various subjects is calculated as follows:
% passed in Biology = x = 52.94%
% passed in Chemistry = x = 77.59%
% passed in Physics = x = 83.3%
% passed in Agric. Sci.= x = 33.33%
If you use the number or frequency of students who passed Biology (4500) and the number who passed Chemistry (1800) in comparing performance in the subjects, you may be tempted to conclude that performance in Biology is better than performance in Chemistry. However, if you use the percentage of students passing in the two subjects (52.94% for Biology and 77.59% in Chemistry), you will see that performance in Chemistry appears to be better than performance in Biology.
Student Activity
1. In a research to compare drop-out rates junior secondary school in three towns, a researcher collected the following data.
Town No.Enrolled No. Dropped out
Bida 6,800 400
Edozigi 10,200 2,000
Pati 5,000 900
(a) What is the percentage drop-out in each of the three towns?
(b) What percentage of the total number of students in the three towns dropped out?
18.9 Measure of Central Tendency
The mean, median and mode are measure of central tendency or measures of location. There are very useful statistics for reporting research findings. They give you information about the characteristics of an average or typical member of a group. The characteristics may be performance, interest, attitude etc.
18.9.1 The Mean
This is also known as the arithmetic average. It is the sum of a distribution's scores divided by the entire number of scores.
The algorithm is where is the sum of scores, N is the total number of scores, is the mean.
Example 1:
The scores of ten students in a test are as follows: 40, 55, 60, 30, 50, 48, 70, 85, 72, and 65. Find the mean.
= 40 + 55 + 60+ 30 + 50 + 48 + 70 + 85 + 72 +65 = 575
N = 10 items (scores)
= = = 57.5
18.9.2 The Median
The middle score, or median, divides the total scores into two equally-sized halves. The scores must be sorted in an ordering, either ascending or decreasing, to obtain the median.
Example
Find the median of the sets of scores
(a) 9, 7, 15, 10, 11, 8, 2, 4, 3
(b) 5, 9, 8, 7, 3, 2, 4, 6, 5, 8
In example (a) We have 2, 4, 5, 7, 8, 9, 10, 11, 15. By counting, the middle number, which is 8, is the median.
In example (b), you will notice that the number is even. You will therefore arrange in order by counting, the two middle numbers are taken, added and divided by two.
We have: 2, 3, 4, 5, 5, 6, 7, 8, 8, 9. The median is = = 5.5
18.9.3 The Mode
This is the score or scores that appear the most frequently in a distribution. Inspection makes this determination simple. But in some distribution, you may have two modes. This is referred to as bimodal, while multi-model refers to any distribution that has more than two modes.
Example
Find the mode in the distribution: 20, 30, 21, 45, 30, 25, 33, 35, 30, 22, 29, 30.
By inspection, you will see that 30 appeared 4 times. It is the mode.
Student Activity
1. What is the meaning of the following terms; mean, median and mode
2. Compute the mean, median and mode of the scores below:
10, 7, 8, 9, 6, 9, 3, 2, 9, 5, 1, 2, 5, 0, 5, 7, 8, 5, 6, 4, 5
3. Present the scores below in a grouped frequency table
The researcher obtained scores after testing 65 students in Mathematics:
33 26 29 41 36 36 26 41 37 19 37 29
44 37 30 31 29 41 28 23 44 29 46 51
49 55 60 29 37 36 34 26 30 28 23 25
47 50 48 65 38 39 26 41 34 25 40 30
24 50 48 47 56 33 38 37 37 30 41 30
41 24 51 49 50
4. Distinguish clearly between the following pairs of concepts giving a suitable example in each case.
(a) Descriptive and Inferential Statistics
(b) Class interval and class size
(c) Frequency polygon and histogram
(d) Bar graph and pie chart
(e) Class mark and class boundaries
(f) Multi-modal and Bimodal
5. Consider the following scores obtained by a researcher by testing 40 students in geometry concepts in Mathematics
38 5 45 40 13 33 21 30
46 8 23 12 50 37 27 39
40 8 48 15 49 34 25 21
18 3 46 18 35 55 21 21
3 5 18 5 30 54 25 23
(a) Using a class size of 5, prepare a frequency distribution table for the data
(b) Using your frequency table (i) Draw bar graph (ii) Draw a histogram
18.10 The Concept of Correlation
The degree of link between two variables is referred to as correlation. The correlation coefficient is a metric that expresses how strongly two variables are related. The range of values ranges from -1 to 1. Accordingly, a correlation coefficient of -1 denotes a perfect negative connection, a correlation coefficient of +1 denotes a perfect positive relationship, and a correlation coefficient of 0 denotes the absence of a relationship. A researcher can establish if variations in one set of scores are the result of variations in another set of scores using correlation. Determining the magnitude of this deviation will also be helpful. Plot two sets of scores, X and Y, on the Cartesian coordinate plane once you have a scatter diagram. This produces a link that is either positive, negative, or neutral.
This suggests that those who perform well on one variable also perform well on the second. Additionally, it implies that those who perform poorly in one variable also perform poorly in the other variables.
Contrary to a positive connection, a negative relationship is the reverse.
This indicates that there is no link.
18.10.1 Correlation Coefficient Computation
There are several correlation coefficient calculation techniques. The Pearson Product Moment Correlation Method and the Spearman Rank-order Correlation Method are two of these techniques. The PPMC is the most extensively utilized and is named after its creator, Karl Pearson. There are two primary methods for computing the PPMC (r).
The first is the deviations from the mean method, while the second is the raw scores method. Let us examine each of them individually.
Deviation from the Mean is given by using:
or
Where x = x - , y = y -
Example 5: The researcher obtained the following data after administering attitude questionnaire to the students and the scores on academic achievement test. Using the data below calculate the Pearson r.
X | 10 | 11 | 12 | 12 | 13 | 14 | 15 | 15 | 16 | 17 | 17 | 18 | 18 |
Y | 5 | 8 | 9 | 4 | 7 | 6 | 8 | 9 | 10 | 10 | 12 | 14 | 13 |
Solution
Step:
(i) Find the mean for X and Y
(ii) Complete the composite table
(iii) = 80.90, = 87.25 = 107.72
S/N | X | Y | XX - | Yy - | xy | x2 | y2 |
1 | 10 | 5 | -4.5 | -3.8 | 17.10 | 20.25 | 14.44 |
2 | 11 | 8 | -3.5 | -0.8 | 2.80 | 12.25 | 0.64 |
3 | 12 | 9 | -2.5 | 0.2 | -0.50 | 6.25 | 0.04 |
4 | 12 | 4 | -2.5 | -4.8 | 12.00 | 6.25 | 23.04 |
5 | 13 | 7 | -1.5 | -1.8 | 2.70 | 2.25 | 3.24 |
6 | 14 | 6 | -0.5 | -2.8 | 1.40 | 0.25 | 7.84 |
7 | 15 | 8 | 0.5 | -0.8 | 0.40 | 0.25 | 0.64 |
8 | 15 | 9 | 0.5 | 0.2 | 0.10 | 0.25 | 0.04 |
9 | 16 | 10 | 1.5 | 1.2 | 1.80 | 2.25 | 1.44 |
10 | 17 | 10 | 2.5 | 1.2 | 3.00 | 6.25 | 1.44 |
11 | 17 | 12 | 2.5 | 3.2 | 8.00 | 6.25 | 10.24 |
12 | 18 | 14 | 3.5 | 5.2 | 18.20 | 12.25 | 27.04 |
13 | 18 | 13 | 3.5 | 4.2 | 14.70 | 12.25 | 17.64 |
| 188 14.5 | 115 8.8 |
|
| 80.90 | 89.25 | 107.72 |
r = = = = 0.83
Calculating Pearson “r” using the Raw Score Method
The formula is given by r =
Let us use the same data in example above
Steps
i. Complete the composite table
ii. If N = 13, = 188, = 115, = 1744, = 2806 and = 1125 then
S/N | X | Y | XY | x2 | y2 |
1 | 10 | 5 | 50 | 100 | 25 |
2 | 11 | 8 | 88 | 121 | 64 |
3 | 12 | 9 | 108 | 141 | 81 |
4 | 12 | 4 | 48 | 144 | 16 |
5 | 13 | 7 | 91 | 169 | 49 |
6 | 14 | 6 | 84 | 196 | 36 |
7 | 15 | 8 | 120 | 225 | 64 |
8 | 15 | 9 | 135 | 225 | 81 |
9 | 16 | 10 | 160 | 256 | 100 |
10 | 17 | 10 | 170 | 289 | 100 |
11 | 17 | 12 | 204 | 289 | 144 |
12 | 18 | 14 | 252 | 324 | 196 |
13 | 18 | 13 | 234 | 324 | 169 |
| 118 | 115 | 1744 | 2806 | 1125 |
r = = = = 0.83
You can see that the two approaches give the same result. This is because the formula of the deviation is derivable the formula of mean score.
Now what is statistical decision on this result (0.83) of coefficient correlation?
r = 0.83 (is a high relationship)
18.10.2 Statistical Decision on Coefficient of Correlation
When correlation coefficient is test for significance at a ɑ = 0.05, the following decision may be taken:
(i) If 0.05 r 0.20 there is negligible relationship
(ii) If 0.21 r 0.40 there is low relationship
(iii) If 0.41 r 0.60 there is moderate relationship
(iv) If 0.61 r 0.80 there is substantial relationship
(v) If 0.81 r 1.00 there is high relationship
Spearman Rank Order Correlation Coefficient – rho
Thus was developed by Spearman and Brown and that is why it is sometime referred to as Spearmen-Brown Rank Order Correlation Coefficient.
The formula is given as rho = 1 -
Example
10 Students are ranked based on their achievement in Instructional Scaffolding and Traditional teaching in mathematics course. Calculate the coefficient of correlation between the two modes of teaching mathematics.
S/N | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
X | 51 | 44 | 70 | 32 | 65 | 67 | 19 | 71 | 45 | 80 |
Y | 49 | 41 | 45 | 31 | 50 | 61 | 11 | 64 | 21 | 75 |
Solution
Steps
(i) Complete the composite table by ranking the scores and difference between the ranks
(ii) Apply the formula: rho = 1 -
(iii) 1 - = 1 - = 1 - = 1 – 0.109 = 0.891
X - for Instructional scaffolding
Y - for Traditional Teaching
S/N | X | Y | Rx | Ry | D | D2 |
1 | 51 | 49 | 6 | 5 | 1 | 1 |
2 | 44 | 41 | 8 | 7 | 1 | 1 |
3 | 70 | 45 | 3 | 6 | -3 | 9 |
4 | 32 | 31 | 9 | 8 | 1 | 1 |
5 | 65 | 50 | 5 | 4 | 1 | 1 |
6 | 67 | 61 | 4 | 3 | 1 | 1 |
7 | 19 | 11 | 10 | 10 | 0 | 0 |
8 | 71 | 64 | 2 | 2 | 0 | 0 |
9 | 45 | 21 | 7 | 9 | -2 | 4 |
10 | 80 | 75 | 1 | 1 | 0 | 0 |
∑ |
|
|
|
|
| 10 |
Student Activity
1. Using any convenient correlation method, calculate the correlation coefficient of the data below:
S/N | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
X | 31 | 24 | 50 | 12 | 45 | 47 | 09 | 51 | 25 | 60 | 15 | 10 |
Y | 29 | 21 | 25 | 11 | 30 | 41 | 01 | 44 | 11 | 55 | 05 | 03 |
2. Explain what is meant by “coefficient of correlation”. What does the sign (+ or -) tell you about the set of variables?
3. Two attributes or random sample of size 10, observed yielded the following data
X | 15 | 17 | 18 | 25 | 19 | 32 | 21 | 27 | 28 | 30 |
Y | 9 | 10 | 11 | 16 | 12 | 20 | 15 | 17 | 32 | 24 |
Compute:
(a) The Pearson Product Moment Correlation Coefficient rxy
(b) The Spearman’s rho
18.11 Inferential Techniques
Statistical tests are categorized into two groups, namely parametric and non-parametric tests. The parametric tests (parametric statistics) require that certain assumptions to be met in order for them to be valid. The followings are three very important assumptions that are made when using parametric statistics to test hypotheses.
1. The variable measured is normally distributed in the population
2. The normal distributions have the same standard deviation.
3. The data collected is from an interval or ratio scale
Therefore, the tests can be referred to as statistical procedures in which make inferences about the population parameters such as mean. In contrast, the non-parametric tests (parametric statistics) are statistical parameters in which inferences are not made about population parameters and no assumptions are made about them. Parametric test that are commonly used in mathematics education and educational research are the z-test, t-test and f-test. This work will only treat z-test and t-test.
18.12 Z-Test
The Z-statistic is used in testing hypothesis involving one sample i.e. determining whether two means are significantly different. It is usually adopted when the sample size is large i.e. when it is equal to or greater than 30. The formula for calculating Z-test is Z =
Where = The mean of a group
SDx = Standard error of difference between means
SDx =
Z =
Where = mean of group I
= mean of group II
SDx = Standard error of difference between means
Example 7: Suppose a mathematics teacher has the following sets of scores in SSCE over the year.
X1 | 3 | 4 | 5 | 6 | 7 |
X2 | 2 | 3 | 3 | 3 | 4 |
Now, let us illustrate the use of Z-test in hypothesis testing with above example
X1 | X1 – | (X1 - )2 | X2 | (X2 - ) | (X1 - )2 |
3 | -2 | 4 | 2 | -1 | 1 |
4 | -1 | 1 | 3 | 0 | 0 |
5 | 0 | 0 | 3 | 0 | 0 |
6 | 1 | 1 | 3 | 0 | 0 |
7 | 2 | 4 | 4 | 1 | 1 |
= 25 = 10 = 15 = 2
= = 5 = = 3
SD1 = = = 1.4 SD2 = = = 0.82
Now we have everything we need and all we have to do is substitute the correct number for each symbol
Z = = = = = = 2.8
Assuming we selected p = 0.05, what we need to do is to make reference to the table of Z-distribution. At 0.05 level of significance, the critical or table value of Z = 1.96. since the Z-value (2.8) is greater than the Z critical value, we reject the null hypothesis, otherwise we do not reject it.
18.13 The T-Test
The t-test otherwise called the student’s t-test and is an inferential technique. It was developed by William Gosset in 1908. It is used to determine whether two means are significantly different when the sample size is small (that is n > 30). There are two different types of t-tests, the t-test for independent sample and the t-test for non-independent samples.
18.13.1 T-Test for Independent Samples
Independent samples are samples which are formed, that members of one group are not related to members of the other group but they are selected from the same population. The t-test of independent samples is used to test hypothesis whether there is probably a significant difference between the means of two independent samples. The formula is given as
t =
where = means of group I and group II
S2 = Standard deviation
n = Number of subjects in each group
Example: Using the following sets of scores for two
Group A | 9 | 17 | 16 | 15 | 14 | 15 | 10 | 18 | 18 | 20 | 26 | 11 | 12 |
Group B | 6 | 10 | 12 | 18 | 13 | 1 | 11 | 9 | 19 | 5 | 15 | 10 |
|
(a) Formulate a testable hypothesis
(b) Test the hypothesis at ɑ = 0.05
(c) Use the t-test for independent samples to test the stated hypothesis
Solution
First, let us label the score and then calculate the mean , then
Group A as X1 and Group B be X2
X1 | X1 – | (X1 - )2 | X2 | (X2 - ) | (X2 - )2 |
9 | -6.5 | 42.25 | 6 | -4.8 | 23.04 |
17 | 1.5 | 2.25 | 10 | -0.8 | 0.64 |
16 | 0.5 | 0.25 | 12 | 1.2 | 1.44 |
15 | -0.5 | 0.25 | 18 | 7.2 | 4.84 |
14 | -1.5 | 2.25 | 13 | 2.2 | 96.04 |
15 | -0.5 | 0.25 | 1 | -9.8 | 0.04 |
10 | -5.5 | 30.25 | 11 | 0.2 | 3.24 |
18 | 2.5 | 6.25 | 9 | -1.8 | 67.24 |
18 | 2.5 | 6.25 | 19 | 8.2 | 33.64 |
20 | 4.5 | 20.45 | 5 | -5.8 | 17.64 |
26 | 10.5 | 110.25 | 15 | 4.2 | 0.64 |
11 | -4.5 | 20.25 | 10 | -0.8 |
|
12 | -3.5 | 12.25 |
|
|
|
201 |
| = 238.95 | 129 |
| = 248.44 |
= = 15.5 S1 = = 18.38 S = = 4.29
= = 10.8 S2 = = 20.70 S = = 4.55
Therefore t = = = = = 5.7
Since we selected ɑ = 0.05, then we need to go the t-table with appropriate degrees of freedom. For the t-test for independent samples, the formula for the degree of freedom is P1 + P2 – 2. Using the formula above:
Df = 13 + 12 – 2.
tcal = 5.7
p = .05
df = 24
tcri. = 2.064
With these, since the t-calculated (t-value) required for rejection or upholding of the null hypothesis. And the t-value is greater than t-critical (2.064), we do retain the null hypothesis.
a. There is no significant difference between the mean scores of male and female students taught scaffolding learning strategy.
b. t-cal(5.7) >tcrit (2.064)
Decision: Ho is retain since the tcal>tcrit. The strategy is gender friendly.
18.13.2 T-test for Non-independent Samples
Non-independent samples are those that have undergone some sort of matching. The members of one group are systematically connected to the members of a different group when samples are not independent.In order to evaluate if there is likely a significant difference between the means of two matched or non-independent samples, or between the means for one sample from two separate times, the t-test for non-independent samples is utilized. The non-independent sample t-test formula:
t =
where d = difference between each matched samples
= Sum of the differences between the matched samples
D2 = the square of the difference between each matched samples
N = number of degree of freedom
Example 9: Suppose a set of student in NCE II took test in both mathematics and statistics. Their results are as follows.
S/N | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Mathematics | 19 | 23 | 30 | 17 | 50 | 25 | 10 |
Statistics | 29 | 63 | 46 | 37 | 58 | 80 | 43 |
Are the results significantly different?
Solution
X1 | X2 | D | D2 |
19 | 29 | -10 | 100 |
23 | 63 | -40 | 1600 |
30 | 46 | -16 | 256 |
17 | 37 | -20 | 256 |
50 | 58 | -8 | 64 |
25 | 80 | -55 | 3025 |
10 | 43 | -33 | 1089 |
|
| = -182 | = 6534 |
Now substitute into t-formula.
t = = = = = = -3.97
Thus, t-value = -3.97 at ɑ = 0.05, going t-table of non-independent samples with degree of freedom N – 1 = 7 – 1 = 6. Therefore, t-value = -3.97, p = 0.05 df = 6 and t-critical or table value of t = -2.365. since the calculated t-value of -3.97 is less than the critical value of -2.365 than Ho is reject.
Student Activity
1. What is Z-test? (b) Distinguish between Z-test and t-test
2. Suppose we have the following sets of post-tests scores for two groups
Group I: 9, 8, 6, 7, 2, 1, 4, 6, 4, 7, 8, 3, 6, 2, 6, 7, 3, 7, 8, 10, 9, 13, 16, 17, 6, 10
Group II: 8, 7, 5, 6, 3, 2, 3, 2, 3, 7, 3, 8, 7, 2, 2, 1, 3, 6, 2, 4, 0, 9, 9, 8, 12, 10, 25, 8, 5, 2, 5, 10, 25, 8
Find out whether these two sets of scores are significance different or not
1. Briefly discuss any four sampling techniques.
2. Describe any four tools used in data collection.
3. With appropriate research question, mention any four test statistics that can be used in data analysis.
4. Differentiate between: (a) Sample and population (b) t-test and z-test
5. For the topic “The Study on Attitude and Academic Performance of In-Service Mathematics Teachers in Introduction to Probability”
a. What are the instruments that will be used for data collection?
b. What statistical tool will be used for data analysis
18.14 Writing of Research Report
One of the important aspects of any research study is communication of the results to other researchers. This important aspect can be accomplished through a well written research report. Research reports can be of two types. These reports are in form a book called a project or thesis or dissertation and a journal. In any of the two, you must start with the title. The title is the label given to the report and must not be too long. After the title comes abstract, this contains a brief summary of the all study.
In the journal report or article (which unlike the project or thesis is not presented in pursuing a degree or other academic qualification), the format is different. Here you need introduction (which include research problem and review of literature), methodology including research design, sample, instrumentation and method of data collection and analysis. Other sections are results, a summary of discussion of findings with brief remark on how the results fit the previous studies, conclusion, recommendations and references.
The format for writing a report, thesis or dissertation varies from an institution to institution and from discipline to discipline. Research report is usual written in past tense. Essentially, a research project report comprises three major parts, viz:-
I. The preliminary section.
II. The main body.
III. The appendix.
The Preliminary Section
I. The title Page
II. Certification Page
III. Acknowledgement
IV. Table of Contents
V. List of Tables
VI. List of Figures
VII. List of Appendices
VIII. The Abstract
Chapter 1: Introduction
I. Background to the Study
II. Statement of the Problem
III. Purpose of the Study
IV. Research Questions and/or Hypotheses
V. Scope and Delimitation of the Study
VI. Definition of Operational Terms
Chapter 2: Review of Literature
I. Conceptual Framework
II. Theoretical Framework
III. Overview of Related Studies
IV. Implications of Literature Reviewed
Chapter 3: Research Methodology
I. Research Design
II. Area of Study
III. Population
IV. Sample and Sampling Procedure
V. Instrument for Data Collection
VI. Validation of the Instrument
VII. Reliability of the Instrument
VIII. Method of Data Analysis
Chapter 4: Data Presentation, Analysis and Discussion
I. Data Presentation
II. Summary of Findings
III. Discussion of Findings
Chapter 5: Summary, Conclusion and Recommendations
I. Summary
II. Conclusion
III. Implications of the Findings
IV. Recommendations arising from the study
V. Limitations of the Study
VI. Contributions to Knowledge
VII. Suggestions for Further Studies
Chapter 6: References
I. References
II. Appendix
References are the final state on reporting. Only the books and studies that the researcher referenced in his report are included as references. This does not apply to all of the sources he read but did not cite. However, if the researcher wishes to add a list of the books or papers, he has read but has not referenced, he is creating a bibliography rather than a reference.
Most projects, dissertations and theses report in educational research are usually organized in five chapters. Chapter one – the problem; chapter two – literature review; chapter three – research methodology; chapter four – data presentation and analysis; chapter five – summary, conclusion and recommendations.
18.15 Summary
In this chapter, you have the concept of statistics in research. Statistics involves collection, organization, representation, analysis of data, interpretation of data and decision making. You also learnt the types of statistics and benefits of statistical analyses in educational research.
In this chapter sequencing order is discuss as arranging of scores either ascending or descending order and also rounding of numbers to required figures. You learnt about graphical representation of data and finding measure of tendency. Lastly simple percentage and inferential statistical tools are discussed.
References
Awotunde, P. O. &Ugodulunwa (2002). An Introduction to Statistical Methods in Education. Printed and published in Nigeria by Fab Anieh (Nig) Ltd.
National Teachers’ Institute, Kaduna (2000). Nigeria Certificate in Education (NCE/DLS) Course Book on Mathematics.