Lesson 14 will consist of the following topics
For lesson 14, read pages 153-160 in Practical Statistics for Educators,
Third Edition by Ruth Ravid (2005, University Press of America)
or read pages 447-465 in Basic Statistics for Behavioral Science Research
2nd ed by Mary B. Harris (1998, Allyn and Bacon)
or
read pages 229-246 in Practical Statistics for
Educators, 2nd Edition by Ruth Ravid (2000, University Press of America)
or read pages 221-238 in Practical Statistics for Educators
by Ruth Ravid (1994, University Press of America).
All of the inferential statistics we have covered in past lessons, are what are called parametric statistics. To use these statistics we make some assumptions about the distributions they come from, such as they are normally distributed. With parametric statistics we also deal with data for the dependent variable that is at the interval or ratio level of measurement, i.e. test scores, physical measurements.
The parametric statistics we have discussed so for in this course are:
We will now consider a widely used non-parametric test, chi-square, which we can use with data at the nominal level, that is data that is classificatory. For example, we know the frequency with which entering freshman, when required to purchase a computer for college use, select Macintosh Computers, IBM Computers, or Some other brand of computer. We want to know if there is a difference among the frequencies with which these three brands of computers are selected or if they choose basically equally among the three brands. This is a problem we can use the chi-square statistic for.
The chi-square statistic is used to compare the observed frequency of some observation (such as frequency of buying different brands of computers) with an expected frequency (such as buying equal numbers of each brand of computer). The comparison of observed and expected frequencies is used to calculate the value of the chi-square statistic, which in turn can be compared with the distribution of chi-square to make an inference about a statistical problem.
The symbol for chi-square and the formula are as follows:
where
O is the observed frequency, and
E is the expected frequency.
The degrees of freedom for the one-dimensional chi-square statistic is:
df = C - 1
where C is the number of categories or levels of the independent variable.
We can use the chi-square statistic to test the distribution of measures over levels of a variable to indicate if the distribution of measures is the same for all levels. This is the first use of the one-variable chi-square test. This test is also referred to as the goodness-of-fit test.
Using the example we already mentioned of the frequency with which entering freshman, when required to purchase a computer for college use, select Macintosh Computers, IBM Computers, or Some other brand of computer. We want to know if there is a significant difference among the frequencies with which these three brands of computers are selected or if the students select equally among the three brands.
The data for 100 students is recorded in the table below (the observed frequencies). We have also indicated the expected frequency for each category. Since there are 100 measures or observations and there are three categories (Macintosh, IBM, and Other) we would indicate the expected frequency for each category to be 100/3 or 33.333. In the third column of the table we have calculated the square of the observed frequency minus the expected frequency divided by the expected frequency. The sum of the third column would be the value of the chi-square statistic.
| Computer | Observed Frequency | Expected Frequency | (O-E)2/E |
|---|---|---|---|
| IBM | 47 | 33.333 | 5.604 |
| Macintosh | 36 | 33.333 | 0.213 |
| Other | 17 | 33.333 | 8.003 |
| Total (chi-square) | 13.820 |
From the table we can see that:
The df = C - 1 = 3 - 1 = 2
We can compare the obtained value of chi-square with the critical value for the .05 level and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2 we see that the critical value for chi-square is 5.991.
We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.
![]()
df = C - 1 = 2
Reject H0 if
>= 5.991.
Let's look at the problem we just solved, in a way that illustrates the other use of one-variable chi-square, that is with predetermined expected frequencies rather than with equal frequencies. We could formulated our revised problem as follows:
In a national study, students required to buy computers for college use bought IBM computers 50% of the time, Macintosh computers 25% of the time, and other computers 25% of the time. Of 100 entering freshman we surveyed 36 bought Macintosh Computers, 47 bought IBM computers, and 17 bought some other brand of computer. We want to know if these frequencies of computer buying behavior is similar to or different than the national study data.
The data for 100 students is recorded in the table below (the observed frequencies). In this case the expected frequencies are those from the national study. To get the expected frequency we take the percentages from the national study times the total number of subjects in the current study.
| Computer | Observed Frequency | Expected Frequency | (O-E)2/E |
|---|---|---|---|
| IBM | 47 | 50 | 0.18 |
| Macintosh | 36 | 25 | 4.84 |
| Other | 17 | 25 | 2.56 |
| Total (chi-square) | 7.58 |
From the table we can see that:
The df = C - 1 = 3 - 1 = 2
We can compare the obtained value of chi-square with the critical value for the .05 level and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2 we see that the critical value for chi-square is 5.991.
We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.
![]()
df = C - 1 = 2
Reject H0 if
>= 5.991.
Now let us consider the case of the two-variable chi-square test, also known as the test of independence.
For example we may wish to know if there is a significant difference in the frequencies with which males come from small, medium, or large cities as constrasted with females. The two variables we are considering here are hometown size (small, medium, or large) and sex (male or female). Another way of putting our research question is: Is gender independent of size of hometown?The data for 30 females and 6 males is in the following table.
| Small | Medium | Large | Totals | |
|---|---|---|---|---|
| Female | 10 | 14 | 6 | 30 |
| Male | 4 | 1 | 1 | 6 |
| Totals | 14 | 15 | 7 | 36 |
The formula for chi-square is the same as before:
where
O is the observed frequency, and
E is the expected frequency.
The degrees of freedom for the two-dimensional chi-square statistic is:
df = (C - 1)(R - 1)
where C is the number of columes or levels of the first variable and R is the number of rows or levels of the seconed variable.
In the table above we have the observed frequencies (six of them). Now we must calculate the expected frequency for each of the six cells. For two-variable chi-square we find the expected frequencies with the formula:
Expected Frequency for a Cell = (Column Total X Row Total)/Grand Total
In the table above we can see that the Column Totals are 14 (small), 15 (medium), and 7 (large), while the Row Totals are 30 (female) and 6 (male). The grand total is 36.
Using the formula we can thus find the expected frequency for each cell.
We can put these expected frequencies in our table and also include the values for (O - E)2/E. The sum of all these will of course be the value of chi-square.
| Small | Medium | Large | Totals | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Observed | Expected | (O-E)2/E | Observed | Expected | (O-E)2/E | Observed | Expected | (O-E)2/E | ||
| Female | 10 | 11.667 | 0.238 | 14 | 12.500 | 0.180 | 6 | 5.833 | 0.005 | 30 |
| Male | 4 | 2.333 | 1.191 | 1 | 2.500 | 0.900 | 1 | 1.167 | 0.024 | 6 |
| Totals | 14 | 15 | 7 | 36 | ||||||
From the table we can see that:

![]()
and df = (C - 1)(R - 1) = (3 - 1)(2 - 1) = (2)(1) = 2
We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.
![]()
df = (C - 1)(R - 1) = (2)(1) = 2
Reject H0 if
>= 5.991.
Chi-square is a useful non-parametric statistic to help evaluate statistical hypothesis, involving frequencies with which observations fall in various categories (nominal data).
Please send electronic mail to the course instructor if you have any questions about this lesson or other concerns.