Ed 602 - Lesson 14 - Chi-Square

Lesson 14 will consist of the following topics

Text Assignment for Lesson 14

For lesson 14, read pages 153-160 in Practical Statistics for Educators, Third Edition by Ruth Ravid (2005, University Press of America)
or read pages 447-465 in Basic Statistics for Behavioral Science Research 2nd ed by Mary B. Harris (1998, Allyn and Bacon)
or read pages 229-246 in Practical Statistics for Educators, 2nd Edition by Ruth Ravid (2000, University Press of America)
or read pages 221-238 in Practical Statistics for Educators by Ruth Ravid (1994, University Press of America).

Introduction to Chi-Square

All of the inferential statistics we have covered in past lessons, are what are called parametric statistics. To use these statistics we make some assumptions about the distributions they come from, such as they are normally distributed. With parametric statistics we also deal with data for the dependent variable that is at the interval or ratio level of measurement, i.e. test scores, physical measurements.

The parametric statistics we have discussed so for in this course are:

  1. the Z-score test
  2. the Z-test
  3. the single-sample t-test
  4. the independent t-test
  5. the dependent t-test
  6. one-sample analysis of variance (ANOVA)

We will now consider a widely used non-parametric test, chi-square, which we can use with data at the nominal level, that is data that is classificatory. For example, we know the frequency with which entering freshman, when required to purchase a computer for college use, select Macintosh Computers, IBM Computers, or Some other brand of computer. We want to know if there is a difference among the frequencies with which these three brands of computers are selected or if they choose basically equally among the three brands. This is a problem we can use the chi-square statistic for.

The chi-square statistic is used to compare the observed frequency of some observation (such as frequency of buying different brands of computers) with an expected frequency (such as buying equal numbers of each brand of computer). The comparison of observed and expected frequencies is used to calculate the value of the chi-square statistic, which in turn can be compared with the distribution of chi-square to make an inference about a statistical problem.

The symbol for chi-square and the formula are as follows:

where

O is the observed frequency, and

E is the expected frequency.

The degrees of freedom for the one-dimensional chi-square statistic is:

df = C - 1

where C is the number of categories or levels of the independent variable.

One-Variable Chi-Square (goodness-of-fit test) with equal expected frequencies

We can use the chi-square statistic to test the distribution of measures over levels of a variable to indicate if the distribution of measures is the same for all levels. This is the first use of the one-variable chi-square test. This test is also referred to as the goodness-of-fit test.

Using the example we already mentioned of the frequency with which entering freshman, when required to purchase a computer for college use, select Macintosh Computers, IBM Computers, or Some other brand of computer. We want to know if there is a significant difference among the frequencies with which these three brands of computers are selected or if the students select equally among the three brands.

The data for 100 students is recorded in the table below (the observed frequencies). We have also indicated the expected frequency for each category. Since there are 100 measures or observations and there are three categories (Macintosh, IBM, and Other) we would indicate the expected frequency for each category to be 100/3 or 33.333. In the third column of the table we have calculated the square of the observed frequency minus the expected frequency divided by the expected frequency. The sum of the third column would be the value of the chi-square statistic.

Frequency with which students select computer brand
Computer Observed
Frequency
Expected
Frequency
(O-E)2/E
IBM 47 33.333 5.604
Macintosh 36 33.333 0.213
Other 17 33.333 8.003
Total (chi-square)

13.820

From the table we can see that:

The df = C - 1 = 3 - 1 = 2

We can compare the obtained value of chi-square with the critical value for the .05 level and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2 we see that the critical value for chi-square is 5.991.

We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.

  1. State the null hypothesis and the alternative hypothesis based on your research question.


    Note: Our null hypothesis, for the chi-square test, states that there are no differences between the observed and the expected frequencies. The alternate hypothesis states that there are significant differences between the observed and expected frequencies.
  2. Set the alpha level.

    Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error.
  3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for the statistical test if necessary.

    df = C - 1 = 2

  4. Write the decision rule for rejecting the null hypothesis.

    Reject H0 if >= 5.991.

    Note: To write the decision rule we had to know the critical value for chi-square, with an alpha level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and noting the tabled value for the column for the .05 level and the row for 2 df.
  5. Write a summary statement based on the decision.
    Reject H0, p < .05
    Note: Since our calculated value of (13.820) is greater than 5.991, we reject the null hypothesis and accept the alternative hypothesis.
  6. Write a statement of results in standard English.
    There is a significant difference among the frequencies with which students purchased three different brands of computers.

One-Variable Chi-Square (goodness-of-fit test) with predetermined expected frequencies

Let's look at the problem we just solved, in a way that illustrates the other use of one-variable chi-square, that is with predetermined expected frequencies rather than with equal frequencies. We could formulated our revised problem as follows:

In a national study, students required to buy computers for college use bought IBM computers 50% of the time, Macintosh computers 25% of the time, and other computers 25% of the time. Of 100 entering freshman we surveyed 36 bought Macintosh Computers, 47 bought IBM computers, and 17 bought some other brand of computer. We want to know if these frequencies of computer buying behavior is similar to or different than the national study data.

The data for 100 students is recorded in the table below (the observed frequencies). In this case the expected frequencies are those from the national study. To get the expected frequency we take the percentages from the national study times the total number of subjects in the current study.

The expected frequencies are recorded in the second column of the table. As before we have calculated the square of the observed frequency minus the expected frequency divided by the expected frequency and recorded this result in the third column of the table. The sum of the third column would be the value of the chi-square statistic.

Frequency with which students select computer brand
Computer Observed
Frequency
Expected
Frequency
(O-E)2/E
IBM 47 50 0.18
Macintosh 36 25 4.84
Other 17 25 2.56
Total (chi-square)

7.58

From the table we can see that:

The df = C - 1 = 3 - 1 = 2

We can compare the obtained value of chi-square with the critical value for the .05 level and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2 we see that the critical value for chi-square is 5.991.

We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.

  1. State the null hypothesis and the alternative hypothesis based on your research question.


    Note: Our null hypothesis, for the chi-square test, states that there are no differences between the observed and the expected frequencies. The alternate hypothesis states that there are significant differences between the observed and expected frequencies.
  2. Set the alpha level.

    Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error.
  3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for the statistical test if necessary.

    df = C - 1 = 2

  4. Write the decision rule for rejecting the null hypothesis.

    Reject H0 if >= 5.991.

    Note: To write the decision rule we had to know the critical value for chi-square, with an alpha level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and noting the tabled value for the column for the .05 level and the row for 2 df.
  5. Write a summary statement based on the decision.
    Reject H0, p < .05
    Note: Since our calculated value of (7.58) is greater than 5.991, we reject the null hypothesis and accept the alternative hypothesis.
  6. Write a statement of results in standard English.
    There is a significant difference among the frequencies with which students purchased three different brands of computers and the proportions suggested by a national study.

Two-Variable Chi-Square (test of independence)

Now let us consider the case of the two-variable chi-square test, also known as the test of independence.For example we may wish to know if there is a significant difference in the frequencies with which males come from small, medium, or large cities as constrasted with females. The two variables we are considering here are hometown size (small, medium, or large) and sex (male or female). Another way of putting our research question is: Is gender independent of size of hometown?

The data for 30 females and 6 males is in the following table.

Frequency with which males and females come from small, medium, and large cities

Small Medium Large Totals
Female 10 14 6 30
Male 4 1 1 6
Totals 14 15 7 36

The formula for chi-square is the same as before:

where

O is the observed frequency, and

E is the expected frequency.

The degrees of freedom for the two-dimensional chi-square statistic is:

df = (C - 1)(R - 1)

where C is the number of columes or levels of the first variable and R is the number of rows or levels of the seconed variable.

In the table above we have the observed frequencies (six of them). Now we must calculate the expected frequency for each of the six cells. For two-variable chi-square we find the expected frequencies with the formula:

Expected Frequency for a Cell = (Column Total X Row Total)/Grand Total

In the table above we can see that the Column Totals are 14 (small), 15 (medium), and 7 (large), while the Row Totals are 30 (female) and 6 (male). The grand total is 36.

Using the formula we can thus find the expected frequency for each cell.

  1. The expected frequency for the small female cell is 14X30/36 = 11.667
  2. The expected frequency for the medium female cell is 15X30/36 = 12.500
  3. The expected frequency for the large female cell is 7X30/36 = 5.833
  4. The expected frequency for the small male cell is 14X6/36 = 2.333
  5. The expected frequency for the medium male cell is 15X6/36 = 2.500
  6. The expected frequency for the large male cell is 7X6/36 = 1.167

We can put these expected frequencies in our table and also include the values for (O - E)2/E. The sum of all these will of course be the value of chi-square.

Observed frequencies, expected frequencies, and (O - E)2/E for males and females from small, medium, and large cities

Small Medium Large Totals

Observed Expected (O-E)2/E Observed Expected (O-E)2/E Observed Expected (O-E)2/E
Female 10 11.667 0.238 14 12.500 0.180 6 5.833 0.005 30
Male 4 2.333 1.191 1 2.500 0.900 1 1.167 0.024 6
Totals 14

15

7

36

From the table we can see that:

and df = (C - 1)(R - 1) = (3 - 1)(2 - 1) = (2)(1) = 2

We now have the information we need to complete the six step process for testing statistical hypotheses for our research problem.

  1. State the null hypothesis and the alternative hypothesis based on your research question.


  2. Set the alpha level.

  3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for the statistical test if necessary.

    df = (C - 1)(R - 1) = (2)(1) = 2

  4. Write the decision rule for rejecting the null hypothesis.

    Reject H0 if >= 5.991.

    Note: To write the decision rule we had to know the critical value for chi-square, with an alpha level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and noting the tabled value for the column for the .05 level and the row for 2 df.
  5. Write a summary statement based on the decision.
    Fail to reject H0
    Note: Since our calculated value of (2.538) is not greater than 5.991, we fail to reject the null hypothesis and are unable to accept the alternative hypothesis.
  6. Write a statement of results in standard English.
    There is not a significant difference in the frequencies with which males come from small, medium, or large towns as compared with females.
    Hometown size is not independent of gender.

Chi-square is a useful non-parametric statistic to help evaluate statistical hypothesis, involving frequencies with which observations fall in various categories (nominal data).

Lesson 14 Assignment

Lesson 14 Quiz

Please send electronic mail to the course instructor if you have any questions about this lesson or other concerns.

Return to Ed 602 Home Page

Return to Previous Lesson

Go to Next Lesson