Lesson 8 will consist of the following topics
For lesson 8, read pages 97-109 in Practical Statistics for Educators,
Third Edition by Ruth Ravid (2005, University Press of America)
or read pages 180-195 & 204-207 in Basic Statistics for Behavioral Science Research
2nd ed by Mary B. Harris (1998, Allyn and Bacon)
or
read pages 143-166 in Practical Statistics for
Educators, 2nd Edition by Ruth Ravid (2000, University Press of America)
or read pages 127-151 in Practical Statistics for Educators
by Ruth Ravid (1994, University Press of America).
In this lesson we will conclude our discussion of descriptive statistics by discussing correlation or the degree to which two variables are related. For example, if we have scores on a reading test and on a spelling test for a group of students, what is the relationship between reading performance and spelling performance? Do the students who score high on reading also score high on spelling and conversely do those who score low on reading also score low on spelling? Such a relationship is referred to as a high positive correlation between reading and spelling. This kind of a relationship is shown in Table 1, which shows scores on two tests (reading and spelling) for 10 students.
| Student | Reading | Spelling |
|---|---|---|
| 1 | 9 | 19 |
| 2 | 10 | 19 |
| 3 | 7 | 16 |
| 4 | 8 | 15 |
| 5 | 6 | 11 |
| 6 | 4 | 9 |
| 7 | 5 | 9 |
| 8 | 1 | 3 |
| 9 | 2 | 4 |
| 10 | 3 | 7 |
In Table 1 we can see that Student 1 has the second to the highest score on reading (9) and the highest score on spelling (19 - tied with student 2). Students 5 and 6 have intermediate scores on both measures. Student 8 has the lowest score on both measures with a 1 for reading and a 3 for spelling.
We could also have a negative relationship between two variables in which persons who scored high on one variable scored low on the other variable and those who scored low on the first variable scored high on the second variable. This is the situation depicted with scores on reading and spelling for the 10 students shown in Table 2.
| Student | Reading | Spelling |
|---|---|---|
| 1 | 2 | 17 |
| 2 | 1 | 19 |
| 3 | 3 | 15 |
| 4 | 4 | 15 |
| 5 | 6 | 11 |
| 6 | 5 | 12 |
| 7 | 8 | 9 |
| 8 | 7 | 5 |
| 9 | 10 | 3 |
| 10 | 9 | 3 |
In Table 2 we can see that Student 2 has the lowest score in reading (1), but the highest score in spelling (19). Student 9, on the other hand, has the highest score on reading (10), but the lowest score on spelling (3 - tied with Student 10).
We could also have the situation in which there was no relationship between the scores on the two variables. We could refer to this situation as a low degree of correlation or zero correlation. This situation is shown in Table 3 where there does not seem to be any relationship between the 10 students scores in reading and their scores in spelling.
| Student | Reading | Spelling |
|---|---|---|
| 1 | 3 | 11 |
| 2 | 7 | 1 |
| 3 | 2 | 19 |
| 4 | 9 | 5 |
| 5 | 8 | 17 |
| 6 | 4 | 3 |
| 7 | 1 | 15 |
| 8 | 10 | 9 |
| 9 | 6 | 15 |
| 10 | 5 | 8 |
In our earlier discussion of descriptive statistics, we found that we can represent our data in a table (tabular representation of data), represent our data with a diagram (graphical representation of data), or use a number to represent the descriptive statistic (numerical representation of data). The three tables we have looked at, showing scores for 10 students on a reading test and on a spelling test, represent the tabular representation of data. Although it is possible to get some idea of the degree of association that exists between two variables by looking at tables of the data, we can see the relationship much more clearly if we use a graphical or numerical representation of the data.
The graphic that is used to show the relationship between two variables is the scattergram or a scatterplot. A scatterplot is a diagram (or graph) with two axes, one for each variable. The scatterplot is set up so that the X and Y axes are approximately of equal length, thus appearing to be square in shape. In the body of the scatterplot each subject's score on both variables is represented by a dot. For example the data for Table 1 above would yield the following scatterplot.
In this scatterplot we can see that the dot in the lower left hand corner of the diagram represents Student 8 in Table 1 (reading = 1, spelling = 3). The dot in the upper right hand corner of the diagram represents Student 2 with a reading score of 10 and a spelling score of 19. In this representation of a high positive correlation between the two variables, we can see that the dots tend to align themselves along a line stretching from the lower left hand corner to the upper right hand corner of the scatterplot. If this were a perfect positive relationship, the dots would all be lined up in a straight line extending from the lower left hand corner to the upper right hand corner of the scatterplot.
The data in Table 2 above, we suggested, represented a negative relationship between the two variables. If we were to create a scatterplot for this data it would look like the following figure.
These scatterplots or scattergrams can be created with the Excel
Spreadsheet Program.
Creating a Scattergram with the Excel Speadsheet Program
We have considered the tabular representation of data showing the degree of association that exists between two variables for a number of subjects (Table 1, Table 2, and Table 3). We have also seen a graphical representation of this association as we constructed scatterplots for the data of Tables 1, 2, and 3. We will now proceed to look at a numerical measure of association which is the most frequently used indication of association.
We will consider two major indices of association - the Pearson Product Moment Correlation Coefficient for use with data at the interval or ratio level of measuremet, and the Spearman Rank Order Correlation Coefficient for use with data at the ordinal level of measurement.
Please send electronic mail to the course instructor if you have any questions about this lesson or other concerns.