The Pearson Product Moment Correlation Coefficient is the most widely used measure of correlation or association. It is named after Karl Pearson who developed the correlational method to do agricultural research. The product moment part of the name comes from the way in which it is calculated, by summing up the products of the deviations of the scores from the mean.
The symbol for the correlation coefficient is lower case r, and it is described in our textbook as the sum of the product of the Z-scores for the two variables divided by the number of scores.
If we substitute the formulas for the Z-scores into this formula we get the following formula for the Pearson Product Moment Correlation Coefficient, which we will use as a definitional formula.
The numerator of this formula says that we sum up the products of the deviations of a subject's X score from the mean of the Xs and the deviation of the subject's Y score from the mean of the Ys. This summation of the product of the deviation scores is divided by the number of subjects times the standard deviation of the X variable times the standard deviation of the Y variable.
Let's calculate the correlation between Reading (X) and Spelling (Y) for the 10 students whose scores appeared in Table 3. There is a fair amount of calculation required as you can see from the table below. First we have to sum up the X values (55) and then divide this number by the number of subjects (10) to find the mean for the X values (5.5). Then we have to do the same thing with the Y values to find their mean (10.3).
| Student | Reading (X) | Spelling (Y) | |||
|---|---|---|---|---|---|
| 1 | 3 | 11 | -2.5 | 0.7 | -1.75 |
| 2 | 7 | 1 | 1.5 | -9.3 | -13.95 |
| 3 | 2 | 19 | -3.5 | 8.7 | -30.45 |
| 4 | 9 | 5 | 3.5 | -5.3 | -18.55 |
| 5 | 8 | 17 | 2.5 | 6.7 | 16.75 |
| 6 | 4 | 3 | -1.5 | -7.3 | 10.95 |
| 7 | 1 | 15 | -4.5 | 4.7 | -21.15 |
| 8 | 10 | 9 | 4.5 | -1.3 | -5.85 |
| 9 | 6 | 15 | 0.5 | 4.7 | 2.35 |
| 10 | 5 | 8 | -0.5 | -2.3 | 1.15 |
| Sum | 55 | 103 | 0.0 | 0.0 | -60.5 |
| Mean | 5.5 | 10.3 | |||
| Standard Deviation | 2.872 | 5.832 |
Then we have to take each X score and subtract the mean from it to find the X deviation score. We can see that subject 1's X deviation score is -2.5, subject 2's X deviation score is 1.5 etc. We could make another column of the squares of the X deviation scores and sum up this column to use to calculate the standard deviation of X using the definitional formula for the standard deviation of a population as we did in Lesson 6.
We can then find each subject's Y-deviation score. Subject 1's Y deviation score is 0.7 (11 - 10.3) and subject 2's Y deviation score is -9.3 (1 - 10.3). We could then add another column to square the Y-deviation scores and use the sum of this column to find the standard deviation for the Y scores.
We can then fill in the last column in which we multiply each subject's X deviation score times the same subject's Y deviation score. For subject 1 this is -1.75 (-2.5 times 0.7) and for subject 2 this is -13.95 (1.5 times -9.3). Finally if we sum up the last column (X deviation score times Y deviation score) we can use that quantity (-60.5), along with the standard deviations of the two variables and N, the number of subjects, to calculate the correlation coefficient.


We have calculated the Pearson Product Moment Correlation Coefficient for the association between Reading and Spelling for the 10 subjects in Table 3. The correlation we obtained was -.36, showing us that there is a small negative correlation between reading and spelling. The correlation coefficient is a number that can range from -1 (perfect negative correlation) through 0 (no correlation) to 1 (perfect positive correlation).
Shown in the figure below are the scattergrams we prepared earlier for Tables 1, 2, and 3 with the numerical correlation coefficient indicated for each one.

You can see that it is fairly difficult to calculate the correlation coefficient using the definitional formula. In real practice we use another formula that is mathematically identical but is much easier to use. This is the computational or raw score formula for the correlation coefficient. The computational formula for the Pearsonian r is
By looking at the formula we can see that we need the following items to calculate r using the raw score formula:
Each of these quantities can be found as show in the computation table below:
| Student | Reading (X) | Spelling (Y) | |||
|---|---|---|---|---|---|
| 1 | 3 | 11 | 9 | 121 | 33 |
| 2 | 7 | 1 | 49 | 1 | 7 |
| 3 | 2 | 19 | 4 | 361 | 38 |
| 4 | 9 | 5 | 81 | 25 | 45 |
| 5 | 8 | 17 | 64 | 289 | 136 |
| 6 | 4 | 3 | 16 | 9 | 12 |
| 7 | 1 | 15 | 1 | 225 | 15 |
| 8 | 10 | 9 | 100 | 81 | 90 |
| 9 | 6 | 15 | 36 | 225 | 90 |
| 10 | 5 | 8 | 25 | 64 | 40 |
| Sum | 55 | 103 | 385 | 1401 | 506 |
In we plug each of these sums into the raw score formula we can calculate the correlation coefficient.




We can see that we got the same answer for the correlation coefficient (-.36) with the raw score formula as we did with the definitional formula.
It is still computationally difficult to find the correlation coefficient, especially if we are dealing with a large number of subjects. In practice we would probably use a computer to calculate the correlation coefficient. We will consider just that (Using the Excel Spreadsheet Program to Calculate the Correlation Coefficient) after we have considered the Spearman Rank Order Correlation Coefficient.