Scatter plot showing a strong correlation
Scatter plot. A graph of the relationship between two variables (e.g., education and income).
Linear Relationship. There could be a 'cloud of dots,' dots that all fit along a line, or something in between. Importantly, there is not a curve to the pattern of dots (e.g., like the shape of a rainbow or smile).
The "best fitting line" is the line that minimizes the distance between itself and every point on the graph.
Correlation. The closer the dots conform to the best fitting line, the stronger the relationship between the two variables.
(x, y) Coordinate
(x, y) Coordinate. The scatter plot has an x-axis along the bottom of the graph and a y-axis along the left side. Each dot in a scatter plot has an (x, y) coordinate.
To represent a person who is 69 inches tall and weighing 155 pounds, a dot would be placed at the (x, y) coordinate of (69, 155).
The Pearson's r is a numerical measurement, communicating the strength of the relationship between two variables. The Pearson's r ranges in value from -1.00 to +1.00.
Strength of Correlation. When determining the strength of a correlation, ignore the sign (positive or negative). Rather, the weaker the correlation, the closer the Pearson's r value will be to zero. For a Pearson's r value of zero, there is no relationship between the two variables (e.g., number of letters in your name and your height). The stronger the correlation, the closer Pearson's r will be to either +1.00 or -1.00. An example of a strong negative correlation would be the more a person spends (i.e., spending above average), the smaller their savings (i.e., savings below average).
Direction of Correlation. A positive Pearson's r value (e.g., r = .60) means that the values for the two variables tend to vary together; when one value is above average other value also tends to be above average (e.g., exercise and fitness). A negative Pearson's r value (e.g., r = -.60) means that the values tend to go in opposite directions; when one value is above average the other value is likely to be below average (e.g., miles driven and amount of gas in the tank).
|r| = 1.00
The points in the scatter plot line up perfectly. This is the strongest possible correlation. The Pearson r value is +1.00 for a perfect positive correlation and -1.00 for a perfect negative correlation. Note that the sign (+1.00 or -1.00) tells you about the slope of the line. A positive correlation slopes upward (from left to right). A negative correlation slopes downward (from left to right). When determining the strength of the correlation, ignore the sign. The further the value is away from zero, the stronger the correlation. A value of -1.00 is further from zero than +.85, so the value of -1.00 represents a stronger correlation.
Perfect Positive Correlation. Pearson's r = +1.00
Perfect Negative Correlation. Pearson's r = -1.00
|r| ≥ .70
The points of the scatter plot line up fairly closely along an imaginary line. The Pearson r value, ignoring the sign, will be .70 or greater. That is to say, a strong correlation could be either a Pearson's r that is .70 or more, or a Pearson's r that is -.70 or less. The further the value is away from zero, the stronger the correlation. A value of -.96 is further from zero than +.70, so the value of -.96 represents a stronger correlation.
Strong Positive Correlation, Pearson's r = +.70
Strong Negative Correlation, Pearson's r = -.96
|r| ≤ .30
The points of the scatter plot appear like a cloud of dots. The cloud of dots might either have a slight tilt to the right (consistent with a positive correlation) or a slight tilt to the left (consistent with a negative correlation). The Pearson r value, ignoring the sign, will be .30 or less. That is to say, a weak correlation would be a Pearson's r that is between .30 and -.30. However, the further the value is away from zero, the stronger the correlation. A value of -.30 is further from zero than +.18, so the value of -.30 represents a stronger correlation.
Weak Positive Correlation, Pearson's r = +.18
Weak Negative Correlation, Pearson's r = -.30
.30 < |r| < .70
You have seen the definition and examples of both strong and weak correlations. A moderate correlation falls in the middle.
For a moderate correlation, the points of the scatter plot can appear anywhere from a cloud of dots with a distinct tilt to a cloud of dots that appears somewhat compressed. The Pearson r value, ignoring the sign, will be more than .30 and less than .70. The further the value is away from zero, the stronger the correlation. A value of -.60 is further from zero than +.36, so the value of -.60 represents a stronger correlation.
Moderate Positive Correlation, Pearson's r = +.36
Moderate Negative Correlation, Pearson's r = -.60
r = 0.00
A zero correlation implies no relationship exists between the two variables. The scatter plot simply looks like a cloud of dots. There would be no advantage of using the scatter plot to make predictions, given that the two variables are unrelated to one another (e.g., shoe size and final exam score).
Zero correlation, Pearson's r = 0.00
(e.g., shoe size and exam performance)
The scatter plot is not curviliner
Both variables are scale (i.e., they fall along a number line)
Both variables are fairly normally distributed
Pearson's r indicates how closely the points on the scatter plot fit a line. Results may range from a zero correlation (i.e., a cloud of dots) to a strong correlation (the dots appear along a line).
When there is an obvious curve in the points (such that they appear curved like a rainbow or smile), then the use of Pearson's r would be misleading.
When a scatter plot reveals a curvilinear relationship, then presenting the scatter plot alone is sufficient.
The two variables being evaluated (e.g., exercise and fitness) must each fall along a number line.
A number line indicates equal distance between adjacent values.
For example, when measuring amount of exercise, it would be appropriate to record the number of hours worked out. The difference between 2 and 3 hours is the same as between 3 and 4 hours; it is a difference of a single hour.
However, recording exercise as (1) daily, (2) weekly, (3) monthly, or (4) none at all would not meet the requirements of Pearson's r. Why not? This distance between the intervals is not the same. The change in amount of exercise from (2) to (3) is from weekly to monthly, whereas the change in exercise from (3) to (4) is monthly to yearly.
Both variables (e.g., exercise and fitness) must be fairly normally distributed. That is to say, the distribution of scores should be somewhat bell shaped. The highest frequency occurs in the middle, with the frequency decreasing in either direction. The distribution should be fairly symmetrical (left side mirroring the right side). The amount of drop in frequency should decrease for values further from the middle, with the drop in frequency lowest at either end of the distribution. Should the distribution be obviously heavily skewed or some other shape, then Pearson's r would not be an appropriate measure.
A correlation represents a pattern in the data.
A pattern helps with making predictions, but it doesn't tell us much about cause-and-effect.
For example, the rise in our national debt is correlated with the recent rise in obesity. This does not mean that a national campaign for US citizens to slim down will reduce the national debt. There is a correlation between ice cream sales and amount of crime. Likewise, shutting down ice cream shops is not the right approach for reducing crime.
When we see a correlation, it tells us that the two variables are related, and given the value of one variable we can predict the value of the other variable.
Correlation implies the ability to make predictions, but not infer causation.