A-Level Maths / Statistics / Statistics

Correlation & Regression

Scatter diagrams, correlation coefficients, regression lines, interpolation and extrapolation.

Statistics AS 45 min

Learning Objectives

Draw and interpret scatter diagrams for bivariate data
Understand and interpret the product moment correlation coefficient (PMCC)
Calculate and interpret the equation of a least squares regression line
Distinguish between interpolation and extrapolation, and understand their reliability
Use coding to simplify regression and correlation calculations
Understand the limitations of correlation and regression analysis

Key Formulae

r = \frac{S_{xy}}{\sqrt{S_{xx} S_{yy}}}

S_{xy} = \sum xy - \frac{\sum x \sum y}{n}

S_{xx} = \sum x^2 - \frac{(\sum x)^2}{n}

b = \frac{S_{xy}}{S_{xx}}, \quad a = \bar{y} - b\bar{x}

\text{Regression line: } y = a + bx

Prior Knowledge Check

Answer at least 3 of 3 correctly to complete this section.

Q1. What does it mean if two variables have positive correlation?

Q2. What is interpolation?

Q3. If a scatter diagram shows points closely following a downward line from left to right, which best describes the correlation?

Why This Matters

When we collect data on two variables — such as hours of revision and exam scores, or temperature and ice cream sales — we want to know: is there a relationship, and can we use it to make predictions?

Correlation measures the strength and direction of a linear relationship. Regression gives us an equation to predict one variable from the other. Together, they are among the most widely-used tools in data analysis, from medical research to economics.

1/3

Scatter Diagrams and Correlation

Value of $r$	Interpretation
$r = 1$	Perfect positive linear correlation
$0.7 < r < 1$	Strong positive linear correlation
$0.3 < r < 0.7$	Moderate positive linear correlation
$0 < r < 0.3$	Weak positive linear correlation
$r = 0$	No linear correlation
$-1 < r < 0$	Negative linear correlation (mirror of above)
$r = -1$	Perfect negative linear correlation

2/3

Least Squares Regression

3/3

Exam Practice

Concept	What to remember
PMCC $r$	Always between $-1$ and $+1$ ; measures linear correlation only
Gradient $b$	Interpret in context with units
Intercept $a$	Often has no practical meaning (e.g. “zero hours of sunshine”)
Interpolation	Within data range — reliable
Extrapolation	Outside data range — unreliable
Causation	Correlation $\neq$ causation; look for lurking variables
Coded data	Gradient is unchanged; intercept changes
Regression direction	$y$ on $x$ predicts $y$ ; $x$ on $y$ predicts $x$

Ready to practise?

Lock in what you've learned with exam-style questions and spaced repetition.

Practise this topic Daily review

Exam Tips

The regression line of y on x is used to predict y from x — not the other way round
When interpreting the gradient, give the context: 'For each additional unit increase in x, y increases by b on average'
Always state whether a prediction involves interpolation (reliable) or extrapolation (unreliable)
Correlation does not imply causation — always consider lurking variables
The PMCC r is always between −1 and +1; values close to ±1 indicate strong linear correlation