Definition: The Regression Line is the line that best fits the data, such that the overall distance from the line to the points (variable values) plotted on a graph is the smallest. In other words, a line used to minimize the squared deviations of predictions is called as the regression line.
There are as many numbers of regression lines as variables. Suppose we take two variables, say X and Y, then there will be two regression lines:
- Regression line of Y on X: This gives the most probable values of Y from the given values of X.
- Regression line of X on Y: This gives the most probable values of X from the given values of Y.
The algebraic expression of these regression lines is called as Regression Equations. There will be two regression equations for the two regression lines.
The correlation between the variables depend on the distance between these two regression lines, such as the nearer the regression lines to each other the higher is the degree of correlation, and the farther the regression lines to each other the lesser is the degree of correlation.
The correlation is said to be either perfect positive or perfect negative when the two regression lines coincide, i.e. only one line exists. In case, the variables are independent; then the correlation will be zero, and the lines of regression will be at right angles, i.e. parallel to the X axis and Y axis.
Note: The regression lines cut each other at the point of average of X and Y. This means, from the point where the lines intersect each other the perpendicular is drawn on the X axis we will get the mean value of X. Similarly, if the horizontal line is drawn on the Y axis we will get the mean value of Y.