Learn something new every day More Info... by email
Simple linear regression applies to statistics and helps to describe (x,y) data that appears to have a linear relationship, allowing for some prediction of y if x is known. This data is often plotted on scatterplots and the formula for linear regression creates a line that best fits all the points, provided they truly have a linear correlation. It won’t fit exactly all the points, but it should be a line where the sum of the squares of the difference between actual data and expected data (residuals) creates the lowest number, which is often called the least squares line or line of best fit. The equation of the line for sample data and population data are the following: ŷ = b0 + b1x and Y = B0 + B1x.
Anyone familiar with algebra may note the similarity of this line to y = mx + b, and in fact the two are relatively identical, except the two terms on the right side of the equation are switched, so that B1 equals slope or m. The reason for this rearrangement is it then becomes elegantly easy to add additional terms with features such as exponents that might describe different nonlinear forms of relationship.
The formulas for getting a simple linear regression line are relatively complex and cumbersome, and most people do not spend much time writing these down because they take a long time to complete. Instead, various programs, such as for Excel® or for many types of scientific calculators, can easily compute a least squares line. The line is only appropriate for prediction if there is clear evidence of a strong correlation between the sets of (x,y) data. A calculator will generate a line, regardless of whether it makes any sense to use it.
At the same time a simple linear regression line equation is generated, people must look at level of correlation. This means evaluating r, the correlation coefficient, against a table of values to determine if linear correlation exists. Additionally, evaluating the data by plotting it as a scatterplot is a good way of getting a sense if data has a linear relationship.
What can then be done with a simple linear regression line, provided it has a linear correlation, is that values can be substituted into x, to get a predicted value for ŷ. This prediction has its limits. The data present, particularly if it’s just a sample, may have a linear correlation now, but might not later with additional sample material added.
Alternately, a whole sample can share a correlation while a whole population does not. Prediction is therefore limited, and going far beyond the available data values is called extrapolation, and is not encouraged. Moreover, should people know that if no linear correlation exists, the best estimate of x is the mean of all y data.
Essentially, simple linear regression is a useful statistical tool that can, with discretion, be used to predict ŷ values based on a x value. It is almost always taught with the idea of linear correlation since determining usefulness of a regression line requires analysis of r. Fortunately with many modern technical programs, people can graph scatterplots, add regression lines and determine correlation coefficient r with a couple of entries.