The coefficient of determination is a mathematical calculation of the square of a correlation coefficient. The correlation coefficient is a calculation of the accuracy of a model. These terms are used in statistical analysis to explain fairly logical calculations.
In statistics, an analyst's job is to look at the data collected from a specific scenario or event and create a mathematical model that explains the data. In order to create this model, there are certain facts that need to be taken into consideration.
There is a possibility of error in every calculation and collection of data. Since this is consistent, the rate of error must be incorporated into the model. By accounting for this error, it ceases to be relevant to determining if the proposed model provides a solid explanation for the data.
The actual coefficient of determination calculation is
R^{2} = Sum of the squared errors
Sum of the squared errors + Regression sum of squares
This is a calculation of the accuracy of the model in explaining the data.
Used in statistical analysis, this value provides insight into the "goodness of fit" of the statistical model to the data. The value of the coefficient is between 0 and 1. A perfect fit of the model to explain the variation is 1 and 0 is the value when the model does not explain the variation at all.
The coefficient of determination takes into account errors with the data, or outliers, and the regression sum of squares. There is no unit to this value, as it is essentially a ratio and is completely unrelated to the size of the sample. The higher the value, approaching 1, the better explanation of the variation is being provided by the model.
A simple way to visualize this concept is to create a graph of all the data surrounding a particular event. Set out three trays of cookies in a lunch room, chocolate, almond and peanut. Observe as people come into the lunch room and write down how many cookies they take, what kinds and in what order. Plot this data on a graph.
Create a formula around the predicted behavior. An example would be to predict that every person who took 1 chocolate cookie, also took 2 almond, but no peanut. A simple linear equation can be written based on this assumption and graphed.
Plot the line that represents the linear equation of that prediction. Compare the line to the actual data collection in your observation. Calculate the coefficient of determination to provide a measure of the accuracy of the predicted behavior when compared with the actual data.
The coefficient of determination indicates the amount of spread of the data around the line. It shows how good or bad the prediction was, in comparison with the actual values. The coefficient of determination allows users to apply a "reality check" to the data proposed in a statistical model. There are two values, the observed or actual values, and the modeled or predicted values.
This type of statistical analysis is very common is science and in business. Many business decisions are based on predictions of future behavior. It is important to analyze the actual results and compare them to the predictions. This process improves the next model and therefore the accuracy of the predictions.