Performance Metrics of a Regression Problem
R-squared (R2) is a statistical measure that represents the proportion of variation in the dependent variable that is explained by the independent variables in a regression model. It ranges from 0 to 1, where 0 indicates that none of the variation in the dependent variable is explained by the independent variables, and 1 indicates that all of the variation in the dependent variable is explained by the independent variables.
Adjusted R-squared (adjusted R2) is a modified version of R2 that takes into account the number of independent variables in the model. Unlike R2, which always increases as the number of independent variables increases, adjusted R2 can decrease as the number of independent variables increases. The formula for adjusted R2 is:
adjusted R2 = 1 — [(1 — R2) * (n — 1) / (n — k — 1)]
where n is the sample size and k is the number of independent variables.
The difference between R2 and adjusted R2 is that R2 measures the overall fit of the model, while adjusted R2 measures the fit of the model relative to the number of independent variables. Adjusted R2 penalizes the addition of independent variables that do not significantly improve the fit of the model, whereas R2 does not.
For example, suppose you have a regression model that predicts a person’s income based on their education level, work experience, and gender. The R2 of the model is 0.80, indicating that 80% of the variation in income is explained by the independent variables in the model. However, the model may have an adjusted R2 of 0.75, indicating that the fit of the model is not as good as it initially appeared once you account for the fact that there are three independent variables in the model.
In summary, while R2 measures the overall fit of the model, adjusted R2 provides a more accurate measure of the fit of the model relative to the number of independent variables, which is useful in determining whether the addition of more independent variables is justified.