6 Formally Comparing Models – Linear Models with R

Often we wish to compare regression models that are “nested”, meaning that the IV’s in one model are a superset of the IV(s) found in another model. One example of that with the simple two-variable system covered so far would be comparison of fit3 (or fit4), the two-IV models, to fit1 (or fit3), the single IV models. This can be done descriptively, inferentially, or with information criteria comparisons.

6.1 Descriptively comparing models

The most common indices for comparing fit of two models are R-squared, adjusted R-squared, and RMSE (root mean squared error). The latter is simply the square root of MSresidual, sometimes called the std error of the estimate, or residual standard error in R. Each are provided by the summary output on lm fits.

For Illustration, we can compare the simple regression model with only publications as an IV (fit1) with the two-IV model that also includes citations (fit3). As expected with any model that has a superset of IVs that are in smaller model, fit3 has higher R-squared and adjusted R-squared values and a smaller RMSE. The RMSE is in the scale of the dv (salary) and so can be directly interpreted - fit3 std error is over $900 better. The proportions of variance indexed by the two R-squared values also reflect a better fit of about 16% of the variation.

Show/Hide Code

cat("Fit1 Rsquared=",summary(fit1)$r.squared, "\n")

Fit1 Rsquared= 0.2561846

Show/Hide Code

cat("Fit1 Adjusted Rsquared=",summary(fit1)$adj.r.squared, "\n")

Fit1 Adjusted Rsquared= 0.2437876

Show/Hide Code

cat("Fit1 RMSE=",summary(fit1)$sigma, "\n")

Fit1 RMSE= 8440.403

Show/Hide Code

cat("Fit3 Rsquared=",summary(fit3)$r.squared, "\n")

Fit3 Rsquared= 0.4195157

Show/Hide Code

cat("Fit3 Adjusted Rsquared=",summary(fit3)$adj.r.squared, "\n")

Fit3 Adjusted Rsquared= 0.3998383

Show/Hide Code

cat("Fit3 RMSE=",summary(fit3)$sigma, "\n")

Fit3 RMSE= 7519.266

Two additional indices are more commonly used in Econometrics, but can fit these types of OLS models as well.

MAE (mean absolute error) is the average absolute difference between observed Dependent Variable scores and the Yhats. Smaller is, of course, indicative of a better fit. The index is in the scale of the dependent variable, so it is simple to interpret - dollars here.

Show/Hide Code

Metrics::mae(cohen1$salary, predict(fit1))

[1] 6804.567

Show/Hide Code

Metrics::mae(cohen1$salary, predict(fit3))

[1] 6056.539

MAPE (mean absolute percentage error) is another index often used. Like MAE, it involves comparing observed dv values and yhats. In this index each absolute value of a yhat difference from the observed dv score for that case is calculated (essentially an unsigned residual), Then that difference is expressed as a percentage of the dv score. The mean is then taken across all cases to produce the MAPE index.

Show/Hide Code

MLmetrics::MAPE(predict(fit1), cohen1$salary)

[1] 0.1290218

Show/Hide Code

MLmetrics::MAPE(predict(fit3), cohen1$salary)

[1] 0.1139376

6.2 Inferentially comparing models

In an attempt to explore how to think about tests of each IV, vis a vis the kinds of things implied just above in chapters 4-5, lets use the anova function in a different way.

We can pass two linear models to ‘anova’ and ask it to compare them. The second model in each specification has to contain a superset of the IV’s in the first (i.e., the IV used in the first plus at least one more). It is somewhat like the “stepping” idea we introduced in SPSS and ‘anova’ essentially tests the improvement in fit of the second model over the first. Carefully look at the F tests for these model differences in Fit, and compare them to what you just examined above for the ‘anova’ vs ‘Anova’ approaches in the two different orders of the two-IV fit models. The F’s that compare the two models are the same as the test of the R squared “increment” that we covered in SPSS work earlier.

We consider beginning with fit1 (only pubs was an IV in that simple regression), and compare fit 3 (which also included cits as an IV) to test the increment in the R-squared produced by the inclusion of cits. The F value matches the Type III SS F test for cits and is the square of the t value that tested the regression coefficient of cits.

Show/Hide Code

# compare model 3 to model 1 - stepping approach, evaluating a new variable (cits)
anova(fit1,fit3)# note this is anova, not Anova

Analysis of Variance Table

Model 1: salary ~ pubs
Model 2: salary ~ pubs + cits
  Res.Df        RSS Df Sum of Sq      F    Pr(>F)    
1     60 4274424497                                  
2     59 3335822387  1 938602110 16.601 0.0001396 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The next illustration compares the full model to the simple (restricted) regression that only contained pubs (fit1). The test is therefore the test of an hypothesis that increment in SS accounted for by cits is zero. And this F also matches the F test of the Type III SS seen above. You may have encountered these types of F tests in other software where they are termed tests of R-squared change.

Show/Hide Code

# compare model 3 to model 2 - stepping approach, evaluating a new variable (pubs)
anova(fit2,fit3)# note this is anova, not Anova

Analysis of Variance Table

Model 1: salary ~ cits
Model 2: salary ~ pubs + cits
  Res.Df        RSS Df Sum of Sq      F   Pr(>F)   
1     60 4009743405                                
2     59 3335822387  1 673921018 11.919 0.001034 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This approach can also be used when more than one IV is added in a full model compared to a restricted model. See the “Extensions” chapter.

6.3 Information criteria

It has become standard to compare models with information criteria. In a later chapter, we consider using this approach in deciding variable-selection strategies. Here, a simple application permits introduction of AIC and BIC approaches in comparing the single IV model with the 2-IV model just tested above. Recall that fit3 contained both IV’s, fit1 used only “pubs”, and fit2 used only “cits”.

………………………………

Show/Hide Code

AIC(fit1,fit3)

     df      AIC
fit1  3 1300.973
fit3  4 1287.601

Show/Hide Code

AIC(fit2,fit3)

     df      AIC
fit2  3 1297.010
fit3  4 1287.601

Show/Hide Code

BIC(fit1,fit3)

     df      BIC
fit1  3 1307.354
fit3  4 1296.110

Show/Hide Code

BIC(fit2,fit3)

     df      BIC
fit2  3 1303.391
fit3  4 1296.110