Chapter 6 Formally Comparing Models

In one more attempt to explore how to think about tests of each IV, vis a vis the kinds of things implied just above, lets use the anova function in a different way.

We can pass two linear models to ‘anova’ and ask it to compare them. The second model in each specification has to contain a superset of the IV’s in the first (i.e., the IV used in the first plus at least one more). It is somewhat like the “stepping” idea we introduced in SPSS and ‘anova’ essentially tests the improvement in fit of the second model over the first. Carefully look at the F tests for these model differences in Fit, and compare them to what you just examined above for the ‘anova’ vs ‘Anova’ approaches in the two different orders of the two-IV fit models. The F’s that compare the two models are the same as the test of the R squared “increment” that we covered in SPSS work earlier.

We consider beginning wth fit1 (only pubs was an IV in that simple regression), and compare fit 3 (which also included cits as an IV) to test the increment in the R-squared produced by the inclusion of cits. The F value matches the Type III SS F test for cits and is the square of the t value that tested the regression coefficient of cits.

# compare model 3 to model 1 - stepping approach, evaluating a new variable (cits)
anova(fit1,fit3)# note this is anova, not Anova
## Analysis of Variance Table
## 
## Model 1: salary ~ pubs
## Model 2: salary ~ pubs + cits
##   Res.Df        RSS Df Sum of Sq      F    Pr(>F)    
## 1     60 4274424497                                  
## 2     59 3335822387  1 938602110 16.601 0.0001396 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The next illustration compares the full model to the simple regression that only contained cits (fit2). The test is therefore the test of an hypothesis that increment in SS accounted for by pubs is zero. And this F also matches the F test of the Type III SS seen above.

# compare model 3 to model 2 - stepping approach, evaluating a new variable (pubs)
anova(fit2,fit3)# note this is anova, not Anova
## Analysis of Variance Table
## 
## Model 1: salary ~ cits
## Model 2: salary ~ pubs + cits
##   Res.Df        RSS Df Sum of Sq      F   Pr(>F)   
## 1     60 4009743405                                
## 2     59 3335822387  1 673921018 11.919 0.001034 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1