Chapter 14 Efficiency in OLS analysis using the olsrr package

The olsrr package has a collection of functions that streamline many of the piecemeal approaches oulined in earlier chapters. It makes it simple to obtain extensive/detailed information at once, with user-friendly functions. Not all of the topics covered in earlier chapters are included (e.g., semi-partial correlations), but extensive capabilities are also included for model criticism.

14.1 A basic analysis

The initial use of the ols_regress function can replace the individual uses of summary, anova, and confint functions. The model fit here is the three-IV model examined in the “Extensions” chapter.

fit6 <- lm(salary~pubs+cits+degree_yrs, data=cohen1)
ols_regress(fit6)
##                            Model Summary
## --------------------------------------------------------------------
## R                       0.708       RMSE                   7030.542
## R-Squared               0.501       Coef. Var                12.826
## Adj. R-Squared          0.475       MSE                49428516.393
## Pred R-Squared          0.420       MAE                    5317.619
## --------------------------------------------------------------------
##  RMSE: Root Mean Square Error
##  MSE: Mean Square Error
##  MAE: Mean Absolute Error
##
##                                    ANOVA
## ----------------------------------------------------------------------------
##                       Sum of
##                      Squares        DF      Mean Square      F         Sig.
## ----------------------------------------------------------------------------
## Regression    2879765872.603         3    959921957.534     19.42    0.0000
## Residual      2866853950.768        58     49428516.393
## Total         5746619823.371        61
## ----------------------------------------------------------------------------
##
##                                        Parameter Estimates
## -------------------------------------------------------------------------------------------------
##       model         Beta    Std. Error    Std. Beta      t        Sig         lower        upper
## -------------------------------------------------------------------------------------------------
## (Intercept)    38967.847      2394.308                 16.275    0.000    34175.118    43760.575
##        pubs       93.608        85.348        0.135     1.097    0.277      -77.235      264.451
##        cits      204.060        56.972        0.361     3.582    0.001       90.019      318.102
##  degree_yrs      874.461       283.895        0.385     3.080    0.003      306.184     1442.739
## -------------------------------------------------------------------------------------------------

14.2 Collinearity diagnostics

The ols_coll_diag function provides collinearity diagnostics:

ols_coll_diag(fit6)
## Tolerance and Variance Inflation Factor
## ---------------------------------------
##    Variables Tolerance      VIF
## 1       pubs 0.5672118 1.763010
## 2       cits 0.8466483 1.181128
## 3 degree_yrs 0.5494035 1.820156
##
##
## Eigenvalue and Condition Index
## ------------------------------
##   Eigenvalue Condition Index   intercept        pubs        cits   degree_yrs
## 1 3.55809347        1.000000 0.009810827 0.014093645 0.009215181 0.0110334455
## 2 0.25588579        3.728942 0.146119575 0.358711061 0.097741038 0.0590237875
## 3 0.10745243        5.754407 0.015671115 0.619788632 0.026076066 0.9299270069
## 4 0.07856832        6.729533 0.828398483 0.007406662 0.866967715 0.0000157601

We can also obtain added-variable plots. These depict the partial correlations of each IV with the DV, both adjusted for other IVs. Here, it can be seen that pubs has a weaker partial relationship and this reinforces the fact that it’s test in the 3-IV model was non-significant. Recall that the partial correlation can be obtained with the mrinfo2 function. Unfortunately, partial and semi-partial correlations are not provided with the otherwise extensive information coming from ols_regress.

ols_plot_added_variable(fit6)
## geom_smooth() using formula 'y ~ x'
## geom_smooth() using formula 'y ~ x'
## geom_smooth() using formula 'y ~ x'

mrinfo(fit6)
## [[1]]
## NULL
##
## $supplemental information ## beta wt structure r partial r semipartial r tolerances unique ## pubs 0.13506 0.71500 0.14254 0.10172 0.56721 0.01035 ## cits 0.36102 0.77662 0.42559 0.33219 0.84665 0.11035 ## degree_yrs 0.38541 0.85873 0.37495 0.28567 0.54940 0.08161 ## common total ## pubs 0.24584 0.25618 ## cits 0.19190 0.30224 ## degree_yrs 0.28793 0.36954 ## ##$var infl factors from HH:vif
##       pubs       cits degree_yrs
##   1.763010   1.181128   1.820156
##
## [[4]]
## NULL

14.4 Residual Assumptions

A normal QQ plot of the residuals is available.

ols_plot_resid_qq(fit6)

Tests of the residual normality assumption are easily obtained.

ols_test_normality(fit6)
## -----------------------------------------------
##        Test             Statistic       pvalue
## -----------------------------------------------
## Shapiro-Wilk              0.9782         0.3374
## Kolmogorov-Smirnov        0.0759         0.8404
## Cramer-von Mises          5.2312         0.0000
## Anderson-Darling          0.4327         0.2946
## -----------------------------------------------

A histogram of the residuals with a normal curve overlaid for comparison is also available.

ols_plot_resid_hist(fit6)

And the standard plot of residuals against yhats is also available.

ols_plot_resid_fit(fit6)

Studentized and Standardized residuals can be examined to look for sequential ordering effects in the data set by plotting them against the case number.

ols_plot_resid_stud(fit6)

ols_plot_resid_stand(fit6)

Several of the above plots plus other model diagnostic plots can be obtained more rapidly with the ols_plot_diagnostics function for which the code is shown here. The plots are not returned in order to save space.

ols_plot_diagnostics(fit6)

14.5 Plots for examination of Influence

Several plots are available for visualizing the influence statistics for a model.

First is examination of Cook’s D values. It provides a visual indicators of which cases exceed a threshold for large influence and those cases are numerically labeled. I have not yet sorted out how this threshold is determined for this function and the following ones.

ols_plot_cooksd_chart(fit6)

The DFBeta index is visualized with a panel of graphs, one for each IV and one for the intercept, permitting identification of influential cases for each IV separately.

ols_plot_dfbetas(fit6)

And a comparable plot for DFfits is also avaialble.

ols_plot_dffits(fit6)

Finally, two additional plots are common in model diagnostics. They examine studentized residuals and deleted studentized residuals against leverage and yhats, respectively.

ols_plot_resid_lev(fit6)

ols_plot_resid_stud_fit(fit6)

The **olsrr* package has numerous other tools and is worth exploring. The reference manual and vignettes on the CRAN site are very helpful.

https://cran.r-project.org/web/packages/olsrr/index.html