Chapter 1 Introduction and Goals

This document lays out basic strategies for linear modeling in R. It is structured as a reflection of what, in Social, Behavioral, and Life Sciences is called Multiple Regression. It approaches linear modeling where one outcome variable (dependent variable) is predicted from multiple independent variables. Each variable is a quantitative measurement. The document presumes a background in the basics of multiple regression from both equational and conceptual perspectives, including the roles of partial and semi-partial correlation. The document is intended for students in the APSY510/511 course sequence at the University at Albany, but can be more generally applicable.

For the most part, the approach here will use a multiple regression problem with only two independent variables. This permits comparability for a companion presentation on implementation of multiple regression with SPSS that used the same data set. The data set is the Cohen, Cohen, West and Aiken textbook chapter 3 example on faculty salaries, publications and citations (Cohen, Cohen, West, & Aiken, 2003).

The flow of the document moves from univariate description, to bivariate description, to all aspects of linear modeling. It extends to topics of evaluation of assumptions for inferential tests, influence analysis, and model criticism.

The emphasis is narrowly placed on implementation of the standard methods in R. Conceptual rationales for the approach, alternative methods (e.g., bayesian), variable selection strategies, model building, and the role of regression in causality assessment are treated elsewhere although a brief introduction to Bayes Factors is included here. The document does not address nonparametric or semi-parametric regression such as quantile regression nor does it address curvilinear regression.

1.1 A note on the R Programming environment

All of the analyses and graphical displays found in this document were produced in R. Usually, the document shows the relevant R code for each topic. The purpose of the document is has a primary focus on the how-to in R, but also emphasizes the conceptual progression related to understanding linear modeling in the simplest of multiple regression applications, the two-IV model. The document can be an extensive template for R usage in these types of analyses. Some extension to models with larger numbers of IVs is also included. To that end, all the code is available both in this document and one other source. In the spirit of reproducible and open source research, this document was created in rmarkdown and bookdown. The Rmd file contains ALL of the R code required to reproduce the analyses and figures contained in the document.

Graphs are drawn with ggplot2, base system graphics, and a convenient 3D surface plotting capability from the plot3D package. Analyses are extensively reliant on the base system lm function. Additional analyses use other packages and BCD-created functions introduced as the document progresses.

Several packages are required for the work in this document.

library(BayesFactor)
library(bcdstats)
library(boot)
library(broom)
library(car)
library(GGally)
library(ggExtra)
library(ggplot2)
library(ggthemes)
library(grid)
library(gvlma)
library(gt)
library(HH)
library(knitr)
library(lattice)
library(lmtest)
library(MASS)
library(moments)
library(nortest)
library(olsrr)
library(psych)
library(plot3D)
library(plot3Drgl)
library(plyr)
library(rcompanion)
library(rmarkdown)
library(sandwich)
library(tseries)
library(UsingR)
library(yhat)

Package citations for packages loaded here (in the above order): BayesFactor (Morey & Rouder, 2018), bcdstats (Dudek, 2020), boot (Canty & Ripley, 2019), broom (Robinson & Hayes, 2020), car (Fox, Weisberg, & Price, 2020), GGally (Schloerke et al., 2020), ggExtra (Attali & Baker, 2019), ggplot2 (Wickham et al., 2020), ggthemes (Arnold, 2019), grid (Auguie, 2017), gvlma (Pena & Slate, 2019), gt (Iannone, Cheng, & Schloerke, 2019), HH (Heiberger, 2020), knitr (Xie, 2020b), lattice (Sarkar, 2020), lmtest (Hothorn, Zeileis, Farebrother, & Cummins, 2019), MASS (Ripley, 2019), moments (Komsta & Novomestky, 2015), nortest (Gross & Ligges, 2015), olsrr (Hebbali, 2020), psych (Revelle, 2020), plot3D (Soetaert, 2019), plot3Drgl, (Soetaert, 2016), plyr (Wickham, 2020), rcompanion (Mangiafico, 2020), rmarkdown (Allaire et al., 2020), sandwich (Zeileis & Lumley, 2019), tseries (Trapletti & Hornik, 2019), UsingR (Trapletti & Hornik, 2019), yhat (Nimon, Oswald, & Roberts., 2013)

Package citations for packages loaded elsewhere in this document: bookdown (Xie, 2020a)

The bcdstats package can be installed with instructions found at its github repository:

https://github.com/bcdudek/bcdstats