1 Introduction

This document provides information for new R users on installation, updating and maintining an R installation. It can be broadly useful to a wide audience, but is intended for students in B. Dudek’s statistics classes, and serves as the first of several R tutorial documents. Topics covered include installation/updates/maintenance of R, RStudio, and R packages. Some general background is provided along with instructions and links to detailed web postings that go into more detail if needed.

2 What is R? What is RStudio?

The R programming language (https://www.r-project.org/) has major strengths in data management, statistical analysis and data visualization. It has fast become a primary tool for data scientists, statisticians and researchers. It can be used on multiple operating systems and its installation is largely straight forward on all platforms. As an open source and free software ecosystem it provides a rich array of tools for a diverse audience of users.

This link to a page on the R project web site gives a more detailed overview: https://www.r-project.org/about.html

Downloads of R and more information are found on CRAN (Comprehenisive R Archive Network): https://cran.r-project.org/

R is installed as a relatively minimal configuration where the user passes R code to a command line environment called the R console (inside the R GUI). A majority of R users employ additional tool, RStudio (https://rstudio.com/) that is an Integrated Development Environment. Embedded in the RStudio configuration is the R console, file management and code writing capability, and several other useful components for displaying figures, managing add-on packages, etc. It is a very powerful way to use R, especially with its capabilities to use markdown, to build packages, and many other add-in capabilities. It is strongly recommended that new users become familiar with using RStudio very early in the R learning curve.

3 Why R?

The question of why R? is often posed in the context of comparing it to other programming languages that can handle data science needs, or in comparison to commercially available software that have long and established histories (e.g., SAS, STATA, SPSS). There is no need to dwell on these comparisons. The major utility of R is its rich array of add-on contributions in the form of packages created by statisticians around the world. New methods can appear very quickly in the R eco-system of add-on packages. There is also an extensive support community online in the form of forums, blogs, and help sites. For the scientific researcher in many disciplines, skills in R are becoming an expected part of training in degree programs. It is an important tool to add to an arsenal of data analytic capabilities.

These URL’s expand on the points made above:

https://www.burns-stat.com/documents/tutorials/why-use-the-r-language/

http://www.econometricsbysimulation.com/2014/03/why-use-r-five-reasons.html

4 Why not just spreadsheets?

Spreadsheets such as Excel have major positive features and are often very helpful for data entry and initial data management. But R can do these things well too, and there is a reproducibility element with R that is important, since code can be saved, repeated, and documented. At its core, R is a complete data management and analytic system, with far more capabilities than spreadsheets. See the following commentary for additional perspectives:

https://www.burns-stat.com/documents/tutorials/spreadsheet-addiction/

5 What are “packages”?

The R language is very broadly capable, even in its simple original installation, but additional capabilities are found in packages.

The core R installation already has a set of add-on packages with capabilities suggested by their names. These are found in a subfolder of the R program installation called src/library/:

The R Project also “recommends” a set of additional packages that the user will have to install separately (see a later section in this document):

https://cran.r-project.org/src/contrib/4.0.2/Recommended/

These are all useful but most users will quickly compile a longer list of additional packages that they use for specific purposes. For example the psych package contains a large suite of functions useful to researchers in the psychological sciences. Methods for installing these packages and a recommended set are described in a later section.

6 Installing R

For purposes of B. Dudek’s classes, it is useful to install the most recent version of R so that we are all using the same version. As of this writing that version is 4.01, but it will likely be a later version by the time you read this. Also as of version 4.0, all previous packages you may have used on an earlier version need to be reinstalled (see the section below).

6.1 Major installation methods

On each of the major platforms, the base R installation occurs with a minimum of effort. The steps are:

  1. Go to the R-project.org site, https://www.r-project.org/, and in the download section navigate with the CRAN link. This takes you to a page with a set of CRAN mirrors. Choose one.
  2. Choose your operating system and then on the next page choose “base” which leads to a download.
  3. Complete the download to your device and run the installer that was downloaded. Choose the default options at each point.

I used to provide a detailed document with screen captures showing each step but it is so simple that it is no longer necessary to generate that level of detail. If you want more detail, go to one of these sites that do provide additional guidance:

https://www.andrewheiss.com/blog/2012/04/17/install-r-rstudio-r-commander-windows-osx/

https://techvidvan.com/tutorials/install-r/

https://rstudio-education.github.io/hopr/starting.html

https://www.datacamp.com/community/tutorials/installing-R-windows-mac-ubuntu

6.2 64 bit vs 32 bit installations

On Windows platforms, both 32 bit and 64 bit R is installed by the procedures outlined above if you are using an x64 version of Windows (which by now, most all users are). The 32 bit version was created originally because older PC OS installations were only 32 bit - and there is a memory limitation that is now much higher in the 64 bit version. The 64 bit version is the one you will want to use.

On MAC OS and LINUX, the base R installation is 64 bit.

7 Installing RStudio

After you have installed R, then install RStudio. Go to the RStudio web site https://rstudio.com/ and choose the download button (or go directly to the download page (https://rstudio.com/products/rstudio/download/). Then download the free RStudio Desktop version for your operating system. Execute the installer that you downloaded.

8 Testing your R and R studio installations

In order to begin learning to use R and RStudio and test the installations, I suggest the following.

  1. Open the 64 bit version of R that you installed (not RStudio at this point). It should look like this in Microsoft Windows and something similar in MAC OS:

R Console inside the R GUI 2. At the command prompt in R Console, type the following to obtain the square root of 25:

25^.5
## [1] 5

R has returned an “object” that contains one element and that first element is the numeric value of the answer.

  1. Next type:
hist(faithful$eruptions)

This function has drawn a frequency histogram of the Old Faithful Geyser eruption durations using a data set that comes with R (called faithul).

  1. Now close R and open RStudio. It should look something like this in Windows and something similar in MAC OS:

R Console inside the R GUI

Notice that the same R Console and command prompt appear in the left pane (I think that is still the default).

  1. Type the same to lines of r code that you did above to see that it works the same way in RStudio. However, the histogram will appear in the viewer pane bottom right.

With these quick tests, you know that you have a functioning installation and are ready to add other packages and begin learning/using R.

9 Adding packages to your R installation

At the time of this writing, CRAN currently hosts 15806 add-on packages. Obviously there is no need to install all of them (or time!). Typically one adds packages when a need arises or you learn about a new package that is worth exploring.

One of the great things about R (and sometimes an infuriating thing) is that there are multiple ways of doing things. This is true with package installations as well. Add-on packages can be installed from the RGui menus, or from code submitted to the command prompt, and with the packages tab in RStudio. I still prefer generating the code to install packages so that I can keep a record of what I did.

The primary function to install packages is install.packages(). You can look up the help page for this function by typing the following at the command prompt:

?install.packages

If you are working in the RGui/Rconsole, this help page will pop up as a second window. If you are in RStudio, the help page appears in the help tab in the bottom right pane.

This help page will probably be overly challenging for the user who is new to R, so lets go slowly.

9.1 Installing packages from the command line

This process can be done either from within the R GUI/R Console, or from the console withing RStudio. Initially, I suggest that you do this first test from the basic R installation (RGui/RConsole). Let’s begin by installing a single package from the command line, using the install.packages() function. This line will install the psych package. Each add-on package may have dependencies on other packages in order to function. Those dependencies are respected by install.packages() and will also be installed. If you look at the help page for install.packages() you will see that a “dependencies” argument is set so that dependencies are also installed by default.

install.packages("psych")

If you do this in RGui/RConsole, a dialog box will appear asking your to choose a CRAN mirror. I typically choose one in Michigan, assuming that closer ones might be quicker. If you do this in RStudio, a default mirror is already set. You can change that default in the global options section of RStudio.

9.2 Install more than one package at a time

It is possible to install multiple packages with a single install.packages() command. Let’s assume that we want to install yhat and rcompanion. This code uses the c function called concatenate, which creates a string of two package names passed to the install.packages() function.

install.packages(c("yhat","rcompanion"))

9.3 Install packages from the menus in RGui/Rconsole.

In RGui, choose the “pacakges” tab and select the “install package(s)”. You can choose from a list of all packages on CRAN. This is a menu/mouse way of executing the install.packages() function.

9.4 Install packages from the “Packages” tab in the RStudio pane (usually bottom right)

By choosing that “packages” tab, you have forced RStudio to show all packages that you have installed. But to install new packages, choose the “install” tab there. The dialog box permits you to install one or more packages by typing in their name in the available box. By default it goes to CRAN to look for those packages, but you may have a local file that contains a package and the option there is to specify the local location of that package file.

9.5 Installing packages from GitHub

Many package developers use GitHub repositories for their packages. Sometimes you may want/need to install a development version of a package that is not yet on CRAN, or you may have identified a useful package that the developer hasn’t submitted to CRAN. That is the case for a package used by B. Dudek in his courses, a package called bcdstats.

In order to make this work, it is necessary to install another package first, devtools. A function in devtools called install_github() is used.

install.packages("devtools")

In order to install a github package you need to know the location which would be found in the developer’s GitHub page. Information on the package page on GitHub probably repeats this sequence of commands. First load the devtools package and then execute the install function.

library(devtools)
install_github('bcdudek/bcdstats')

9.6 Install many packages for use in B. Dudek’s course

For requirements in B. Dudek’s courses an additional way of installing packages is available. A long list of packages are recommended for these courses (between one and two hundred). A separate file is provided, called something like “bcd_recommend.R” or “bcd_recommend2.R”. That file is a text file of R code. The R code is a long series of “install.packages()” commands, each installing an individual package and its dependencies. In order to use this file, we can employ the source function in R. At the command prompt, using the source function (and the file name) asks R to read the R code in that text file. It will then execute all of the install.packages() lines in that file. In order to employ this approach, you have to know where the “bcd_recommend.R” file resides and navigate there. The source function is set up here to use another function called file.choose() which creates a dialog box where you can navigate to the correct folder and identify the “bcd_recommend.R” file to be passed to source. Once you do this, the process will take a while to install all of the packages. Some may not install and you can see this at the end by using the warnings() function as indicated by R if some package installations failed. You should copy/paste this list of uninstalled packages from the warnings() output, but don’t worry about them at this point. Just share the list with B. Dudek.

# pass the "bcd_recommend.R" file to the source function.
source(file.choose())

9.7 Installing the tidyverse

A suite of packages motivated by the RStudio folks and Hadley Wickham, in particular, have gained much favor recently in the R user world. They provide a nice set