This document provides information for new R users on installation, updating and maintining an R installation. It can be broadly useful to a wide audience, but is intended for students in B. Dudek’s statistics classes, and serves as the first of several R tutorial documents. Topics covered include installation/updates/maintenance of R, RStudio, and R packages. Some general background is provided along with instructions and links to detailed web postings that go into more detail if needed.
The R programming language (https://www.r-project.org/) has major strengths in data management, statistical analysis and data visualization. It has fast become a primary tool for data scientists, statisticians and researchers. It can be used on multiple operating systems and its installation is largely straight forward on all platforms. As an open source and free software ecosystem it provides a rich array of tools for a diverse audience of users.
This link to a page on the R project web site gives a more detailed overview: https://www.r-project.org/about.html
Downloads of R and more information are found on CRAN (Comprehenisive R Archive Network): https://cran.r-project.org/
R is installed as a relatively minimal configuration where the user passes R code to a command line environment called the R console (inside the R GUI). A majority of R users employ additional tool, RStudio (https://rstudio.com/) that is an Integrated Development Environment. Embedded in the RStudio configuration is the R console, file management and code writing capability, and several other useful components for displaying figures, managing add-on packages, etc. It is a very powerful way to use R, especially with its capabilities to use markdown, to build packages, and many other add-in capabilities. It is strongly recommended that new users become familiar with using RStudio very early in the R learning curve.
The question of why R? is often posed in the context of comparing it to other programming languages that can handle data science needs, or in comparison to commercially available software that have long and established histories (e.g., SAS, STATA, SPSS). There is no need to dwell on these comparisons. The major utility of R is its rich array of add-on contributions in the form of packages created by statisticians around the world. New methods can appear very quickly in the R eco-system of add-on packages. There is also an extensive support community online in the form of forums, blogs, and help sites. For the scientific researcher in many disciplines, skills in R are becoming an expected part of training in degree programs. It is an important tool to add to an arsenal of data analytic capabilities.
These URL’s expand on the points made above:
Spreadsheets such as Excel have major positive features and are often very helpful for data entry and initial data management. But R can do these things well too, and there is a reproducibility element with R that is important, since code can be saved, repeated, and documented. At its core, R is a complete data management and analytic system, with far more capabilities than spreadsheets. See the following commentary for additional perspectives:
The R language is very broadly capable, even in its simple original installation, but additional capabilities are found in packages.
The core R installation already has a set of add-on packages with capabilities suggested by their names. These are found in a subfolder of the R program installation called src/library/:
The R Project also “recommends” a set of additional packages that the user will have to install separately (see a later section in this document):
These are all useful but most users will quickly compile a longer list of additional packages that they use for specific purposes. For example the psych package contains a large suite of functions useful to researchers in the psychological sciences. Methods for installing these packages and a recommended set are described in a later section.
For purposes of B. Dudek’s classes, it is useful to install the most recent version of R so that we are all using the same version. As of this writing that version is 4.01, but it will likely be a later version by the time you read this. Also as of version 4.0, all previous packages you may have used on an earlier version need to be reinstalled (see the section below).
On each of the major platforms, the base R installation occurs with a minimum of effort. The steps are:
I used to provide a detailed document with screen captures showing each step but it is so simple that it is no longer necessary to generate that level of detail. If you want more detail, go to one of these sites that do provide additional guidance:
On Windows platforms, both 32 bit and 64 bit R is installed by the procedures outlined above if you are using an x64 version of Windows (which by now, most all users are). The 32 bit version was created originally because older PC OS installations were only 32 bit - and there is a memory limitation that is now much higher in the 64 bit version. The 64 bit version is the one you will want to use.
On MAC OS and LINUX, the base R installation is 64 bit.
After you have installed R, then install RStudio. Go to the RStudio web site https://rstudio.com/ and choose the download button (or go directly to the download page (https://rstudio.com/products/rstudio/download/). Then download the free RStudio Desktop version for your operating system. Execute the installer that you downloaded.
In order to begin learning to use R and RStudio and test the installations, I suggest the following.
2. At the command prompt in R Console, type the following to obtain the square root of 25:
##  5
R has returned an “object” that contains one element and that first element is the numeric value of the answer.
This function has drawn a frequency histogram of the Old Faithful Geyser eruption durations using a data set that comes with R (called faithul).
Notice that the same R Console and command prompt appear in the left pane (I think that is still the default).
With these quick tests, you know that you have a functioning installation and are ready to add other packages and begin learning/using R.
At the time of this writing, CRAN currently hosts 15806 add-on packages. Obviously there is no need to install all of them (or time!). Typically one adds packages when a need arises or you learn about a new package that is worth exploring.
One of the great things about R (and sometimes an infuriating thing) is that there are multiple ways of doing things. This is true with package installations as well. Add-on packages can be installed from the RGui menus, or from code submitted to the command prompt, and with the packages tab in RStudio. I still prefer generating the code to install packages so that I can keep a record of what I did.
The primary function to install packages is
install.packages(). You can look up the help page for this function by typing the following at the command prompt:
If you are working in the RGui/Rconsole, this help page will pop up as a second window. If you are in RStudio, the help page appears in the help tab in the bottom right pane.
This help page will probably be overly challenging for the user who is new to R, so lets go slowly.
This process can be done either from within the R GUI/R Console, or from the console withing RStudio. Initially, I suggest that you do this first test from the basic R installation (RGui/RConsole). Let’s begin by installing a single package from the command line, using the
install.packages() function. This line will install the
psych package. Each add-on package may have dependencies on other packages in order to function. Those dependencies are respected by
install.packages() and will also be installed. If you look at the help page for
install.packages() you will see that a “dependencies” argument is set so that dependencies are also installed by default.
If you do this in RGui/RConsole, a dialog box will appear asking your to choose a CRAN mirror. I typically choose one in Michigan, assuming that closer ones might be quicker. If you do this in RStudio, a default mirror is already set. You can change that default in the global options section of RStudio.
It is possible to install multiple packages with a single
install.packages() command. Let’s assume that we want to install
rcompanion. This code uses the
c function called concatenate, which creates a string of two package names passed to the
Many package developers use GitHub repositories for their packages. Sometimes you may want/need to install a development version of a package that is not yet on CRAN, or you may have identified a useful package that the developer hasn’t submitted to CRAN. That is the case for a package used by B. Dudek in his courses, a package called bcdstats.
In order to make this work, it is necessary to install another package first, devtools. A function in devtools called
install_github() is used.
In order to install a github package you need to know the location which would be found in the developer’s GitHub page. Information on the package page on GitHub probably repeats this sequence of commands. First load the devtools package and then execute the install function.
For requirements in B. Dudek’s courses an additional way of installing packages is available. A long list of packages are recommended for these courses (between one and two hundred). A separate file is provided, called something like “bcd_recommend.R” or “bcd_recommend2.R”. That file is a text file of R code. The R code is a long series of “install.packages()” commands, each installing an individual package and its dependencies. In order to use this file, we can employ the
source function in R. At the command prompt, using the source function (and the file name) asks R to read the R code in that text file. It will then execute all of the
install.packages() lines in that file. In order to employ this approach, you have to know where the “bcd_recommend.R” file resides and navigate there. The
source function is set up here to use another function called
file.choose() which creates a dialog box where you can navigate to the correct folder and identify the “bcd_recommend.R” file to be passed to
source. Once you do this, the process will take a while to install all of the packages. Some may not install and you can see this at the end by using the
warnings() function as indicated by R if some package installations failed. You should copy/paste this list of uninstalled packages from the
warnings() output, but don’t worry about them at this point. Just share the list with B. Dudek.
A suite of packages motivated by the RStudio folks and Hadley Wickham, in particular, have gained much favor recently in the R user world. They provide a nice set