This document provides information for new R users on installation, updating and maintining an R installation. It can be broadly useful to a wide audience, but is intended for students in B. Dudek’s statistics classes, and serves as the first of several R tutorial documents. Topics covered include installation/updates/maintenance of R, RStudio, and R packages. Some general background is provided along with instructions and links to detailed web postings that go into more detail if needed.
The R programming language (https://www.r-project.org/) has major strengths in data management, statistical analysis and data visualization. It has fast become a primary tool for data scientists, statisticians and researchers. It can be used on multiple operating systems and its installation is largely straight forward on all platforms. As an open source and free software ecosystem it provides a rich array of tools for a diverse audience of users.
This link to a page on the R project web site gives a more detailed overview: https://www.r-project.org/about.html
Downloads of R and more information are found on CRAN (Comprehenisive R Archive Network): https://cran.r-project.org/
R is installed as a relatively minimal configuration where the user passes R code to a command line environment called the R console (inside the R GUI). A majority of R users employ additional tool, RStudio desktop (https://posit.co/) that is an Integrated Development Environment. Embedded in the RStudio configuration is the R console, file management and code writing capability, and several other useful components for displaying figures, managing add-on packages, etc. It is a very powerful way to use R, especially with its capabilities to use markdown, to build packages, and many other add-in capabilities. It is strongly recommended that new users become familiar with using RStudio very early in the R learning curve.
The question of why R? is often posed in the context of comparing it to other programming languages that can handle data science needs, or in comparison to commercially available software that have long and established histories (e.g., SAS, STATA, SPSS). There is no need to dwell on these comparisons. The major utility of R is its rich array of add-on contributions in the form of packages created by statisticians around the world. New methods can appear very quickly in the R eco-system of add-on packages. There is also an extensive support community online in the form of forums, blogs, and help sites. For the scientific researcher in many disciplines, skills in R are becoming an expected part of training in degree programs. It is an important tool to add to an arsenal of data analytic capabilities.
These URL’s expand on the points made above:
https://www.burns-stat.com/documents/tutorials/why-use-the-r-language/
http://www.econometricsbysimulation.com/2014/03/why-use-r-five-reasons.html
Spreadsheets such as Excel have major positive features and are often very helpful for data entry and initial data management. But R can do these things well too, and there is a reproducibility element with R that is important, since code can be saved, repeated, and documented. At its core, R is a complete data management and analytic system, with far more capabilities than spreadsheets. See the following commentary for additional perspectives:
https://www.burns-stat.com/documents/tutorials/spreadsheet-addiction/
The R language is very broadly capable, even in its simple original installation, but additional capabilities are found in add-on packages. These packages are created by statisticians/programmers world-wide and this is a major strength of the R ecosystem.
The core R installation already has a set of add-on packages with capabilities suggested by their names. These are found in a sub-folder of the R program installation called src/library/:
The R Project also “recommends” a set of additional packages that the user will have to install separately (see a later section in this document):
https://cran.r-project.org/src/contrib/4.2.0/Recommended/
These are all useful but most users will quickly compile a longer list of additional packages that they use for specific purposes. For example the psych package contains a large suite of functions useful to researchers in the psychological sciences. Methods for installing these packages and a recommended set are described in a later section.
For purposes of B. Dudek’s classes, it is useful to install the most recent version of R so that we are all using the same version (including in the classroom). As of this writing that version is 4.4.1. So if you, by chance, have an older installation of R, then please update it. Also as of version 4.0, all previous packages you may have used on an earlier version need to be re-installed (see the section below).
On each of the major platforms, the base R installation occurs with a minimum of effort. The steps are:
Historically, I used to provide a detailed document with screen captures showing each step but it is so simple that it is no longer necessary to generate that level of detail. If you want more detail, go to one of these sites that do provide additional guidance:
https://www.andrewheiss.com/blog/2012/04/17/install-r-rstudio-r-commander-windows-osx/
https://techvidvan.com/tutorials/install-r/
https://rstudio-education.github.io/hopr/starting.html
https://www.datacamp.com/community/tutorials/installing-R-windows-mac-ubuntu
https://www.dataquest.io/blog/tutorial-getting-started-with-r-and-rstudio/
The R consortium offers several flavors of the R installation package. One major distinction is the difference between Apple products using the Intel chipsets and those newer ones using Apple silicon, the M1/M2/M3 architecture - also labeled as “ARM”. Make sure that you choose the correct installer for your Mac. It is also important to have the most recent update of your specific operating system, Xcode, and Xquartz as outlined on the R installation page for Apple products.
On Windows platforms, both 32 bit and 64 bit R is installed by the procedures outlined above if you are using an x64 version of Windows (which by now, most all users are). The 32 bit version was created originally because older PC OS installations were only 32 bit - and there is a memory limitation that is now much higher in the 64 bit version. The 64 bit version is the one you will want to use. Soon, the R consortium will terminate offering a 32 bit version.
On MAC OS and LINUX, the base R installation is 64 bit.
After you have installed R, then install RStudio. RStudio desktop is a product of a company called Posit (confusingly, the company was formerly known as RStudio) Go to the Posit web site (https://posit.co/) and choose the download button (or go directly to the download page (https://posit.co/downloads/). Then download the free RStudio Desktop version for your operating system. Execute the installer that you downloaded.
In order to begin learning to use R and RStudio and test the installations, I suggest the following.
## [1] 5
R has returned an “object” that contains one element and that first element is the numeric value of the answer.
This function has drawn a frequency histogram of the Old Faithful Geyser eruption durations using a data set that comes with R (called faithul).
Notice that the same R Console and command prompt appear in the left pane (I think that is still the default).
With these quick tests, you know that you have a functioning installation and are ready to add other packages and begin learning/using R.
Students in B. Dudek’s courses should see the section below on package needs and installation methods specific for the courses. This section provides general information on package installation.
At the time of this writing, CRAN currently hosts 21118 add-on packages (many more useful ones are shared via individual GitHub accounts. Obviously there is no need to install all of them (or time!). Typically one adds packages when a need arises or you learn about a new package that is worth exploring.
Which packages to start with???? One set of recommended packages comes from the folks at Posit: https://support.posit.co/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages Cran R also has interesting resources called “Task Views” which provides suggestions for package installation by discipline: https://cran.r-project.org/web/views/ Again, students in B. Dudek’s classes will want to use the method in a section found here, below.
One of the great things about R (and sometimes an infuriating thing) is that there are multiple ways of doing things. This is true with package installations as well. Add-on packages can be installed from the RGui menus, or from code submitted to the command prompt, and with the packages tab in RStudio. I still prefer generating the code to install packages so that I can keep a record of what I did.
The primary function to install packages is
install.packages()
. You can look up the help page for this
function by typing the following at the command prompt:
If you are working in the RGui/Rconsole, this help page will pop up as a second window. If you are in RStudio, the help page appears in the help tab in the bottom right pane.
This help page will probably be overly challenging for the user who is new to R, so lets go slowly.
This process can be done either from within the R GUI/R Console, or
from the console withing RStudio. Initially, I suggest that you do this
first test from the basic R installation (RGui/RConsole). Let’s begin by
installing a single package from the command line, using the
install.packages()
function. This line will install the
psych
package. Each add-on package may have dependencies on
other packages in order to function. Those dependencies are respected by
install.packages()
and will also be installed. If you look
at the help page for install.packages()
you will see that a
“dependencies” argument is set so that dependencies are also installed
by default.
If you do this in RGui/RConsole, a dialog box will appear asking your to choose a CRAN mirror. I typically choose one in Michigan, assuming that closer ones might be quicker. If you do this in RStudio, a default mirror is already set. You can change that default in the global options section of RStudio.
It is possible to install multiple packages with a single
install.packages()
command. Let’s assume that we want to
install yhat
and rcompanion
. This code uses
the c
function called concatenate, which creates a string
of two package names passed to the install.packages()
function.
By choosing that “packages” tab, you have forced RStudio to show all packages that you have installed. But to install new packages, choose the “install” tab there. The dialog box permits you to install one or more packages by typing in their name in the available box. By default it goes to CRAN to look for those packages, but you may have a local file that contains a package and the option there is to specify the local location of that package file.
Many package developers use GitHub repositories for their packages. Sometimes you may want/need to install a development version of a package that is not yet on CRAN, or you may have identified a useful package that the developer hasn’t submitted to CRAN. That is the case for a package used by B. Dudek in his courses, a package called bcdstats.
In order to make this work, it is necessary to install another
package first, devtools. A function in
devtools called install_github()
is
used.
In order to install a github package you need to know the location which would be found in the developer’s GitHub page. Information on the package page on GitHub probably repeats this sequence of commands. First load the devtools package and then execute the install function.
For requirements in B. Dudek’s courses an additional way of
installing packages is available. A long list of packages are
recommended for these courses (between one and two hundred). A separate
file is provided, called something like “bcd_recommend.R” or
“bcd_recommend2.R”. That file is a text file of R code. The R code is a
long series of “install.packages()” commands, each installing an
individual package and its dependencies. In order to use this file, we
can employ the source
function in R. At the command prompt,
using the source function (and the file name) asks R to read the R code
in that text file. It will then execute all of the
install.packages()
lines in that file. In order to employ
this approach, you have to know where the “bcd_recommend.R” file resides
and navigate there. The source
function is set up here to
use another function called file.choose()
which creates a
dialog box where you can navigate to the correct folder and identify the
“bcd_recommend.R” file to be passed to source
. Once you do
this, the process will take a while to install all of the packages. Some
may not install and you can see this at the end by using the
warnings()
function as indicated by R if some package
installations failed. You should copy/paste this list of uninstalled
packages from the warnings()
output, but don’t worry about
them at this point. Just share the list with B. Dudek.
A suite of packages motivated by the RStudio folks and Hadley Wickham, in particular, have gained much favor recently in the R user world. They provide a nice set of tools for data handling, data visualization and many other things. For example the ggplot2 package has become a heavily used approach to graphing. The current number of tidyverse package is over 20 and you can read about them here: https://www.tidyverse.org/
The tidyverse packages can be installed simultaneously with this code. But note that if you used the “bcd_recommend.R” script above, one of the first things it did was to install the tidyverse suite of packages.
The syntax of the install.packages() function works on all OS platforms. And despite some superficial differences in appearance of the RGui/RConsole on Linux/MAC/Windows, the R program largely runs the same way. But some differences with regard to package installation are less apparent.
If you visit the home page of the package (e.g., psych at CRAN you will find that there are actually three different formats of the package
psych_1.9.12.31.tar.gz , the package source code
psych_1.9.12.31.zip, the Windows binary containing the package
psych_1.9.12.31.gz .tgz, the Mac binary containing the package
What actually happens when you run `install.packages(“psych”) is that, on Mac or Windows, R downloads these compressed pre-compiled binaries and just unpacks t