1 Background and R Setup – One Way ANOVA with R

The goal of this document is provision of a template for using R to evaluate data from a 1-factor design that is typically called a 1-way ANOVA problem. The completely randomized design used for the initial illustration here is a 3-group design. These initial data come from an exercise in the classic Hays textbook. Later chapters utilize other data sets that have more treatment conditions.

The standard R axiom that there are always multiple ways of performing any task is never more accurate than with the ANOVA models. Beginning with graphical depiction and extending to standard NHST inferences, contrast analysis and post hoc tests, and evaluation of assumptions, the document also includes some rudimentary Bayesian approaches to inference.

This document

Is intended for use by APSY511 course at UAlbany, but can be more broadly used by data analysts.
Is a fairly full one-way anova exposition for a 3-group design and a second illustration with a five group design.
Implements graphical summaries, numerical descriptions.
Approaches ANOVA as linear modeling and is supplemented with analytical contrasts, and multiple comparison tests.
Implements trend analysis for quantitative IV’s.
Includes graphical and inferential evaluation of assumptions.
Includes sections on Bayesian Inference, Robust methods, and Resampling Methods
It includes a section on sample size planning with power analysis.

The document is constantly under development:

Additional work on effect size computations,
implementation of some newer multiple comparison methods
additional work on robust and resampling methods

One of the primary goals is to reproduce all the work we have accomplished with the SPSS REGRESSION, GLM, MANOVA and ONEWAY procedures (and then some).

Several R packages are required:

Show/Hide Code

#if (!requireNamespace("BiocManager", quietly = TRUE))
#    install.packages("BiocManager")
#BiocManager::install("Biobase", version = "3.8")

Show/Hide Code

# load packages
library(afex)
library(asbio)
library(BayesFactor)
library(beeswarm)
library(car)
library(coin)
library(dunn.test)
library(effectsize)
library(emmeans)
library(ez)
library(DTK)
library(ggdist)
library(gghalves)
library(ggplot2)
library(ggrain)
library(ggthemes)
library(ggstatsplot)
library(granova)
library(gridExtra)
library(gt)
library(KScorrect)
library(knitr)
library(lattice)
library(lawstat)
library(lmboot)
library(lmPerm)
library(lsr)
library(multcomp)
library(multtest)
library(mutoss)
library(nortest)
library(outliers)
library(pgirmess)
library(plotrix)
library(plyr)
library(psych)
library(pwr)
library(rcompanion)
library(Rmisc)
library(sciplot)
library(sdamr)
library(sjstats)
library(userfriendlyscience)
library(WRS2)
library(dplyr)

Package citations for packages loaded here (in the above order): afex (Singmann, Bolker, Westfall, & Aust, 2018), asbio (Aho, 2019), BayesFactor (Morey & Rouder, 2018), beeswarm (Eklund, 2016), car (Fox, Weisberg, & Price, 2018), coin (Hothorn, Hornik, van de Wiel, Winell, & Zeileis, 2017), effectsize (Ben-Shachar, Makowski, & Lüdecke, 2021), emmeans (Lenth, 2019), ez (Lawrence, 2016), DTK (Lau, 2013), dunn.test (Dinno, 2017), ggdist (Kay, 2024), ggplot2 (Tiedemann, 2022), ggplot2 (Wickham et al., 2018), ggrain (Judd, van Langen, & Kievit, 2024), ggthemes (Arnold, 2018), ggstatsplot (Patil, 2021), granova (Pruzek & Helmreich, 2014), gridExtra (Auguie, 2017), gt (Iannone, Cheng, & Schloerke, 2019), KScorrect (Novack-Gottshall & Wang, 2018), knitr (Xie, 2018), lattice (Sarkar, 2018) lawstat (Gastwirth et al., 2017), lmPerm (Wheeler & Torchiano, 2016), lsr (Navarro, 2015) multcomp (Hothorn, Bretz, & Westfall, 2017), multtest (Pollard, Gilbert, Ge, Taylor, & Dudoit, 2018), mutoss (Team et al., 2017), nortest (Gross & Ligges, 2015), outliers (Komsta, 2011), pgirmess (Giraudoux, 2018), plotrix (Lemon et al., 2018), plyr (Wickham, 2016), psych (Revelle, 2019), pwr (Champely, 2018), rcompanion (Mangiafico, 2019), Rmisc (Hope, 2013,) sciplot (Morales, R Development Core Team, R-help listserv community, & Duncan Murdoch., 2017), sdamr (Speekenbrink, 2022), sjstats (Lüdecke, 2019), userfriendlyscience (Peters, 2017), WRS2 (Mair & Wilcox, 2018), dplyr (Wickham, François, Henry, & Müller, 2019)

1.1 A note on R version and package installations.

R packages are undergoing constant revision and some code here may be deprecated or slightly modified in more recent versions of some packages. RStudio makes it simple to update versions of packages. Users can always install the most recent versions (or archived versions if they are no longer maintained on CRAN) of R packages with source files rather than binaries, when they are available. The general process is to download the appropriate source files from the repository (ending in “tar.gz”). Then use this function to install the package:

Show/Hide Code

#install.packages(file.choose(), repos=NULL, type="source")

Note that Windows users will need to install the Rtools suite of tools before source package installation is attempted.

https://cran.r-project.org/bin/windows/Rtools/

Rstudio may permit direct installation from source.

Two packages that are required for permutation tests and bootstrapping, lmPerm and lmboot, may be arcived from CRAN can be obtained by searching CRAN (search the package name).

Three packages come from the BioConductor suite of r packages and the core BioConductor installer should also be installed.
https://www.bioconductor.org/

Search for pages of each of these four to download and install the latest package source files. But by the time you read this the normal process of installing the binary files may work (see the BiocManager page)

BiocManager

Biobase

BioGenerics

multtest

1.2 Resources

The following list will provide a good start for those needing a broader background in ANOVA techniques and more detailed sources for the primary packages employed in this document.

Salvatore S. Mangiafico’s R Companion: [https://rcompanion.org/rcompanion/d_05.html]
Martin Schweinberger’s Blog: [http://www.martinschweinberger.de/blog/one-way-anova/]
cwoods on RPub: [https://rpubs.com/cwoods/anova]
Daniel Wollschläger’s R Examples Repository [http://dwoll.de/rexrepos/posts/anovaCRp.html]

1.3 A note on R coding style

In this document, a great many functions from a great many packages are used. Sometimes packages use the same name for a function that is uses in another package. In order to reduce ambiguity I have attempted to be consistent in a way of calling functions in the code.

Normally, if a package is loaded, we can write code that just calls the function. For example, here is how one can call the describe function to analyze an object/dataframe/variable:

Show/Hide Code

describe(variablename)

But a describe function exists in multiple packages. Readers would not necessarily know which package the describe function employed here came from unless the text or a comment in the code chunk identified it. It turns out that the last package loaded with that function in it would gain priority.

So, in order to add clarity, I have tried to use the pkgname::functionname convention. Preceding the function name with the package name and the double colon, executes the function from that package. This is effective even if the package has not previously been loaded with the library function. For example:

Show/Hide Code

psych::describe(variable_name)

The exception is when a function is found in base R packages, but not add-ons. In that case I don’t use the :: approach.

Aho, K. (2019). Asbio: A collection of statistical tools for biologists. Retrieved from https://CRAN.R-project.org/package=asbio

Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., … Iannone, R. (2018). Rmarkdown: Dynamic documents for r. Retrieved from https://CRAN.R-project.org/package=rmarkdown

Arnold, J. B. (2018). Ggthemes: Extra themes, scales and geoms for ’ggplot2’. Retrieved from https://CRAN.R-project.org/package=ggthemes

Auguie, B. (2017). gridExtra: Miscellaneous functions for "grid" graphics. Retrieved from https://CRAN.R-project.org/package=gridExtra

Ben-Shachar, M. S., Makowski, D., & Lüdecke, D. (2021). Effectsize: Indices of effect size and standardized parameters. Retrieved from https://easystats.github.io/effectsize/

Champely, S. (2018). Pwr: Basic functions for power analysis. Retrieved from https://CRAN.R-project.org/package=pwr

Dinno, A. (2017). Dunn.test: Dunn’s test of multiple comparisons using rank sums. Retrieved from https://CRAN.R-project.org/package=dunn.test

Eklund, A. (2016). Beeswarm: The bee swarm plot, an alternative to stripchart. Retrieved from https://CRAN.R-project.org/package=beeswarm

Fox, J., Weisberg, S., & Price, B. (2018). Car: Companion to applied regression. Retrieved from https://CRAN.R-project.org/package=car

Gastwirth, J. L., Gel, Y. R., Hui, W. L. W., Lyubchich, V., Miao, W., & Noguchi, K. (2017). Lawstat: Tools for biostatistics, public policy, and law. Retrieved from https://CRAN.R-project.org/package=lawstat

Giraudoux, P. (2018). Pgirmess: Spatial analysis and data mining for field ecologists. Retrieved from https://CRAN.R-project.org/package=pgirmess

Gross, J., & Ligges, U. (2015). Nortest: Tests for normality. Retrieved from https://CRAN.R-project.org/package=nortest

Hope, R. M. (2013). Rmisc: Ryan miscellaneous. Retrieved from https://CRAN.R-project.org/package=Rmisc

Hothorn, T., Bretz, F., & Westfall, P. (2017). Multcomp: Simultaneous inference in general parametric models. Retrieved from https://CRAN.R-project.org/package=multcomp

Hothorn, T., Hornik, K., van de Wiel, M. A., Winell, H., & Zeileis, A. (2017). Coin: Conditional inference procedures in a permutation test framework. Retrieved from https://CRAN.R-project.org/package=coin

Howell, D. C. (2013). Statistical methods for psychology (8th ed., pp. xix, 770 p.). Book, Belmont, CA: Wadsworth Cengage Learning.

Iannone, R., Cheng, J., & Schloerke, B. (2019). Gt: Easily create presentation-ready display tables. Retrieved from https://github.com/rstudio/gt

Judd, N., van Langen, J., & Kievit, R. (2024). Ggrain: A rainclouds geom for ggplot2. Retrieved from https://github.com/njudd/ggrain

Kay, M. (2024). Ggdist: Visualizations of distributions and uncertainty. Retrieved from https://mjskay.github.io/ggdist/

Komsta, L. (2011). Outliers: Tests for outliers. Retrieved from https://CRAN.R-project.org/package=outliers

Lau, M. K. (2013). DTK: Dunnett-tukey-kramer pairwise multiple comparison test adjusted for unequal variances and unequal sample sizes. Retrieved from https://CRAN.R-project.org/package=DTK

Lawrence, M. A. (2016). Ez: Easy analysis and visualization of factorial experiments. Retrieved from https://CRAN.R-project.org/package=ez

Lemon, J., Bolker, B., Oom, S., Klein, E., Rowlingson, B., Wickham, H., … Groemping, U. (2018). Plotrix: Various plotting functions. Retrieved from https://CRAN.R-project.org/package=plotrix

Lenth, R. (2019). Emmeans: Estimated marginal means, aka least-squares means. Retrieved from https://CRAN.R-project.org/package=emmeans

Lüdecke, D. (2019). Sjstats: Collection of convenient functions for common statistical computations. Retrieved from https://CRAN.R-project.org/package=sjstats

Mair, P., & Wilcox, R. (2018). WRS2: A collection of robust statistical methods. Retrieved from https://CRAN.R-project.org/package=WRS2

Mangiafico, S. (2019). Rcompanion: Functions to support extension education program evaluation. Retrieved from https://CRAN.R-project.org/package=rcompanion

Morales, M., R Development Core Team, with code developed by the, R-help listserv community, with general advice from the, & Duncan Murdoch., especially. (2017). Sciplot: Scientific graphing functions for factorial designs. Retrieved from https://CRAN.R-project.org/package=sciplot

Morey, R. D., & Rouder, J. N. (2018). BayesFactor: Computation of bayes factors for common designs. Retrieved from https://CRAN.R-project.org/package=BayesFactor

Navarro, D. (2015). Lsr: Companion to "learning statistics with r". Retrieved from https://CRAN.R-project.org/package=lsr

Novack-Gottshall, P., & Wang, S. C. (2018). KScorrect: Lilliefors-corrected kolmogorov-smirnov goodness-of-fit tests. Retrieved from https://CRAN.R-project.org/package=KScorrect

Patil, I. (2021). Ggstatsplot: ggplot2 based plots with statistical details. Retrieved from https://CRAN.R-project.org/package=ggstatsplot

Peters, G.-J. Y. (2017). Diamond plots: A tutorial to introduce a visualisation tool that facilitates interpretation and comparison of multiple sample estimates while respecting their inaccuracy. PsyArXiv. Retrieved from https://psyarxiv.com/fzh6c

Pollard, K. S., Gilbert, H. N., Ge, Y., Taylor, S., & Dudoit, S. (2018). Multtest: Resampling-based multiple hypothesis testing.

Pruzek, R. M., & Helmreich, J. E. (2014). Granova: Graphical analysis of variance. Retrieved from https://CRAN.R-project.org/package=granova

Revelle, W. (2019). Psych: Procedures for psychological, psychometric, and personality research. Retrieved from https://CRAN.R-project.org/package=psych

Sarkar, D. (2018). Lattice: Trellis graphics for r. Retrieved from https://CRAN.R-project.org/package=lattice

Singmann, H., Bolker, B., Westfall, J., & Aust, F. (2018). Afex: Analysis of factorial experiments. Retrieved from https://CRAN.R-project.org/package=afex

Speekenbrink, M. (2022). Sdamr: Statistics: Data analysis and modelling. Retrieved from https://mspeekenbrink.github.io/sdam-r/

Team, M. C., Blanchard, G., Dickhaus, T., Hack, N., Konietschke, F., Rohmeyer, K., … Werft, W. (2017). Mutoss: Unified multiple testing procedures. Retrieved from https://CRAN.R-project.org/package=mutoss

Tiedemann, F. (2022). Gghalves: Compose half-half plots using your favourite geoms. Retrieved from https://github.com/erocoar/gghalves

Wheeler, B., & Torchiano, M. (2016). lmPerm: Permutation tests for linear models. Retrieved from https://CRAN.R-project.org/package=lmPerm

Wickham, H. (2016). Plyr: Tools for splitting, applying and combining data. Retrieved from https://CRAN.R-project.org/package=plyr

Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., & Woo, K. (2018). ggplot2: Create elegant data visualisations using the grammar of graphics. Retrieved from https://CRAN.R-project.org/package=ggplot2

Wickham, H., François, R., Henry, L., & Müller, K. (2019). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr

Xie, Y. (2018). Knitr: A general-purpose package for dynamic report generation in r. Retrieved from https://CRAN.R-project.org/package=knitr