The goal of this document is provision of a template for using R to evaluate data from a 1-factor design that is typically called a 1-way ANOVA problem. The completely randomized design used for the initial illustration here is a 3-group design. These initial data come from an exercise in the classic Hays textbook. Later chapters utilize other data sets that have more treatment conditions.
The standard R axiom that there are always multiple ways of performing any task is never more accurate than with the ANOVA models. Beginning with graphical depiction and extending to standard NHST inferences, contrast analysis and post hoc tests, and evaluation of assumptions, the document also includes some rudimentary Bayesian approaches to inference.
This document
Is intended for use by APSY511 course at UAlbany, but can be more broadly used by data analysts.
Is a fairly full one-way anova exposition for a 3-group design and a second illustration with a five group design.
Approaches ANOVA as linear modeling and is supplemented with analytical contrasts, and multiple comparison tests.
Implements trend analysis for quantitative IV’s.
Includes graphical and inferential evaluation of assumptions.
Includes sections on Bayesian Inference, Robust methods, and Resampling Methods
It includes a section on sample size planning with power analysis.
The document is constantly under development:
Additional work on effect size computations,
implementation of some newer multiple comparison methods
additional work on robust and resampling methods
One of the primary goals is to reproduce all the work we have accomplished with the SPSS REGRESSION, GLM, MANOVA and ONEWAY procedures (and then some).
Several R packages are required:
Show/Hide Code
#if (!requireNamespace("BiocManager", quietly = TRUE))# install.packages("BiocManager")#BiocManager::install("Biobase", version = "3.8")
1.1 A note on R version and package installations.
R packages are undergoing constant revision and some code here may be deprecated or slightly modified in more recent versions of some packages. RStudio makes it simple to update versions of packages. Users can always install the most recent versions (or archived versions if they are no longer maintained on CRAN) of R packages with source files rather than binaries, when they are available. The general process is to download the appropriate source files from the repository (ending in “tar.gz”). Then use this function to install the package:
Rstudio may permit direct installation from source.
Two packages that are required for permutation tests and bootstrapping, lmPerm and lmboot, may be arcived from CRAN can be obtained by searching CRAN (search the package name).
Three packages come from the BioConductor suite of r packages and the core BioConductor installer should also be installed. https://www.bioconductor.org/
Search for pages of each of these four to download and install the latest package source files. But by the time you read this the normal process of installing the binary files may work (see the BiocManager page)
BiocManager
Biobase
BioGenerics
multtest
1.2 Resources
The following list will provide a good start for those needing a broader background in ANOVA techniques and more detailed sources for the primary packages employed in this document.
Salvatore S. Mangiafico’s R Companion: [https://rcompanion.org/rcompanion/d_05.html]
Martin Schweinberger’s Blog: [http://www.martinschweinberger.de/blog/one-way-anova/]
cwoods on RPub: [https://rpubs.com/cwoods/anova]
Daniel Wollschläger’s R Examples Repository [http://dwoll.de/rexrepos/posts/anovaCRp.html]
1.3 A note on R coding style
In this document, a great many functions from a great many packages are used. Sometimes packages use the same name for a function that is uses in another package. In order to reduce ambiguity I have attempted to be consistent in a way of calling functions in the code.
Normally, if a package is loaded, we can write code that just calls the function. For example, here is how one can call the describe function to analyze an object/dataframe/variable:
Show/Hide Code
describe(variablename)
But a describe function exists in multiple packages. Readers would not necessarily know which package the describe function employed here came from unless the text or a comment in the code chunk identified it. It turns out that the last package loaded with that function in it would gain priority.
So, in order to add clarity, I have tried to use the pkgname::functionname convention. Preceding the function name with the package name and the double colon, executes the function from that package. This is effective even if the package has not previously been loaded with the library function. For example:
Show/Hide Code
psych::describe(variable_name)
The exception is when a function is found in base R packages, but not add-ons. In that case I don’t use the :: approach.
Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., … Iannone, R. (2018). Rmarkdown: Dynamic documents for r. Retrieved from https://CRAN.R-project.org/package=rmarkdown
Ben-Shachar, M. S., Makowski, D., & Lüdecke, D. (2021). Effectsize: Indices of effect size and standardized parameters. Retrieved from https://easystats.github.io/effectsize/
Gastwirth, J. L., Gel, Y. R., Hui, W. L. W., Lyubchich, V., Miao, W., & Noguchi, K. (2017). Lawstat: Tools for biostatistics, public policy, and law. Retrieved from https://CRAN.R-project.org/package=lawstat
Hothorn, T., Hornik, K., van de Wiel, M. A., Winell, H., & Zeileis, A. (2017). Coin: Conditional inference procedures in a permutation test framework. Retrieved from https://CRAN.R-project.org/package=coin
Howell, D. C. (2013). Statistical methods for psychology (8th ed., pp. xix, 770 p.). Book, Belmont, CA: Wadsworth Cengage Learning.
Iannone, R., Cheng, J., & Schloerke, B. (2019). Gt: Easily create presentation-ready display tables. Retrieved from https://github.com/rstudio/gt
Judd, N., van Langen, J., & Kievit, R. (2024). Ggrain: A rainclouds geom for ggplot2. Retrieved from https://github.com/njudd/ggrain
Lau, M. K. (2013). DTK: Dunnett-tukey-kramer pairwise multiple comparison test adjusted for unequal variances and unequal sample sizes. Retrieved from https://CRAN.R-project.org/package=DTK
Morales, M., R Development Core Team, with code developed by the, R-help listserv community, with general advice from the, & Duncan Murdoch., especially. (2017). Sciplot: Scientific graphing functions for factorial designs. Retrieved from https://CRAN.R-project.org/package=sciplot
Novack-Gottshall, P., & Wang, S. C. (2018). KScorrect: Lilliefors-corrected kolmogorov-smirnov goodness-of-fit tests. Retrieved from https://CRAN.R-project.org/package=KScorrect
Peters, G.-J. Y. (2017). Diamond plots: A tutorial to introduce a visualisation tool that facilitates interpretation and comparison of multiple sample estimates while respecting their inaccuracy. PsyArXiv. Retrieved from https://psyarxiv.com/fzh6c
Pollard, K. S., Gilbert, H. N., Ge, Y., Taylor, S., & Dudoit, S. (2018). Multtest: Resampling-based multiple hypothesis testing.
Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., & Woo, K. (2018). ggplot2: Create elegant data visualisations using the grammar of graphics. Retrieved from https://CRAN.R-project.org/package=ggplot2