1 Three way factorial design. A completely randomized, “between subjects” design
This document is designed for students in the APSY510/511 introductory statistics sequence at the University at Albany. However, it should have much more general utility for other students and researchers learning to use R. A major strength of the document is presentation of extensive methods/logic of employing analytical/orthogonal contrasts both as an approach to obtaining the omnibus analysis and for direct evaluation.
Designs that are called “Completely Randomized Factorial Designs”, presume assignment of cases to groups independently. They are sometimes called “between subjects” designs in the psychological sciences. For the design covered in this document, three independent variables are factorially arranged so that each level of each factor is found in combination with each level of the other factors.
The challenge of analyzing a 3 way factorial design is not in the specification and analysis of the omnibus model (aov, lm, afex,granova`), it is in the follow up analyses.
With the use of the phia and emmeans packages we can obtain Simple 2-way interactions, Simple Simple Main Effects, Simple Main Effects, and orthogonally partitioned contrasts (and non-orthogonal ones) on all sources of variance, e.g., 3-, and 2-way interaction contrasts, main effect contrasts, simple main effect contrasts.
The bulk of the presentation here is a traditional NHST framework.
This document still needs:
easier ways to compute effect sizes, especially for contrasts
bayesian and robust methods
more on post hoc tests
In order to use this tutorial effectively, the reader should be alert to terminology specific to the style of ANOVA analysis that is traditional in the textbooks such as Kirk, Winer, Keppel, and Maxwell/Delaney/Kelley. For some of the terminology, block diagram depecitons of a 3-way factorial are included for clarity. The relevant terminology/notation includes:
A fixed effects AxBxC design: Three independent variables, perhaps called A, B, and C. Fully “between groups” design where participants/cases are measured only once on the outcome variable under the combined conditions of factors A, B, and C. All independent variables are fixed rather than random.
Levels: The groups defined by each variable are called levels. E.g., in the primary example used in this document the “Grade” factor (factor C) has two “levels”: fifth grade and twelfth grade.
Factorial: The design is called a factorial since each level of each factor is found in all possible combinations with all levels of all other factors. So, for example a 3x3x2 factorial design (AxBxC) has 18 total combinations or “cells”
Cell means vs Marginal Means, Omnibus Effects: Cells are defined by the three way combination of factors. The 3x3x2 design used in this document has 18 cells. But we can imaging “collapsing” on one or more IVs to produce marginal sets of means. For example collapsing on factor C produces an AxB matrix of means (9 means) and that would lead to assessment of the omnibus AxB interaction term. The omnibus terms in a 3 way factorial are the three main effects, the three two way interactions and the three way interaction, plus the pooled within group source (residual).
Simple Effects
When effects of one or more factors are examined at only specific levels of one or more other factor, then those effects are “simple” I find that this term is not generically used outside of the psychology experimental design textbook world or in the R ecosystem. But there is no alternative terminology so the “simple” phraseology strikes me as having a useful role even though a phrase like “simple, simple” may seem a bit silly. These can include:
Simple interactions: It is possible to ask a two way interaction question (e.g., AxB), not in an omnibus sense, but at a level of a third IV. For example - AxB at C1
Here is a block diagram depiction of a 3-way design that is a 4x2x2, thus 24 total cells. The red rectangle outlines the part of the study where only participants treated with C1 are. We could imagine a 2-way interaction of A and B there. Thus it would be the simple 2-way interaction of A*B @ C1
Simple Simple Main effects: The term “main” implies the effect of one factor, but the word “simple” implies that the effect is examined at some level of one or more other factors. The textbook language of “simple-simple” implies that the effect of the one factor of interest (e.g., A) is examined at combined levels of two other factors. For example - B at A3,C2
Here is a depiction of a simple simple main effect of B at A3,C2. The effect involves the variation of the three means whose cells are indicated by the black dots - it is a one-dimensional (1-way) question about factor A, located in a specific section of the full block diagram.
Simple Main effects: effects of one IV at levels of another factor, but collapsed on the third. E.g., A at B1 or C at A3. These are 1-way questions about the effect of a single IV but the location of the questions is specified by a level of a second variable. The word “collapse” indicates that the grid of means is averaged over a third factor. Depicted with the block diagram, we can see the simple main effect of factor A is examined collapsed over B (the marginals), but only in the C1-treated participants. Thus A @ C1 implies collapsing over factor B
An important part of the grasp of the logic is the understanding of these following concepts:
A three way interaction is a test of whether simple 2 way interactions differ across levels of a 3rd variable, e.g., AxB at C1 is not the same as AxB at C2
A simple 2 way interaction is a test of whether simple simple main effects differ.
An omnibus 2 way interaction tests whether simple main effects differ.
A 3 way interaction contrast tests whether simple 2way interaction contrasts differ across levels of a third factor.
2 way interaction contrasts test whether simple main effect contrasts differ.
When designs such as this are fully factorial, between-groups designs, the error term for all of the sources above would be MSwithin.
In addition, many sections of this tutorial rely on the idea of creating orthogonal sets of contrasts. This philosophy derives from the alignment of that approach with theory-driven a priori hypotheses and from the recognized value of full rank models in ANOVA work.
2 R setup
Quite a few packages are used in this document. In many code chunks, where a the package origin of a function is not described or obvious, I use the pkgname::functionname syntax when a function is called. For example psych::describeBy calls the describeBy function from the **psych** package. This format is not used when functions come from base system packages. One can always ask for help on a function (?functionname`) to establish the package of its origin if I’ve been inconsistent in this syntactical phrasing.
Show/Hide Code
library(afex)library(bcdstats) #available from bcdudek on Githublibrary(car)library(effectsize)library(emmeans)library(ggplot2)library(ggthemes)library(ggrain)library(gt)library(knitr)library(lmtest)library(nortest)library(phia)library(plyr)library(psych)library(Rmisc)library(sciplot)library(sjstats)library(tibble)
3 Data Definition
The data are from Keppel and Wickens, pg 466 data set, a 3x3x2 design it is the “keppel_3way_pg466.csv” file. The dependent variable is number of words recalled from a memorized list. Three independent variables were involved and factorially arranged. Participants were either fifth or twelfth graders. They were assigned to one of three Feedback conditions: control (none), praise, or negative. The words were one of three types: “LF_LE”, “HF_LE”, or “HF_HE”. The factorial arrangement of these three factors thus produced a total of eighteen cells in the design.
The data set was explicitly chosen because the effects in the model are subtle and a simple story does not emerge. But it does serve as a useful illustration of how to extract a very wide range of analyses from this 3way factorial design.
In most illustrations in this document, the emphasis is more on how to perform the myriad analyses rather than on interpretation of the outcome of this textbook data set.
Next, I reorder the levels of two of the factors so that they are not the default alphabetical that R uses. The new orders match how we worked with these variables in other software. These orders are useful in graph labeling and in construction of contrast vectors.
Numeric and graphical summaries are obtained in this section.
4.1 Numerical summaries of the dependent variable, by group.
First, use the psych package to look at descriptives. In order to obtain nicer looking tables, it required some relabeling and splitting of the object produced by describeBy into two separate tables - with nicerformatting by using gt.
Use the standard boxplot function from base R. Notice how the labels are not always fully visible. It is possible to use other functions to angle the labels so they can be seen - not worked out here. With the small N data set from a textbook, some of the plots are not particularly helpful with this data set, but I show them anyway since the primary goal here is a template for use with more realistic and larger data sets. Consequently, the boxplots have odd features in some groups.
A different approach to clustered boxplots can be implemented with ggplot2.
ggplot requires a summary of the data that is efficiently provided by the summarySE function. The column in the table labeled with the DV name is actually a column of cell means.
And now ggplot can create the clustered boxplot graph from that summary object. Since we have three IVs, we need to use the facet_wrap argument to split the plot into sections for the two levels of the third IV, grade.
Several approaches to the standard bar graph are available. First, use the bargraph.CI function from the sciplot package. The key to producing side by side panels is to use the screen function.
Show/Hide Code
# bar graphs +/- std error barssplit.screen(figs=c(1,2))
With ggplot2 we have to produce an object that summarizes the data frame first - extracting means, std errors, and CI’s. I did this with an Rmisc function.
Show/Hide Code
# first, summarize the data set to produce a new data frame that ggplot can use# just repeating what was done abovemyData <- Rmisc::summarySE(data=bg.3way,measurevar="numrecall", groupvars=c("feedback", "wordtype", "grade"))# look at the new data frame that contains the summary statisticsmyData
Now, use the myData object to draw the base plot (p1), and then add style control elements.
Show/Hide Code
#library(ggplot2)#library(ggthemes)# Now create the Default bar plotp1 <-ggplot(myData, aes(x=feedback, y=numrecall, fill=wordtype)) +geom_bar(stat="identity", color="black", position=position_dodge()) +geom_errorbar(aes(ymin=numrecall-se, ymax=numrecall+se), width=.2,position=position_dodge(.9)) +facet_wrap(~grade,ncol=1)p2 <- p1 +labs(title="Words Recalled +/- SEM", x="Feedback Group", y ="Mean Number Recalled")+theme_bw() +theme(panel.grid.major.x =element_blank(),panel.grid.major.y =element_blank(),panel.grid.minor.x =element_blank(),panel.grid.minor.y =element_blank(),panel.background =element_blank(),axis.line.y =element_line(colour="black", linewidth=.7),axis.line.x =element_line(colour="black", linewidth=.7),plot.title =element_text(hjust=.5) ) +scale_fill_manual(values=c('honeydew3','honeydew2', 'honeydew1')) print(p2)
The textbook-derived data set used for the initial examples in this document has woefully inadequate sample sizes to be a good data set for illustrating several of these graphical approaches, including the boxplots and kernel density plots. The same is true of the favored “raincloud” plot, but the code is provided as a template anyway.
The model syntax shown above, naming each effect in the model, is inefficient when we want a full factorial analysis. It is more efficient to use the asterisk operator which tells aov to create all higher order terms from the main effects listed.
5.1.1 a different way to obtain tables of means and std errors
model.tables is an efficient function, but care must be taken in unbalanced designs to be certain of whether it produces weighted or unweighted marginal means (I believe that it gives weighted marginal means). In our example this doesn’t matter since sample sizes are balanced.
It is worth recalling that since aov is a wrapper for lm, we can we refit the model, using the lm function. This produces the same anova summary table as when we used aov above, so the residuals are the same. This approach will become useful when we examine analysis of contrasts, in unequal N designs.
Show/Hide Code
# do the 3 way anova with the lm functionfit.1lm <-lm(numrecall~feedback*wordtype*grade,data=bg.3way)#summary(fit.1lm)anova(fit.1lm)
An alternative that provides some advantages is the suite of ANOVA tools from afex. aov_car gives type III SS and tests (by default), plus the ges effect size statistic.
Note that the “grade” factor is specified as an “observed” variable. This impacts the method of calculating the generalized effect sizes.
The anova_stats function is a convenient way to obtain a multiplicity of effect sizes on the omnibus effects. However the downside of using it is that it will not work on an afex object. I show it here with the original aovfit. The issue (downside) is that the aov fit is likely based on Type I SS. Below this section, the effectsize package is use on the afex fit which employed Type III SS. In the equal N situation with the current data set there is no distinction, but for unbalanced designs, the effectsize/afex approach may be better.
Show/Hide Code
# the gt function permits nicer formatting of the table# anova_stats(fit_base.aov)gt(anova_stats(fit_base.aov)[1:7,1:6],rownames_to_stub = T)
etasq
partial.etasq
omegasq
partial.omegasq
epsilonsq
cohens.f
feedback
0.092
0.165
0.079
0.120
0.079
0.444
wordtype
0.222
0.323
0.208
0.264
0.209
0.691
grade
0.049
0.096
0.043
0.069
0.043
0.325
feedback:wordtype
0.050
0.097
0.024
0.040
0.024
0.328
feedback:grade
0.029
0.059
0.016
0.028
0.017
0.251
wordtype:grade
0.056
0.106
0.042
0.068
0.043
0.345
feedback:wordtype:grade
0.034
0.068
0.008
0.014
0.008
0.271
Using the effectsize package, we can compute eta squareds (and partial eta squareds) using the afex fit with type III SS. Another nice feature of the effectsize functions is that confidence intervals are provided for the effect size estimates.
# or other effect sizes#gt::gt(effectsize::cohens_f(fit_base.afex, partial=TRUE, ci=.95, alternative="two"))#gt::gt(effectsize::omega_squared(fit_base.afex, partial=TRUE, ci=.95, alternative="two"))
7 Evaluate Assumptions
The normality and homoscedasticity/homogeneity assumptions are evaluated in the standard manner we covered for other linear models and ANOVA objects.
7.1 First, produce the diagnostic plots for linear models
The standard two plots for evaluating homoscedasticity and normality of the residuals are available in the same manner as for other linear model objects that we have seen previously.
Show/Hide Code
plot(fit_base.aov,which=1)
Show/Hide Code
plot(fit_base.aov,which=2)
The qqPlot function from the car package produces a nice normal QQ plot.
Show/Hide Code
car::qqPlot(fit_base.aov$residuals)
[1] 66 55
There is clearly some non-normality in the distribution of these residuals so let’s examine a frequency histogram with a kernel density overlaid.
Or use the explore function from the bcdstats package on the aov fit object from above.
Show/Hide Code
bcdstats::explore(residuals(fit_base.aov),varname="Residuals from Omnibus Model")
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 90 0 1.24 0.1 0.01 1.63 -2.2 2.2 4.4 -0.12 -1.2 0.13
7.2 Inferential tests of the Normality and Homogeneity of Variance assumptions
We can see some non-normality of the residuals from the graphical displays, so inferential tests might be in order. The Anderson-Darling test rejects the null hypothesis that residuals are normally distributed.
Show/Hide Code
nortest::ad.test(residuals(fit_base.aov))
Anderson-Darling normality test
data: residuals(fit_base.aov)
A = 1.5133, p-value = 0.0006257
A Shapiro-Wilk test reaches the same conclusion.
Show/Hide Code
shapiro.test(residuals(fit_base.aov))
Shapiro-Wilk normality test
data: residuals(fit_base.aov)
W = 0.94663, p-value = 0.001042
The Levene test for homogeneity of variance is obtained using the function from the car package. Very little variation in cell variances/stdevs was seen and this test was not significant.
Show/Hide Code
car::leveneTest(fit_base.aov)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 17 0.0897 1
72
The Breusch Pagan test of homoscedasticity leads to a similar outcome, as does the non-constant variance test.
Show/Hide Code
lmtest::bptest(fit_base.aov)
studentized Breusch-Pagan test
data: fit_base.aov
BP = 4.1174, df = 17, p-value = 0.9994
Show/Hide Code
car::ncvTest(fit.1lm)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 0.008655078, Df = 1, p = 0.92588
Since the normality assumption appears to have been violated, this data set should be submitted to alternative approaches. DV scale transformations, bootstrapping, other robust analyses or Bayesian inference might be considered. These alternatives are not yet incorporated into this tutorial.
8 Orthogonal Contrasts on the Omnibus Effects
We will obtain partitioning of omnibus effects into their single df contrasts here and later use the emmeans and phia packages both for contrasts and simple effects.
8.1 Work with the split argument and the summary.lm function to obtain contrasts on omnibus effects.
The Feedback and Wordtype factors both have three levels and are now decomposed into a pair of orthogonal contrasts each. For Feedback, the first contrast compares the negative condition to the average of the other two and the second contrasts compares none and positive. For realistic applications of these methods, the choice of explicit contrasts should be made on * a priori* grounds. For this tutorial, with a textbook example, the choice was based on the fact that for feedback, group 3 was the control condition, so a 1-, -1, 2 contrast seemed appropriate. and the second one is the orthogonal one to that contrast.
For the Wordtype variable, the first contrast compares LF_LE to the other two, and the second compares HF-LE to HF_HE. I chose this set somewhat arbitrarily since we don’t have explicit information on theory that would drive contrasts for this factor in this textbook illustration.
It is worth recalling that the two-level factor, “grade” can already be conceptualized as a contrast since it has only 1 df. However, the default coding for “grade” is still in place and that is dummy coding.
Show/Hide Code
contrasts(bg.3way$grade)
twelfth
fifth 0
twelfth 1
It is best to change that to “sum to zero” coding as well so that interpretation of regression coefficients and effects is straight forward.
The initial application of this approach includes partitioning of both the Feedback and Wordtype factors. Grade is only two levels, so requires no further decomposition. Notice that summary now produces all eleven single df contrasts among the twelve cells, breaking down main effects and interactions involving Feedback and Wordtype.
If we only pass one variable at a time to the split function, the resulting ANOVA table yields interaction comparisons instead of interaction contrasts.
Please recall from work in the 2-way design and the oneway tutorial that use of this split function, while fairly simple, produces Type I SS decompositions when sample sizes are unequal. This may not be desirable, so other approaches are necessary (such as summary.lm). Others are outlined below using the emmeans or phia packages.
When we apply the summary.lm function to an aov object, the resultant table is a table of the regression coefficients for the model. The t-tests here are tantamount to tests of Type III SS and may be more desirable when sample sizes are unequal. In our example, the squares of these t values equal the F tests from the summary tables above where split was use. But that was ONLY because sample sizes were equal in this example.
9 Follow up analyses to evaluate contrasts and simple effects
Analysts have a pair of major choices to make about following up the omnibus effects outlined above. One choice is whether to follow up the omnibus analysis with pairwise types of comparisons using a function such as pairwise.t.test (or pairs in emmeans) along with alpha rate adjustment or with post hoc multiple comparison tests versus a followup method that employs contrasts (perhaps orthogonal). A second choice is whether to obtain simple effects and contrast effects with ghlt, emmeans, or phia. This document demonstrates the contrast approach using the latter two packages (ghlt may be added on later). Some multiple comparison approaches are integrated into the demonstration of the use of emmeans.
Arbitrarily, the next section demonstrates use of phia first and then emmeans is used in the following section. “phia” stands for “Post Hoc Interaction Analysis”, a label that I find unfortunate. I have argued that a strength of employing contrast analysis is that it should emanate from a priori hypotheses which then inform the values of the contrast coefficients. Nonetheless, phia, and its testInteractions function is a very powerful tool for evaluating simple effects and their contrasts, main effect contrasts, and interaction contrasts. That is the primary usage here.
For uses of both phia and emmeans (in the later section) it is required to have already performed the omnibus full factorial ANOVA. For these omnibus analyses, it is best to have had in place either “effect/deviation” coding or orthogonal contrast coding. But the use of these two packages can be seen as adding contrast and simple effect analyses on to the omnibus analysis. Full orthogonal sets are not even necessary since the contrasts obtained are not being used to create the full omnibus model which might have already been generated using deviation coding schemes.
Two different motivations exist for using either phia or emmeans in order to evaluate contrasts and simple effects. The first is that they produce inferential tests based on Type III SS decompositions. The second is that they are the only way I know to obtain simple effects, aside from manual computations.
In the particular data set we are working with here, the only higher order effect that was significant was the Wordtype by Grade interaction. This means that we would typically only follow up that effect by looking at simple main effects of one of those factors at levels of the other, as well as their interaction contrasts. We would also follow up with contrasts on the Feedback factor since its main effect was significant and no interactions with Feedback were significant. Of course this logic is a prime example of traditional thinking about NHST methods and use of the word significance which are both under scrutiny now.
However, the approach taken here is designed to demonstrate how the whole suite of effects can be obtained.
10 Use of the phia package for simple effects and contrasts
Initially this document uses the phia package for these evaluations and makes extensive use of the testInteractions function from that package. The methods by “holm”, and “Hochberg” are possible for p value adjustments with contrasts. emmeans is demonstrated in the next section.
It is worth repeating display of the full set of means in a bar graph so that some of these effects can be visualized.
Show/Hide Code
# #| code-summary: "Show/Hide Code"p2
10.1 Simple two way interactions
Simple two way interactions can also be obtained (three sets). We don’t expect to be interested in any of these since the three way interaction was not significant, but we might have had an a-priori hypothesis about one or more of them, so illustration is included for that possibility and for completeness of this template document.
The information in the tables produced by the testInteractions function is not organized/labeled in a user-friendly manner. Each of the rows of the tables below represent tests of the simple two-way at that level of the factor in which the effects are examined (e.g.,feedback by grade at levels of grade in the first illustration).
The general strategy in the testInteractions function is to use an argument called “fixed” to specify the variable at which effects of other factors are to be examined in simple effects. The “across” argument specifies a variable for which the effect is being requested.
Note that the combined effects of two variables are requested by putting both of them in the “across” argument - the variable @ which these effects are examined is designated by the “fixed” argument.
Reading the table requires some understanding of the idiosyncracies of testInteractions formatting. There is one large table (three rows, including a row for residuals) and it may have wrapped into three sections in some Quarto renderings. The columns labeled (for example) feedback1:wordtype1 are indeed, interactions as the colon implies. The numeric appendage (e.g., feedback1) means a contrast. The numeric values (e.g., 6, 3.2, etc) are “estimates” of the underlying interaction contrast. But since we are not examining contrasts here we can ignore these columns. They are used to compile the F test of each of the simple two way interactions.
The F tests are tests of the two different simple two way interactions at the two levels of grade. At least the question of level of grade is clear in the table - rows. Note that the numerator of the two F tests both have 4 df, as they should since the effects are interactions of wordtype and feedback, both 3 level factors. Also note that the df and SS for the Residual term matches that for the omnibus analysis - each of these simple two-way interactions is tested by the omnibus MSwg term as expected for an appropriate analysis rather than fully separating the analysis into two parts.
To be clear, the F value of 3.0882 tests the simple 2 way interaction of feedback and wordtype at fifth graders. And the F value of .1765 tests the simple two way interaction of feedback and wordtype at twelfth graders.
Show/Hide Code
# feedback*wordtype at levels of gradetestInteractions(fit.1aov,fixed=c("grade"), across=c("feedback", "wordtype"),adjust="none")
Looking back at the graph, one can see that in fifth graders, the two lower means in the Positive and Negative HF_HE groups is what is driving the simple two-way there. That pattern is not present in the twelfth graders, although the pattern exists, just much smaller in magnitude.
The analyst would likely never do all of the three possible sets of simple two-way interactions. One one should suffice and it would probably be the one where the two primary IVs were interacted. In our example, the first one fits the way the ggplot bar graph is drawn so that is probably what the choice would be. The reader can see the patterns that generate the two simple two-way interactions in the upper and lower facets of that ggplot bar graph. Note that the test of the simple two-way of wordtype and feedback at fifth grade is significant by traditional NHST methodologies (p <.05), but the other simple two-way is not. This distinction cannot be taken as evidence that the three-way interaction is present - in fact that was tested and found N.S. This apparently illogical outcome is not uncommon with NHST in factorial designs.
The other two possible simple two way interactions are shown here for the sake of illustration of testInteractions usage.
A second possible set of simple two way interactions is examined next, wordtype by grade at levels of feedback. The three F tests are the tests of the three different simple two way interactions of wordtype and grade at the three levels of feedback (rows). Also note that their df of 2 is what is expected.
Show/Hide Code
# wordtype*grade at levels of feedbacktestInteractions(fit.1aov,fixed=c("feedback"), across=c("wordtype","grade"),adjust="none")
These are effects of one factor at combined levels of two other factors.
First examine the so-called simple, simple main effects (three different sets). These effects would be assessed after finding significant simple two way interactions. Notice that the testInteractions function accepts an argument for p value adjustment (e.g., bonferroni/holm/fdr), but that I’ve set it to “none” in these illustrations.
The general strategy in the testInteractions function is to use an argument called “fixed” to specify the variable at which effects of other factors are to be examined in simple effects. The “across” argument specifies a variable for which the effect is being requested.
The first set here examines effects of wordtype at combinations of the other two factor levels and is one of the sets of simple simple main effects most visible from the ggplot bar graph above.
Here, and for later tables, “values” of effects are differentiated by the contrasts associated with the factors although those are aggregated to obtain the simple effects - this is confirmed since many of them have multiple df. We are ignoring those “effect” values here since we have not addressed the contrasts in place at the time the aov fit was done (default is dummy coding but we changed it to orthogonal contrast coding at the beginning of this section). The exact choice of contrast set doesn’t influence the 2 df simple simple main effects here.
Notice that each of these simple simple main effects in this first set have 2 df (three means compared, so 2 df). Later we will break the effects down into contrasts. Compare the significance test results to the ggplot bar graph. The only significant simple simple main effects in this table are the wordtype comparisons in the upper panel (fifth grade) for the negative and positive feedback conditions. This pattern of significance fits what the eye sees in the graph.
Show/Hide Code
# 1, effects of wordtype @ combinations of feedback and gradetestInteractions(fit.1aov,fixed=c("feedback","grade"), across="wordtype",adjust="none")
We could also look at other sets of simple simple main effects but would probably only look at one (such as the immediately preceding one), followed up by contrasts (later in this doc).
Show/Hide Code
# 2, effects of grade @ combinations of feedback and wordtypetestInteractions(fit.1aov,fixed=c("feedback","wordtype"), across="grade",adjust="none")
In studies where the 3way interaction is not significant, the traditional approach would advise the analyst to progress with examination the omnibus 2way interactions. If any are significant then they would partly be characterized by examining simple main effects (there are six possible sets here, two each for each of the three possible 2way interactions).
It is not possible to interpret such simple main effects without reference to the tables of collapsed/marginal means such as those produce above following the initial aov fit OR by examining the redrawn graph that depicts 2way layouts, collapsed on one of the three factors.
Such a redrawn graph is presented here, to depict the two way layout of wordtype and feedback (a 3x3 structure), since the first of the six sets of SME shown here examines the effect of wordtype at levels of feedback. If this were actual data analysis of a true data set (rather than a textbook/artifical one), then we would only look to follow up the wordtype by grade two way interaction since it was the only one of the three that reached traditional significance levels. But this first graphical illustration permits seeing the more complex 3x3 arrangement, even though the wordtype by feedback interaction was NS. The reader should recognize that the effects of wordtype (seen in the plot below) appear to depend on level of feedback.
From this depiction, we might expect to find an effect of wordtype only in the negative and positive feedback conditions. In reality the 2-way interaction tests whether the wordtype effect is different in the three levels of feedback. It was the general impression that the HF_HE group is lower only in the positive and negative feedback levels. Nontheless while this impression is clear, the difference was not significant with the wordtype by feedback interaction was tested. In this instance the F test result does not match up with the eyeball test - but that is why a test is done. We won’t dwell on this seeming paradox since the point of this document is more oriented to providing a template than fully to evaluate a textbook data set (remember, low n means low power).
Show/Hide Code
# first, summarize the data set to produce a new data frame that ggplot can usemyData2.wg <- Rmisc::summarySE(data=bg.3way,measurevar="numrecall", groupvars=c("feedback", "wordtype"))# look at the new data frame that contains the summary statisticsmyData2.wg
Show/Hide Code
#library(ggplot2)#library(ggthemes)# Now create the Default bar plotp3 <-ggplot(myData2.wg, aes(x=feedback, y=numrecall, fill=wordtype)) +geom_bar(stat="identity", color="black", position=position_dodge()) +geom_errorbar(aes(ymin=numrecall-se, ymax=numrecall+se), width=.2,position=position_dodge(.9)) p4 <- p3 +labs(title="Words Recalled +/- SEM", x="Feedback", y ="Mean Number Recalled")+theme_bw() +theme(panel.grid.major.x =element_blank(),panel.grid.major.y =element_blank(),panel.grid.minor.x =element_blank(),panel.grid.minor.y =element_blank(),panel.background =element_blank(),axis.line.y =element_line(colour="black", linewidth=.7),axis.line.x =element_line(colour="black", linewidth=.7),plot.title =element_text(hjust=.5) ) +scale_fill_manual(values=c('honeydew3','honeydew2', 'honeydew1')) p4
The following sequence of tests of six sets of simple main effects is presented only to demonstrate how to generate each of them. We would probably only examine number 3 or 4.
In the first analysis we find that in the both the negative feedback condition the effect of feedback significant, as the graph seems to suggest. But note that most of the difference is due to the position of the HF_HE group. We would pursue that pattern with examination of simple main effect contrasts.
Show/Hide Code
# 1-2 effects of wordtype at levels of feedback, and then feedback at levels of wordtype# note these are collapsed on grade testInteractions(fit.1aov,fixed="feedback", across="wordtype",adjust="none")
Next we will begin to examine the origins of omnibus wordtype by grade interaction that was significant. In order to interpret the sets of simples possible in this 3x2 grid of means, we should visualize the outcome.
Show/Hide Code
# first, summarize the data set to produce a new data frame that ggplot can usemyData3.wg <- Rmisc::summarySE(data=bg.3way,measurevar="numrecall", groupvars=c("wordtype", "grade"))# look at the new data frame that contains the summary statisticsmyData3.wg
Show/Hide Code
#library(ggplot2)#library(ggthemes)# Now create the Default bar plotp5 <-ggplot(myData3.wg, aes(x=wordtype, y=numrecall, fill=grade)) +geom_bar(stat="identity", color="black", position=position_dodge()) +geom_errorbar(aes(ymin=numrecall-se, ymax=numrecall+se), width=.2,position=position_dodge(.9)) p6 <- p5 +labs(title="Words Recalled +/- SEM", x="WordType", y ="Mean Number Recalled")+theme_bw() +theme(panel.grid.major.x =element_blank(),panel.grid.major.y =element_blank(),panel.grid.minor.x =element_blank(),panel.grid.minor.y =element_blank(),panel.background =element_blank(),axis.line.y =element_line(colour="black", linewidth=.7),axis.line.x =element_line(colour="black", linewidth=.7),plot.title =element_text(hjust=.5) ) +scale_fill_manual(values=c('honeydew3','honeydew2', 'honeydew1')) p6
In the first of the testInteractions analyses in this next code chunk (#3), the set of simple main effects of wordtype at levels of grade is produced. From the graph we may expect wordtype to be significant in the 12th graders, but perhaps not in the 5th graders. This is exactly what happened, as seen in the first table. In the second testInteractions output here (#4) the effect of grade is significant only in the HF_HE condition, as the visual inspection would have suggested.
Show/Hide Code
# 3-4 effects of wordtype at levels of grade, and then grade at levels of feedback# note these are collapsed on feedback, respectivelytestInteractions(fit.1aov,fixed="grade", across="wordtype",adjust="none")
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
LF_LE -0.2 0.502 1 0.3 0.1588 0.6914214
HF_LE -0.2 0.502 1 0.3 0.1588 0.6914214
HF_HE -2.0 0.502 1 30.0 15.8824 0.0001597 ***
Residuals 72.000 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The third omnibus 2way can also be followed up with either of its set of simple main effects. This is presented just for the sake of completeness. A redrawn plot of feedback by grade is not provided and the two-way interaction of feedback by grade that would have prompted this analysis was not significant in the omnibus analysis.
Show/Hide Code
# 5-6 effects of feedback at levels of grade, and then grade at levels of feedback# note these are collapsed on wordtypetestInteractions(fit.1aov,fixed="grade", across="feedback",adjust="none")
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
None 0.06667 0.502 1 0.0333 0.0176 0.894689
Positive -1.13333 0.502 1 9.6333 5.1000 0.026957 *
Negative -1.33333 0.502 1 13.3333 7.0588 0.009708 **
Residuals 72.000 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
10.4 Contrast Analyses in factorial designs using phia
Initially, we will re-create the custom contrasts for the Wordtype and Feedback factors since they each have more than two levels. Note that these are the same contrasts employed above in the omnibus analyses, but phia requires them to be created in a slightly different manner for usage in the testInteraction function. The orthogonal sets employed here were arbitrarily chosen. I am not certain why phia needs these contrasts established as lists.
Show/Hide Code
# first for the feedback factor:fbc1 <-list(feedback=c(-.5, -.5, 1)) # same as defined above, except using fractionsfbc2 <-list(feedback=c(-1, 1, 0)) # same as defined above, except using fractionsfbc1
$feedback
[1] -0.5 -0.5 1.0
Show/Hide Code
fbc2
$feedback
[1] -1 1 0
Show/Hide Code
# now for the wordtype factorwtypec1 <-list(wordtype=c(1, -.5, -.5)) # same as defined above, except using fractionswtypec2 <-list(wordtype=c(0, -1, 1)) # same as defined above, except using fractionswtypec1
$wordtype
[1] 1.0 -0.5 -0.5
Show/Hide Code
wtypec2
$wordtype
[1] 0 -1 1
10.4.1 Simple Effect Contrasts and Simple Interaction Contrasts
We already obtained 2 and 3-way interaction contrasts above with the summary/split approach and by using summary.lm to obtain Type III SS equivalents. Here, we obtain simple interaction contrasts and simple effect contrasts.
In this code, notice that testInteractions takes a “custom” argument and that is where we can specify the contrast to be used.
10.4.1.1 Simple 2 way interaction contrasts
First we will partition the simple two-way two way interactions into contrasts. This is shown here largely as a template since the simple 2-way interactions and their contrasts would typically not be evaluated in this data set since the 3-way interaction was not significant by NHST standards. An exception would be if some of the contrasts were a priori
Note that the labeling in the table is odd. The twotestInteraction functions produce the two tables. The first examines the effect involving the first contrast on feedback (called fbc1 in the custom argument), and the second examines the second contrast on feedback (called fbc1). Examining the first of the two tables, each row represents evaluation of a simple two way interaction contrast of the Feedback (first contrast) and Grade at each of the three levels of Wordtype.
The use of the colon symbol does NOT imply an interaction between those two effects as is more typical in R ANOVA/lm notation. In addition, the labeling of the Feedback effect as “feedback1” doesn’t refer to the exact contrast that we called fbc1. It is just a generic label for the fact that a contrast on feedback was specified. So in the first table, the first F value of .0794 represents a test of fbc1 by grade AT LF_LE. The second, is the same effect, but AT HF_LE and so forth. In the second table, the first row evaluates the fbc2 by grade interaction AT LF_LE, with the F value of .0265 EVEN THOUGH the table still uses the “feedback1” label. The confusion can arise because the tables do not employ the fbc1 and fbc2 labels that we created. In each table, there are three tests because there are three levels of Wordtype and the effect is either fbc1 by grade (table 1) or fbc2 by grade (table 2). This pattern plays out in subsequent implementations as well. So, while testInteractions is very powerful, the organization of the output and its labels leaves something to be desired.
Show/Hide Code
# first, an example of simple 2-way interaction contrasts of feedback with grade at levels of wordtypetestInteractions(fit.1aov, fixed="wordtype", custom=fbc1, adjustment="none", across="grade")
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
LF_LE : feedback1 -0.3 1.065 1 0.15 0.0794 0.7789
HF_LE : feedback1 -0.3 1.065 1 0.15 0.0794 0.7789
HF_HE : feedback1 -1.8 1.065 1 5.40 2.8588 0.0952 .
Residuals 72.000 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
LF_LE : feedback1 -0.2 1.229 1 0.05 0.0265 0.87121
HF_LE : feedback1 -0.2 1.229 1 0.05 0.0265 0.87121
HF_HE : feedback1 -3.2 1.229 1 12.80 6.7765 0.01121 *
Residuals 72.000 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When both factors in a simple 2way interaction have contrasts then the code specification is slightly more complex. With the interaction of wordtype and feedback at levels of grade, grade is specified as “fixed”. The “across” argument is not used. Rather, both contrasts are specified simultaneously in the “custom” argument and `testInteractions understands that the interaction of those two contrasts is to be examined.
Note that the output uses the colon symbol again, but it doesn’t mean interaction. The interaction here is between fbc1 and wtypec1 (thus an interaction contrast) at fifth grade.
Show/Hide Code
# simple 2-2way interaction contrast of wordtype and grade at levels of feedback# one requested at a time here, both of the first contrasts (fbc1 and wtypec1) are requested.testInteractions(fit.1aov, fixed="grade", custom=c(fbc1,wtypec1), adjustment="none")
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
fifth : feedback1 : wordtype1 1.50 0.922 1 5.00 2.6471 0.1081
twelfth : feedback1 : wordtype1 0.75 0.922 1 1.25 0.6618 0.4186
Residuals 72.000 136
Code for the other six contrasts available in this set is shown here, but results suppressed to save space. If one were to run these additional chunks, the same “feedback1” and “wordtype1” labels would appear, but meaning either fbc1/fbc2 or wtypec1/wtypec2, respectively.
Sometimes we prefer to examine simple interaction comparisons rather than simple interaction contrasts. That kind of effect is exemplified with this type of code where a contrast is specified for one factor (wordtype here) and the other factor is not broken into contrasts (feedback here, specified in the “across” argument). These will be a 2 df interaction contrasts: either wordtype contrast 1 by feedback at levels of grade or wordtype contrast2 by feedback at levels of grade.
Once again the fifth: and twelfth: notation just specify the levels of grade AT which the simple interaction comparisons are located.
Show/Hide Code
# now simple 2-way interaction comparisons of wordtype contrasts and feedback at levels of grade# note that these effects have 2 dftestInteractions(fit.1aov, fixed="grade", custom=wtypec1, adjustment="none", across="feedback")
10.4.1.2 Simple and simple-simple main effect contrasts
Contrasts breaking down the simple-simple or simple main effects are also available.
Initially, we can examine simple simple main effect contrasts of feedback at the combined levels of wordtype and grade.
It is helpful to reexamine the plot of the data of all cell means to visualize these simple simple main effects.
As we saw above, care is required in reading the table, which does not label the different contrasts. The first table is for the fbc1 contrast and the second table for the fbc2 contrast, each at the six combined levels of Grade and Wordtype in this first illustration.
Once again, the “feedback1” label in the tables is misleading and not necessary. We have to discern which contrast on feedback is being tested by knowing which code line produced which table.
Show/Hide Code
# Now, how about simple SME contrasts for the feedback factor at # combined levels of grade and wordtypetestInteractions(fit.1aov, fixed=c("wordtype", "grade"), custom=fbc1, adjustment="none")
Next, we evaluate Simple Main effect contrasts of Feedback at levels of Wordtype, collapsed on Grade.
It would be helpful to repeat the plot of the nine means, collapsed on grade.
These next three code chunks also produce the two contrast tests with and without a “holm” adjustment.
The first code chunk is thus the test of feedback contrast 1 at levels of wordtype, with no p value adjustment. Once again, the “feedback1” label can be ignored. As an interpretation example, consider the third F test with a value of 7.94. This effect is feedback contrast 1 in the HF_HE group (collapsed on grade)
Show/Hide Code
# or SME contrasts at levels of wordtype (collapsed on grade)testInteractions(fit.1aov, fixed=c("wordtype"), custom=fbc1, adjustment="none")
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
LF_LE : feedback1 0.05 0.532 1 0.0167 0.0088 0.925423
HF_LE : feedback1 -0.65 0.532 1 2.8167 1.4912 0.226018
HF_HE : feedback1 -1.50 0.532 1 15.0000 7.9412 0.006234 **
Residuals 72.000 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The next chunk does the same analysis with a “holm” adjustment.
Show/Hide Code
# note that adjustment for Type I error inflation can be done. The "adjustment"# argument uses the methods available in p.adjusttestInteractions(fit.1aov, fixed=c("wordtype"), custom=fbc1, adjustment="holm")
F Test:
P-value adjustment method: holm
Value SE Df Sum of Sq F Pr(>F)
LF_LE : feedback1 0.05 0.532 1 0.0167 0.0088 0.9254
HF_LE : feedback1 -0.65 0.532 1 2.8167 1.4912 0.4520
HF_HE : feedback1 -1.50 0.532 1 15.0000 7.9412 0.0187 *
Residuals 72.000 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And the same approach is used for the second feedback contrast.
F Test:
P-value adjustment method: holm
Value SE Df Sum of Sq F Pr(>F)
LF_LE : feedback1 -0.5 0.615 1 1.25 0.6618 0.517053
HF_LE : feedback1 -0.7 0.615 1 2.45 1.2971 0.517053
HF_HE : feedback1 -2.0 0.615 1 20.00 10.5882 0.005204 **
Residuals 72.000 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There are many other simple main effect contrasts that could be exmained (following up on the six sets of possible simple main effects described above). The important to examine in the context of this data set would be contrasts on wordtype at levels of grade, since the omnibus wordtype by grade interaction was significant. In order to shorten this doc, only one of those SME contrasts is shown here, wordtype contrast2 @ fifth grade.
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
fifth : feedback1 -1.66667 0.502 1 20.8333 11.0294 0.001411 **
twelfth : feedback1 -0.46667 0.502 1 1.6333 0.8647 0.355532
Residuals 72.000 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
10.4.2 3way interaction contrasts with phia
Next, we will examine 3-way interaction contrasts. This shows that phia can recreate what we did above with the summary/split approach and with the summary.lm function. Those two match because of the equal sample sizes in this example data set. If the data set were unbalanced, the tests here, from testInteractions would match the squared t-values from tests obtained in the summary.lm output.
Here, two of the three factors have contrasts available and those specifications are put in the “custom” argument. The third factor is only a 2-level factor (already a contrasts), and can just be specified in the “across” argument.
The first of these is thus fbc1 x wtypec1 x grade, etc….
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
feedback1 : wordtype1 -3 1.738 1 5.625 2.9779 0.0887 .
Residuals 72.000 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
10.4.3 Two-way interaction contrasts
By leaving the “across=grade” argument out of the above code chunk, we can obtain the two-way interaction contrasts of Feedback an Wordtype (collapsed on Grade).
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
feedback1 : wordtype1 -1.3 0.869 1 4.225 2.2368 0.1391
Residuals 72.000 136
Finally, in this data set, the most interesting interaction effect to follow up would have been the Wordtype By Grade interaction which was the only significant one. Simple main effects of Wordtype at Levels of Grade (and the contrasts of those SME) were obtained above. Here, the interaction contrasts were obtained to decompose this two way interaction.
Notice that the sum of the SS for the two effects equals the SS for the Wordtype by Grade interaction seen in the initial ANOVA summary table, as it should for an orthogonal partitioning.
In this pair of analyses, only the second of the two contrasts is significant and this should not be surprising. Most of what was varying in the wordtype by grade array of means was the position of the HF_HE group. So, it makes sense that the second contrast which compares HF_LE to HF_HE interacts with grade since the difference is larger in the 12th graders.
F Test:
P-value adjustment method: none
Value SE Df Sum of Sq F Pr(>F)
wordtype1 -1.8 0.71 1 12.15 6.4324 0.01338 *
Residuals 72.00 136
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
10.4.4 A note on orthogonal contrasts
The reader should understand that use of the testInteractions function does not require that contrasts submitted to it be orthogonal. Each contrast was submitted one at a time and not used in calculating any other full model. The orthogonality approach taken here was simply recognition that there are some advantages of thinking about contrast sets as orthogonal - from the a prior point of view.
10.5 Conclusion about use of phia and testInteractions
The testInteractions function is very powerful, permitting all simple and contrast effects to be specified (we did not examine main effect contrasts here, but they can be done as was outlined in the 2 factor ANOVA tutorial document). I am very positive about the relative ease of use of the arguments in testInteractions. However, the layout and labeling in the tables produced by the function are difficult. It was only with careful comparison to other known analyses that I was able to be certain what the testInteraction function was testing, at times. With more regular use, the style of the table layout becomes easier to understand, but for the novice user the labeling (e.g., :feedback1”) can be misleading. If it were not for this output formatting difficulty, I would be strongly recommending use of the phia functions for this type of analysis. But emmeans may produce output that is more easily interpreted for most users.
Finally, I am unaware of a direct wa of producing effect sizes on effects or contrasts produced by testInteractions. However, since testInteractions does provide SS for effects, manual computation of eta or partial eta squareds can begin with those SS values.
11 Use of emmeans for contrasts and simple effects
The tools of the emmeans package can produce all of the same analyses as those just accomplished above with phia. Some types of effects are slightly easier to obtain from emmeans, but others are more difficult (e.g., interaction contrasts). Either package provides strong tools for these follow up analyses.
This emmeans section does not perform the detailed examination of nearly all possible contrasts as was done in the phia section. Instead a few examples are used in order to provide templates and the choice of which effects to examine is partly driven by patterns in the data set. Once again, the goal here is the provision of templates for use in other analyses, not the “best” possible evaluation of this textbook data set.
11.1 Simple two-way interactions.
The logic and introductory wording here matches that in the phia section above:
Simple two way interactions are available (three sets). We don’t expect to be interested in any of these since the three way interaction was not significant, but we might have had an a-priori hypothesis about one or more of them, so illustration is included for completeness.
wordtype by grade @ levels of feedback
wordtype by feedback @ levels of grade
feedback by grade @ levels of wordtype
Even if there were a significant 3=way interaction, we would probably only have been interested in one of these three sets of simple two-ways. It would probably have been the second one since wordtype and feedback are the two primary manipulated IVs.
Either the aov or anova_car fit objects will work with emmeans. We will use the anova_car object here. First, extract the grid of means and other descriptive info, for all of the cells.
In order to test the simple two way interactions, a direct and isolated approach is not possible. Instead, we request what amounts to examination of three kinds of effects at the levels of feedback. This code provides the simple main effects of both wordtype and grade as well as their interaction (and here the interaction term uses the colon symbol), all AT levels of feedback, using the omnibus error term. Thus the simple two-way interactions that we set out to obtain are the third effect tested in each of these three sets of tables/analyses. Note that the second and third of them have identical F and p values - this is correct output, but suggests that the data set may have been fabricated for use in the Keppel textbook.
Show/Hide Code
# first, examine wordtype*grade at levels of feedbackjoint_tests(wgf.emm, by="feedback")
Now the other two sets of simple two-ways are obtained.
Show/Hide Code
# second wordtype*feedback at levels of grade(joint_tests(wgf.emm, by="grade"))
Show/Hide Code
# second grade*feedback at levels of wordtypejoint_tests(wgf.emm, by="wordtype")
One thing that I do not like about using emmeans functions for these types of followup analyses is that the tabled results do not provide SS and MS for the effects. It is useful to doublecheck software for its accuracy by verifying that things work out as they should. For example the SS for the set of grade*feedback simple two ways at levels of wordtype should sum to the pooling of the omnibus two way of grade by feedback and the three way of grade by feedback by wordtype. We could work backwards from the F values (using the MSerror from above) to find these values, but that added work is not completed here. Initial code to show how to manually find a SS from the F value is here:
Show/Hide Code
# e.g. for grade by feedback at HF_HE # the F value was 4.818# df for this simple two way is 2# MSerror of 1.889 from the omnibus analyses abovedf_effect <-2MSeffect <-1.889*4.818SSeffect <- MSeffect*df_effectSSeffect
[1] 18.2024
Doing this with the other two would provide the value that when summed should equal the pooled SS described above.
11.2 Simple Simple main effects
Since the 3way interaction was not significant, the analyst would probably not go on to examine simple simple main effects. Nonetheless one set is outlined here for the sake of demonstrating the emmeans capability.
Working from the full grid of means from all cells (the wgf.emm grid), we can specify the oneway type of question that simple simple main effects ask. The variable to be examined is wordtype - at the different levels of feedback and grade combined.
A repetition of the cell means graph here can help visualization of where the question is asked. With this plot it is reinforced that the 2df effect of wordtype can be examined in six places (the combinations of feedback and grade)
The briefest way to request the simple simple main effects is to use the joint_test function on the 3way grid of means. Note that the focal variable is not directly named in an argument. Rather, it is implied because the wgf.emm object contains means from combinations of all three IVs. When the feedback and grade variables are set as the variables AT which the questions are to be asked (by “by”). that leaves only wordtype as the IV of the effect.
Not surprisingly, the only places that wordtype is a significant effect is in the negative and positive fifth grade conditions.
Show/Hide Code
joint_tests(wgf.emm, by=c("feedback","grade"))
feedback = None, grade = fifth:
model term df1 df2 F.ratio p.value
wordtype 2 72 0.424 0.6564
feedback = Positive, grade = fifth:
model term df1 df2 F.ratio p.value
wordtype 2 72 10.835 0.0001
feedback = Negative, grade = fifth:
model term df1 df2 F.ratio p.value
wordtype 2 72 14.224 <.0001
feedback = None, grade = twelfth:
model term df1 df2 F.ratio p.value
wordtype 2 72 0.671 0.5146
feedback = Positive, grade = twelfth:
model term df1 df2 F.ratio p.value
wordtype 2 72 0.459 0.6339
feedback = Negative, grade = twelfth:
model term df1 df2 F.ratio p.value
wordtype 2 72 1.376 0.2590
11.3 Simple Main Effects.
In this particular data set the 3way interaction was not significant and the only two way interaction that was significant was the wordtype by grade effect. The graph of these collapsed/marginal means is redrawn here to provide a reference point.
Now extract descriptive stats into an emmeans grid.
Now perform the tests of wordtype at levels of grade:
Show/Hide Code
test(contrast(w.g.emm),joint=TRUE, adjust=none)
grade df1 df2 F.ratio p.value note
fifth 2 72 19.306 <.0001 d
twelfth 2 72 2.153 0.1236 d
d: df1 reduced due to linear dependence
It is not clear what the warning about df and linear dependence is referring to here. Each of these simple main effects has the 2 numerator df expected for a comparison involving three means. I suspect that I may have used the “joint” argument in a way that is unexpected by the contrast function, even though it has produced correct results.
11.4 Contrasts with emmeans
There are five places where contrasts will be demonstrated here. These are on the 3-way and one 2-way interaction, on the set of simple simple main effects examined above, the simple main effects, and the main effects. We will not do all possible ones but be guided (partly) by the outcome of the omnibus tests.
It is easier to begin with the lower order effects.
11.4.1 Main Effect contrasts and pairwise main effect comparisons
In this data set, feedback did not interact with other factors, so we could examine contrasts on that main effect.
Show/Hide Code
f.emm <-emmeans(fit_base.afex, "feedback")
NOTE: Results may be misleading due to involvement in interactions
Show/Hide Code
f.emm
feedback emmean SE df lower.CL upper.CL
None 8.37 0.251 72 7.87 8.87
Positive 7.30 0.251 72 6.80 7.80
Negative 7.13 0.251 72 6.63 7.63
Results are averaged over the levels of: wordtype, grade
Confidence level used: 0.95
The orthogonal contrast set requested here for this factor is different than the one employed earlier in this document. Here, the Helmert set is specified since the levels are a control (first level) and two treatment conditions. The change was also made just to remind the user that a priori choices should drive contrast choice. In examining the means from the grid just produced, the expectation would be that the first of these two contrasts would account for the bulk of the feedback main effect variation, and it does. This change is also why the tests of these two contrasts do not match what was produced with the summary.lm analysis on the omnibus model above.
contrast estimate SE df t.ratio p.value
ac1 2.300 0.615 72 3.742 0.0004
ac2 0.167 0.355 72 0.470 0.6400
Results are averaged over the levels of: wordtype, grade
It is worth noting at this point that there is not a direct way of obtaining effect sizes with these types of emmeans analyses. In addition, since the output simply does a t-test, it is not possible to use a presented table of SS to calculate eta or partial eta squareds manually. This is a downside of using emmeans (or phia). I have seen ways of obtaining effect sizes for pairwise comparisons using emmeans, but not for contrasts - still searching. The testInteractions function from phia does provide SS for contrast effects as seen above, so manual computation of effect sizes can be accomplished with those quantities.
It is also worth remembering that the emmeans approach can also produce adjusted p values. The main effect contrast test is repeated here with a request for bonferroni-sidak correction of the p values (others are possible.
Show/Hide Code
test(lincombs_f, adjust="sidak")
contrast estimate SE df t.ratio p.value
ac1 2.300 0.615 72 3.742 0.0007
ac2 0.167 0.355 72 0.470 0.8704
Results are averaged over the levels of: wordtype, grade
P value adjustment: sidak method for 2 tests
Pairwise comparisons can also be performed, with Tukey-adjusted p values. Note that since the error df is specified as 72, the pooled within-cell error term is used, appropriately if there is homogeneity of variance.
Show/Hide Code
pairs(f.emm, adjust="tukey")
contrast estimate SE df t.ratio p.value
None - Positive 1.067 0.355 72 3.006 0.0101
None - Negative 1.233 0.355 72 3.476 0.0025
Positive - Negative 0.167 0.355 72 0.470 0.8857
Results are averaged over the levels of: wordtype, grade
P value adjustment: tukey method for comparing a family of 3 estimates
11.4.2 Simple Main Effect Contrasts
This section focuses on simple main effect following up on the wordtype by grade interaction since that was the only two-way interaction that was significant. It can be visualized again with this graph.
The contrast set on the wordtype variable is also the helmert set. Based on the visual patterns of the means we might expect that both contrasts on wordtype would be significant in fifth graders and neither in 12th graders. This is precisely what the outcome is.
grade = fifth:
contrast estimate SE df t.ratio p.value
ac1 2.600 0.869 72 2.991 0.0038
ac2 2.733 0.502 72 5.447 <.0001
grade = twelfth:
contrast estimate SE df t.ratio p.value
ac1 0.800 0.869 72 0.920 0.3605
ac2 0.933 0.502 72 1.860 0.0670
Results are averaged over the levels of: feedback
11.4.3 Simple Simple Main effect contrasts
Since we examined the Simple Simple main effect of wordtype at combined levels of feedback and grade above, we will now decompose those effects into their simple simple main effect contrasts. It is worth seeing the graph of the cell means again in order to visualize where those simple simple main effects and their contrasts originate.
The contrast set will match what was used for wordtype earlier in the document, the Helmert set.
Using emmeans for interaction contrasts is a bit more complex. It requires passing a whole contrast vector rather than just the main effect contrasts - in other words, emmeans cannot combine single IV contrasts to obtain the coefficients for an interaction contrast. We have to do it first. And, the order has to match the order of the cells in the grid.
One approach is to take the contrast of interest for wordtype (this initial example is with the first of the two contrasts of interest in the Helmert set) and using matrix operations multiply it by the single contrast for grade. This produces a list that has to be “unlisted” to create a string of six coefficients in order for the six cells of the 3x2.
These operations produce ic1 and ic2 which are the strings of coefficients for the two interaction contrasts for the wordtype by grade 2way interaction.
Show/Hide Code
# first, make sure that the wordtype coefficients are helmert# follows from changing to that specification in a section abovecontrasts(bg.3way$wordtype)
[,1] [,2]
LF_LE 2 0
HF_LE -1 -1
HF_HE -1 1
Show/Hide Code
wc <-contrasts(bg.3way$wordtype)wc
[,1] [,2]
LF_LE 2 0
HF_LE -1 -1
HF_HE -1 1
Show/Hide Code
# now make sure the contrast for grade is 1, -1contrasts(bg.3way$grade) <- contr.sumgc <-contrasts(bg.3way$grade)gc
[,1]
fifth 1
twelfth -1
Show/Hide Code
# now create the 6 coefficient interaction contrast vectors # (two of them)ic1 <- wc[,1]%*%t(gc)ic1 <-matrix(unlist(ic1), nrow=1)ic2 <- wc[,2]%*%t(gc)ic2 <-matrix(unlist(ic2), nrow=1)ic1
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 2 -1 -1 -2 1 1
Show/Hide Code
ic2
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 -1 1 0 1 -1
Now that the contrast vectors are known, they can be passed to the test function.
If we look at the graph, we can see that the HF_LE group differs from HF_HE in fifth graders more than it does in 12th graders. This is exactly what the second interaction contrast tests and it is significant by traditional NHST criteria.
contrast estimate SE df t.ratio p.value
intc1 1.8 1.23 72 1.464 0.1475
intc2 1.8 0.71 72 2.536 0.0134
Results are averaged over the levels of: feedback
11.4.5 Simple two way interaction contrasts
The wordtype by feedback interaction at levels of grade is a good illustration for simple two-way interaction contrasts because of the way that the base graph was set up.
For this analysis we change the contrasts to better fit the pattern seen in the graph - this is blatantly post hoc, but we can “see” the possibility of a 2way interaction in grade 5 but not grade 12 data. And most of that is the way that the third wordtype level (HF_HE) differs from the other two and how that, in turn, depends on whether one is examining feedback condition 1 (none) vs. 2 and 3 (positive and negative). This sets up one primary interaction contrast that will be pursued as a template for how to do any/all simple interaction contrasts.
A guiding principle in construction of these contrasts is that the interaction contrast vector has to match the pattern of the order of the 9 cells in the grid of means shown just above.
Show/Hide Code
# change to reverse helmertreverse <-matrix(c(-1,-1,2,1,-1,0),ncol=2)reverse
Again, note that the order of the coefficients is important - it matches the cell order from the grid of means shown above. A block diagram/table can help here.
Wordtype_c1 by Feedback_c1 Coefficients
wordtype_columns
feedback_rows
LF_LE
HF_LE
HF_HE
none
-2
1
1
positive
-2
1
1
negative
4
-2
-2
In R, there should be a way to pass those nine coefficients in the wc1byfb1 vector to the emmeans::contrast function but there is a gap in my R understanding so I’ve just typed them in manually.
The analysis shows what was expected. The simple interaction of the first contrast of wordtype and the first contrast of feedback produced a significant outcome in fifth graders but not in 12th grades, as the graph implies, visually. It is worth recalling here that even though the pattern appears to differ in 5th and 12th graders, the three way interaction was not significant. We can’t simple use the presence and absence of significance in simple effects to imply the presence of a higher order effect - it needs to be tested. On the other hand, if these simple interaction contrasts were a priori hypotheses, then there is a rationale for examining them without the presence of an omnibus 3way interaction.
The simplest way of obtaining the 3-way interaction contrasts is to return to the use of summary.lm on the basic aov fit object as outlined in section 5.1.1 above. emmeans can also produce the three-way interaction contrasts, but with much labor. (see later section)
For the data set under analysis here, the 3-way interaction was NS, so we would probably not undertake to evaluate these interaction contrasts unless there were a priori reasons to do so. However, they are shown here anyway (next section) in order to provide a template for situations where their evaluation would be important.
11.4.6.1 Using summary.lm
Here is the repetition of how it was done in the sections on 2-way interaction contrasts and simple 2-way interaction contrasts above (with emmeans). First, change the contrast set to the desired ones. The sets for wordtype and feedback are the same ones used in the previous emmans sections where the two way interaction contrasts and the simple two way interaction of wordtype and feedback were partitioned into contrasts. Note that these are different contrast sets than those used in the earlier section so the results will not match. I changed them just to provide another and different opportunity to match what the analysis says to what the eye sees on the plot.
The grade factor only has two levels so in a sense, it is already a contrast, and recall that we changed it earlier to a 1, -1 vector.
Here, we refit the basic aov model, using the factors with these redefined orthogonal contrast sets just to refresh memory and have it on hand in this section. (Same results as appeared earlier when the aov fit was produced “fit.1aov”)
Recall that if we use summary.lm tests of the regression coefficients are provided and these are the main effect, two way interaction and 3 way interaction contrast vectors - accomplished with t-tests. They are type 3 SS calculations, so these interaction contrasts suit our purposes.
There are four interaction contrasts. We might label them:
Wc1 by FBc1 by grade
Wc2 by FBc1 by grade
Wc1 by FBc2 by grade
Wc2 by FBc2 by grade
And they are in that order in the output.
Notice that only the first is significant. In fact the t values for the 2nd, 3rd, and fourth are all zero. This means that all of the 3way interaction is associated with that first contrast. This never happens, so we can assume that the data set is a fabricated textbook data set.
Nonetheless, this provides tests of the 3 way interaction contrasts. Unfortunately, it does not provide SS, only test statistics.
By using summary.lm, R has combined the main effect coding vectors in the appropriate ways to create the full rank X matrix that contains all 2 and 3 way interaction contrasts as well as main effect contrasts. This just comes out at part of the omnibus analysis with the split argument or here with summary.lm.
11.4.7 Using emmeans for 3 way interaction contrasts.
emmeans is a very inefficient way of obtaining higher order interaction contrasts. We cannot simply provide coefficient sets for each IV separately and have emmeans combine them. We have to create the vectors “manually”, perhaps using matrix algebra tools. The logic is that emmeans can only evaluate a contrast if the coefficients are provided. (my approach was to create the “lincombs” objects). In other words, emmeans will not do the combinatorial work to produce interaction from knowledge of main effect vectors. We have to “manually” create those vectors.
We have established the contrasts for each IV above (the “main effect” coding vectors). They are…..
first, for wordtype:
Show/Hide Code
wc <-contrasts(bg.3way$wordtype)wc
[,1] [,2]
LF_LE -1 1
HF_LE -1 -1
HF_HE 2 0
then for feedback:
Show/Hide Code
fb <-contrasts(bg.3way$feedback)fb
[,1] [,2]
None 2 0
Positive -1 1
Negative -1 -1
and then for grade:
Show/Hide Code
gc <-contrasts(bg.3way$grade)gc
[,1]
fifth 1
twelfth -1
There are four interaction contrasts formed with the 3way product of the three sets of component vectors. We can create them using the same matrix logic that we used above.
This is tricky because we have to make certain that the order of the coefficients matches what will be in the grid of the emmeans output.
First I show the vector for the 2way combination of Wc1 and fb1, then the three way with grade.
Now we can pass those 18 coefficients to the contrast function, in the proper order and use test, as before, to evaluate that three way interaction contrast. Notice that the t value for this first of four 3way interaction contrasts matches what was produced by summary.lm
The other three 3way interaction contrasts could be tested in the same manner. That is not done here since the 3way was not significant and the purpose of demonstration was accomplished with the first of the four.
11.4.7.1 Conclusions on 3 way interaction contrasts.
Obtaining the test of the 3way interaction contrasts with emmeans is considerable work. This illustration only outlined the way to obtain one of the four possible ones. This inefficiency is not desirable. It is simpler to use summary.lm. Neither approach produces SS and F tests or effect sizes. I will continue to prefer SPSS MANOVA.
11.5 Alpha rate adjustments wtih contrasts.
Recall that each use of the emmeans::test function had the capability of p value adjustments using the bonferroni/sidak/holm/fdr family of adjustments. Any of the sets of contrasts tested above might have used that argument, especially if the orthogonal sets were not a priori. The exception would be in the final contrast, the simple 2way interaction contrast where only one contrast was tested - thus nothing to be adjusted. However, if the full orthogonal set of four interaction contrasts were employed, then the adjustment might have made sense.
11.6 Pairwise follow up comparisons with emmeans
If a preference for follow up analyses chooses pairwise comparisons rather than contrasts, then emmeans permits that approach.
Consider the set of simple main effects of wordtype at levels of grade that was examined above. The grid of means is reproduced here as a reminder.
The plot of those marginal means in the wordtype by grade layout is also reproduced here for visual reference.
If we wanted to examine pairwise difference among the levels of wordtype, separately at levels of grade we could use the pairs function. In its default format, this produces all three pairwise t-tests separately at each level of grade (six tests in all) where all use the pooled within cell error term and its 72 df rather than the error specific to the two cells involved.
Unsurprisingly the HF_HE group differs from the other two levels of wordtype in fifth graders but no differences were found in 12th graders.
If these were explicitly post hoc pairwise comparisons, then it would be appropriate to do p value adjustments for error rate inflation. One nice thing about the pairs.emmGrid function is its capability for a tukey adjustment method in addition to the bonferroni/sidak/fdr family. Note that the adjustment is done within families of pairwise comparisons - here that means adjusting for three tests at each level of grade separately, not for a total of six tests.
Show/Hide Code
pairs(w.g.emm, adjust="tukey")
grade = fifth:
contrast estimate SE df t.ratio p.value
LF_LE - HF_LE -0.0667 0.502 72 -0.133 0.9903
LF_LE - HF_HE 2.6667 0.502 72 5.314 <.0001
HF_LE - HF_HE 2.7333 0.502 72 5.447 <.0001
grade = twelfth:
contrast estimate SE df t.ratio p.value
LF_LE - HF_LE -0.0667 0.502 72 -0.133 0.9903
LF_LE - HF_HE 0.8667 0.502 72 1.727 0.2022
HF_LE - HF_HE 0.9333 0.502 72 1.860 0.1579
Results are averaged over the levels of: feedback
P value adjustment: tukey method for comparing a family of 3 estimates
There is a way to produce effect size statistics (Cohen’s d) for these pairwise comparisons. See the help documentation for the eff_size function in emmeans.
12 Additional Post Hoc Tests
Tukey’s HSD is easily accomplished on an AOV fit via a function in the base stats package that is loaded upon startup note that this compares all pairs of cell means. It would severely over correct for alpha inflation since not all of the pairs of cell means are interesting comparisons (comparing applies and oranges in many instances). I cannot recommend using this method for a factorial.
The only satisfactory solutions that I’ve seen for application of Post Hoc tests variously to marginal tables of means collapsed from the full factorial table or for arrays of means in simple effect configurations are those provided in the emmeans suite of tools. We saw the availability of the adjust argument in both the contrasts approach and the pairwise comparison approach. Analogous methods may be found in the phia approach.
13 Reproducibility
Ver 1.3 March 31, 2025
Converted document to Quarto
Cleaned up some language and added needed exposition
Corrected a few typos
Ver 1.2 April 17, 2023
major revision of much language and sequencing of approach
added major sections on follow up analyses with the emmeans and phia packages