# Chapter 3 Perform the standard parametric 1-way ANOVA

Note: In this chapter we can specify the data frame to be used inside the `aov`

and `lm`

functions, so we will detach the hays data frame to avoid confusion when contrasts for factors are re-specified. We had attached in in the earlier chapter to make writing code for graphing functions more efficient.

`detach(hays)`

The R user is faced with many choices of packages and functions to implement 1-way ANOVAs. We will begin with a very simple and limited function `oneway.test`

. The standard method for doing ANOVA is to use the `aov`

function from the base package, although `lm`

can also be used. They are the most common approaches for obtaining the traditional SS partitioning and F test. We also do a bit more work with them to obtain analytical/orthogonal contrasts here, as well Three other approaches are also outlined. **ezanova**, **granova**, and **afex** are all designed to provide easier and/or additional strategies.

## 3.1 The `oneway.test`

function

A very basic way to do a one factor ANOVA is with the `oneway.test()`

function. The “var.equal” argument, when set to TRUE yields the standard textbook analysis where the error term (MSerror) is the pooled within-group variances and homogeneity of population variances is assumed.

```
# the var.equal argument permits using the pooled within-group error
# as the F test error term.
oneway.test(dv~factora,var.equal=TRUE, data=hays)
```

```
##
## One-way analysis of means
##
## data: dv and factora
## F = 5.8947, num df = 2, denom df = 27, p-value = 0.007513
```

Changing the var.equal argument is possible. This is desireable when the homogeneity of variance assumption is violated. The logic and desireability of this form of the 1-way ANOVA F test is parallel to the Welch test in the independent samples “t-test”. More on alternative approaches when assumptions are violated is provided later in this document.

This Welch test is strongly recommended.

```
# setting var.equal=F results in a generalization of the
# Welch (Fisher-Behrens) form of the test.
# see your textbook or see the help pages for oneway.test() for more information
# notice the fractional df for the denominator term
oneway.test(dv~factora,var.equal=FALSE, data=hays)
```

```
##
## One-way analysis of means (not assuming equal variances)
##
## data: dv and factora
## F = 5.0716, num df = 2.000, denom df = 15.914, p-value = 0.01977
```

## 3.2 The `aov`

and `lm`

functions

The `aov`

function in the base package is a “wrapper” for `lm`

. It uses the ‘lm’ linear modeling engine, but permits model specification in a similar way to how we have seen `lm`

used for regression. It expects IVs to be factors. The output with use of the `summary`

function is the more traditional ANOVA summary table with SS and df partitioning. The simplicity of its usage is its strength. It’s output is, however, limited. We need to use other functions for analytical/orthogonal/singleDF contrasts, multiple comparisons, etc. That follows below.

An extended CAVEAT BEFORE USING AOV:

Even though this script addresses only 1-way designs, there are some issues that are bigger, once we get to factorial designs and that require care NOT to simply apply a generalization of these 1-way approaches to factorials with unequal sample sizes per group. I’ll elaborate on this a bit here, but in the 1-way situation this discussion is largely irrelevant except for evaluation of individual contrasts and there is a later chapter in this document that addresses the questions associated with them. The `anova`

and `summary`

functions applied to `aov`

objects produce Type I SS. When used in factorial designs, the `aov`

and `anova`

functions will not test hypotheses about unweighted marginal means. The **car** package provides access to an `Anova`

function (upper case first letter A) that will produce Type III SS. Obtaining Type III SS for contrasts is not completely direct. You will see that use of the `summary`

function on the `aov`

model does produce Type I SS for the contrasts. I obtain tests of the contrasts with the `lm`

function by applying the summary function to the `lm`

model or by use of the `summary.lm`

function on an `aov`

object. This approach does provide a test of type III SS, but the SS are not listed. See the later chapter on Unequal Sample Sizes for details on this topic. The **emmeans** package is a strong alternative when working with contrasts - chapter 6 here.

There is a very large debate on the value of Type I, II vs III SS. In our class, we will address that at a later point in time.

Conclusion: The direct extension, to a factorial design, of the methods outlined here will not always produce exact duplication of the SPSS/SAS methods we cover using TYPE III or UNIQUE SS methods, in factorial designs with unequal N. Since the initial/primary example in this document is balanced (equal N) and not a factorial, these issues are not relevant for this initial Hays data set. The treatment of the Type I and III SS issues here set the stage for the importance of that distinction in factorial designs where marginal means such as main effects might be either weighted or unweighted. We will better be able to grasp the issues there with the foundation put in place here and in the preceding tutorial on linear modeling in R

The “R ecosystem” is not terribly well oriented to the experimental design perspective, as it has emerged in the Psychological Sciences. Duplication of the exact approaches we have learned with SPSS/SAS can be done, but more complex designs such as factorial between-groups designs and repeated measures present unique challenges for some of the straightforward methods used in SPSS/SAS. This is especially the case for contrasts, simple effects, and repeated measures. This further work is enabled by a solid understanding for the applications of `aov`

and `lm`

in one factor designs.

### 3.2.1 Using the `aov`

function

Fortunately, obtaining the results from ‘aov’ is simpler than the preceding four paragraphs. `aov`

utilizes the `lm`

functionality to perform its work. Notice that, like `lm`

, the data frame can be named as an argument precluding the need for attachment of the dataframe as we did earlier in this document.

Note that the independent variable named in the model specification is expected to have a class type of “factor”, which it does in this example (called factora here). This is because of how the variable was specified as a string variable and imported as a factor by `read.csv`

.

```
.1 <- aov(dv~factora,data=hays)
fitsummary(fit.1)
```

```
## Df Sum Sq Mean Sq F value Pr(>F)
## factora 2 233.9 116.93 5.895 0.00751 **
## Residuals 27 535.6 19.84
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Notice that this result from `aov`

was produced by the `summary`

function. We have previously seen that applying `summary`

on a `lm`

regression object produces a coefficients table rather than the SS partitioning and F test. This distinction arises because the ANOVA summary table is the typical summary reported for ANOVAs. One way around this, if we want to see the regression coefficients produced by the analysis is to use the `summary.lm`

function on the `aov`

object. We see presentation of two coding vectors, expected since there are three categories in the “factora” IV. Later we consider which coding scheme is employed by default to produce the vectors for which coefficients are found here.

`summary.lm(fit.1)`

```
##
## Call:
## aov(formula = dv ~ factora, data = hays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4 -2.7 0.4 2.1 8.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.400 1.408 19.454 < 2e-16 ***
## factorafast -6.200 1.992 -3.113 0.00435 **
## factoraslow -5.600 1.992 -2.811 0.00907 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 27 degrees of freedom
## Multiple R-squared: 0.3039, Adjusted R-squared: 0.2524
## F-statistic: 5.895 on 2 and 27 DF, p-value: 0.007513
```

The `anova`

function can also be applied to the `aov`

object and gives slightly different pieces of information than did application of the `summary`

function above (number of significant places for F and p values).

`anova(fit.1)`

```
## Analysis of Variance Table
##
## Response: dv
## Df Sum Sq Mean Sq F value Pr(>F)
## factora 2 233.87 116.933 5.8947 0.007513 **
## Residuals 27 535.60 19.837
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

### 3.2.2 Using the `lm`

function for an ANOVA

We can also obtain the same analysis using `lm`

directly. Here, `summary`

on the `lm`

object gives the expected coefficients table, but `anova`

on that object gives an ANOVA summary table with slightly different characteristics.

```
# use the linear models function (lm) to do the analysis
# from this, can you determine whether lm uses dummy or effect coding?
.2 <- lm(dv~factora, data=hays)
fitsummary(fit.2)
```

```
##
## Call:
## lm(formula = dv ~ factora, data = hays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4 -2.7 0.4 2.1 8.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.400 1.408 19.454 < 2e-16 ***
## factorafast -6.200 1.992 -3.113 0.00435 **
## factoraslow -5.600 1.992 -2.811 0.00907 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 27 degrees of freedom
## Multiple R-squared: 0.3039, Adjusted R-squared: 0.2524
## F-statistic: 5.895 on 2 and 27 DF, p-value: 0.007513
```

`anova(fit.2)`

```
## Analysis of Variance Table
##
## Response: dv
## Df Sum Sq Mean Sq F value Pr(>F)
## factora 2 233.87 116.933 5.8947 0.007513 **
## Residuals 27 535.60 19.837
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

### 3.2.3 What is the default coding scheme for the factor?

If you couldn’t determine whether dummy (indicator) coding or effect coding was used (based on regression coefficient values and the cell means as we have previously covered) then the following code may help. Recall the earlier tutorial document where we examined how to control/change contrast choices and that `lm`

uses dummy coding by default and calls it `contr.treatment`

. Note which category is the reference.

`contrasts(hays$factora)`

```
## fast slow
## control 0 0
## fast 1 0
## slow 0 1
```

## 3.3 Analytical and orthogonal contrasts for one factor ANOVA models

If we want to employ our own contrasts we can create them and reassign those to factora. The goal is to create an orthogonal set. It is possible to request the helmert set directly, but notice that it gives what we have called “reverse helmert” and not the set that we want to use based on our prior work with this data set.

```
contrasts(hays$factora) <- "contr.helmert"
contrasts(hays$factora)
```

```
## [,1] [,2]
## control -1 -1
## fast 1 -1
## slow 0 2
```

There is not a method for automatically rearranging the pattern for the built-in helmert set so we need to create our own vectors. This will be the preferred, and more generalizable, approach that we can use in many different designs, and with many different kinds of orthogonal sets. This method was reviewed in the earlier tutorial on coding vectors in R.

```
# now apply orthogonal contrasts of our choosing.
<- matrix(c(2,-1,-1, 0,1,-1),ncol=2)
contrasts.factora contrasts(hays$factora) <- contrasts.factora
contrasts(hays$factora)
```

```
## [,1] [,2]
## control 2 0
## fast -1 1
## slow -1 -1
```

The first method for requesting analysis of these contrasts employs the ‘split’ argument in the ‘summary’ function. I find it to be somewhat cumbersome, especially for factorial designs. Nonetheless, it works well, and the summary table is easily readable. Note that this “split” approach yields F tests that are based on Type I SS. See the unequal sample size chapter below for detailed discussion. In this current data set with equal N, there is no distinction between Type I and Type III SS.

```
.3 <- aov(dv~factora, data=hays)
fitsummary(fit.3,
split=list(factora=list(ac1=1, ac2=2)))
```

```
## Df Sum Sq Mean Sq F value Pr(>F)
## factora 2 233.9 116.93 5.895 0.00751 **
## factora: ac1 1 232.1 232.07 11.699 0.00200 **
## factora: ac2 1 1.8 1.80 0.091 0.76555
## Residuals 27 535.6 19.84
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The summary table from the `lm`

model object produces t-tests rather than F-tests and SS listings. However, those t’s are the square roots of the F’s just seen, so both approaches yield the same outcome. But this is only true when sample sizes are equal. If they are unequal, the summary table from the `lm`

model fit produces tests based on an equivalence to Type III SS.

```
.3lm <- lm(dv~factora, data=hays)
fitsummary(fit.3lm)
```

```
##
## Call:
## lm(formula = dv ~ factora, data = hays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4 -2.7 0.4 2.1 8.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.4667 0.8132 28.858 <2e-16 ***
## factora1 1.9667 0.5750 3.420 0.002 **
## factora2 -0.3000 0.9959 -0.301 0.766
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 27 degrees of freedom
## Multiple R-squared: 0.3039, Adjusted R-squared: 0.2524
## F-statistic: 5.895 on 2 and 27 DF, p-value: 0.007513
```

If we apply this `split`

approach to the `lm`

object, an identical table of results is produced, as expected. This request is the method for obtaining the parameter estimates. Note the equivalence to what we found using the “manual” regression procedure in SPSS. It is also equivalent to the first summary table seen just above for the fit.3lm model object.

```
.4 <- lm(dv~factora, data=hays)
fitsummary(fit.4,
split=list(factora=list(ac1=1, ac2=2)))
```

```
##
## Call:
## lm(formula = dv ~ factora, data = hays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4 -2.7 0.4 2.1 8.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.4667 0.8132 28.858 <2e-16 ***
## factora1 1.9667 0.5750 3.420 0.002 **
## factora2 -0.3000 0.9959 -0.301 0.766
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 27 degrees of freedom
## Multiple R-squared: 0.3039, Adjusted R-squared: 0.2524
## F-statistic: 5.895 on 2 and 27 DF, p-value: 0.007513
```

An alternative, and perhaps simpler, way to obtain the contrasts is to apply a version of the `summary`

function to an `aov`

object. We saw above that simply applying `summary`

to an `aov`

object produces tests of the contrasts, only when the `split`

argument was used (code repeated here).

```
summary(fit.3,
split=list(factora=list(ac1=1, ac2=2)))
```

However, since `aov`

is a wrapper to `lm`

we can use the `summary`

function in a different way by calling `summary.lm`

. This will produce the same t-tests of the specified contrasts that we saw above when `summary`

was applied to an `lm`

fit object. This is a good way to do analyses only using `aov`

, and the t-tests in the table are equivalent to those shown above when starting with the `lm`

object, and thus equivalent to Type III SS decompositions. For the oneway design, these t’s are the square roots of the F tests for the contrasts produced when using the `split`

function on the `aov`

object.

`summary.lm(fit.3)`

```
##
## Call:
## aov(formula = dv ~ factora, data = hays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4 -2.7 0.4 2.1 8.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.4667 0.8132 28.858 <2e-16 ***
## factora1 1.9667 0.5750 3.420 0.002 **
## factora2 -0.3000 0.9959 -0.301 0.766
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 27 degrees of freedom
## Multiple R-squared: 0.3039, Adjusted R-squared: 0.2524
## F-statistic: 5.895 on 2 and 27 DF, p-value: 0.007513
```

The reader is urged to compare this table to the output from SPSS REGRESSION where we first utilized this orthogonal contrast set with the hays data set.

### 3.3.1 Manually computing contrast SS for verification

To be thorough, and to connect these analyses to the basic formulaic concepts put in place in initial lectures, let’s see if we can find a way to manually compute these contrast SS using some matrix/vector manipulation in R. First we need to find the linear combination “psi” value - only done here for the first vector, the 2 -1 -1 contrast.

```
# First lets obtain the group means and place them in a vector.
# We can do this two ways.
# a simple way is to use the 'aggregate' function
# which produces a data frame of the means
<- aggregate(dv~factora, hays, mean)
xbars xbars
```

```
## factora dv
## 1 control 27.4
## 2 fast 21.2
## 3 slow 21.8
```

```
#str(xbars)
# a second way to extract the means by group is using 'dplyr',
# pipes, and a summarizing function.
# not run here
#@xbars <-
#hays %>%
# group_by(factora) %>%
# summarise (mean_dv = mean(dv))
```

```
# now extract the means from the dataframe and create a vector
<- xbars$dv
v1 v1
```

`## [1] 27.4 21.2 21.8`

```
#extract the first contrast from the contrasts matrix
<- contrasts(hays$factora)[,1]
v2 v2
```

```
## control fast slow
## 2 -1 -1
```

```
# For our purposes it doesn't matter if these vectors
# are row or column vectors.
# we just need the dot product
# take the dot product of these vectors.
<- v1%*%v2
vectorprod vectorprod
```

```
## [,1]
## [1,] 11.8
```

Once we have this “psi” quantity, we can use our generic SS formula to compute SS. We have seen this same 11.8 value in our earlier work with this data set, several times actually.

\(\text{S}{{\text{S}}_{AnalyticalContrast}}=\frac{{{n}_{j}}*{{(\Psi )}^{2}}}{\sum\limits_{j-1}^{k}{c_{j}^{2}}}\)

For our data set n=10 and the sum of the squared coefficients is 6. So,

```
<- (10*(11.8**2))/6
SS_ac1 SS_ac1
```

`## [1] 232.0667`

This matches the SS for the first contrast produced with the ‘split’ function, as applied to the `aov`

fit.3 object above.

### 3.3.2 The Bonferroni family and other alpha-inflation control methods for analytical contrasts

If we need to control of post hoc error rate inflation with the tests of these contrasts, R has a useful function called `p.adjust`

. The types of adjustments are in the ‘bonferroni’ style of control. `p.adjust permits choice of a large number of methods. Output from only one of them is shown here, but the logic for changing to the others is obvious from the commented code. The reader should look at the`

p.adjust` help page.

For our orthogonal set, any of the methods listed here are appropriate. However, since there are only two contrasts, we don’t find any difference among the approaches.

So, the laying out of the different approaches can serve as a template for designs where you might have more than two contrasts. Note that I had to submit the p values for the each of the contrasts examined in the preceding summary(fit.4)specification.

The result here is almost too trivial to need the software. With two tests, the bonferroni adjust is just to double the observed p-value. And since only the first was significant above at the nominal alpha level of .05 it is the only one we really had to adjust and we could have multiplied it by two in our heads. Nonetheless, this puts in place a template for usage in more complex situation.

Understand that the logic is to submit p-values of ALL followup tests so that the number of tests can be determined by ‘p.adjust’ in its computation.

With the Bonferroni adjustment, our first contrast remains significant since .004 < .05.

Later in this document several other multiple comparison and post hoc methods are reviewed.

```
# now do a bonferroni adjustment on the p values from these two contrasts.
# we can use the p.adjust function for this. it has many other
# options for modified bonferroni adjustments - read the help: ?p.adjust
# For our orthogonal/independent contrasts, the bonferroni, holm, hochberg, and
# Benjamini/Hochberg (fdr) method are appropriate. However, since there are only two
# contrasts, we don't find any difference among the approaches.
# So, the laying out of the different approaches can serve as a template for
# designs where you might have more than two contrasts.
# note that I had to submit the p values for the each of the contrasts
# examined in the preceding summary(fit.4)specification:
p.adjust(c(.00200,.07655), method="bonf")
```

`## [1] 0.0040 0.1531`

```
#p.adjust(c(.00200,.07655), method="holm")
#p.adjust(c(.00200,.07655), method="hoch")
#p.adjust(c(.00200,.07655), method="hommel")
#p.adjust(c(.00200,.07655), method="fdr")
```

## 3.4 A note about testing analytical contrasts in R

While the above method using the ‘split’ function is workable for the 1-factor ANOVA model, it can become quite challenging to extend its usage to complex designs with multiple factors, and especially with repeated measures designs.

The **phia** package will provide some capability for those broader needs.

The **granova** package has a function that permits use of contrasts, in a 1-way, and is exemplified below.

The most flexible approach is probably using the **emmeans** package. It’s usage for our 1-way Hays design is found in its own chapter below and will be revisited with later designs.

## 3.5 Use of the ‘Anova’ function from the **car** package

Above, we discussed the fact that use of the base package ‘anova’ function on an ‘aov’ object produced Type I SS. The issue of Type I vs Type III SS is not relevant for the current data set since there is equal n in the groups. Nonetheless, it is useful to put in place an introduction to the use of the ‘Anova’ function here and the specification required to obtain Type III SS.

An `Anova`

argument called “type” permits request of the various “types” of SS. Once again, we see that with equal N, the TYPE III SS equals the Type I SS (and F/p values) produced by `anova`

in various illustrations above.

```
#require(car)
Anova(fit.1,type="III")
```

```
## Anova Table (Type III tests)
##
## Response: dv
## Sum Sq Df F value Pr(>F)
## (Intercept) 7507.6 1 378.4638 < 2.2e-16 ***
## factora 233.9 2 5.8947 0.007513 **
## Residuals 535.6 27
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

## 3.6 Recommended approach for a basic 1-way ANOVA with planned contrasts

Although this section has reviewed many options, the core initial approach to a 1-way ANOVA can be fairly quick and direct with just a few steps:

- Choose a set of planned contrasts and create the orthogonal set.
- Perform the omnibus ANOVA with the
`aov`

function - Examine the omnibus F test and the single df contrast tests with either the “split” capability or with
`summary.lm`

- Follow up with post hoc tests, effect size computations, evaluation of assumptions and possible alternative methods outlined in later chapters and/or the
`p.adjust`

methods shown earlier in this chapter.

Here is the sequence (repeating the functions shown above):

First create the contrasts:

```
<- matrix(c(2,-1,-1, 0,1,-1),ncol=2)
contrasts.factora contrasts(hays$factora) <- contrasts.factora
contrasts(hays$factora)
```

```
## [,1] [,2]
## control 2 0
## fast -1 1
## slow -1 -1
```

Perform the base omnibus ANOVA and partition the factor into contrasts using “split”:

```
.3 <- aov(dv~factora, data=hays)
fitsummary(fit.3,
split=list(factora=list(ac1=1, ac2=2)))
```

```
## Df Sum Sq Mean Sq F value Pr(>F)
## factora 2 233.9 116.93 5.895 0.00751 **
## factora: ac1 1 232.1 232.07 11.699 0.00200 **
## factora: ac2 1 1.8 1.80 0.091 0.76555
## Residuals 27 535.6 19.84
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

or, obtain inferences on the contrasts by using `summary.lm`

. This will produce “t-tests” of the coding vectors, but in an equal-N situation the squares of the t’s will equal the F’s. With unequal N’s the t-tests are tests equivalent to Type III SS (a more important issue in Factorial designs than in 1-way designs)

`summary.lm(fit.3)`

```
##
## Call:
## aov(formula = dv ~ factora, data = hays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4 -2.7 0.4 2.1 8.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.4667 0.8132 28.858 <2e-16 ***
## factora1 1.9667 0.5750 3.420 0.002 **
## factora2 -0.3000 0.9959 -0.301 0.766
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 27 degrees of freedom
## Multiple R-squared: 0.3039, Adjusted R-squared: 0.2524
## F-statistic: 5.895 on 2 and 27 DF, p-value: 0.007513
```