# Chapter 10 Robust Methods and Resampling Methods

Violations of assumptions can wreak havoc with the standard methods of ANOVA, leading to increases in Type I error rate. Wilcox (2016) has reviewed the literature on this and concludes that researchers have been systematically taught, in error, that the standard parametric methods are often resistant to violations of assumptions. He argues, in a compelling fashion, that the standard approach of evaluating assumptions with inferential tests (e.g., levene test for homogeneity of variance, and various tests for non-normality) are underpowerd for sample sizes that are common in much research. Reliance on the “non-significance” of such tests leads to flawed application of the standard inferential methods such as the omnibus F test employed in ANOVA. Two general approaches, sometimes intertwined, are available in light of this perspective. Robust methods often employ alternative statistics for estimation done with essential descriptive statistics such as means and standard deviations and employ them in analyses. A large class of modified central tendency estimators can be employed. The second class of alternative methods are the resampling methods that we have covered elsewhere. In this chapter, a few examples will be shown for the application of both classes of these methods.

## 10.1 Robust statistics and 1-way ANOVA

Wilcox has implemented an easy to use suite of tools in an R package for Robust Methods (Mair & Wilcox, 2018). Some background in concepts of robust central tendency estimators is useful in fully understanding the approaches - such methods as trimming, and winsorizing. The function operates by doing 20% trimming for computation of the location indices, by default and then employs the Welch F method seen above. An effect size statistic is also obtained

`t1way(formula = dv~factora, data = hays)`

```
## Call:
## t1way(formula = dv ~ factora, data = hays)
##
## Test statistic: F = 8.1743
## Degrees of freedom 1: 2
## Degrees of freedom 2: 9.65
## p-value: 0.00838
##
## Explanatory measure of effect size: 0.68
## Bootstrap CI: [0.26; 1.07]
```

Pairwise multiple comparisons can also be performed, on these trimmed means, with a correction for error rate inflation built in.

`lincon(formula = dv~factora, data = hays)`

```
## Call:
## lincon(formula = dv ~ factora, data = hays)
##
## psihat ci.lower ci.upper p.value
## control vs. fast 7.16667 1.57906 12.75427 0.01221
## control vs. slow 5.33333 0.98750 9.67916 0.01221
## fast vs. slow -1.83333 -7.15661 3.48994 0.34025
```

For both functions, control of the degree of trimming can be implemented with the “tr” argument (results not shown).

`t1way(formula = dv~factora, data = hays, tr=.10)`

```
## Call:
## t1way(formula = dv ~ factora, data = hays, tr = 0.1)
##
## Test statistic: F = 4.3376
## Degrees of freedom 1: 2
## Degrees of freedom 2: 12.02
## p-value: 0.03817
##
## Explanatory measure of effect size: 0.65
## Bootstrap CI: [0.36; 1.01]
```

`lincon(formula = dv~factora, data = hays, tr=.10)`

```
## Call:
## lincon(formula = dv ~ factora, data = hays, tr = 0.1)
##
## psihat ci.lower ci.upper p.value
## control vs. fast 6.500 -0.50481 13.50481 0.05124
## control vs. slow 5.375 0.04328 10.70672 0.05124
## fast vs. slow -1.125 -7.21691 4.96691 0.60863
```

The `med1way`

function from **WRS2** performs a 1-way

`med1way(formula = dv~factora, data = hays, iter=1000)`

```
## Call:
## med1way(formula = dv ~ factora, data = hays, iter = 1000)
##
## Test statistic F: 1.3439
## Critical value: 1.2377
## p-value: 0.046
```

## 10.2 Resampling methods: Permutation testing

I have been using the **lmPerm** package for permutation tests. Unfortunately the source code has been moved into the CRAN archive for unsupported packages. It is, nontheless, accessible. Follow these steps

Download the file lmPerm_1.1-2.tar.gz from http://cran.r-project.org/src/contrib/Archive/lmPerm/, and save it to a convenient place on your local machine.

MS Windows users may need to install RTools from http://cran.r-project.org/bin/windows/Rtools/. Mac and Linux users can skip this step.

Do install.packages(file.choose(), repos=NULL, type=“source”)

The `aovp`

function is simple. Its defaults

```
#library(lmPerm)
#library(multcomp)
set.seed(15) # so that reruns of this produce the same outcome as seen here
<- aovp(dv~factora, data=hays, perm="Prob") fit
```

`## [1] "Settings: unique SS "`

`anova(fit)`

```
## Analysis of Variance Table
##
## Response: dv
## Df R Sum Sq R Mean Sq Iter Pr(Prob)
## factora 2 233.87 116.933 5000 0.002 **
## Residuals 27 535.60 19.837
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Alternatively…..

```
#library(coin)
independence_test(dv ~ factora,
data = hays)
```

```
##
## Asymptotic General Independence Test
##
## data: dv by factora (control, fast, slow)
## maxT = 2.9574, p-value = 0.008792
## alternative hypothesis: two.sided
```

Pairwise comparisons….

```
#library(rcompanion)
<- pairwisePermutationTest(dv ~ factora,
ppairs data = hays,
method="fdr")
ppairs
```

```
## Comparison Stat p.value p.adjust
## 1 control - fast = 0 2.334 0.01958 0.02937
## 2 control - slow = 0 2.576 0.009982 0.02937
## 3 fast - slow = 0 -0.336 0.7369 0.73690
```

Or organized another way:

```
<- pairwisePermutationMatrix(dv ~ factora,
ppm data = hays,
method="fdr")
ppm
```

```
## $Unadjusted
## control fast slow
## control NA 0.01958 0.009982
## fast NA NA 0.736900
## slow NA NA NA
##
## $Method
## [1] "fdr"
##
## $Adjusted
## control fast slow
## control 1.00000 0.02937 0.02937
## fast 0.02937 1.00000 0.73690
## slow 0.02937 0.73690 1.00000
```

## 10.3 Resampling methods: Bootstrapping

This section briefly reviews implementation of bootstrapping capability from two different R packages, **WRS2** and **lmboot**.

### 10.3.1 Percentile t boostrapping along with robust mean estimation.

Wilcox recommends a percentile t method of bootstrapping for 1way ANOVA (Wilcox, 2016). This is implemented with the `t1waybt`

function which also permits specification of the amount of trimming for means, thus adding a further robustness feature to the analysis. This method is appropriate when either (or both) heteroscedasticity and non-normality are present. The variance explained is a proportion of variance and the Effect size is a cohen’s f value. These value will differ from what was obtained previously, because of the trimming and the bootstrapping for error terms.

`t1waybt(dv~factora,tr=.2,nboot=800, data=hays)`

```
## Warning in t1waybt(dv ~ factora, tr = 0.2, nboot = 800, data = hays): Some
## bootstrap estimates of the test statistic could not be computed.
```

```
## Call:
## t1waybt(formula = dv ~ factora, data = hays, tr = 0.2, nboot = 800)
##
## Effective number of bootstrap samples was 796.
##
## Test statistic: 8.1743
## p-value: 0.01382
## Variance explained: 0.466
## Effect size: 0.683
```

The user can employ another **WRS2** function, `mcppb20`

, to do pairwise post hoc tests comparing pairs of groups within the bootstrapping and trimming framework:

`mcppb20(dv~factora,tr=.2,nboot=800, data=hays)`

```
## Call:
## mcppb20(formula = dv ~ factora, data = hays, tr = 0.2, nboot = 800)
##
## psihat ci.lower ci.upper p-value
## control vs. fast 7.16667 0.33333 12.33333 0.0150
## control vs. slow 5.33333 1.16667 10.16667 0.0025
## fast vs. slow -1.83333 -6.16667 3.66667 0.4575
```

## 10.4 Residual/Wild Bootsrapping

The **lmboot** package provides a function to implement Residual/Wild bootstrapping, which is recommended for models with heteroscedasticity. The simple output is only the p-value for the test of the single IV in 1way ANOVA. The modeling is based on Type I SS by default and cannot be changed. I rerun the base model so that the results can be compared. The `ANOVA.boot`

function permits specification of both a seed that helps with reproducibility within this document, and number of bootstrap samples (“B”).

```
#library(lmboot)
anova(aov(dv~factora, data=hays))
```

```
## Analysis of Variance Table
##
## Response: dv
## Df Sum Sq Mean Sq F value Pr(>F)
## factora 2 233.87 116.933 5.8947 0.007513 **
## Residuals 27 535.60 19.837
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
<- ANOVA.boot(dv~factora, data=hays, seed=1234, B=5000)
boot1 $`p-values` #bootstrap p-values for 1-way model boot1
```

`## [1] 0.0058`