## Visualize Sum of Squares and Variance

High Dispersion Variable

Low Dispersion Variable

##### Visualizing the Variance.

This app provides a way of visualizing the variance of a variable as a geometric entity. We define the variance this way, as the average of areas of N squares, because the 'formula' for variance involves a squared value. Textbook definitions of Sample Variance label it in various ways, including: $$S^2$$ (an upper case S), $$SD^2$$. amd $$VAR_X$$

This sample variance is usally defined with this expression: $$\frac{{\sum\limits_{i = 1}^n {({X_i} - \bar X)^2} }}{{N}}$$

Expansion of the numerator permits a rewrite: $$\frac{{\sum\limits_{i = 1}^n {({X_i} - \bar X)({X_i} - \bar X)} }}{{N}}$$

Either of these expressions make it clear that the deviation of each X value from its mean is multiplied by itself, yielding the 'area of a geometric square' perspective that the plots in this app emphasize. When this is done for each X value and the N quantities are summed, that gives us the numerator of the expression. This numerator is usally give a shorthand notation of SS. This is because it is the 'sum of squared deviations from the mean'. So we can write the variance expression a third way: $$\frac{{SS}}{{N}}$$

Students are encouraged to obtain the data values from the 'show the data' tab of this app and perform the calculations for these expressions in Excel, or a with a calculator.

Some users of this app may have been taught to compute the Variance a different way. This expression is similar to the ones above, differing only in the denominator: $$\frac{{\sum\limits_{i = 1}^n {({X_i} - \bar X)^2} }}{{N-1}}$$.

The numerator is still called Sum of Squares (SS), but the expression is no longer the arithmetic average of the squared deviations. It is close, but will be slightly larger.

This modified version of the expression is usually denoted as $$s^2$$, with a lower case s. It is the more commonly used variance expression. This preference is because it is an unbiased estimate of the variance in the population from which the N X scores were randomly sampled. Although it is no longer exactly the arithmetic average of areas of squares it is close, especially for larger sample sizes (N).

Introductory Statistics coursework will expand on the difference in these two variance compuations. In this app we used the first of the two versions which is typicalled the sample variance. Since the 'area of a square' concept is related to the numerator, the distinction in the two expressions doesn't alter they way we understand Sum of Squares.

## Tools for Statistics Instruction using R and Shiny

Author: Bruce Dudek at the University at Albany

Built using Shiny by Rstudio and R, the Statistical Programming Language.

This app works well as an adjunct to the use of excel for implementation of the Sum Of Squares computation. In a spreadsheet it is simple for the student to see each of the deviations, their squares, and the summing/averaging process that produces the sample variance. The idea of averaging squares makes the expression come to life for many students who will treat use of formulas only as memorization tasks.

Ver 1.0, Aug. 21, 2018