Descriptive Statistics

Data Sets and Variables Available for Analysis

Variable from these data sets are useful for exploring questions about shapes of distributions, outliers, bin widths in frequency histograms, and kernel density smoothing techniques. This application is for evaluation of quantitative variables.

Three data sets are available for examination (Go to the Plots/Summary Tab to use them):

  1. Aron Table 1-5 is a variable from the introductory statistics textbook: Aron, A., Coups, E. J., & Aron, E. (2013). Statistics for psychology (6th ed.). Boston: Pearson. The data set contains one variable, the number of social interactions during a week for 94 college students.

  2. Mouse Alcohol Drinking. Data are on alcohol consumption in a set of 379 mice tested across four days of two-bottle choice where they could drink fluid either from a bottle containing water or a bottle containing alcohol. The variable available fromt this data set is measured as a daily average of g/kg of alcohol consumed across the four days. Data are from Crabbe, J. C., Wahlsten, D., & Dudek, B. C. (1999) Genetics of mouse behavior: interactions with laboratory environment. Science, 284(5420), 1670-1672. PMID: 1035639.

  3. Morning Person Survey Question. Students in two introductory statistics classes were asked a question as part of a survey:
         To what extent are you a morning person?
              The answers used a 7 point rating scale where a
              1 indicated “not at all a morning person”, and
              7 indicated “very much a morning person”
    The data available here, as are the combined data from both classes.
    These data provide a good illustration of some of the problems associated with using Likert scales as if they were quantitative variables. It gives instructors the opportunity to discuss the psychometric, statistical, and graphing issues that emerge. For example, do the data only have ordinal properties? Or can these variables be construed to reflect an underlying interval scale? What does it mean to compute an arithmetic mean or a variance/Std Deviation on these types of data?

  4. Old Faithful Geyser Eruption Data Set. This data set on the famous Yellowstone geyser is found in the R base package. It contains two variables. The first “duration” is the duration of each eruption (min). The second, “interval” is the latency between successive eruptions (min).

    See the Wikipedia Entry on “Old Faithful”

Tools for Statistics Instruction using R and Shiny

Author: Bruce Dudek at the University at Albany.

Assistance In R coding was provided by Jason Bryer, University at Albany and Excelsior College.

The equal area histogram and diagonally cut histogram use the “dhist” function provided by Denby and Mallows, 2009 (J. Comp Graph. Stat., 18(1),pp 21-31 )

Built using Shiny by Rstudio and R, the Statistical Programming Language.

Ver 1.5, Jan 22, 2017