The very useful dabestr package in R creates the Gardner-Altman style plots and is described here:
Ho J, Tumkaya T, Aryal S, Choi H, Claridge-Chang A (2019). “Moving beyond P values: Everyday data analysis with estimation plots.” doi: 10.1038/s41592-019-0470-3, https://rdcu.be/bHhJ4.
https://cran.r-project.org/web/packages/dabestr/index.html
Work in this app is based on the following literature:
References
Altman, D., Machin, D., Bryant, T., & Gardner, M. (2013). Statistics with confidence: confidence intervals and statistical guidelines. John Wiley & Sons.
Austin, P. C., & Hux, J. E. (2002). A brief note on overlapping confidence intervals. Journal of vascular surgery, 36(1), 194-195.
Baguley, T. (2012). Serious stats : a guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.
Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers Misunderstand Confidence Intervals and Standard Error Bars. Psychological Methods, 10(4), 389-396. https://doi.org/http://dx.doi.org/10.1037/1082-989X.10.4.389
Bolker, B. (2015). Overlapping confidence intervals. https://rstudio-pubs-static.s3.amazonaws.com/132971_a902bb2b962b407e9e9436559c6f5d36.html
Cumming, G. (2007). Inference by eye: Pictures of confidence intervals and thinking about levels of confidence. Teaching Statistics, 29(3), 89-93.
Cumming, G. (2009). Inference by eye: reading the overlap of independent confidence intervals. Statistics in medicine, 28(2), 205-220.
Cumming, G., Fidler, F., & Vaux, D. L. (2007). Error bars in experimental biology. J Cell Biol, 177(1), 7-11. https://doi.org/10.1083/jcb.200611141
Cumming, G., & Finch, S. (2005). Inference by eye: confidence intervals and how to read pictures of data. Am Psychol, 60(2), 170-180. https://doi.org/10.1037/0003-066X.60.2.170
Cumming, G., Williams, J., & Fidler, F. (2004). Replication and Researchers' Understanding of Confidence Intervals and Standard Error Bars. Understanding Statistics, 3(4), 299-311. https://doi.org/http://dx.doi.org/10.1207/s15328031us0304_5
Finch, S., & Cumming, G. (2009). Putting research in context: understanding confidence intervals from one or more studies. J Pediatr Psychol, 34(9), 903-916. https://doi.org/10.1093/jpepsy/jsn118
Franz, V. H., & Loftus, G. R. (2012). Standard errors and confidence intervals in within-subjects designs: generalizing Loftus and Masson (1994) and avoiding the biases of alternative accounts. Psychon Bull Rev, 19(3), 395-404. https://doi.org/10.3758/s13423-012-0230-1
Gardner, M. J., & Altman, D. G. (1986). Confidence intervals rather than P values: estimation rather than hypothesis testing. British medical journal (Clinical research ed.), 292(6522), 746-750. https://doi.org/10.1136/bmj.292.6522.746
Goldstein, H., & Healy, M. J. (1995). The graphical presentation of a collection of means. Journal of the Royal Statistical Society: Series A (Statistics in Society), 158(1), 175-177.
Ho, J., Tumkaya, T., Aryal, S., Choi, H., & Claridge-Chang, A. (2019). Moving beyond P values: data analysis with estimation graphics. Nat Methods, 16(7), 565-566. https://doi.org/10.1038/s41592-019-0470-3
Julious, S. A. (2004). Using confidence intervals around individual means to assess statistical significance between two means. Pharmaceutical Statistics: The Journal of Applied Statistics in the Pharmaceutical Industry, 3(3), 217-222.
Knezevic, A. (2008). Overlapping confidence intervals and statistical significance. StatNews: Cornell University Statistical Consulting Unit, 73(1).
Marmolejo-Ramos, F., & Matsunaga, M. (2009). Getting the most from your curves: exploring and reporting data using informative graphical techniques. Tutor. Quant. Methods Psychol, 5, 40-50.
Masson, M. E. J. (2004). “Using confidence intervals for graphically based data interpretation”: Correction to Masson and Loftus (2003). Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 58(4), 289. https://doi.org/http://dx.doi.org/10.1037/h0087451
Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 57(3), 203-220. https://doi.org/http://dx.doi.org/10.1037/h0087426
Morrison, G. R., & Weaver, B. (1995). Exactly how many p values is a picture worth? A commentary on Loftus’s plot-plus-error-bar approach. Behavior Research Methods, Instruments, & Computers, 27(1), 52-56.
Noguchi, K., & Marmolejo-Ramos, F. (2016). Assessing Equality of Means Using the Overlap of Range-Preserving Confidence Intervals. The American Statistician, 70(4), 325-334. https://doi.org/10.1080/00031305.2016.1200487
Payton, M. E., Greenstone, M. H., & Schenker, N. (2003). Overlapping confidence intervals or standard error intervals: what do they mean in terms of statistical significance? Journal of Insect Science, 3(1).
Payton, M. E., Miller, A. E., & Raun, W. R. (2000). Testing statistical hypotheses using standard error bars and confidence intervals. Communications in Soil Science and Plant Analysis, 31(5-6), 547-551.
Ryan, G. W., & Leadbetter, S. D. (2002). On the misuse of confidence intervals for two means in testing for the significance of the difference between the means. Journal of Modern Applied Statistical Methods, 1(2), 56.
Schenker, N., & Gentleman, J. F. (2001). On Judging the Significance of Differences by Examining the Overlap Between Confidence Intervals. The American Statistician, 55(3), 182-186. https://doi.org/10.1198/000313001317097960
Wolfe, R., & Hanley, J. (2002). If we're so different, why do we keep overlapping? When 1 plus 1 doesn't make 2. Cmaj, 166(1), 65-66.
Wright, T., Klein, M., & Wieczorek, J. (2018). A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals. The American Statistician, 73(2), 165-178. https://doi.org/10.1080/00031305.2017.1392359