Skip to main content
eScholarship
Open Access Publications from the University of California

Articles

An educator’s perspective of the tidyverse

Computing makes up a large and growing component of data science and statistics courses. Many of those courses, especially when taught by faculty who are statisticians by training, teach R as the programming language. A number of instructors have opted to build much of their teaching around use of the tidyverse. The tidyverse, in the words of its developers, "is a collection of R packages that share a high-level design philosophy and low-level grammar and data structures, so that learning one package makes it easier to learn the next" (Wickham et al. 2019). The shared principles have led to the widespread adoption of the tidyverse ecosystem. No small part of this usage is because the tidyverse tools have been intentionally designed to ease the learning process and cognitive load for users as they engage with each new piece of the larger ecosystem. Moreover, the functionality offered by the packages within the tidyverse spans the entire data science cycle, which includes data import, visualisation, wrangling, modeling, and communication. We believe the tidyverse provides an effective and efficient pathway to data science mastery for students at a variety of different levels of experience. In this paper, we introduce the tidyverse from an educator's perspective, touching on the what (a brief introduction to the tidyverse), the why (pedagogical benefits, opportunities, and challenges), the how (scoping and implementation options), and the where (details on courses, curricula, and student populations).

Technology Innovations

NormalityAssessment: An Interactive Classroom Tool for Testing Normality Visually

Many inferential and predictive statistical procedures possess underlying theoretical assumptions that should be met in order for the results of those procedures to be considered reliable. One assumption associated with methods for population means, including linear regression coefficients, is that of normality of a population(s). When assessing normality, two graphical tools that are often utilized are normal quantile-quantile (QQ) plots and histograms. However, while these tools are popular, they still present challenges for many who use them due to the subjectivity oftentimes involved when examining them. In this article, we describe a free, interactive Shiny application, downloadable as an R package, which implements two procedures recently developed for graphical inference with a specific emphasis on assessing normality. The application was created and designed with a focus on statistics education.

Supporting Statistics and Data Science Education with learnr

A modern statistics or data science course aims to equip students with both conceptual and computing skills. This is a challenging task as instructors do not want to increase students’ cognitive load with new tools and technical details and have to balance limited teaching time to help students in achieving the learning outcomes of both content and tool use. Interactive tutorials, built with the R package learnr, can support student learning with progressive reveal of content, interactive code exercises, and quizzes with automatic feedback, and an interface with the potential to reduce technical burdens via deployment as a web application. We describe different use-cases for learnr tutorials including introductory and upper-level statistics and data science courses based on our own teaching experiences. We also discuss the common benefits and lessons learned from implementing and teaching with learnr tutorials.