Volume 12, Issue 1, 2019
When experienced analysts explore data in a rich environment, they often transform the dataset. For example, they may choose to group or filter data, calculate new variables and summary measures, or reorganize a dataset by changing its structure or merging it with other information. Such actions background, highlight, or even fundamentally change particular features of the data, allowing different types of questions to be explored. We call these actions data moves. In this paper, we argue that paying explicit attention to data moves, as well as their purposes and consequences, is necessary for educators to support student learning about data. This is especially needed in an era when students are expected to develop critical literacy around data and engage in purposeful, self-directed exploration of large and often complex datasets.
This paper revisits the box model, a metaphor developed by David Freedman to explain sampling distributions and statistical inference to introductory statistics students. The basic idea is to represent all random phenomena in terms of drawing tickets at random from a box. In this way, random sampling from a population can be described in the same way as everyday phenomena, like coin tossing and card dealing. For Freedman, box models were merely a thought experiment; calculations were still done using normal approximations. In this paper, we propose a more modern view that treats the box model as a practical simulation framework for conducting inference. We show how concepts in introductory statistics and probability classes can be motivated by simulating from a box model. To facilitate this simulation-based approach to teaching box models, we developed an online, open-source "box model simulator".