Publications & Resources

Sampling Variability of Performance Assessments

Mar 1993

Richard J. Shavelson, Xiaohong Gao, and Gail P. Baxter

Download

The authors of this study examined the cause of measurement error in a number of science performance assessments. In one part of the study, 186 fifth- and sixth-grade students completed each of three science tasks: an experiment to measure the absorbency of paper towels; a task that measured students’ ability to discover the electrical contents of a black mystery box; and a task requiring students to determine sow bugs’ preferences for various environments (damp vs. dry, light vs. dark). The researchers found that the measurement error was largely due to task sampling variability. In essence, student performance varied significantly from one task sample to another. Based on their study of both science and mathematics performance assessments, the authors concluded that “regardless of the subject matter (mathematics or science), domain (education or job performance) or the level of analysis (individual or school), large numbers of tasks are needed to get a generalizable [dependable] measure of performance.” Considering that science experiments are time consuming, ten tasks may represent a significant cost burden for schools, districts, or even states. In another part of the study, the researchers evaluated the methods in which students were assessed on several of the same experiments including: a notebook, a method in which students conducted the experiment, then described in a notebook the procedures they followed and their conclusions; computer simulations of the tasks; and short-answer problems where students answered questions dealing with planning, analyzing or interpreting the tasks. The notebook and direct observations were the only methods that appeared to be fairly interchangeable. The results from both the short-answer problems and the computer simulations were disappointing. Increasing the number of tasks is costly and time consuming, conclude the authors. But they warn that trying to explain away technical problems is dangerous.

Shavelson, R. J., Gao, X., & Baxter, G. P. (1993). Sampling variability of performance assessments (CSE Report 361). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).