Publications & Resources

Generalizability of New Standards Project 1993 Pilot Study Tasks in Mathematics

Jan 1995

Robert L. Linn, Elizabeth Burton, Lizanne DeStefano, and Matthew Hanson

Download

Students may have to take as many as 9-17 “long” performance assessment tasks if educators are to be confident that student performance matches true ability in a given domain, according to this important new CRESST study. Because a long task typically requires students to give complex, multifaceted responses requiring one-to-three hours to administer, the time and cost implications are significant. The performance tasks analyzed are from the New Standards Project, a joint project of the National Center on Education and the Economy and the Learning Research and Development Center. Robert Linn, Elizabeth Burton, Lizanne DeStefano, and Matthew Hanson, conducted the CRESST study. Using a generalizability analysis of New Standards tasks, the CRESST researchers analyzed two primary sources of measurement error that typically lead to unreliability in measurement of student performance: performance tasks and raters, and the interactions of pupils with tasks or raters. Because the New Standards raters were carefully trained and monitored, consistency in rating was generally very high. The greatest error therefore, was due to tasks. Essentially, student performance varied greatly from one performance task to another, suggesting that the tasks may be measuring different skills or that the skills were not measured well by the different tasks. The results confirm findings from several other studies. States or school districts that administer just a few performance tasks and then report individual student scores, may face unacceptably large measurement error. The authors make recommendations that may help resolve some problems. “Since each task,” write the authors, “requires an hour or more to administer, a strategy needs to be developed either for combining some shorter tasks with long tasks or for collecting information about student performance over more extended periods of time.” The authors add that researchers in the New Standards Project are pursuing both strategies.

Linn, R. L., Burton, E., DeStefano, L., & Hanson, M. (1995). Generalizability of New Standards Project 1993 pilot study tasks in mathematics (CSE Report 392). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).