Publications & Resources

The Reliability of Scores From the 1992 Vermont Portfolio Assessment Program

Feb 1993

Daniel Koretz, Daniel McCaffrey, Stephen Klein, Robert Bell and Brian Stecher

A follow-up report to the same study (see CSE Report 350), this new report presents CRESST’s findings about the reliability of scores from the Vermont portfolio assessment program. In this component, we focused not on the program’s impact as an educational intervention, but rather on its quality as an assessment tool. The “rater reliability”–that is, the extent of agreement between raters about the quality of students’ portfolio work–was on average low in both mathematics and writing. However, reliability varied, depending on subject, grade level, and the particular scoring criterion, and in a few instances it could be characterized as moderate. The overall pattern was one of low reliability, however, and in no instance was the scoring highly reliable. Although it may be unrealistic to expect the reliability of portfolio scores to reach the levels obtained in standardized performance assessments, the Vermont portfolio assessment reliability coefficients are low enough to limit seriously the uses of the 1992 assessment results. The report concludes with an analysis of issues that need to be considered in improving the technical quality of the assessment.

Koretz, D., McCaffrey, D., Klein, S., Bell, R., & Stecher, B. (1993). The reliability of scores from the 1992 Vermont portfolio assessment program (CSE Report 355). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).