Value-Added Interpretations

Student growth measures can be used within systems for evaluating teachers, schools, and districts. Several methods attempt to estimate the distinct effects of teachers and schools on student growth. These methods are intended to support value-added interpretations.

The Multivariate Model

The Multivariate model is designed for the primary purpose of supporting value-added inferences for teachers and schools. By simultaneously considering multiple years of student scores across subject areas, the Multivariate model attempts to attribute student performance to individual teachers and schools. A well-known example of the Multivariate model is the Education Value-Added Assessment System (EVAAS). Because of its technical complexity and extensive data requirements, the Multivariate model is not supported here.

Value-Added Interpretations from Other Models

With proper attribution of student growth measures, value-added interpretations can be made from simpler models, including the Conditional Score Averages, Projection, and Student Growth Percentile models.

Note that the Conditional Score Averages and Projection models both produce predicted scores. For a group of students, a normative measure of aggregate growth can be constructed by averaging the residual scores—the difference between actual and predicted scores. If the mean residual score is significantly greater than 0, then the group of students performed better than predicted, on average. If the mean residual score is significantly less than 0, then the group of students performed worse than predicted, on average. Simple statistical tests can be used to determine if the mean residual score is greater or less than 0.

For the Student Growth Percentile (SGP) model, the mean (or median) SGP is a normative measure of aggregate growth. If the mean SGP is significantly greater than 50, then the group of students performed better than predicted, on average. If the mean SGP is significantly less than 50, then the group of students performed worse than predicted, on average. Simple statistical tests can be used to determine if the mean SGP is greater or less than 50.

Value-added interpretations of teachers and schools are better supported when individual student performance can be attributed directly to the teacher and school. Factors that negatively affect confidence in attribution include:

  • Misalignment of timing of assessments and timing of instruction
  • Misalignment of assessment content coverage to instructional content coverage
  • Team teaching, multidisciplinary instruction, or other situations where multiple teachers affect student learning
  • Student migration, absenteeism, and other personal events that affect academic performance

Generally, attribution is difficult because many forces act upon student performance, and it is difficult to disentangle them. Because of difficulties with attribution and year-to-year inconsistency in estimates of teacher and school effects, many experts suggest that teacher and school effect estimates should supplement, not replace, other sources of information for evaluation.

In some cases, it may be unreasonable to attribute student performance to a single teacher. In other cases, it may be more reasonable to assign weights to student growth measures representing the level of attribution to individual teachers.

Examples

In the first two examples below, students’ predicted scores are compared to their actual scores to obtain a residual score. If the mean residual score is significantly greater than 0, then the group of students performed better than predicted, on average. If the mean residual score is significantly less than 0, then the group of students performed worse than predicted, on average. A simple statistical test known as the one-sample t-test is used to determine if the mean residual score is significantly greater or less than 0. If it is significantly greater than 0, the aggregate growth is classified as “Above Expected”; if it is significantly less than 0, the aggregate growth is classified as “Below Expected.” Otherwise, the aggregate growth is classified as “Expected.” In the third example, a statistical test is used to determine if the mean SGP is significantly greater or less than 50.

Example 1

In this example, 124 students took the ACT Explore assessment in 9th grade and then took the ACT Plan assessment in 10th grade, 12 months later. Students’ predicted ACT Plan scores are determined by their ACT Explore score in the same subject area using the Conditional Score Averages model. Predicted ACT Plan scores can be compared to actual ACT Plan scores to obtain residual scores that measure growth relative to peers. The predicted values and residual scores are highlighted in the spreadsheet.The worksheet named “Conditional Score Averages” is obtained from the Conditional ACT Explore grade 9 to ACT Plan grade 10 (10 to 14 months) file.

The mean residual scores are calculated for each subject area. Each of the mean residual scores is greater than 0, and the t-test p-values (<0.05) indicate that the means are significantly greater than 0. Therefore, the growth in each subject area is classified as “Above Expected.”

Example 2

In this example, 131 students took the ACT Plan assessment in 10th grade and then took the ACT in 11th grade (11 or 12 months later). Students’ predicted ACT scores are determined by their ACT Plan scores in all four subject areas, as well as the number of months elapsed between the ACT Plan and ACT tests. Predicted ACT scores can be compared to actual ACT scores to obtain residual scores that measure growth relative to peers. The predicted values and residual scores are highlighted in the spreadsheet. The worksheet named “Projection Model Parameters” is obtained from the Projection ACT Plan grade 10 to ACT grade 11 (10–14 months) file.

The mean residual scores are calculated for each subject area. For English, reading, and science, the mean of the residual scores is less than 0, and the t-test p-values (<0.05) indicate that the means are significantly less than 0. Therefore, the growth in English, reading, and science is classified as “Below Expected.” For mathematics, the mean of the residual scores is less than 0, but the t-test p-value (0.269) indicates that the mean is not significantly less than 0. Therefore, the growth in mathematics is classified as “Expected.”

Example 3

In this example, 50 students took ACT Aspire in spring grade 10 and then took the ACT in spring grade 11. Each student was tested in four subject areas (English, mathematics, reading, and science). The spreadsheet displays each student’s ACT Aspire and ACT scores, as well as their SGP in each subject area. SGP values are highlighted.

The mean SGP values are calculated for each subject area. The mean SGP values range from 47.48 for reading to 53.22 for English. For each subject area, the t-test p-value is greater than 0.05, so there is not enough evidence to conclude that the mean SGP is significantly different from 50 in any subject area. Therefore, the growth is classified as “Expected” for each subject area.