One Picture Does Not Always Tell the Entire Story

ACT Response to AIR Editorial

The American Institutes for Research (AIR) recently posted an opinion piece written by its president of assessment, Jon Cohen, entitled College Entrance Exams as Statewide Accountability Exams: Why Not? Here is a Reason. What is that reason to which he refers?

After providing scant and suspect supportive evidence, Cohen concludes that: (1) college admissions tests may not cover every content standard, (2) admissions tests may not provide adequate information about the postsecondary readiness of the lowest-performing students, and (3) consortia-developed Common Core tests such as PARCC and Smarter Balanced are preferable for use in college admissions.

Cohen makes a lot of claims and assertions but fails to acknowledge that national admissions tests are among the most scrutinized assessments on the globe. They have been the subject of serious scientific review for decades, including studies by the National Academy of Sciences, hundreds of independent academic research studies, and hundreds of additional studies conducted by the assessment organizations. Such research has focused on a range of important issues, including bias, subgroup differences, item-level validity, socio-economic status, incremental validity, coaching, differential prediction, and predictive validity with all groups of students across many outcomes and across many different types of institutions.

Cohen, however, prefers to base his claims on “one picture,” a scatterplot graph. He provides one figure—with no data or collateral information—from one state in order to make a claim about assessments that he may perceive as a threat to his own or ones his company supports.

One picture does not tell the story in measurement or science.

The new federal Every Student Succeeds Act (ESSA) allows states to propose the use of a national admissions test, such as the ACT or SAT, in place of their own statewide high school exam. What might that statewide high school exam be? Potentially one of the multi-state consortium assessments such as PARCC or Smarter Balanced or a custom state assessment created by AIR; in short, Cohen and his company have a stake in the game, and that alone is reason to take Cohen’s views on this issue with a grain of salt.

Nevertheless, let’s examine the reasoning and evidence provided by Cohen. Each of the three issues he identifies is addressed below.

Alignment

Alignment has been given far greater influence in educational accountability today than many other important measurement characteristics such as validity evidence. The claim is that national admissions tests do not measure every single state standard or Common Core standard. That’s true, and they shouldn’t.

The ACT test was not designed to measure the Common Core State Standards nor any other state standards; rather, it is designed to measure the skills that ACT’s data and empirical research have shown to be the most important for college and career readiness. Nevertheless, it still manages to cover over 90 percent of these standards in far less time than either of the two federally funded consortia assessments.

The ACT (which includes a science test) takes less than three hours to complete without the optional writing test and just over 3.5 hours with the writing test. The Smarter Balanced high school assessment, in contrast, recommends 8.5 hours of testing time—without even touching science.

Assessments have always been designed and developed to sample behavior and knowledge, and measurement professionals have long been able to construct valid and reliable tests that describe achievement and predict future performance without needing to develop excessively lengthy tests that cover each and every standard. The obsession with covering each standard has resulted in state assessments that require multiple testing days for each student and testing windows that extend 6 to 8 weeks and disrupt valuable classroom instructional time.

The use of admissions tests for accountability reduces testing time dramatically, which seems to be an important objective of ESSA and a goal of parents and educators who have fueled opt-out movements in many states and communities around the country. The call by AIR leadership to require double-testing ensures that students will continue to devote more and more instructional time to assessments, and less and less time on learning. It also ignores the finding that 11th and 12th graders are often not motivated when taking state assessments that provide them no personal benefit.

The ACT is not only used by all colleges and universities for admissions and often course placement decisions, but it also provides an early indication of whether students are on track to be ready for college and career. When all students in a school, district, or state complete an admissions test, longitudinal trend data can be used to examine the growth in scores and the extent to which high schools have prepared students for college success. That use seems like a legitimate goal for states interested in measuring college readiness. And, unlike that provided by the PARCC and Smarter Balanced assessments, the information provided by the ACT is based not on judgment but on empirical data and established benchmarks for success.

It is also worth noting that, when the National Assessment of Educational Progress (NAEP) attempted to gauge how their test results related to college readiness, they linked scores from the ACT and SAT to their scale and ignored state assessment results. Additional studies were conducted to examine the relationship between ACT Explore and 8th grade NAEP and ACT WorkKeys tests and NAEP to make further inferences about college and career readiness – for all students.

States certainly may decide that they prefer to develop their own test or use a consortium test to measure state standards, but not because the admissions tests cannot be used as valid and reliable measures of student readiness or provide valuable school-level trends that can inform accountability.

Finally, the quality of test questions and test forms is an essential ingredient of statewide assessments, and the Consortium tests – including the Smarter Balanced tests that Cohen cites – continue to struggle with some of these issues. For example, a recent study by the Fordham Foundation found that individual Smarter Balanced test forms sometimes “contained two or three items measuring the same skill that were nearly identical” and asserted that “such near-duplication may not impact the accuracy of the score, but a greater variety of question stems/scenarios is desirable.” The Fordham report also found Smarter Balanced math items that contained mathematical errors and/or lacked a correct answer.

Low scores on admissions tests

Cohen contends that “existing college entrance exams fail to support accountability because they only measure the performance of students who are succeeding,” suggesting that the ACT doesn’t provide information on the bottom quarter of students in one state. In the ACT-tested high school graduating class, less than 3 percent of students score below a 21 on a combined English and reading score, and in the three Smarter Balanced states using ACT as a statewide exam, less than 3.5 percent of students score below a 21 on a combined English and reading score. The data reported by AIR that show nearly a quarter of students below these scores do not appear plausible, and the accuracy of the dataset they used is questionable. The ACT provides useful information on students across the entire distribution of scores, and even Cohen’s figure illustrates a strong relationship with over 97% of the U.S. population.

Cohen’s other assertions simply cannot be investigated because he provides no data or clarification. For example, he uses the same figure to claim that the ACT doesn’t provide valid scores for the majority of ELL students and students with disabilities, despite validity studies demonstrating the opposite. He does not report the sample size, nor does he inform us of which state he is looking at, so we cannot compare the demographics of that state to the nation. Is this state representative of US students in terms of ELL, minority, and urban populations? It’s impossible to tell, and while the casual reader may take this at face value, we should demand more from a research institution such as AIR or any organization making such claims.

He also does not report the correlation between tests or report data needed to determine if Smarter Balanced has a ceiling effect that could contribute to his claims. Overall, Smarter Balanced is less reliable than the ACT and has even lower reliability at the top of their scale, and, since it is a brand-new test, additional data are needed to understand how this assessment performs across the entire distribution of scores.

Use of Smarter Balanced for Admissions

Postsecondary institutions use many factors for admissions, and while national admissions tests are considered important by a majority of institutions, other factors such as grades, course rigor, and extracurricular experiences are also used in comprehensive admissions systems. ACT firmly believes that use of admission test scores can help confirm a student’s abilities and skills for college or career training programs, but we also believe that test results should be one of a variety of factors in making an admissions decision. Students are more than a single test score.

State or consortium assessments may eventually be able to offer additional collateral evidence about college success if and when predictive validity studies are conducted, but using Smarter Balanced results without conducting appropriate validity studies for all groups of students would be inappropriate today.

But even if such evidence were available, Smarter Balanced tests are taken by only a portion of students attending public high schools in 15 states, while the ACT and SAT are used and accessible to all students in all states and internationally, irrespective of what school they attend.

So, despite the lack of any published empirical studies on postsecondary outcomes of students taking Smarter Balanced tests, Cohen believes that decades of research and longitudinal data from admissions tests can easily be replaced with a test used by a subset of states for accountability.

In addition, the Smarter Balanced test administration model does not provide the level of security that admissions tests provide. The performance tasks are used across forms and, with the long test windows used by states, some students have access to test questions weeks prior to other students in their district or state who are taking the same test. At present, there is little motivation for students or commercial test-preparation organizations to cheat on Smarter Balanced tests because the results do not have an impact on most students.

Conclusion

We conclude where we began - one picture does not tell the entire story. ACT urges great caution in basing important educational decisions on a single picture in any field, but especially when it comes to student assessment and accountability.