Are standardized tests fair and helpful evaluation tools?
Not really. On standardized exams, all test takers answer the same questions under the same conditions, usually in multiple-choice format. Such tests reward quick answers to superficial questions. They do not measure the ability to think deeply or creatively in any field. Their use encourages a narrowed curriculum, outdated methods of instruction, and harmful practices such as grade retention and tracking.
Standardized tests are not designed, and should not be used for promoting or ranking students, evaluating teachers, or grading schools. They are a better predictor of a student’s socio-economic status and a parent’s educational attainment level.
Are standardized tests objective?
The only objective part of most standardized tests is scoring, when done by an accurately programmed machine evaluating multiple choice answers. Deciding what items to include on the test, how questions are worded, which answers are scored as “correct,” how the test is administered, and the uses of exam results are all made by subjective human beings, replete with human biases.
How we define educational standards and levels of achievement (in the case of standardized tests in the setting of cut scores) are highly subjective. The latter is generally driven by how great a percentage the test administrators are comfortable “failing” to meet or “exceeding” standards.
Are test scores “reliable”?
A test is completely reliable if you would get exactly the same results the second time you administered it. All tests have “measurement error.” This means an individual’s score may vary significantly from day to day due to testing conditions or the test-taker’s mental or emotional state. Scores of young children and scores on sub-sections of tests are particularly unreliable. Coaching and preparation on college admissions tests effectively reduce their reliability as a measure of innate abilities.
Do test scores reflect significant differences among people?
Not necessarily. The goal of most tests is to sort and rank. To do that, test makers make small differences appear large. Questions most people get right or wrong are removed because they don’t help with ranking. Because of measurement error, two people with very different scores on one exam administration might get similar scores on a retest, or vice versa. On the SAT, for example, two students’ scores must differ by at least 144 points (out of 1,600) before the test’s sponsors are willing to say the students’ measured abilities really differ.
Don’t test-makers remove bias from tests?
Most test-makers review items for obvious biases, such as offensive words. But many forms of bias are not superficial. Test-makers also use statistical bias-reduction techniques. However, these cannot detect underlying bias in the test’s form or content. As a result, biased cultural assumptions built into the test as a whole often are not removed by test-makers.
Standardized tests have been proven to have “predictive bias” for various subgroups. There are various plausible explanations for why this is so, ranging from the cultural context of test construction to expectations of success or failure prior to sitting for an exam to interventions that have occurred in some communities but not others. For a recent study looking at college entrance exams, see
- Aguinis, H., & Culpepper, S. A. Improving our understanding of predictive bias in testing. Journal of Applied Psychology. https://doi.org/10.1037/
apl0001152
Do tests reflect current knowledge about how students learn?
Not at all. While our understanding of the brain and how people learn and think has progressed enormously, standardized tests have remained the same. Test makers still assume that knowledge can be broken into separate bits and that people learn by absorbing these individual parts. Today, cognitive and developmental psychologists understand that knowledge is not separable bits and that people (including children) learn by connecting what they already know with what they are trying to learn. If they cannot actively make meaning out of what they are doing, they do not learn or remember.
Do multiple-choice or short-answer tests measure important student achievement?
These kinds of tests are very poor yardsticks of student learning. They are weak measures of the ability to comprehend complex material, write, apply math, understand scientific methods or reasoning, or grasp social science concepts. Nor do they adequately measure thinking skills or assess what people can do on real-world tasks.
Standardized tests also prize speed over depth of thought (“power”) and thus are testing something far less relevant to the real world.
Are test scores helpful to teachers?
Classroom surveys show most teachers do not find scores from standardized tests scores very useful. The tests do not help a teacher understand what to do next in working with a student because they do not indicate how the student learns or thinks. Nor do they measure much of what students should learn. Good evaluation provides useful information to teachers.
How did “No Child Left Behind” (NCLB) affect the use of standardized tests in the U.S.?
NCLB has led to a huge increase in testing. It requires state testing of every student in grades 3-8 and once in high school, more than twice previous federal mandates. NCLB also led to an explosion of other standardized exams, including “benchmark” tests often administered 3-10 times per year. U.S. students are now the most tested on Earth.
Did the “Every Student Succeeds Act” (ESSA 2015) change testing?Not as much as you might think. The Every Student Succeeds Act of 2015 that reauthorized the Elementary and Secondary Education Act and replaced No Child Left Behind was hailed as making substantive changes to assessments. It did eliminate the consequences and punishments for students, schools, districts and states for failing to make “adequate yearly progress” on test scores. The requirement remained, however, to test every student every year in grades 3-8 and once in high school in English Language Arts and math plus once each in science during elementary, middle and high school. The opportunity to pilot a few alternatives to state and federal consortium tests was a red herring, as it also required the administration of the federally mandated standardized tests during the pilot program and its goal was to scale alternative assessments to one statewide test. In reality there has been almost no assessment innovation for accountability purposes that has stuck under ESSA. K-12 education still lives in the world of lots of standardized tests. |
What is high-stakes testing?
High-stakes tests are used to make important decisions such as student promotion or graduation, granting teacher tenure, or sanctioning schools for poor performance. Nine states now have graduation tests; some states and districts have tests for grade promotion.
ESSA attaches sanctions to test results. Even though NCLB and ESSA have failed to improve schools, policy makers continue to rely on high-stakes tests.
What happens when tests become high stakes?
High-stakes testing often results in a narrow focus on teaching just the tested material (test preparation). Other content in that subject as well as untested subjects such as social studies, art and music are cut back or eliminated. High stakes tests also are a misuse of standardized instruments according to professional research and psychometric standards, given their snapshot nature, biases and variability in individual test-taker.
Furthermore, high stakes tests distort the purpose of education. Children come to believe that the purpose for learning is to perform well on tests; those who don’t internalize that they are not capable learners. Teachers perceive low-scoring students as deficient and less capable. Students and teachers alike focus on how many points they need rather than the ability to think deeply or innovatively.
What are other consequences of high-stakes testing?
Attaching high stakes to test results increases cheating and other efforts to boost scores without improving educational quality. This can be done by arranging for low-scoring students to be absent on test day or pushing them out of school.
Schools in predominantly low-income areas that serve large numbers of students of color have been placed in unhelpful receivership and eventually closed for failing to adequately raise test scores. Students — disproportionately Black and Latinx — have been denied diplomas.
Are there better ways to evaluate student achievement or ability?
Yes. Good teacher observation, documentation of student work, and performance-based assessment, all of which involve the direct evaluation of real learning tasks, provide useful material for teachers, parents, and the public. Many nations that do the best in international comparisons, like Finland, use these techniques instead of large-scale standardized testing.
►Other FairTest fact sheets and reports provide details and research evidence to support the points in this fact sheet.