Multiple-Choice Tests

A multiple-choice test usually has dozens of questions or “items.” For each question, the test- taker is supposed to select the “best” choice among a set of four or five options. (They are sometime called “selected-response tests.”) For example:

What causes night and day?

A. The earth spins on its axis.
B. The earth moves around the sun.
C. Clouds block out the sun’s light.
D. The earth moves into and out of the sun’s shadow.
E. The sun goes around the earth.

(Source: P. M. Sadler, “Psychometric Models of Student Conceptions in Science,” Journal of Research in Science Teaching (1998. V. 35, N. 3, pp. 265-296).)

The “wanted” answer is “A.” The other answer options are called “distractors.”

Most standardized tests, including state exams and most commercial achievement tests, are made up primarily of multiple-choice items. A few state tests have a quarter, a half or even more “open-ended” (or “constructed-response”) items, usually short answer questions. These ask a student to write and perhaps explain, not just select, an answer. Many short-answer questions are not much more than multiple-choice items without the answer options, and they share many of the limits and problems of multiple-choice items.

Are multiple-choice tests “objective”?

Test-makers often promote multiple-choice tests as “objective.” This is because there is no human judgement in the scoring, which usually is done by machine. However, humans decide what questions to ask, how to phrase questions, and what “distractors” to use. All these are subjective decisions that can be biased in ways that unfairly reward or harm some test-takers. Therefore, multiple-choice tests are not really objective.

Any uses of test results involve additional human decisions, including such things as setting a “cut-off” or passing-level score on a test. Some people also claim multiple-choice tests avoid the subjective views of any one teacher, who may be biased or have low expectations. This is true, but there are many ways to address these problems, such as by having independent groups of teachers and others review student essays, projects, portfolios or other more comprehensive forms of assessment.

What can multiple-choice items be used for?

Multiple-choice items are best used for checking whether students have learned facts and routine procedures that have one, clearly correct answer. However, an item may have two reasonable answer options. Therefore, test directions usually ask test takers to select the “best” answer. If, on a reading test, a student selected a somewhat plausible answer, does it mean that she cannot read, or that she does not see things exactly the way the testmaker does?

In some subjects, carefully written multiple-choice items with good distractors can fairly accurately distinguish students who grasp a basic concept from those who do not. Look again at the “night and day” question. Those who don’t quite get it often are attracted by answer B. Those who have little or no knowledge usually select C, D or E.

Multiple-choice and critical thinking

It is possible to get multiple-choice items correct without knowing much or doing any real thinking. Because the answers are in front of the student, some people call these tests “multiple- guess.” Multiple-choice items can be easier than open-ended questions asking the same thing. This is because it is harder to recall an answer than to recognize it. Test-wise students know that it is sometimes easier to work backwards from the answer options, looking for the one that best fits. It also is possible to choose the “right” answer for the wrong reason or to simply make a lucky guess.

Some people claim that multiple-choice tests can be useful for measuring whether students can analyze material. This item was released by test publishers as an example of how multiple-choice items supposedly measure “thinking” skills:

Was the infantry invasion of Japan a viable alternative to the use of the atomic bomb to end World War II? Is so, why? If not, why not?

A. Yes; transport ships were available in sufficient numbers.
B. Yes; island defenses in Japan were minimal.
C. No; estimated casualties would have been much greater.*
D. No; Japan was on the verge of having an atomic bomb.
* Wanted answer.

(From Measuring Thinking in the Classroom, Northwest Regional Educational Laboratory, 1988, Oak Park, IL.)

Claiming there is one right answer to this complex historical issue actually demonstrates how this sort of question short-circuits the thinking process it claims to measure. Since “C” is the explanation given in most high-school texts for using the bomb, choosing the wanted answer would be a matter of recall for many students. For students who did not recall the textbook response, no information is provided to actually analyze the question and come up with the wanted answer. Beyond that, there remains an intense debate among historians about the justification for the use of the atomic bomb. Thus, what is treated as “true” may not be. A question really asking for critical thinking would have students weigh evidence and defend a position.

Most researchers agree that multiple-choice items are poor tools for measuring the ability to synthesize and evaluate information or apply knowledge to complex problems. In math, for example, they can measure knowledge of basic facts and the ability to apply standard procedures and rules. Carefully written multiple-choice questions also can measure somewhat more complex mathematical knowledge such as integrating information or deciding which mathematical procedures to use to solve problems. However, as students move toward solving non-routine problems, analyzing, interpreting, and making mathematical arguments, multiple-choice questions are not useful.

In sum, multiple-choice items are an inexpensive and efficient way to check on factual (“declarative”) knowledge and routine procedures. However, they are not useful for assessing critical or higher order thinking in a subject, the ability to write, or the ability to apply knowledge or solve problems.

Informing instruction

Even with carefully written distractors, as in the “night and day” example, it is often hard to know why a student got a question wrong or right. But unless a teacher has that information, the test result is not useful for improving instruction for the individual.

A standardized multiple-choice test may point to some broad areas that need improvement. For example, a test may show that students in a school or district need to improve on double-digit multiplication. However, the tests do not provide information that will help teachers do a better job of teaching double-digit multiplication because they do not show why the class generally did not do well.

If students were asked to explain how they got their answers, then their teachers would have a lot more information. This information is vital for teachers to make instruction more effective. For example, students who did not know why “the earth spins on its axis” is the correct answer to “night and day” but happened to guess the correct answer would be unable to explain why. Their mistaken views would be visible to the teacher, who could then address the misunderstanding and clarify the concept.

Dangers of relying on multiple-choice tests.

Relying on multiple-choice tests as a primary method of assessment is educationally dangerous for many reasons:

1) Because of cultural assumptions and biases, the tests may be inaccurate. (Of course, other kinds of assessments also can be biased.) Assuming the test is accurate because of its supposedly “objective” format may lead to making bad decisions about how best to teach a student.

2) Students may recognize or know facts or procedures well enough to score high on the test, but not be able to think about the subject or apply knowledge, even though being able to think and apply is essential to “knowing” any subject. Therefore, the conclusion or inference that a student “knows” history or science because she got a high score on a multiple-choice test may be false.

3) What is easily measurable may not be as important as what is not measurable or is more difficult to measure. A major danger with high stakes multiple-choice and short-answer tests — tests that have a major impact on curriculum and instruction — is that only things that are easily measured are taught.

4) Since the questions usually must be answered quickly and have only one correct answer, students learn that problems for which a single answer cannot be chosen quickly are not important.

5) When schools view multiple-choice tests as important, they often narrow their curriculum to cover only what is on the exams. For example, to prepare for multiple-choice tests, curriculum may focus on memorizing definitions and recognizing (naming) concepts. This will not lead students to understand important scientific principles, grasp how science is done, and think about how science affects their lives.

6) When narrow tests define important learning, instruction often gets reduced to “drill and kill” – – lots of practice on questions that look just like the test. In this case, students often get no chance to read real books, to ask their own questions, to have discussions, to challenge texts, to conduct experiments, to write extended papers, to explore new ideas — that is, to think about and really learn a subject.

Should multiple-choice tests be used at all?

The decision to use multiple-choice tests or include multiple-choice items in a test should be based on what the purpose of the test is and the uses that will be made of its results. If the purpose is only to check on factual and procedural knowledge, if the test will not have a major effect on overall curriculum and instruction, and if conclusions about what students know in a subject will not be reduced to what the test measures, then a multiple-choice test might be somewhat helpful — provided it is unbiased, well written, and related to the curriculum. If they substantially control curriculum or instruction, or are the basis of major conclusions that are reported to the public (e.g., how well students read or know math), or are used to make important decisions about students, then multiple-choice tests are quite dangerous.

Students should learn to think and apply knowledge. Facts and procedures are necessary for thinking, but schools should not be driven by multiple-choice testing into minimizing or eliminating thinking and problem-solving. Therefore, classroom assessments and standardized tests should not rely more than a small amount on multiple-choice or short-answer items. Instead, other well-designed forms of assessment should be implemented and their used properly. Most importantly, all teachers need to be capable of high quality assessment to help their students learn (see Implementing Performance Assessment from FairTest).