Module 26
What Is Intelligence?
291
However, sacrifi ces are made in group testing that in some cases may outweigh
the benefi ts. For instance, group tests generally offer fewer kinds of questions than
do tests administered individually. Furthermore, people may be more motivated to
perform at their highest ability level when working on a one-to-one basis with a test
administrator than they are in a group. Finally, in some cases, it is simply impossible
to employ group tests, particularly with young children or people with unusually
low IQs (Aiken, 1996).
RELIABILITY AND VALIDITY: TAKING THE MEASURE OF TESTS
When we use a ruler, we expect to fi nd that it measures an inch in the same way it did
the last time we used it. When we weigh ourselves on the bathroom scale, we hope
that the variations we see on the scale are due to changes in our weight and not to
errors on the part of the scale (unless the change in weight is in an unwanted direction!).
In the same way, we hope that psychological tests have
reliability —that they
measure consistently what they are trying to measure. We need to be sure that each
time we administer the test, a test-taker will achieve the same results—assuming that
nothing about the person has changed relevant to what is being measured.
Suppose, for instance, that when you fi rst took the SAT exams, you scored 400
on the verbal section of the test. Then, after taking the test again a few months later,
you scored 700. Upon receiving your new score, you might well stop celebrating for
a moment to question whether the test is reliable for it is unlikely that your abilities
could have changed enough to raise your score by 300 points (T. R. Coyle, 2006).
But suppose your score changed hardly at all, and both times you received a
score of about 400. You couldn’t complain about a lack of reliability. However, if you
knew your verbal skills were above average, you might be concerned that the test
did not adequately measure what it was supposed to measure. In sum, the question
has now become one of validity rather than reliability. A test has
validity when it
actually measures what it is supposed to measure.
Knowing that a test is reliable is no guarantee that it is also valid. For instance,
Sir Francis Galton assumed that skull size is related to intelligence, and he was able
to measure skull size with great reliability. However, the measure of skull size was
not valid—it had nothing to do with intelligence. In this case, then, we have reli-
ability without validity.
However, if a test is unreliable, it cannot be valid. Assuming that all other factors—
motivation to score well, knowledge of the material, health, and so forth—are similar,
if a person scores high the fi rst time he or she takes a specifi c test and low the second
time, the test cannot be measuring what it is supposed to measure. Therefore, the test
is both unreliable and not valid.
Test validity and reliability are prerequisites for accurate assessment of
intelligence—as well as for any other measurement task carried out by psychologists.
Consequently, the measures of personality carried out by personality psychologists,
clinical psychologists’ assessments of psychological disorders, and social psycholo-
gists’ measures of attitudes must meet the tests of validity and reliability for the
results to be meaningful (Feldt, 2005; Phelps, 2005; Yao, Zhour, & Jiang, 2006).
Assuming that a test is both valid and reliable, one further step is necessary in
order to interpret the meaning of a particular test-taker’s score: the establishment of
norms.
Norms are standards of test performance that permit the comparison of one
person’s score on a test to the scores of others who have taken the same test. For
example, a norm permits test-takers to know that they have scored, say, in the top
15% of those who have taken the test previously. Tests for which norms have been
developed are known as
standardized tests.
Test designers develop norms by calculating the average score achieved by a
specifi c group of people for whom the test has been designed. Then the test design-
ers can determine the extent to which each person’s score differs from the scores of
Do'stlaringiz bilan baham: