Since the concepts of addition, dot product, and angle between vectors scale
up to many dimensions, vector mathematics adapts well to statistical computa-
tions. Consider the test-score data on five students shown in the table below. The
deviation scores form vectors with five components. The science vector is
s =
(5, 0, 1,-1, -5). The math vector is
m = (5,-3, 3,-2,-3).
science
math
science
math
raw score
raw score
deviation score deviation score
student
X
Y
x
y
Albert
85
25
5
5
Manuel
80
17
0
-3
Bonnie
81
23
1
3
Sharon
79
18
-1
-2
Elena
75
17
-5
-3
average (mean)
80
20
Test data on five students. Deviation scores are computed by taking the test score
minus the mean (e.g.,
x
Bonnie
= 81
− 80 = 1
;. y
Elena
= 17 − 20 =
-3
).
Computations with the vectors give the lengths to be
| s| =
5
2
+ 0
2
+ 1
2
+ (−1)
2
+ (−5)
2
=
√
52 and |
m| =
√
56.
When each of these lengths is scaled by the reciprocal of the square root of
dimensions,
1
√
5
, the computation produces the
standard deviation for each vec-
tor. These are about 3.22 for
s and 3.34 for
m. (See
Standard Deviation for uses
and formulas.) The cosine of the angle between the two vectors is
cos θ =
s•
m
|s ||
m|
=
5(5)+0(−3)+1(3)+(−1)(−2)+(−5)(−3)
√
52
√
56
≈ 0.83.
This is called the
correlation coefficient for the two vectors and is commonly
designated with the letter
r. We say that for this group of students, science scores
correlate 0.83 with math scores. Because it is a cosine, the correlation coefficient
r ranges from -1 to +1. Correlations at +1 (angle θ = 0
◦
) and -1 (angle
θ = 180
◦
)
indicate that the vectors are collinear. Correlations close to 0 (angle
θ = 90
◦
)
indicate that the vectors are going in different directions. In the first case (
r = 1),
the vectors are pulling in the same direction. In the second case (
r = -1), they are
opposites. Our correlation coefficient for science and math tests (
r = 0.83) cor-
responds to an angle between the vectors of about 33.5°. In a space of five
dimensions, these vectors are separate enough that each one is measuring some
underlying skills that are different for different students, but they are also meas-
uring something that is the same for all students. Generally, students who scored
high on science also scored high on math. The square of the cosine provides a
measure of overlap. This
coefficient of determination is
r
2
= 0.83
2
≈ 0.70. It
Do'stlaringiz bilan baham: