128
Part One
Single-Equation Regression Models
that is,
13
.
1043
≤
E
(
Y
|
X
=
20)
≤
15
.
8260
(5.10.5)
Thus, given
X
0
=
100, in repeated sampling, 95 out of 100 intervals like Equation 5.10.5
will include the true mean value; the single best estimate of the
true mean value is of course
the point estimate 14.4656.
If we obtain 95 percent confidence intervals like Eq. (5.10.5) for each of the
X
values
given in Table 3.2, we obtain what is known as the
confidence interval,
or
confidence
band,
for the population regression function, which is shown in Figure 5.6.
Individual Prediction
If our interest lies in predicting an
individual
Y
value,
Y
0
, corresponding to a given
X
value,
say,
X
0
, then, as shown in Appendix 5, Section 5A.4, a best linear unbiased estimator of
Y
0
is also given by Eq. (5.10.1), but its variance is as follows:
(5.10.6)
It can be shown further that
Y
0
also follows the normal distribution with mean and variance
given by Eqs. (5.10.1) and (5.10.6), respectively. Substituting
ˆ
σ
2
for the unknown
σ
2
, it
follows that
t
=
Y
0
− ˆ
Y
0
se (
Y
0
− ˆ
Y
0
)
var (
Y
0
− ˆ
Y
0
)
=
E
[
Y
0
− ˆ
Y
0
]
2
=
σ
2
1
+
1
n
+
(
X
0
− ¯
X
)
2
x
2
i
18
14
Mean wage
Education
12
16
10
8
6
4
2
0
0
8
6
4
2
10
14
16
18
20
22
Y
X
12.01
15.82
16.91
Y
i
= – 0.0144 + 0.7240
X
i
14.46
Confidence interval
for individual
Y
Confidence interval
for mean
Y
12
X
13.10
FIGURE 5.6
Confidence intervals
(bands) for mean
Y
and
individual
Y
values.
guj75772_ch05.qxd 07/08/2008 12:46 PM Page 128
Chapter 5
Two-Variable Regression: Interval Estimation and Hypothesis Testing
129
also follows the
t
distribution. Therefore, the
t
distribution can be used to draw inferences
about the true
Y
0
. Continuing with our example, we see that the point prediction of
Y
0
is
14.4656, the same as that of
ˆ
Y
0
, and its variance is 1.2357 (the reader should verify this cal-
culation). Therefore, the 95 percent confidence interval for
Y
0
corresponding to
X
0
=
100 is
seen to be
(12
.
0190
≤
Y
0
|
X
0
=
20
≤
16
.
9122)
(5.10.7)
Comparing this interval with Eq. (5.10.5), we see that the confidence
interval for indi-
vidual
Y
0
is wider than that for the mean value of
Y
0
. (Why?) Computing confidence inter-
vals like Equation 5.10.7 conditional upon the
X
values given in Table 3.2, we obtain the
95 percent confidence
band for the individual
Y
values corresponding to these
X
values.
This confidence band along with the confidence band for
ˆ
Y
0
associated with the same
X
’s is
shown in Figure 5.6.
Notice an important feature of the confidence bands shown in Figure 5.6. The width of
these bands is smallest when
X
0
= ¯
X
. (Why?) However, the width widens sharply as
X
0
moves away from
¯
X
. (Why?) This change would suggest that the predictive ability of the
historical
sample regression line falls markedly as
X
0
departs progressively from
¯
X
.
fore, one should exercise great caution in “extrapolating” the historical regression
line to predict
E
(
Y
|
X
0
) or
Y
0
associated with a given
X
0
that is far removed from the
sample mean
¯
X
.
5.11
Reporting the Results of Regression Analysis
There are various ways of reporting the
results of regression analysis, but in this text we
shall use the following format, employing the wages-education example of Chapter 3 as an
illustration:
ˆ
Y
i
= −
0.0144
+
0.7240
X
i
se
=
(0.9317)
(0.0700)
r
2
=
0.9065
(5.11.1)
t
=
(
−
0.0154) (10.3428)
df
=
11
p
=
(0.987)
(0.000)
F
1.11
=
108.30
In Equation 5.11.1 the figures in the first set of parentheses are the estimated standard
errors of the regression coefficients, the figures in the
second set are estimated
t
values
computed from Eq. (5.3.2) under the null hypothesis that the true population value of each
regression coefficient individually is zero (e.g., 10
.
3428
=
0
.
7240
0
.
0700
), and the figures in the
third set are the estimated
p
values. Thus, for 11 df the
probability of obtaining a
t
value of
10.3428 or greater is 0.00009, which is practically zero.
By
presenting the
p
values of the estimated
t
coefficients, we can see at once the exact
level of significance of each estimated
t
value. Thus, under the null hypothesis that the true
population slope value is zero (i.e., that is, education has no effect on mean wages), the
exact probability of obtaining a
t
value of 10.3428 or greater is practically zero. Recall that
the smaller the
p
value, the smaller the probability of making a mistake if we reject the null
hypothesis.
guj75772_ch05.qxd 07/08/2008 12:46 PM Page 129