78
Part One
Single-Equation Regression Models
That is,
(3.5.14)
where
Y
i
=
actual
Y
,
ˆ
Y
i
=
estimated
Y
, and
¯
Y
= ¯ˆ
Y
=
the mean of
Y
. For proof, see
Exercise 3.15. Expression 3.5.14 justifies the description of
r
2
as a measure of goodness of
fit, for it tells how close the estimated
Y
values are to their actual values.
3.6
A Numerical Example
We illustrate the econometric theory developed so far by considering the data given in
Table 2.6, which relates mean hourly wage (
Y
) and years of schooling (
X
). Basic labor
economics theory tells us, that among many variables, education is an important determi-
nant of wages.
In Table 3.2 we provide the necessary raw data to estimate the quantitative impact of
education on wages.
r
2
=
y
i
ˆ
y
i
2
y
2
i
ˆ
y
2
i
(
d
)
(
e
)
(
f
)
X
Y
r
= +1
r
= –1
X
Y
X
Y
(
a
)
(
b
)
(
c
)
X
Y
X
Y
X
Y
X
Y
X
Y
(
h
)
(
g
)
r
close to –1
r
close to +1
r
positive but
close to zero
r
negative but
close to zero
r
= 0
Y
=
X
2
but
r
= 0
FIGURE 3.10
Correlation patterns
(adapted from Henri
Theil,
Introduction to
Econometrics,
Prentice-Hall,
Englewood Cliffs, NJ,
1978, p. 86).
guj75772_ch03.qxd 23/08/2008 02:34 PM Page 78
Chapter 3
Two-Variable Regression Model: The Problem of Estimation
79
Obs
Y
X
x
y
x
2
i
y
i
x
i
1
4.4567
6
−
6
−
4.218
36
25.308
2
5.77
7
−
5
−
2.9047
25
14.5235
3
5.9787
8
−
4
−
2.696
16
10.784
4
7.3317
9
−
3
−
1.343
9
4.029
5
7.3182
10
−
2
−
1.3565
4
2.713
6
6.5844
11
−
1
−
2.0903
1
2.0903
7
7.8182
12
0
−
0.8565
0
0
8
7.8351
13
1
−
0.8396
1
−
0.8396
9
11.0223
14
2
2.3476
4
4.6952
10
10.6738
15
3
1.9991
9
5.9973
11
10.8361
16
4
2.1614
16
8.6456
12
13.615
17
5
4.9403
25
24.7015
13
13.531
18
6
4.8563
36
29.1378
Sum
112.7712
156
0
0
182
131.7856
Obs
X
2
i
Y
i
2
Y
i
ˆ
u
i
ˆ
=
Y
i
−
Y
ˆ
u
i
2
ˆ
1
36
19.86217
4.165294
0.291406
0.084917
2
49
33.2929
4.916863
0.853137
0.727843
3
64
35.74485
5.668432
0.310268
0.096266
4
81
53.75382
6.420001
0.911699
0.831195
5
100
53.55605
7.17157
0.14663
0.0215
6
121
43.35432
7.923139
−
1.33874
1.792222
7
144
61.12425
8.674708
−
0.85651
0.733606
8
169
61.38879
9.426277
−
1.59118
2.531844
9
196
121.4911
10.17785
0.844454
0.713103
10
225
113.93
10.92941
−
0.25562
0.065339
11
256
117.4211
11.68098
−
0.84488
0.713829
12
289
185.3682
12.43255
1.182447
1.398181
13
324
183.088
13.18412
0.346878
0.120324
Sum
2054
1083.376
112.7712
0
9.83017
Note:
x
i
=
X
i
− ¯
X
;
y
i
=
Y
i
= ¯
Y
ˆ
β
2
=
y
i
x
i
x
2
i
=
131
.
7856
182
.
0
=
0
.
7240967
ˆ
β
1
= ¯
Y
− ˆ
β
2
¯
X
=
8
.
674708
−
0
.
7240967
x
12
= −
0
.
01445
ˆ
σ
2
=
ˆ
u
2
i
n
−
2
=
9
.
83017
11
=
0
.
893652
; ˆ
σ
=
0
.
945332
var(
ˆ
β
2
)
=
ˆ
σ
2
x
2
i
=
0
.
893652
182
.
0
=
0
.
004910
;
se(
ˆ
β
2
)
=
√
0
.
00490
=
0
.
070072
r
2
=
1
−
ˆ
u
2
i
(
Y
i
− ¯
Y
)
2
=
1
−
9
.
83017
105
.
1188
=
0
.
9065
r
=
√
r
2
=
0
.
9521
var(
ˆ
β
1
)
=
x
2
i
n
x
2
i
=
2054
13(182)
=
0
.
868132
;
se(
ˆ
β
1
)
=
√
0
.
868132
=
0
.
9317359
TABLE 3.2
Raw Data Based
on Table 2.6
guj75772_ch03.qxd 23/08/2008 02:34 PM Page 79
80
Part One
Single-Equation Regression Models
From the data given in this table, we obtain the estimated regression line as follows:
ˆ
Y
i
= −
0
.
0144
+
0
.
7240
X
i
(3.6.1)
Geometrically, the estimated regression line is as shown in Figure 3.11.
As we know, each point on the regression line gives an estimate of the mean value of
Y
corresponding to the chosen
X
value, that is,
Yˆ
i
is an estimate of
E
(
Y
|
X
i
). The value of
β
ˆ
2
=
0.7240, which measures the slope of the line, shows that, within the sample range of
X
between 6 and 18 years of education, as
X
increases by 1, the estimated increase in mean
hourly wages is about 72 cents. That is, each additional year of schooling, on average,
increases hourly wages by about 72 cents.
The value of
β
ˆ
1
= −
0.0144, which is the intercept of the line, indicates the average
level of wages when the level of education is zero. Such literal interpretation of the inter-
cept in the present case does not make any sense. How could there be negative wages? As
we will see throughout this book, very often the intercept term has no viable practical
meaning. Besides, zero level of education is not in the observed level of education in our
sample. As we will see in Chapter 5, the observed value of the intercept is not statistically
different from zero.
The
r
2
value of about 0.90 suggests that education explains about 90 percent of the vari-
ation in hourly wage. Considering that
r
2
can be at most 1, our regression line fits the data
very well. The coefficient of correlation,
r
= 0.9521, shows that wages and education are
highly positively correlated.
Before we leave our example, note that our model is extremely simple. Labor econom-
ics theory tells us that, besides education, variables such as gender, race, location, labor
unions, and language are also important factors in the determination of hourly wages. After
we study multiple regression in Chapters 7 and 8, we will consider a more extended model
of wage determination.
20
18
16
14
12
10
8
6
Education
4
4
6
Mean hourly wage
14
12
10
8
FIGURE 3.11
Estimated regression
line for wage-education
data from Table 2.6.
guj75772_ch03.qxd 23/08/2008 02:34 PM Page 80
Chapter 3
Two-Variable Regression Model: The Problem of Estimation
81
3.7
Illustrative Examples
EXAMPLE 3.1
Consumption–
Income
Relationship in
the United States,
1960–2005
Let us revisit the consumption income data given in Table I.1 of the Introduction. We have
already shown the data in Figure I.3, along with the estimated regression line in Eq. (I.3.3).
Now we provide the underlying OLS regression results, which were obtained from
EViews 6.
Note
Y
=
personal consumption expenditure (PCE) and
X
=
gross domestic product (GDP),
both measured in 2000 billions of dollars. In this example the data are time series data.
ˆ
Y
t
299.5913
0.7218
X
t
(3.7.1)
var ( ˆ
β
1
)
827.4195
se ( ˆ
β
1
)
28.7649
var ( ˆ
β
2
)
0.0000195
se ( ˆ
β
2
)
0.004423
r
2
0.9983
ˆ
2
73.56689
Equation 3.7.1 is the aggregate, or economywide, Keynesian consumption function.
As this equation shows, the
marginal propensity to consume (MPC)
is about 0.72,
suggesting that if (real income) goes up by a dollar, the average personal consumption
expenditure goes up by about 72 cents. According to Keynesian theory, MPC is expected
to lie between 0 and 1.
The intercept value in this example is negative, which has no viable economic
interpretation. Literally interpreted, it means that if the value of GDP were zero, the
average level of personal consumption expenditure would be a negative value of about
299 billion dollars.
The
r
2
value of 0.9983 means approximately 99 percent of the variation in personal con-
sumption expenditure is explained by variation in the GDP. This value is quite high, consid-
ering that
r
2
can at most be 1. As we will see throughout this book, in regressions involving
time series data one generally obtains high
r
2
values. We will explore the reasons behind
this in the chapter on autocorrelation and also in the chapter on time series econometrics.
EXAMPLE 3.2
Food
Expenditure in
India
Refer to the data given in Table 2.8 of Exercise 2.15. The data relate to a sample of 55 rural
households in India. The regressand in this example is expenditure on food and the
regressor is total expenditure, a proxy for income, both figures in rupees. The data in this
example are thus
cross-sectional
data.
On the basis of the given data, we obtained the following regression:
FoodExp
i
=
94.2087
+
0.4368 TotalExp
i
(3.7.2)
var ( ˆ
β
1
)
=
2560.9401
se ( ˆ
β
1
)
=
50.8563
var ( ˆ
β
2
)
=
0.0061
se ( ˆ
β
2
)
=
0.0783
r
2
=
0.3698
ˆ
σ
2
=
4469.6913
From Equation 3.7.2 we see that if total expenditure increases by 1 rupee, on average,
expenditure on food goes up by about 44 paise (1 rupee
=
100 paise). If total expendi-
ture were zero, the average expenditure on food would be about 94 rupees. Again, such
a mechanical interpretation of the intercept may not be meaningful. However, in this
example one could argue that even if total expenditure is zero (e.g., because of loss of a
job), people may still maintain some minimum level of food expenditure by borrowing
money or by dissaving.
The
r
2
value of about 0.37 means that only 37 percent of the variation in food expen-
diture is explained by the total expenditure. This might seem a rather low value, but as we
will see throughout this text, in cross-sectional data, typically one obtains low
r
2
values,
possibly because of the diversity of the units in the sample. We will discuss this topic
further in the chapter on heteroscedasticity (see Chapter 11).
guj75772_ch03.qxd 23/08/2008 05:28 PM Page 81
Do'stlaringiz bilan baham: |