Bayesian Logistic Regression Models for Credit Scoring by Gregg Webster

Chapter 3: Methodology and Theoretical Considerations

Download 2,26 Mb.

Pdf ko'rish

bet	12/58
Sana	08.07.2022
Hajmi	2,26 Mb.
	#757017

1 ... 8 9 10 11 12 13 14 15 ... 58

Chapter 3: Methodology and Theoretical Considerations

3.1 Methodology
A credit scoring data set analysed by Wielenga, Lucas and Georges (1999) was obtained.
This is a home equity data set and the aim is to predict whether an applicant will
eventually default or be seriously delinquent on a loan that allows owners to borrow
against the equity of their homes. The data set consists of loan performance for 5,960
home equity loans. The dependent variable is a dummy variable indicating whether a
default occurred during the duration of the loan. The proportion of applicants who
defaulted in the data set is approximately 20%. There are twelve independent variables.
These variables are: the reason for obtaining the credit, the type of job the applicant has,
the amount of the loan request, the amount due on the existing mortgage, the value of the
current property, the applicants debt-to-income ratio, the number of years the applicant has
been working at a current job, the number of major derogatory reports, the number of trade
lines (this is the number of other loans the applicant currently has), the number of
delinquent trade lines, the age of the oldest trade line and the number of recent credit
inquiries.
It is assumed that the bank is expanding into a new economic location or there is a change
in procedure. The goal is to produce a good scoring model in the new location or under the
new procedure when there are limited data available. Expert knowledge from the current
location or under the old procedure is to be incorporated into the model at the new location
or under the new procedure. It is assumed that there is a change in the economic location.
The scoring procedure in the current economic location is assumed to be exactly the same
as in the new economic location. Therefore, exactly the same variables are used to model
good and bad applicants. To replicate this situation, the home equity data set is split as
follows:
-
50% of the observations are randomly selected and labelled
as
the set of observations that
are “old”. These observations are assumed to come from the current or home economic
location.

22
-
10% of the observations are randomly selected and labelled
as the set of observations that
are “new”. These observations are assumed to come from the new or foreign economic
location.
-
10% of the observations are randomly selected and used as a validation set from which, an
optimal cut-off probability will be obtained. These observations are assumed to come from
the current or home economic location.
-
The remaining randomly selected 30% of the observations are used as test data. The “old”
data set is used as prior information and the “new” data for the new procedure. These
observations are assumed to come from the new economic location and are used to assess
the performance of the models which are fitted on the limited amount of data in the new
economic location.
To ensure that each random selection has a proportion of approximately 20% bad
applicants, a stratified random sampling procedure is used.
The following steps are then undertaken:
-
The data set is first checked and cleaned. This means removing outliers and estimating
missing values etc.
-
A logistic regression model is fitted to the “old” data set. The coefficients here are used as
prior information when the “new” procedure is either introduced or the business expanded
into a new market.
-
An optimal cut-off probability is obtained on the validation data using the model fitted on
the “old” data.
-
A logistic regression model is fitted to the “new” data.
-
A Bayesian logistic regression model is fitted to the “new” data using the coefficients from
the “old” data set as priors.
-
A Bayesian logistic regression model with non-informative prior is fitted to the “new”
data.
-
The performances of the logistic regression model and the Bayesian logistic regression
model fitted on the “new” data are compared on the test data.
-
The performances of the models are also considered using different sizes of the “new”
data.

Download 2,26 Mb.

Do'stlaringiz bilan baham:

1 ... 8 9 10 11 12 13 14 15 ... 58