Bayesian Logistic Regression Models for Credit Scoring by Gregg Webster


Chapter 3: Methodology and Theoretical Considerations



Download 2,26 Mb.
Pdf ko'rish
bet12/58
Sana08.07.2022
Hajmi2,26 Mb.
#757017
1   ...   8   9   10   11   12   13   14   15   ...   58
Chapter 3: Methodology and Theoretical Considerations 
 
3.1 Methodology 
A credit scoring data set analysed by Wielenga, Lucas and Georges (1999) was obtained. 
This is a home equity data set and the aim is to predict whether an applicant will 
eventually default or be seriously delinquent on a loan that allows owners to borrow 
against the equity of their homes. The data set consists of loan performance for 5,960 
home equity loans. The dependent variable is a dummy variable indicating whether a 
default occurred during the duration of the loan. The proportion of applicants who 
defaulted in the data set is approximately 20%. There are twelve independent variables. 
These variables are: the reason for obtaining the credit, the type of job the applicant has, 
the amount of the loan request, the amount due on the existing mortgage, the value of the 
current property, the applicants debt-to-income ratio, the number of years the applicant has 
been working at a current job, the number of major derogatory reports, the number of trade 
lines (this is the number of other loans the applicant currently has), the number of 
delinquent trade lines, the age of the oldest trade line and the number of recent credit 
inquiries.
It is assumed that the bank is expanding into a new economic location or there is a change 
in procedure. The goal is to produce a good scoring model in the new location or under the 
new procedure when there are limited data available. Expert knowledge from the current 
location or under the old procedure is to be incorporated into the model at the new location 
or under the new procedure. It is assumed that there is a change in the economic location. 
The scoring procedure in the current economic location is assumed to be exactly the same 
as in the new economic location. Therefore, exactly the same variables are used to model 
good and bad applicants. To replicate this situation, the home equity data set is split as 
follows:
-
50% of the observations are randomly selected and labelled
as
the set of observations that 
are “old”. These observations are assumed to come from the current or home economic 
location.


22 
-
10% of the observations are randomly selected and labelled
as the set of observations that 
are “new”. These observations are assumed to come from the new or foreign economic 
location.
-
10% of the observations are randomly selected and used as a validation set from which, an 
optimal cut-off probability will be obtained. These observations are assumed to come from 
the current or home economic location.
-
The remaining randomly selected 30% of the observations are used as test data. The “old” 
data set is used as prior information and the “new” data for the new procedure. These 
observations are assumed to come from the new economic location and are used to assess 
the performance of the models which are fitted on the limited amount of data in the new 
economic location.
To ensure that each random selection has a proportion of approximately 20% bad 
applicants, a stratified random sampling procedure is used.
The following steps are then undertaken: 
-
The data set is first checked and cleaned. This means removing outliers and estimating 
missing values etc.
-
A logistic regression model is fitted to the “old” data set. The coefficients here are used as 
prior information when the “new” procedure is either introduced or the business expanded 
into a new market. 
-
An optimal cut-off probability is obtained on the validation data using the model fitted on 
the “old” data.
-
A logistic regression model is fitted to the “new” data. 
-
A Bayesian logistic regression model is fitted to the “new” data using the coefficients from 
the “old” data set as priors. 
-
A Bayesian logistic regression model with non-informative prior is fitted to the “new” 
data. 
-
The performances of the logistic regression model and the Bayesian logistic regression 
model fitted on the “new” data are compared on the test data. 
-
The performances of the models are also considered using different sizes of the “new” 
data.

Download 2,26 Mb.

Do'stlaringiz bilan baham:
1   ...   8   9   10   11   12   13   14   15   ...   58




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2025
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish