16
analysis, mathematical programming, recursive partitioning (decision trees), expert
systems,
neural networks, nonparametric smoothing methods and time varying models.
They state that “there is no overall best model” (Hand and Henley, 1997, p. 535). This is
because the best model depends on the data structure. It is also mentioned that neural
networks might provide a good modelling approach when there is poor understanding of
the data structure. However, these models provide a “black box” approach and usually no
understanding can be gained from the model.
There have been a number of studies which compare these methods in credit scoring.
Altman
et al
. (1994) provided one of the first investigations of neural networks in credit
scoring. Neural networks were compared to linear discriminant analysis (LDA) and it was
found that LDA performed better. Desai
et al
. (1996) obtained different results. Using a
credit union data set, a neural network performed better than LDA but did not perform
significantly better than logistic regression. In a master’s degree study by Komorád (2002),
logistic regression is compared to multilayer perceptron and radial basis function neural
networks for credit scoring. These models were trained and their performance tested on
confidential data from a French bank. It was found that the multilayer perceptron neural
network and the radial basis function neural network gave very
similar results but the
logistic regression performed the best.
Thomas (2009) claims that logistic regression is the most commonly used method for the
construction of scorecards. Logistic regression is part of a wider class of generalized linear
models (GLMs) as shown by Nelder and Wedderburn (1972). The reason for this is that
the binomial distribution, which is the distribution of the response in logistic regression, is
part of the exponential family of distributions. GLMs include a number of models such as
normal linear regression, logistic regression, Poisson regression etc. One of the first
applications of logistic regression to credit scoring is given by Steenackers and Goovaerts
(1989). Based on data from a Belgian credit company they develop a logistic regression
model. Nineteen predictor variables were utilized and then
using stepwise logistic
regression, 11 variables were chosen for a final model. Steenackers and Goovaerts (1989)
also mentioned that the model relies on historical data. Therefore, a periodical review of
the model is necessary to adjust for shifts in the underlying factors. To solve this problem
in credit scoring, Whittacker
et al
. (2007) developed a Kalman filter for a credit scorecard.
Here, the scorecard is updated by combining the new applicant data with the previous best
estimate. A Bayesian approach can also be used to update a model - the posterior
17
distribution is updated as soon as new information becomes available. Greenberg (2008)
stated that Bayesian updating is a very attractive feature of Bayesian inference. With
Bayesian logistic regression, numerical methods are used to update the model. The reason
for this is that conjugate priors (the posterior distribution comes from the same family of
the prior distribution) do not exist. A popular method used to update the model is the
Markov Chain Monte Carlo (MCMC) method.
Do'stlaringiz bilan baham: