Discrimination
Some of the computational methods presented in this chapter are used to determine which
aspects of which features best separate the data. For example, if there are many different
kinds of tests that can be performed on a person to diagnose a disease, each of the different
kinds of measurement will have a different degree of importance to the outcome. Also,
certain combinations of measurements may be important for classification, in either a
positive or negative sense. In essence, with such information we wish to determine the
best view of the feature vectors to observe the correlations and distinctions. To take a very
simple three-dimensional example, imagine the problem was to distinguish blurred points
of light by taking photographs. Here you would not expect to be able to separate the
different lights if the camera view meant that one light lay directly behind the other; the
best separation for two lights would be a view perpendicular to the line between them.
Generalising the problem for any feature space we would seek to find a projection (view)
of the data where differences or groups are most obvious. Implicit in this reasoning is the
tactic of mapping several different kinds of features into a simpler, flatter representation,
otherwise known as dimensional reduction.
Taking a photograph of real objects involves going from three dimensions to a two-
dimensional projection, so this is an example of dimensional reduction, although for the
purposes of data discrimination we would not take just any view, but rather the one that
gives optimal separation. If there are only two data categories that are to be separated, we
could draw a line through the ‘centre’ of one category to the other. Although we know
where this line is in the feature space of the data, the line itself is only a one-dimensional
object that charts the transition of going from one group to the other. By transforming
multi-dimensional data (lots of features) to points on an optimally positioned one-
dimensional line we automatically create an axis for separation; a decision boundary
would be a point on the line between the groups. It is noteworthy that although
dimensional reduction can often simplify a problem involving large numbers of features,
including giving human beings the kinds of graphs and 2D pictures they can visually
appreciate, this simplification is not a prerequisite for separating data items. Many
methods allow data to be grouped and separated in its original high-dimensional, feature
vector form. Where it is possible, separating the unmapped data should be considered first,
given that dimensional reduction loses information, which may obscure separation.
In this chapter we will look at two forms of data discrimination with different
approaches. These are principal component analysis (PCA) and linear discriminant
analysis (LDA), and either may be used to work out a best-separating projection (view) of
data represented as feature vectors. Accordingly they may also be used as a means of
dimensional reduction.
Do'stlaringiz bilan baham: |