There are some properties of machine learning algorithms that make them attractive to
use, even when there are alternative approaches that could be used to achieve the same
task. Firstly, you don’t have to think about the precise details of what is going on. As long
as you can train a program to do a good job, then that may be enough; although in some
cases people will be distressed at the lack of a proper ‘reason why’. Secondly, most
as long as it can be encoded numerically; all sorts of disparate kinds of input data may be
combined if they improve the prediction being made. Thirdly, machine learning methods
predictions when the relationships between data items are not straightforward, including,
for example, when two sets of input are generally very similar but some subtle correlation
causes a completely different result.
In this chapter we will cover four different machine learning examples, and you can try
these for your own computational problems where appropriate. For each, we describe a
simple Python implementation (or as simple as we can make it while still being useful)
and aim to point out the advantages and disadvantages of each method. We start with the
k-nearest neighbour algorithm, which is perhaps the simplest of all machine learning
algorithms and relatively easy to understand. Despite its simplicity, however, in some
situations it can make good classifications with relatively little effort. Also, it introduces
some of the principles, like vector input, that will be discussed in the other methods. Next
we will describe a self-organising map as an example of an unsupervised method, and
then go on to two supervised methods: a feed-forward artificial neural network and a
support vector machine. Both of these methods can be used in a large number of different
situations where there is training data available. The support vector machine is the more
recent invention and holds certain advantages over neural networks: it is generally
deterministic, giving the same result on the same training data, and it is much less
susceptible to over-training, where a method ‘learns’ the training data too well and is not
general enough to make the best predictions on data not seen before. Nonetheless, we
include the feed-forward neural network because it is easier to implement, especially for
multi-option classification, and often a good place to start in order to judge whether
machine learning is an effective strategy for any given situation.
Do'stlaringiz bilan baham: