Multiclass classification
. The primary difference
is that y 1,..., n can now take on a variety of different values. For example, we could want to
categorize a document by the language in which it was written (English, French, German,
Spanish, Hindi, Japanese, Chinese, etc.). As an example, see Figure 1.6. The primary change
from previously is that the cost of error might vary significantly depending on the type of error.
a mistake we make. For example, when assessing cancer risk, whether we misclassify an early
stage of cancer as healthy (in which case the patient is likely to die) or as an advanced stage of
cancer (in which case the patient is likely to be inconvenienced by overly aggressive treatment)
makes a significant difference.
By presuming that the labels y contain some extra structure that may be exploited in the estimate
process, Structured Estimation goes beyond basic multiclass estimation. For example, when
attempting to categorize websites, y may be a path in an ontology, or a permutation when
attempting to match objects, do collaborative filtering, or rank documents in a retrieval scenario.
When doing named entity recognition, y might also be an annotation of a text. Each of these
issues has its own peculiarities in terms of the set of y that we may consider acceptable, as well
as how to search this space. In Chapter??, we'll look at a few of these issues.
Another frequent use is regression. Given a pattern x, the purpose is to estimate a real-valued
variable y R. (see e.g. Figure 1.7). For example, we could wish to predict the following day's value
of a stock, the yield of a semiconductor fab based on the present process, the iron content of ore
based on mass spectroscopy readings, or an athlete's pulse rate based on accelerometer data.
The choice of a loss is one of the fundamental ways in which regression problems differ from one
another. When predicting stock prices, for example, our loss for a put option will be lopsided. A
hobby athlete, on the other hand, could just be concerned that our estimate of heart rate is close
to the actual average.
The term "novelty detection" is a bit of a misnomer. It discusses the problem of identifying
"abnormal" readings from a set of previous data. Clearly, deciding what is uncommon is a highly
subjective process. Unusual occurrences are thought to happen seldom, according to popular
belief. As a result, one possible objective is to create a system that provides a grade to each
observation based on how innovative it is. Readers who are knowledgeable with density
estimation could argue that the latter is a viable option. We don't require a score that adds up to 1
for the entire domain, and we don't worry too much about novelty ratings for normal observations.
We'll explore how we can directly attain this relatively easier aim later. When used in an optical
character recognition database, novelty detection is shown in Figure 1.8.
Do'stlaringiz bilan baham: |