Let’s extend this same concept to a real-world application. In the following example,
you will take a look a factory that produces screws and attempt to determine what an
anomaly could be in this context. The factory produces massive batches of screws all
at once, and samples from each batch are tested to ensure that a certain level of quality
is maintained. For each sample, assume that the density and tensile strength (how
6
The intersections of the dotted lines create several different regions containing data
points. Of interest is the bounding box (solid lines) created from the intersection of both
dotted lines since it contains the data points for samples deemed acceptable (Figure
1- 5
).
Any data point outside of that specific box will be considered anomalous.
Now that you know what points are and aren’t acceptable, let’s pick out a sample
from a new batch of screws and check its data to see where it falls on the graph
(Figure
1- 6
).
Figure 1-5. Data points are identified as good or anomaly based on their
location
Chapter 1 What Is anomaly DeteCtIon?
7
The data for this sample screw falls within the acceptable range. That means that this
batch of screws is good to use since its density and tensile strength are appropriate for
use by the consumer. Now let’s look at a sample from the next batch of screws and check
its data (Figure
1-7
).
Figure 1-6. A new data point representing the new sample screw is generated,
with the data falling within the bounding box
Chapter 1 What Is anomaly DeteCtIon?
8
The data falls far outside the acceptable range. For its density, the screw has abysmal
tensile strength and is unfit for use. Since it has been flagged as an anomaly, the factory
can investigate the reasons for why this batch of screws turned out to be brittle. For a
factory of considerable size, it is important to hold a high standard of quality as well
as maintain a high volume of steady output to keep up with consumer demand. For a
monumental task like that, automation to detect any anomalies to avoid sending out
faulty screws is essential and has the benefit of being extremely scalable.
So far, you have explored anomalies as data points that are either out of place, in the
case of the black swan, or unwanted, in the case of faulty screws. So what happens when
you introduce time as a new variable?
Figure 1-7. A new data point is generated for another sample, but it falls outside
the bounding box
Chapter 1 What Is anomaly DeteCtIon?