proportion of the data entries, pretty much drowning out the anomalous data. Not only
a new data frame that contains a portion of normal data entries and all of the anomalous
201
As in Chapter
2
, the normal labels are encoded as 4 so you can use them as the basis
to separate the normal entries from the anomalies.
Since the data set is so large, the entries are shuffled randomly ten times before a sample of
50,000 is selected from them. This is to ensure a random selection of values from the entire
data set instead of having the entries just in the top 50,000. The output is shown in Figure
5-45
.
Do'stlaringiz bilan baham: