training set and testing set, you can use a built-in
scikit- learn function called train_test_split, as detailed below:
x_train, x_test, y_train, y_test = train_test_split(df2, labels,
test_size = 0.2, random_state = 42)
The parameters are as follows: x, y, test_size, and random_state. Note that x and
y are supposed to be the training data and training labels, respectively, with test_size
indicating the percentage of the data set to be used as test data. random_state is a
IRU
I
LQ
UDQJH
GI GILORF>QSUDQGRPSHUPXWDWLRQ OHQ GI @
GI GI>@
ODEHOV GI>ODEHO@
GIBYDOLGDWH GI>@
[BWUDLQ[BWHVW\BWUDLQ\BWHVW WUDLQBWHVWBVSOLW GIODEHOV
WHVWBVL]H UDQGRPBVWDWH
[BYDO\BYDO GIBYDOLGDWHGIBYDOLGDWH>ODEHO@
Figure 2-20. Shuffling the values in df and creating your training, testing, and
validation data sets
Chapter 2 traditional Methods of anoMaly deteCtion
46
number used to initialize the random number generator that determines what data
entries are chosen for the training data set and for the test data set.
Finally, you delegate the rest of the data to the
Do'stlaringiz bilan baham: |