45
With
df = df.iloc[np.random.permutation(len(df))]
you are randomly shuffling all the entries in the data set to avoid the problem of
abnormal entries pooling in any one region of the data set.
With
df2 = df[:500000]
you are assigning the first 500,000 entries of df to a variable df2.
In the next line of code, labels = df2["label"], you assign the label column to
the variable labels. Next, you assign the rest of the data frame to a variable named
df_validate to create the validation data set with df_validate = df[500000:].
To split your data into the
Do'stlaringiz bilan baham: