Data Analysis From Scratch With Python: Step By Step Guide

Download 2,79 Mb.

Pdf ko'rish

bet	16/60
Sana	30.05.2022
Hajmi	2,79 Mb.
	#620990

1 ... 12 13 14 15 16 17 18 19 ... 60

Bog'liq
Data Analysis From Scratch With Python Beginner Guide using Python, Pandas, NumPy, Scikit-Learn, IPython, TensorFlow and... (Peters Morgan) (z-lib.org)

import pandas as pd
dataset = pd.read_csv('Data.csv')
This step is often necessary before Python and your computer can work on the
data. So whenever you’re working on a CSV file and you’re using Python, it’s
good to immediately have those two lines of code at the top of your project.
Then, we set the input values (X) and the output values (y). Often, the y values
are our target outputs. For example, the common goal is to learn how certain
values of X affect the corresponding y values. Later on, that learning can be
applied on new X values and see if that learning is useful in predicting y values
(unknown at first).
After the data becomes readable and usable, often the next step is to ensure that
the values don’t vary much in scale and magnitude. That’s because values in
certain columns might be in a different league than the others. For instance, the
ages of customers can range from 18 to 70. But the income range are in the
range of 100000 to 9000000. The gap in the ranges of the two columns would
have a huge effect on our model. Perhaps the income range will contribute
largely to the resulting predictions instead of treating both ages and income
range equally.
To do feature scaling (scaling values in the same magnitude), one way to do this
is by using the following lines of code:
from sklearn.preprocessing import
StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)

X_test = sc_X.transform(X_test)
# sc_y = StandardScaler()
# y_train = sc_y.fit_transform(y_train)
The goal here is to scale the values in
the same magnitude so all the values from different columns or features will
contribute to the predictions and outputs.
In data analysis and machine learning, it’s often a general requirement to divide
the dataset into Training Set and Test Set. After all, we need to create a model
and test its performance and accuracy. We use the Training Set so our computer
can learn from the data. Then, we use that learning against the Test Set and see if
its performance is good enough.
A common way to accomplish this is through the following code:

Download 2,79 Mb.

Do'stlaringiz bilan baham:

1 ... 12 13 14 15 16 17 18 19 ... 60