Data Analysis From Scratch With Python: Step By Step Guide


,135426.92,0,California,42559.73



Download 2,79 Mb.
Pdf ko'rish
bet29/60
Sana30.05.2022
Hajmi2,79 Mb.
#620990
1   ...   25   26   27   28   29   30   31   32   ...   60
Bog'liq
Data Analysis From Scratch With Python Beginner Guide using Python, Pandas, NumPy, Scikit-Learn, IPython, TensorFlow and... (Peters Morgan) (z-lib.org)

0,135426.92,0,California,42559.73
542.05,51743.15,0,New York,35673.41
0,116983.8,45173.06,California,14681.4
Notice that there are multiple features or independent variables (R&D Spend,
Administration, Marketing Spend, State). Again, the goal here is to reveal or
discover a relationship between the independent variables and the target (Profit).
Also notice that under the column ‘State’, the data is in text (not numbers).
You’ll see New York, California, and Florida instead of numbers. How do you
deal with this kind of data?
One convenient way to do that is by transforming categorical data (New York,
California, Florida) into numerical data. We can accomplish this if we use the
following lines of code: 
from sklearn.preprocessing import LabelEncoder,
OneHotEncoder
labelencoder = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3]) #Note this
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()
Pay attention to 
X[:, 3] =
labelencoder.fit_transform(X[:, 3])
What we did there is to transform the data
in the fourth column (State). It’s number 3 because Python indexing starts at zero
(0). The goal was to transform categorical variables data into something we can


work on. To do this, we’ll create “dummy variables” which take the values of 0
or 1. In other words, they indicate the presence or absence of something.
For example, we have the following data with categorical variables: 
3.5, New
York
2.0, California
6.7, Florida
If we use dummy variables, the above data
will be transformed into this: 
3.5, 1, 0, 0
2.0, 0, 1, 0
6.7, 0, 0, 1
Notice that the column for State became equivalent to 3 columns:
New York
California
Florida
3.5
1
0
0
2.0
0
1
0
6.7
0
0
1
As mentioned earlier, dummy variables indicate the presence or absence of
something. They are commonly used as “substitute variables” so we can do a
quantitative analysis on qualitative data. From the new table above we can
quickly see that 3.5 is for New York (1 New York, 0 California, and 0 Florida).
It’s a convenient way of representing categories into numeric values.
However, there’s this so-called “dummy variable trap” wherein there’s an extra
variable that could have been removed because it can be predicted from the
others. In our example above, notice that when the columns for New York and
California are zero (0), automatically you’ll know it’s Florida. You can already
know which State it is even with just the 2 variable.
Continuing with our work on 50_Startups.csv, we can avoid the dummy variable
trap by including this in our code: 

Download 2,79 Mb.

Do'stlaringiz bilan baham:
1   ...   25   26   27   28   29   30   31   32   ...   60




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish