Nik piepenbreier datagy. Io 31 tips and tricks for super stars



Download 372,05 Kb.
Pdf ko'rish
bet7/8
Sana06.07.2022
Hajmi372,05 Kb.
#743883
1   2   3   4   5   6   7   8
Bog'liq
2 5215210330325523624

Name
Weight in Pounds
0
Nik
180
1
Jim
175
2
Alice
143
3
Jane
155
4
Matt
167
5
Kate
189
Let's convert these weights to kilograms: 
In [42]: 
df[

'Weight in Kg'

] = df[

'Weight in Pounds'

] / 

2.2
df
Out[42]:
Name
Weight in Pounds
Weight in Kg
0
Nik
180
81.82
1
Jim
175
79.55
2
Alice
143
65
3
Jane
155
70.45
4
Matt
167
75.91
5
Kate
189
85.91
Nik Piepenbreier - 

datagy.io
 


32 
Tip #22: "Text to Columns" from Excel in 
Pandas 
Need to split a column containing text into multiple columns? 
Use the split() function with the expand=True argument! 
In [43]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'

], 
'Location'

: [

'Toronto, Ontario'



'Los Angeles, California'


'Portland, Oregon'

]}) 
df
Out[43]: 
Name
Location
0
Nik
Toronto, Ontario
1
Jim
Los Angeles, California
2
Alice
Portland, Oregon
Let's split location into two columns: 
In [44]: 
df[[

'City'



'State'

]] = df[

'Location'

].str.split(pat=

','


expand=

True


df
Out[44]: 
Name
Location
City
State
0
Nik
Toronto, Ontario
Toronto
Ontario
1
Jim
Los Angeles, 
California
Los Angeles
California
2
Alice
Portland, Oregon
Portland
Oregon
Nik Piepenbreier - 

datagy.io
 


33 
Tip #23: Explode a Column in Pandas 
Sometimes you'll get data back that contains lists of lists in a dataframe. 
This is particularly true for surveys. 
Using the Pandas explode() function, you can create a long dataset out of 
this: 
In [45]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'

], 
'Cities Visited'

: [[

'Toronto'



'Los Angeles'


'Oregon'

],[

'Montreal'



'Toronto'



'Atlanta'

],[

'Toronto'


'Vancouver'



'Los Angeles'

]]}) 
df
Out[45]: 
Name
Cities Visited
0
Nik
[Toronto, Los 
Angeles, Oregon]
1
Jim
[Montreal, 
Toronto, Atlanta]
2
Alice
[Toronto, 
Vancouver, Los 
Angeles]
Now let's explode this dataset: 
In [46]: 
df = df.explode(

'Cities Visited'


df.head()
Out[46]:
Name
Cities Visited
0
Nik
Toronto
0
Nik
Los Angeles
0
Nik
Oregon
1
Jim
Montreal
1
Jim
Toronto
Nik Piepenbreier - 

datagy.io
 


34 
Tip #24: Create a Pivot Table 
Let's turn this exploded dataset into a table. 
In [47]: 
df[

'count'

] = 

1
pivot = df.pivot_table(index=

'Name'

, columns=

'Cities Visited'


values = 

'count'

, aggfunc=

'count'

, fill_value=

0


pivot
Out[47]: 
Name
Atlanta
Los Angeles
Montreal
Oregon
Toronto
Vancouver
Alice
0
1
0
0
1
1
Jim
1
0
1
0
1
0
Nik
0
1
0
1
1
0
To learn everything you need to know about Pandas Pivot Tables, check out 
my post here: ​
https://datagy.io/python-pivot-tables/ 
Nik Piepenbreier - 

datagy.io
 


35 
Tip #25: Change Dataframe Column 
Order 
Need to change the order that your columns appear in? 
This simple tricks makes it incredibly easy: 
In [48]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'



'Jane'



'Matt'


'Kate'

], 
'Score'

: [

100



120



96



75



68



123

], 
'Height'

: [

178



180



160



165



185



187

], 
'Weight'

: [

180



175



143



155



167



189

]}) 
df
Out[48]:
Name
Score
Height
Weight
0
Nik
100
178
180
1
Jim
120
180
175
2
Alice
96
160
143
3
Jane
75
165
155
4
Matt
68
185
167
5
Kate
123
187
189
Use two sets of square brackets to change the order: 
In [49]: 
df = df[[

'Name'



'Weight'



'Height'



'Score'

]] 
df
Out[49]:
Name
Weight
Height
Score
0
Nik
180
178
100
1
Jim
175
180
120
2
Alice
143
160
96
3
Jane
155
165
75
4
Matt
167
185
68
5
Kate
189
187
123
Nik Piepenbreier - 

datagy.io
 


36 
Tip #26: Calculate a Rolling Average 
Sometimes you will want to smooth out your charts by producing a rolling 
average. 
This is especially helpful when loading in data that fluctuates wildly, and 
where trends are more important than individual data points. 
In [50]: 
df = 
pd.read_csv(

'https://raw.githubusercontent.com/datagy/mediumdata/m
aster/sales.csv'


df.plot(kind=

'line'

)
Out[50]: 
 
You can see how jagged the lines are. 
Let's smooth this out with a rolling average. 
In [51]: 
df[

'rolling_average'

] = df[

'Sales'

].rolling(

5

).mean() 
df[

'rolling_average'

].plot(kind=

'line'

)
Out[51]: 
 
Nik Piepenbreier - 

datagy.io
 


37 
With this, you get a much better picture of the trends of the data! 
The rolling() function takes a number of arguments, including the first being 
window (which 
Nik Piepenbreier - 

datagy.io
 


38 
Tip #27: Convert String-Numbers to 
Numbers 
In some datasets, numbers will import as strings. 
This can be annoying and trying to use the astype() function will fail. 
This is where the tonumeric() function comes in handy: 
In [52]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'



'Jane'



'Matt'


'Kate'

], 
'Score'

: [

'100'



120



'96'



75



68



123

]}) 
df.info()
Out[52]:
<

class

'

pandas

.

core

.

frame

.

DataFrame

'> 
RangeIndex



6

entries, 

0

to 

5
Data columns (total 

2

columns): 
Name

6

non-null object 
Score

6

non-null object 
dtypes: object(

2


memory usage: 

224.0

+ bytes
Let's convert the Score column to integers: 
In [53]: 
df[

'Score'

] = df[

'Score'

].apply(pd.to_numeric, errors=

'coerce'


df.info()
Out[53]:
<

class

'

pandas

.

core

.

frame

.

DataFrame

'> 
RangeIndex



6

entries, 

0

to 

5
Data columns (total 

2

columns): 
Name

6

non-null object 
Score

6

non-null int64 
dtypes: int64(

1

), object(

1


memory usage: 

224.0

+ bytes
Let’s see what this dataframe looks like: 
Nik Piepenbreier - 

datagy.io
 


39 
In [54]: 
df
Out[54]: 
Name
Score
0
Nik
100
1
Jim
120
2
Alice
96
3
Jane
75
4
Matt
68
5
Kate
123
Nik Piepenbreier - 

datagy.io
 


40 
Tip #28: Impute Missing Values with 
fillna() 
Most data sets aren't perfect. You'll often encounter missing data. What to do 
with it depends on the data itself. One option is to impute the value by using 
its mean (average). 
This can be done easily by using the fillna() function in Pandas: 
In [55]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'



'Jane'



'Matt'


'Kate'

], 
'Sales'

: [

100



120



90



75



68



123

]}) 
df.iloc[

3

,

1

] = np.nan 
df
Out[55]:
Name
Sales
0
Nik
100
1
Jim
120
2
Alice
90
3
Jane
NaN
4
Matt
68
5
Kate
123
Imputing a number isn't always the best option. If you feel confident it won't 
skew the data, try the code below: 
In [56]: 
df = df.fillna(df.mean()) 
df
Out[56]:
Name
Sales
0
Nik
100
1
Jim
120
2
Alice
90
3
Jane
100.2
4
Matt
68
5
Kate
123
Nik Piepenbreier - 

datagy.io
 


41 
Tip #29: Get the Year from a Date 
You've been given a lot of data and want to create a column that tells you the 
year that row belongs to. 
Pandas makes this easy: 
In [57]: 
raw_data = {

'name'

: [

'Nik'



'Jen'



'Alex'



'Mel'

], 

'birth_date'

: [

'01-02-1988'



'08-05-1989'



'04-28-1987'


'12-16-1990'

]} 
df = pd.DataFrame(raw_data) 
df[

'birth_date'

] = pd.to_datetime(df[

'birth_date'

]) 
df
Out[57]: 
name
birth_date
0
Nik
1988-01-02
1
Jen
1989-08-05
2
Alex
1987-04-28
3
Mel
1990-12-16
Let's calculate the birth year of everyone in the dataframe: 
In [58]: 
df[

'birthyear'

] = df[

'birth_date'

].dt.year 
df
Out[58]:
name
birth_date
birthyear
0
Nik
1988-01-02
1988
1
Jen
1989-08-05
1989
2
Alex
1987-04-28
1987
3
Mel
1990-12-16
1990
Nik Piepenbreier - 

datagy.io
 


42 
Tip #30: Create a Frequency Table (incl. 
with Percentages!) 
Sometimes you just need a quick frequency table to see how often different 
values show up. 
Pandas makes this quite easy using the value_counts() function: 
In [59]: 
df = 
pd.read_excel(

'https://github.com/datagy/pivot_table_pandas/raw/ma
ster/sample_pivot.xlsx'


df
Out[59]:

Download 372,05 Kb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6   7   8




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish