Nik piepenbreier datagy. Io 31 tips and tricks for super stars



Download 372,05 Kb.
Pdf ko'rish
bet6/8
Sana06.07.2022
Hajmi372,05 Kb.
#743883
1   2   3   4   5   6   7   8
Bog'liq
2 5215210330325523624

Name
Score
Height
Weight
0
Nik
100
178
180
1
Jim
120
180
175
2
Alice
96
160
143
3
Jane
75
165
155
4
Matt
68
185
167
5
Kate
123
187
189
Let's see how to reshape this using the melt() function: 
In [24]: 
melted = df.melt(id_vars=

'Name'

, value_vars=[

'Score'



'Height'


'Weight'

]) 
melted
Out[24]:
 
Name
variable
value
0
Nik
Score
100
1
Jim
Score
120
2
Alice
Score
96
3
Jane
Score
75
4
Matt
Score
68
Nik Piepenbreier - 

datagy.io
 


22 
Tip #14: set_option to Format Values 
You may run into data where you have just ​too​ many decimal places. 
This can be less than optimal for when you're trying to display data. 
Thankfully, Pandas has the set_option function that lets you set a certain 
precision for decimal values. 
In [25]: 
import

numpy 

as

np 
df = pd.DataFrame(np.random.randn(

5



5

)) 
df
Out[25]:
0
1
2
3
4
0
0.495358
0.256962
-0.356198
0.190293
0.182815
1
-0.208766
-0.454087
-1.231011
-0.284256
-0.080261
2
0.936962
-0.111395
-0.695576
-1.777353
-1.794223
3
1.08996
0.005278
-0.313701
1.622452
0.720692
4
0.818865
-0.79598
-0.000649
0.866517
0.628177
Let's say you wanted to only show two decimal places, you could write: 
In [26]: 
pd.set_option(

'precision'



2


df
Out[26]:
 
0
1
2
3
4
0
0.5
2.57E-01
-3.56E-01
0.19
0.18
1
-0.21
-4.54E-01
-1.23E+00
-0.28
-0.08
2
0.94
-1.11E-01
-6.96E-01
-1.78
-1.79
3
1.09
5.28E-03
-3.14E-01
1.62
0.72
4
0.82
-7.96E-01
-6.49E-04
0.87
0.63
Nik Piepenbreier - 

datagy.io
 


23 
Tip #15: Filter by Relative Dates 
Sometimes you may want to filter a dataframe based on dates. If you have a 
column that is formatted as dates, you can use dates to effectively filter your 
data down. 
In [27]: 
raw_data = {

'name'

: [

'Nik'



'Jen'



'Alex'



'Mel'

], 

'birth_date'

: [

'01-02-1988'



'08-05-1989'



'04-28-1987'


'12-16-1990'

]} 
df = pd.DataFrame(raw_data) 
df[

'birth_date'

] = pd.to_datetime(df[

'birth_date'

]) 
df
Out[27]:
name
birth_date
0
Nik
1988-01-02
1
Jen
1989-08-05
2
Alex
1987-04-28
3
Mel
1990-12-16
In [28]: 
filtered = df[df[

'birth_date'

] < 

'1990-01-01'


filtered
Out[28]:
name
birth_date
0
Nik
1988-01-02
1
Jen
1989-08-05
2
Alex
1987-04-28
Similarly, you can filter just for a single year, by using the date year attribute.
In [29]: 
just_1988 = df[df[

'birth_date'

].dt.year == 

1988


just_1988
Out[29]:
name
birth_date
0
Nik
1988-01-02
Nik Piepenbreier - 

datagy.io
 


24 
Tip #16: Calculate Cumulative Sum 
Need a running total of a particular column in a dataframe? 
In [30]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'



'Jane'



'Matt'


'Kate'

], 
'Sales'

: [

100



120



96



75



68



123

]}) 
df
Out[30]: 
Name
Sales
0
Nik
100
1
Jim
120
2
Alice
96
3
Jane
75
4
Matt
68
5
Kate
123
This is where the cumsum() function comes in handy. 
Assign this function to a new column and it'll calculate the cumulative sum 
as a new series! 
In [31]: 
df[

'running_total'

] = df[

'Sales'

].cumsum() 
df
Out[31]: 
Name
Sales
running_total
0
Nik
100
100
1
Jim
120
220
2
Alice
96
316
3
Jane
75
391
Nik Piepenbreier - 

datagy.io
 


25 
4
Matt
68
459
5
Kate
123
582
This also works with other running functions, such as: 

cummax() - for running max 

cummin() - for running min 

cumprod() - for running product 
Let's get the running max: 
In [32]: 
df[

'running_max'

] = df[

'Sales'

].cummax() 
df
Out[32]: 
Name
Sales
running_total
running_max
0
Nik
100
100
100
1
Jim
120
220
120
2
Alice
96
316
120
3
Jane
75
391
120
4
Matt
68
459
120
5
Kate
123
582
123
Nik Piepenbreier - 

datagy.io
 


26 
Tip #17: Drop Multiple Columns in 
Pandas 
When importing a data set, chances are you'll run into many columns you 
don't need. It's best to just not import these. 
However, say you've been manipulating a dataset and no longer need some 
columns. 
Use Pandas' drop function: 
In [33]: 
df = 
pd.read_csv(

'https://raw.githubusercontent.com/datagy/pivot_table_
pandas/master/select_columns.csv'


df
Out[33]:
Name
Age
Height
Score
Random_
A
Random_
B
Random_
C
Random_
D
Random_
E
0
Joe
28
5'9
30
73
59
5
4
31
1
Melissa
26
5'5
32
30
85
38
32
80
2
Nik
31
5'11
34
80
71
59
71
53
3
Andrea
33
5'6
38
16
63
86
81
42
4
Jane
32
5'8
29
19
40
48
5
68
Let's drop the last five columns: 
In [34]: 
df.drop(columns=[

'Random_A'



'Random_B'



'Random_C'



'Random_D'


'Random_E'

], inplace=

True


df
Out[34]:
Name
Age
Height
Score
0
Joe
28
5'9
30
1
Melissa
26
5'5
32
2
Nik
31
5'11
34
3
Andrea
33
5'6
38
4
Jane
32
5'8
29
Nik Piepenbreier - 

datagy.io
 


27 
Tip #18: Change all your columns to 
Lowercase 
Sometimes when you import data, your column names will be all over the 
place. 
Inconsistency in column names can make it tricky to select data. 
With this trick, you'll learn how to turn all your columns to lowercase: 
In [35]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'



'Jane'



'Matt'


'Kate'

], 
'score'

: [

100



120



96



75



68



123

], 
'Height in CM'

: [

178



180



160



165



185



187

], 
'Weight in Pounds'

: [

180



175



143



155



167



189

]}) 

df
Out[35]: 
Name
score
Height in CM
Weight in 
Pounds
0
Nik
100
178
180
1
Jim
120
180
175
2
Alice
96
160
143
3
Jane
75
165
155
4
Matt
68
185
167
5
Kate
123
187
189
In [36]: 
columns = df.columns 
df.columns = [column.lower() 

for

column 

in

columns] 
df
Out[36]: 
Nik Piepenbreier - 

datagy.io
 


28 
name
score
height in cm
weight in 
pounds
0
Nik
100
178
180
1
Jim
120
180
175
2
Alice
96
160
143
3
Jane
75
165
155
4
Matt
68
185
167
5
Kate
123
187
189
Similarly, spaces can cause issues when you want to use the dot selector to 
select columns. 
You can use this same trick to switch spaces for underscores: 
In [37]: 
columns = df.columns 
df.columns = [column.replace(

" "



"_"



for

column 

in

columns] 
df
Out[37]: 
name
score
height_in_cm
weight_in_poun
ds
0
Nik
100
178
180
1
Jim
120
180
175
2
Alice
96
160
143
3
Jane
75
165
155
4
Matt
68
185
167
5
Kate
123
187
189
 
 
Nik Piepenbreier - 

datagy.io
 


29 
Tip #19: Create a List of Unique Values 
in a Column 
Sometimes you may want to pull all the unique values in a column into a list. 
This can be helpful for plotting them as variables in a machine learning 
model. 
When you simply select the unique values and place them into a variable, you 
can use these as a list: 
In [38]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'



'Jane'



'Matt'


'Kate'

], 
'Score'

: [

100



120



96



75



68



123

]}) 
unique = df[

'Name'

].unique() 
print(

'unique is type:'

,type(unique))
Out[38]:
unique 

is

type: <

class

'

numpy

.

ndarray

'>
You can see that the unique is of type ​numpy.array​, meaning you can't loop 
over it. 
Let's turn it into a list: 
In [39]: 
unique_list = df[

'Name'

].unique().tolist() 
print(

'unique_list is type: '

, type(unique_list)) 
print(unique_list)
Out[39]:
unique_list 

is

type: <

class

'

list

'> 
['

Nik

', '

Jim

', '

Alice

', '

Jane

', '

Matt

', '

Kate

']
Nik Piepenbreier - 

datagy.io
 


30 
Tip #20. Export to Clipboard for Easy 
Pasting 
You may be working with a dataset you don't ​really​ want to save. 
But you want to bring it to Excel. 
Pandas gives you that option by using the to_clipboard() function. 
Let's give it a try: 
In [40]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'



'Jane'



'Matt'


'Kate'

], 
'Weight in Pounds'

: [

180



175



143



155



167



189

]}) 
df.to_clipboard()
Nik Piepenbreier - 

datagy.io
 


31 
Tip #21: Create New Columns with Math 
Do you have columns that you need to apply formulas to? For example: 

Calculating sales tax? 

Changing pounds to kilograms? 
This can be done easily in Pandas: 
In [41]: 
df = pd.DataFrame({

'Name'

: [

'Nik'



'Jim'



'Alice'



'Jane'



'Matt'


'Kate'

], 
'Weight in Pounds'

: [

180



175



143



155



167



189

]}) 
df
Out[41]:

Download 372,05 Kb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6   7   8




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish