Date
Region
Type
Units
Sales
0
2020-07-11
East
Children's
Clothing
18
306
1
2020-09-23
North
Children's
Clothing
14
448
2
2020-04-02
South
Women's
Clothing
17
425
3
2020-02-28
East
Children's
Clothing
26
832
4
2020-03-19
West
Women's
Clothing
3
33
...
...
...
...
...
...
995
2020-02-11
East
Children's
Clothing
35
735
996
2020-12-25
North
Men's Clothing
NaN
1155
997
2020-08-31
South
Men's Clothing
13
208
998
2020-08-23
South
Women's
Clothing
17
493
999
2020-08-17
North
Women's
Clothing
25
300
1000 rows × 5 columns
Nik Piepenbreier -
datagy.io
43
Let's see how often each type appears:
In [60]:
df[
'Region'
].value_counts()
Out[60]:
East
411
North
316
South
137
West
136
Name: Region, dtype: int64
If you want to turn these into proportions, you just need to use the
normalize=True argument:
In [61]:
df[
'Region'
].value_counts(normalize=
True
)
Out[61]:
East
0.41
North
0.32
South
0.14
West
0.14
Name: Region, dtype: float64
Nik Piepenbreier -
datagy.io
44
Tip #31: Sort Your Dataframes
It can be helpful to sort your dataframes.
The sort_values() function is well known. But did you know you can apply
multiple layers of sort to it?
In [62]:
df =
pd.read_excel(
'https://github.com/datagy/pivot_table_pandas/raw/ma
ster/sample_pivot.xlsx'
)
df.head(
10
)
Out[62]:
Date
Region
Type
Units
Sales
0
2020-07-11
East
Children's
Clothing
18
306
1
2020-09-23
North
Children's
Clothing
14
448
2
2020-04-02
South
Women's
Clothing
17
425
3
2020-02-28
East
Children's
Clothing
26
832
4
2020-03-19
West
Women's
Clothing
3
33
5
2020-02-05
North
Women's
Clothing
33
627
6
2020-01-24
South
Women's
Clothing
12
396
7
2020-03-25
East
Women's
Clothing
29
609
8
2020-01-03
North
Children's
Clothing
18
486
9
2020-11-03
East
Children's
Clothing
34
374
Nik Piepenbreier -
datagy.io
45
Let's sort this dataframe by region and then by number of units sold:
In [63]:
df.sort_values(by=[
'Region'
,
'Units'
],ascending=
False
).head(
10
)
Out[63]:
Date
Region
Type
Units
Sales
534
2020-05-17
West
Men's Clothing
35
520
972
2020-09-21
West
Men's Clothing
35
437
755
2020-09-28
West
Men's Clothing
34
182
948
2020-01-27
West
Children's
Clothing
34
510
280
2020-10-31
West
Women's
Clothing
33
196
533
2020-05-14
West
Men's Clothing
33
840
844
2020-05-14
West
Men's Clothing
33
308
206
2020-08-05
West
Children's
Clothing
32
640
244
2020-03-02
West
Men's Clothing
32
56
481
2020-02-02
West
Women's
Clothing
32
558
Nik Piepenbreier -
datagy.io
46
Bonus Tip: Read Multiple Files at Once
Do you have multiple files you need to read into a single dataframe?
You could create seperate dataframes first - or you could use the glob library
to read multiple files into a dataframe at the same time.
In [64]:
import
glob
file_path =
#path to your files here
df = pd.concat([pd.read_csv(file)
for
file
in
glob.glob(
f"
{file_path}
*.csv"
)], ignore_index =
True
)
We use a list comprehension to read our csv files and then concatenate them
using the concat function.
You can also use the read_excel() function if you're dealing with Excel files!
Conclusion
Thanks so much for reading! I hope you enjoyed the tips and that you learned
something new!
Let me know what your favorite tip is over Twitter (
@datagyio
) or via email
(
nik@datagy.ca
).
Be sure to check out
https://datagy.io
for more Python tips, tricks, and
thorough tutorials!
Nik Piepenbreier -
datagy.io
Do'stlaringiz bilan baham: |