6
Tip #1: Scrape a Website with One
Function
Easily scrape a web table (or multiple web tables) using Pandas.
Use the Pandas read_html() function to easily extract data from web tables.
In [2]:
website =
"https://en.wikipedia.org/wiki/All-time_Olympic_Games_medal_table"
df_list = pd.read_html(website)
df = df_list[
2
]
df
Out[2]:
Team (IOC code)
№ Summer
№ Winter
№ Games
0
Albania (ALB)
8
4
12
1
American Samoa (ASA)
8
1
9
2
Andorra (AND)
11
12
23
3
Angola (ANG)
9
0
9
4
Antigua and Barbuda (ANT)
10
0
10
...
...
...
...
...
74
Republic of China (ROC) [ROC]
3
0
3
75
Saar (SAA) [SAA]
1
0
1
76
North Yemen (YAR) [YAR]
2
0
2
77
South Yemen (YMD) [YMD]
1
0
1
78
Refugee Olympic Team (ROT) [ROT]
1
0
1
79 rows × 4
columns
The function returns a list of dataframes on the webpage.
Slice the list to find your table and assign it to a dataframe.
Nik Piepenbreier -
datagy.io
7
Tip #2: Reverse a List in Python
To
reverse the values of a list, use the negative indexer twice in a row.
In [3]:
original_list = [
1
,
2
,
3
,
4
,
5
]
reversed_list = original_list[::
-1
]
reversed_list
Out[3]:
[
5
,
4
,
3
,
2
,
1
]
This uses
extended slices to reverse the list.
Extended slices add a third "step" argument to a slice.
By using a step of -1, Python returns the list in reverse order.
This happens because it starts at the -1 (the last value), then steps another -1
to -2, until position 0 is hit.
Nik Piepenbreier -
datagy.io
8
Tip #3: Speed Up Loading Dataframes
by Assigning Data Types
Pandas has 7 different data types:
1. Object (object - including strings and mixed)
2. Integer (int64)
3. Float (float64)
4. Boolean (bool)
5. Datetime (datetime64)
6. Time delta (timedela[ns])
7.
Categorical (category) (Check out
Tip #9 to
learn more about
categorical data types)
Each column in a dataframe can only have one data type.
Pandas will attempt to identify the data type
for each column when you
import it by scanning across the values in that column.
To
speed up the import, if you know the column
data types of specific
columns, you can include them as an argument in the import function.
In [4]:
df =
pd.read_csv(
'https://raw.githubusercontent.com/datagy/pivot_table_
pandas/master/select_columns.csv'
dtype={
'Name'
:str,
'Age'
:int,
'Height'
:str,
'Score'
:int,
'Random_A'
:int,
'Random_B'
:int,
'Random_C'
:int,
'Random_D'
:int})
df
Out[4]:
Do'stlaringiz bilan baham: