Name
Age
Height
Score
Random_A
Random_B
Random_C
Random_D
Random_E
Joe
28
5'9
30
73
59
5
4
31
Melissa
26
5'5
32
30
85
38
32
80
Nik
31
5'11
34
80
71
59
71
53
Andrea
33
5'6
38
16
63
86
81
42
Jane
32
5'8
29
19
40
48
5
68
Nik Piepenbreier -
datagy.io
9
Tip #4: Reading Large CSV Files
Need to read a large csv file and want to preview it first?
Use the "nrows=" argument in the read_csv function. This will only read in the
first "n" rows you specify in the function.
In [5]:
df =
pd.read_csv(
'https://raw.githubusercontent.com/datagy/pivot_table_
pandas/master/select_columns.csv'
, nrows=
2
)
df
Out[5]:
Name
Age
Height
Score
Random_A
Random_B
Random_C
Random_D
Random_E
Joe
28
5'9
30
73
59
5
4
31
Melissa
26
5'5
32
30
85
38
32
80
This can be helpful to figure out which columns to import, to set data types.
You could load the column names this way by using the list() function.
Similarly, you can assign data types using the dtype argument.
In [6]:
columns = list(df.columns)
columns
Out[6]:
[
'Name'
,
'Age'
,
'Height'
,
'Score'
,
'Random_A'
,
'Random_B'
,
'Random_C'
,
'Random_D'
,
'Random_E'
]
Nik Piepenbreier -
datagy.io
10
Then you can choose to import only the columns you need.
In [7]:
df =
pd.read_csv(
'https://raw.githubusercontent.com/datagy/pivot_table_
pandas/master/select_columns.csv'
,
usecols = [
'Name'
,
'Age'
,
'Height'
],
dtype={
'Name'
:str,
'Age'
:int,
'Height'
:str})
df
Out[7]:
Name
Age
Height
Joe
28
5'9
Melissa
26
5'5
Nik
31
5'11
Andrea
33
5'6
Jane
32
5'8
Nik Piepenbreier -
datagy.io
11
Tip #5: Count Missing Values Per
Column
It can often be helpful to get a sense of how many values are missing in a
particular dataset.
By default, Pandas will tell you how many non-null values exist in a
dataframe.
While this is helpful, it can be a lot more helpful to identify the number of null
values.
You can use the .isnull() method with the .sum() method to determine the
total number of missing values.
In [8]:
df =
pd.read_csv(
'https://raw.githubusercontent.com/datagy/mediumdata/ma
ster/sample.csv'
)
df.isnull().sum()
Out[8]:
Name
0
Height
1
Weight
2
dtype: int64
From this, we can see that
●
The Name column isn't missing any values,
●
The Height column is missing 1 value, and
●
The Weight column is missing 2 values.
Nik Piepenbreier -
datagy.io
12
Tip #6: Import Data from Your Clipboard
Sometimes you have data in Excel and want to import them easily.
Simply select the data and copy it to your clipboard.
Then use the read_clipboard function to import this data into a dataframe.
1. Copy data to your clipboard
2. df = pd.read_clipboard()
Nik Piepenbreier -
datagy.io
13
Tip #7: Skip Rows in Strange Excel Files
You might encounter Excel files that have extra text before the data starts.
Look at our example below, where the true data doesn't start until row 5.
In [9]:
df =
pd.read_excel(
'https://github.com/datagy/pivot_table_pandas/raw/ma
ster/sampleweird.xlsx'
)
df
Out[9]:
Do'stlaringiz bilan baham: |