Indexing by Record
When you need to access an entire record, you must use loc. This allows us to specify the record location via the index. Let’s access the entire first record, then the name within that record:
# directly selecting a record in Pandas using .loc print( df.loc[0] )
print( df.loc[0]["names"] ) # selecting the value at record 0 in the
"names" column
Go ahead and run the cell. We can see that we’re able to output the entire record. In the case of using loc, you must specify the record index location first, then the column name.
Slicing a DataFrame
When you want to access a specific number of records, you must slice the DataFrame. Slicing in Pandas works the exact same way as a Python list does, using start, stop, and step within a set of brackets. Let’s access the records from index 2 up to 5:
# slicing a DataFrame to grab specific records print( df[2:5] )
Go ahead and run the cell. This will output the records at index 2, 3, and 4. Again, be careful when slicing as leaving off the colon would result in trying to access a column name.
Built-in Methods
These are methods that are frequently used to make your life easier when using Pandas.
It is possible to spend a whole week simply exploring the built-in functions supported by DataFrames in Pandas. However, we will simply highlight a few that will be useful, to give you an idea of what’s possible out of the box with Pandas.
head( )
When you work with large sets of data, you’ll often want to view a couple records to get an idea of what you’re looking at. To see the top records in the DataFrame, along with the column names, you use the head() method:
# accessing the top 5 records using .head( ) df.head(5)
Go ahead and run the cell. This will output the top five records. The argument passed into the method is arbitrary and will show as many records as you want from the top. tail( )
To view a given number of records from the bottom, you would use the tail() method:
# accessing the bottom 3 records using .tail( ) df.tail(3)
Go ahead and run the cell. This will output the bottom three records for us to view. keys( )
Sometimes you’ll need the column names. Whether you’re making a modular script or analyzing the data you’re working with, using the keys( ) method will help:
# accessing the column headers (keys) using the .keys( ) method headers = df.keys( ) print(headers)
Go ahead and run the cell. This will output a list of the header names in our DataFrame.
.shape
The shape of a DataFrame describes the number of records by the number of columns. It’s always important to check the shape to ensure you’re working with the proper amount of data:
# checking the shape, which is the number of records and columns print( df.shape )
Go ahead and run the cell. We’ll get a (7, 2) tuple returned, representing records and columns.
Do'stlaringiz bilan baham: |