The following are key terms we’ll be using throughout this section. Be sure to look over them and reference them when necessary:
Series ➤ One-dimensional labeled array capable of holding data of any type
DataFrame ➤ Spreadsheet
Axis ➤ Column or row, axis = 0 by row; axis = 1 by column
Record ➤ A single row
dtype ➤ Data type for DataFrame or series object
Time Series ➤ Series object that uses time intervals, like tracking weather by the hour
Installing Pandas
To install Pandas, make sure your virtual environment is activated first, then write the following command into the terminal:
$ pip install pandas
After running the command, it should install a few packages that Pandas requires. If you’d like to check and make sure you downloaded the proper library, just write out the list command.
Importing Pandas
To follow along with the rest of this lesson, let’s open and continue from our previous notebook file “Week_10” and simply add a markdown cell at the bottom that says,
“Pandas.”
Importing Pandas is simple; however, there is an industry standard when you import
the library:
# importing the pandas library import pandas as pd # industry standard name of pd when importing
Go ahead and run the cell. We import Pandas as pd because it’s shorter and easier to reference.
Creating a DataFrame
The central object of study in Pandas is the DataFrame, which is a tabular data structure with rows and columns like an Excel spreadsheet. You can create a DataFrame from a Python dictionary or a file that has tabular data, like a CSV file. Let’s create our own from a dictionary:
1| # using the from_dict method to convert a dictionary into a Pandas DataFrame
2| import random
CHapter 10 INtroduCtIoN to data aNalYsIs
4| random.seed(3) # generate same random numbers every time, number used doesn't matter
6| names = [ "Jess", "Jordan", "Sandy", "Ted", "Barney", "Tyler", "Rebecca" ]
7| ages = [ random.randint(18, 35) for x in range( len(names) )]
9| people = { "names" : names, "ages" : ages }
11| df = pd.DataFrame.from_dict(people)
12| print(df)
Go ahead and run the cell. We import the random module so that we may create random ages for our people on line 7. Using the seed method on line 4 will give us both the same random numbers to work with. You could pass any number as the argument into seed; however, if you use a number other than 3, you’ll get a different output than this book.
Note random numbers aren’t truly random; they follow a specific algorithm to return a number.
After we generate a list of names and random ages for each person, we create a dictionary called “people.” The magic truly happens on line 11, where we use Pandas to create the DataFrame that we’ll be working with. When it’s created, it uses the keys as the column names, and the values match up with the corresponding index, such that names[0] and ages[0] will be a single record. You should output a table that looks like Table 10-1.
Table 10-1. DataFrame created from fake data
Do'stlaringiz bilan baham: |