Python Projects for Beginners a ten-Week Bootcamp Approach to Python Programming

Download 2,61 Mb.

bet	185/200
Sana	20.06.2022
Hajmi	2,61 Mb.
	#681748

1 ... 181 182 183 184 185 186 187 188 ... 200

Bog'liq
Python Projects for Beginners A Ten Week Bootcamp Approach to Python

Aggregations

ages	names	tenure
age_group
6	18	rebecca	4	teenager
5	20	tyler	6	teenager
2	22	sandy	2	adult
0	25	Jess	5	adult
3	29	ted	8	adult
4	33	Barney	7	adult
1	35	Jordan	5	adult

Note When you need to apply a value based on multiple columns, you must set the axis = 1.

Aggregations

The raw data plus transformations is generally only half the story. Your objective is to extract actual insights and actionable conclusions from the data, and that means reducing it from potentially billions of rows to a summary of statistics via aggregation functions. This section assumes some knowledge of SQL and the groupby function. If you’re not familiar with how groupby works in SQL, visit w3schools⁸ for reference material.

groupby( )

In order to condense the information down to a summary of statistics, we’ll need to use the groupby method that Pandas has. Whenever you group information together, you need to use an aggregate function to let the program know how to group the information together. For now, let’s count how many records of each age group there are within our DataFrame:

# grouping the records together to count how many records in each group df.groupby("age_group", as_index=False).count( ).head( )

Go ahead and run the cell. When the information is grouped together using the count method, the program will simply add up the number of records that belong in each category. We’ll have two categories: adult with five records, and teenager with two records. The first argument of our groupby method is the column we want to group on, and the second is to make sure we don’t reset the index to become the age group column. If it were set to True, then the resulting DataFrame would use age_group as the unique identifier for each record.

mean( )

Instead of counting how many records there are in each category, let’s go ahead and find the averages of each column by using the mean method. We’ll group based on the same column:

# grouping the data to see averages of all columns df.groupby("age_group", as_index=False).mean( ).head( )

Go ahead and run the cell. Using the mean method, we’ll be able to get the averages for all numerical columns. The output should result in a DataFrame that looks like Table 10-4.
CHapter 10 INtroduCtIoN to data aNalYsIs
Table 10-4. Grouping by age_group and averaging data

Download 2,61 Mb.

Do'stlaringiz bilan baham:

1 ... 181 182 183 184 185 186 187 188 ... 200