When you need to plot categorical data, a bar plot is a much better choice. Let’s create some fake data for the number of people that chose their favorite movie category and plot it:
1| # creating a bar plot using x and y coords
3| num_people, categories = [ 4, 8, 3, 6, 2 ] , [ "Comedy", "Action", "Thriller", "Romance", "Horror" ]
5| plt.bar(categories, num_people)
7| plt.title("Favorite Movie Category", fontsize=24)
8| plt.xlabel("Category", fontsize=16)
9| plt.ylabel("# of People", fontsize=16)
10| plt.xticks(fontname="Fantasy")
11| plt.yticks(fontname="Fantasy")
13| plt.show( )
Go ahead and run the cell. After creating our data to work with, we create our plot on line 5. Using the bar() method, we’re able to create the bar plot. The numerical data must always be set up on the y axis, which is why we have our categories in the x axis.
We’ve also added several new customizations to the chart. We can adjust the font size, font to be displayed, and even adjust how large the tick marks appear. You should render a chart like Figure 10-3.
Figure 10-3. Bar plot of movie categories data
Box Plot
Box plots are useful in situations where you need to compare a single statistic either over time or against categories. They are like candlestick charts in their design, where you can view the min, max, 25% quartile, 75% quartile, and median, which can be useful for displaying data over time. In the case of stocks, currency would be the y axis data and time would be the x axis data. For our example, let’s create two separate groups and display the heights for each:
1| # creating a box plot – showing height data for male-female
3| males, females = [ 72, 68, 65, 77, 73, 71, 69 ] , [ 60, 65, 68, 61, 63, 64 ]
4| heights = [ males, females ]
6| plt.figure(figsize=(15, 8)) # makes chart bigger
7| plt.boxplot(heights) # takes in list of data, each box is its' own array, heights contains two lists
CHapter 10 INtroduCtIoN to data aNalYsIs
9| plt.xticks( [ 1, 2 ] , [ "Male" , "Female " ] ) # sets number
of ticks and labels on x-axis
10| plt.title("Height by Gender", fontsize=22)
11| plt.ylabel("Height (inches)", fontsize=14)
12| plt.xlabel("Gender", fontsize=14)
14| plt.show( )
Go ahead and run the cell. In order to plot the data in separate categories, we need to have a list of lists. On line 4, we declare our data which is holding a list of heights for both males and females. When we go to plot our data, it will separate each list into its own box. You’ll notice the figure is much larger than usual; we declare a new figure size on line 6. To render the chart though, we use the boxplot() method on line 7 and pass heights in as our data. One of the more important lines is number 9, however, where we define the number of categories to appear on the x axis. We order them as “Male” then “Female” because that is the order in which they’re declared on line 4. The chart should render like Figure 10-4.
Figure 10-4. Box plot of height data
Do'stlaringiz bilan baham: |