MONDAY EXERCISES
1.
|
Test Environment: Create a new virtual environment called “test.” When creating it, install python version 2.7 instead of the current version. after it’s completed, make sure it installed the proper version of python by checking the list.
|
2.
|
JavaScript Repositories: using the requests module and the Github API link in our last lesson, figure out how many repositories on Github use Javascript.
|
today was an important introduction into data analysis. Not only did we cover how to use virtual environments and why, but we also went over the requests module with a brief introduction into apIs. When using any library for the rest of the week, we’ll need to activate our data_analysis virtual environment. at the end of the week, we’ll cover web scraping, which requires us to use the requests module.
Tuesday: Pandas
When you need to work with data, Pandas is the ultimate tool. It’s essentially Excel on steroids. If you’re familiar with the SQL language, this will come easier to you, as Pandas is a mix of Python and SQL. By the end of the day, you’ll be able to analyze and work with tabular data in a more efficient way than other traditional methods.
Like how yesterday’s lesson began, we need to install the Pandas library into our virtual environment. To follow along with today’s lesson, cd into the “python_bootcamp” folder, and activate the environment. We’ll begin today within the terminal.
Note If you can’t remember how to activate the environment, go back to yesterday’s lesson.
What Is Pandas?
Pandas is a flexible data analysis library built within the C language, which is excellent for working with tabular data. It is currently the de facto standard for Python-based data analysis, and fluency in Pandas will do wonders for your productivity and frankly your resume. It is one of the fastest ways of getting from zero to answer. Having been written in C, it has increased speed when performing calculations. The Pandas module is a high performance, highly efficient, and high-level data analysis library. It allows us to work with large sets of data called DataFrames.
Note Numpy is a fundamental package for scientific computing in python. Built from the C language, it uses multidimensional arrays and can perform calculations at high-rate speeds.
CHapter 10 INtroduCtIoN to data aNalYsIs
The Pandas library is useful in so many ways that you can do any of the following and more:
Calculate statistics and answer questions about the data like average, median, max, and min of each column
Finding correlations between columns
Tracking the distribution of one or more columns
Visualizing the data with the help of matplotlib, using plot bars, histograms, etc.
Cleaning and filtering data, whether it’s missing or incomplete, just by applying a user-defined function (UDF) or built-in function
Transforming tabular data into Python to work with
Exporting the data into a CSV, other file, or database
Feature engineer new columns that can be applied to your analysis
No matter what you need to do with data, Pandas is your end-all-be-all analysis library.
Do'stlaringiz bilan baham: |