Less popular libraries
So, above we went through the most popular libraries, which are used by almost every professional data scientist or a person interested in this field. There also are less well-known, but no less useful libraries for intellectual analysis and processing of natural language, and data visualization.
Scrapy
Scrapy is used to create spider bots that scan website pages and collect structured data: prices, contact information and URLs. In addition, Scrapy can extract data from the API.
NLTK
The NLTK set of libraries is designed for natural language processing. Its main functions are: text markup, specifying named objects, displaying the syntax tree, which reveals parts of speech and dependencies. It's being used for tone analysis and automatic generalization.
Pattern
Pattern combines the Scrapy and NLTK functionality and is designed for extracting data from the Internet, natural language processing, machine learning and social media analysis. Among its tools are the search engine, API for Google, Twitter and Wikipedia, and text analysis algorithms that can be executed in several lines of code.
Seaborn
The Seaborn library is of a higher level than Matplotlib. With its help it's easier to create a specific visualization: heat maps, time series and violin diagrams.
Bokeh
Bokeh creates interactive and scalable graphics in browsers using JavaScript widgets. It has three interface levels, from high, which allows you to quickly create complex graphics, to low.
Basemap
Basemap is used to create maps. The
Folium library, with which the interactive maps on the Internet are being created, is based on it. Here is the example of visualization created with Folium and Basemap.
NetworkX
NetworkX is used to create and analyze graphs and network structures. It's designed to work with standard and non-standard data formats.
Despite the fact that there are many Python libraries and packages for image and natural language processing, deep learning, neural networks and so on, it's better to master the five basic libraries first (described in paragraphs 1-5 of this article), and only then target more narrowly focused ones.
Conclusion
As you can see, Python has a very wide range of tools for both collecting information and analyzing it. Given that the amount of data that needs to be analyzed is growing every day, the ability to work with these libraries can be a great plus for your resume (or maybe even a basic requirement for a startup).
If you have the experience of working with one of the listed libraries - share it, this way you might help other developers who are just making the first steps in this area to make up their minds.
Do'stlaringiz bilan baham: |