Python Projects for Beginners a ten-Week Bootcamp Approach to Python Programming


Parsing the Response with Beautiful Soup



Download 2,61 Mb.
bet199/200
Sana20.06.2022
Hajmi2,61 Mb.
#681748
1   ...   192   193   194   195   196   197   198   199   200
Bog'liq
Python Projects for Beginners A Ten Week Bootcamp Approach to Python

Parsing the Response with Beautiful Soup


The Beautiful Soup library comes with many attributes and methods that make parsing the code easier for ourselves. Using this library, we can make the code easy to view, scrape, and traverse through. We’ll need to create a BeautifulSoup object to work with by passing the page content into it, along with the type of parser we want to use. In our case, we’re working with HTML code, so we’ll need to use the HTML parser:

# turning the response into a BeautifulSoup object to extract data soup = BeautifulSoup(page.content, "html.parser") print( soup.prettify( ) )

Go ahead and run the cell. The prettify() method will create a well-formatted output for us to view. This makes it easier for us to see the actual code that is written. The soup object knew how to parse the content properly because of the parser that we specified. Beautiful Soup works with other languages, but we’ll be working with HTML for this book. Now that we’ve turned the content into an object we can use, let’s learn how to extract the data from the code.

Scraping Data


There are many methods to extract data using Beautiful Soup. The following sections will cover a few of the main methods in doing so. Basic HTML knowledge is assumed for this section.
CHapter 10 INtroduCtIoN to data aNalYsIs

.find( )


To find a specific element within the code, we can use the find() method. The argument we pass is the tag that we want to search for, but it will only find the first instance and return it. Meaning that if there are four bold element tags within our code, and we use this method to find a bold tag, it will respond back with only the first bold element tag found. Let’s try it out:

# using the find method to scrape the text within the first bold tag title = soup.find("b") print(title) print( title.get_text( ) ) # extracts all text within element

Go ahead and run the cell. If you look at the code using the inspector tab in your web browser’s console tools, you’ll be able to see that the first bold tag within the code is the title of the poem. The first print statement results in “Love” and the second is simply the text within the element. We were able to extract the text by using the get_ text() method.

Download 2,61 Mb.

Do'stlaringiz bilan baham:
1   ...   192   193   194   195   196   197   198   199   200




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish