Python Projects for Beginners a ten-Week Bootcamp Approach to Python Programming



Download 2,61 Mb.
bet200/200
Sana20.06.2022
Hajmi2,61 Mb.
#681748
1   ...   192   193   194   195   196   197   198   199   200
Bog'liq
Python Projects for Beginners A Ten Week Bootcamp Approach to Python

.find_all( )


To find all instances of a given element, we use the find_all() method. This will give us back a list of all tags found within the code. Let’s find all bold tags within the code and extract the text:
# get all text within the bold element tag then output each poem_text = soup.find_all("b") for text in poem_text: print( text.get_text( ) )
Go ahead and run the cell. If you were to look at the code using your inspector tools, you would notice that all the text is within bold tags. The result is an output of the entire poem.

Finding Elements by Attributes


All HTML elements have attributes associated with them, whether it’s a style, id, class, etc., you can use Beautiful Soup to find elements with a specific attribute value. Let’s request a response from my personal Github page and find the element that shows my username:

1| # finding an element by specific attribute key-values
3| page = requests.get("https://github.com/Connor-SM")
4| soup = BeautifulSoup(page.content, "html.parser")
6| username = soup.find( "span", attrs={ "class" : "vcard-username" } ) # find first span with this class
8| print(username) # will show that element has class of vcard- username among others
9| print( username.get_text( ) )

Go ahead and run the cell. We send a request to Github and parse the content into a BeautifulSoup object to work with. On line 6, we search for a span tag element that has an attribute of class, whose value is “vcard-username.” This will output the entire span tag, including text, attributes, and the syntax on line 8. Lastly, we extract the text on line 9 to output the username associated with this page.
Note Finding elements by attributes also works with the find_all method. You can also include multiple key-value pairs to look for within the attrs argument.

DOM Traversal


This section will cover how to extract information by traversing through the DOM hierarchy. The DOM, short for Document Object Model, is a concept in web design that describes the relationships and structure between elements on a browser. All elements on a web page belong to one of three relationships:

  1. Parent-Child

  2. Sibling

  3. Grandparent-Grandchild

CHapter 10 INtroduCtIoN to data aNalYsIs
This concept is important to understand when you are web scraping because you may need to access the children of a specific element. The children are in reference to all elements within another element. Take the following HTML code, for instance:


Title


Sub-title



Text



Download 2,61 Mb.

Do'stlaringiz bilan baham:
1   ...   192   193   194   195   196   197   198   199   200




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish