Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet82/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   78   79   80   81   82   83   84   85   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Reading PubMed XML files

PubMed


3

is a search engine that lets you access the MEDLINE database of citations for

articles  in  the  life  sciences.  You  can  download  the  citations  in  various  formats.  In  this

section we consider the XML format.

As  with  all  XML  formats,  although  you  can  escape  from  the  pain  of  parsing  using

ElementTree,  you  still  have  to  understand  what  the  schema  (or  ‘data  model’)  is.  The

schema  for  PubMed  XML  is  defined  by  a  DTD  (Document  Type  Definition),  although

reading this is not very enjoyable. And as with all schemas it is quite possible that it will

change in future, so application code can break. Here we use the DTD that was valid on 1

January  2009.  We  will  show  how  to  extract  and  print  the  journal  year  and  title  and  the

article title and abstract, from a collection of PubMed XML files (which, for example, can

be downloaded from the PubMed website).

The  root  object  has  the  tag  ‘PubmedArticleSet’  and  underneath  that  are  one  or  more

children  with  the  tag  ‘PubmedArticle’,  although  here  we  will  just  look  at  the  first  child.

Underneath  that,  there  is  either  a  child  with  tag  ‘NCBIArticle’  or  ‘MedlineCitation’  and

we  will  assume  the  latter.  Continuing  down  the  hierarchy  we  eventually  get  to  the

information we want:

def printPubmedAbstracts(xmlFiles):

for xmlFile in xmlFiles:

tree = ElementTree.parse(xmlFile)

root = tree.getroot()

citationElem = root.find('PubmedArticle/MedlineCitation')

pmid = citationElem.findtext('PMID')

articleElem = citationElem.find('Article')

journalElem = articleElem.find('Journal')

journalTitle = journalElem.findtext('Title')

journalYear = journalElem.findtext('JournalIssue/PubDate/Year')

articleTitle = articleElem.findtext('ArticleTitle')

articleAbstract = articleElem.findtext('Abstract/AbstractText')

print('PMID = %s' % pmid)

print('journalYear = %s' % journalYear)

print('journalTitle = "%s"' % journalTitle)

print('articleTitle = "%s"' % articleTitle)



print('articleAbstract = "%s"' % articleAbstract)

print('')

The PMID is the PubMed ID of the citation. With variants of this code you could create

your own kind of short summary of MEDLINE citations.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   78   79   80   81   82   83   84   85   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish