Python Programming for Biology: Bioinformatics and Beyond


Reading a genome alignment file



Download 7,75 Mb.
Pdf ko'rish
bet263/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   259   260   261   262   263   264   265   266   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Reading a genome alignment file

The next example illustrates how to read a genome alignment file, which in this case is in

SAM format and could have been generated by the genomeAlign() defined above. Firstly,

the name of the genomic alignment is specified as alignFile and a set is defined to collect

the  chromosome  identifier.  The  SAM  file  reader  is  used  to  create  Alignment  objects,

noting  here  that  we  choose  to  loop  directly  though  the  reader’s  output,  rather  than

explicitly  defining  a  variable  that  represents  the  open  file  object.  The  alignment  objects

can then be interrogated with the .read attribute to access a sequence read object.




alignFile = 'examples/EcoliGenomeAlign.sam'

chromosomes = set()

for alignment in SAM_Reader(alignFile):

if alignment.aligned:

seqRead = alignment.read

print(seqRead.name)

print(seqRead.seq)

Also,  the  alignment’s  .iv  attribute  (a  GenomicInterval)  provides  a  means  of  accessing

the location that was aligned in the genome sequence:

genomeRegion = alignment.iv

chromo = genomeRegion.chrom

strand = genomeRegion.strand

start = genomeRegion.start

end = genomeRegion.end

chromosomes.add(chromo)

print(chromo, start, end, strand)

The set of chromosome identifiers which were collected above are converted into a list

so that they can be used to make a GenomicArray(), an object that can be used to house

any  data  that  we  may  care  to  map  to  an  efficient  Python  representation  of  the  genome,

which in this case will store the locations of where short-read sequences aligned. Note that

the typecode argument dictates what kind of data is housed in the array and can take the

values  ‘d’,  ‘i’,  ‘b’  and  ‘O’,  respectively  representing  floating  point  numbers,  integers,

Booleans and general Python objects.

chromosomes = list(chromosomes)

hitMap = GenomicArray(chromosomes, stranded=True, typecode='i')

This genomic array can be filled with the results from a SAM alignment file. A check is

made to ensure that the alignment was successful (i.e. that alignment.aligned is true) and

then the genomic interval genomeRegion can be used to set values in the genomic array,

noting that we can access the array as if it were a Python dictionary and set the value of 1

for the leading strand and -1 otherwise.

for alignment in SAM_Reader(alignFile):

if alignment.aligned:

genomeRegion = alignment.iv

if genomeRegion.strand == '+':

hitMap[genomeRegion] = 1

else:


hitMap[genomeRegion] = -1

To  access  the  results  of  the  hitMap  genomic  array  GenomicInterval  objects  are  used

again, although here they are created from scratch to represent the region of interest. As an

example intervals are created for the first chromosome from position zero to 2 Mbp, for

different strands.



chromo = chromosomes[0]

endPoint = 2000000

plusStrand = GenomicInterval(chromo, 0, endPoint, '+')

minusStrand = GenomicInterval(chromo, 0, endPoint, '-')

bothStrands = GenomicInterval(chromo, 0, endPoint, '.')

The hitMap can then simply be queried with the interval object. Converting the result to

a list allows the result to be easily plotted in a graph.

pyplot.plot(list(hitMap[plusStrand]))

pyplot.plot(list(hitMap[minusStrand]))

pyplot.show()




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   259   260   261   262   263   264   265   266   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish