Python Programming for Biology: Bioinformatics and Beyond


Separating and grouping data



Download 7,75 Mb.
Pdf ko'rish
bet367/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   363   364   365   366   367   368   369   370   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Separating and grouping data

When dealing with biological information, the question at hand often relates to the ability

to separate a pool of data into different groups. This may be a simple two-way split, for

example between people who do or do not have a disease, or it may involve many more

data categories. Sometimes, however, the number of groups may not be known and it may

not  even  be  appropriate  to  think  in  terms  of  rigidly  defined  groups.  Rather,  it  might  be

better  to  first  determine  the  most  discriminating  features  that  separate  the  data  and  then

investigate  afterwards  whether  groups  are  present,  and  if  so  how  many.  Any  kind  of

discrimination exercise naturally requires some form of information on which a judgement

may be based, such as the results from an experiment, which can even include things like

DNA sequences. Implicit in this sort of analysis is the notion that units of data are being

separated, but each unit may relate to several pieces of information. For example, if a unit

of data corresponds to a person they may be diagnosed by several different parameters and

test  measurements,  or  if  a  unit  is  a  biological  molecule  it  may  be  categorised  by  many

different properties and experimental results.

Whatever  the  situation  and  type  of  data,  sometimes  the  question  being  asked  tries  to




place each unit of data in one group or another, where there is no possibility of something

being in more than one group. Naturally, whether this is a valid assumption will depend on

context  and  the  formulation  of  the  problem.  In  reality,  a  hard  boundary  between  groups

might  not  actually  be  as  useful  as  a  more  fuzzy  membership.  Referring  again  to  the

problem of diagnosing a condition in people using experimental test results, it may be that

two people with identical test results have different outcomes; there may not be a simple

dividing  line  between  groups.  We  may  have  official  values  to  distinguish  between

‘underweight’, ‘normal’ and ‘overweight’ people to help guide healthcare, but of course it

is a continuous scale, so it may be sufficient to merely separate people (e.g. using height,

weight and gender information) and be able to make more flexible decisions, not based on

rigid categories.

Where  there  are  discrete  groups,  identification  and  classification  will  sometimes  be

based on rich, well-studied data, e.g. people who definitely do or do not have a condition,

but  the  groupings  may  then  be  used  to  make  predictions  with  more  limited  information,

where  there  is  no  certainty.  In  such  situations,  it  may  be  appropriate  to  approach  the

classification  of  a  unit  of  data,  within  one  group  or  another,  in  a  probabilistic  manner.

While

Chapter 21



 deals  with  the  concept  of  probability,  here  we  focus  on  the  process  of

separating data, in terms of both making groups and determining the most discriminating

information.  We  refer  to  the  formation  of  discrete  groups  by  bringing  together  units  of

data as clustering,  and  use  discrimination  to  mean  how  we  find  the  best  combination  of

the different kinds of data feature to perform separation.


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   363   364   365   366   367   368   369   370   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish