Python Programming for Biology: Bioinformatics and Beyond


Figure 22.7.  Pearson’s correlation coefficient values (r) for a variety of different



Download 7,75 Mb.
Pdf ko'rish
bet362/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   358   359   360   361   362   363   364   365   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 22.7.  Pearson’s correlation coefficient values (r) for a variety of different

data samples. The coefficient represents the degree of linear covariance in the two

quantities and is scaled so that the value lies between −1 (for negative correlation) and +1

(positive correlation). Values near zero indicate the quantities are non-linearly correlated,

although there may be other patterns or forms of non-linear correlation, which would not

be exposed by this test.

The  correlation  coefficient  is  readily  calculated  in  Python  using  the  numpy.corrcoef()

function, and as with the covariance function we get back a matrix of values, for all pairs

of inputs. Testing on the previously used values we get:

from numpy import corrcoef

r1 = corrcoef(xVals, yVals1)[0, 1] # Result is: 0.0231

r2 = corrcoef(xVals, yVals2)[0, 1] # Result is: 0.8145

Hence  we  can  see  that  xVals  has  almost  no  correlation  with  yVals1,  but  has  a  large

positive  correlation  (0.8145)  with  yVals2,  as  we  might  expect.  If  we  wished  we  could

naturally also derive the correlation coefficient from the previously calculated covariance,

remembering that we use the unbiased sample standard deviation (ddof=1):

from numpy import std

cov2 = cov2[0,1] # X-Y element

stdDevX = std(xVals, ddof=1)

stdDevY = std(yVals2, ddof=1)

r2 = cov2 / (stdDevX*stdDevY)

Although  the  correlation  coefficient  is  insensitive  to  different  sample  means  and

variances  for  the  quantities,  it  should  not  be  forgotten  that  it  is  only  a  test  of  a  linear

relationship.  There  may  be  a  distinct  non-random,  non-linear  relationship  between  the



quantities which will not be picked up by the test, although in some instances it is possible

to transform a quantity (e.g. by taking a logarithm) so that the relationship becomes linear.

We  can  subject  the  correlation  coefficient  to  significance  tests  if  we  consider  an

uncorrelated  null  hypothesis,  i.e.  where  the  underlying  correlation  coefficient  is  0.  The

basic  idea  here  is  that  even  if  distributions  are  really  uncorrelated  they  can  appear  to  be

correlated (points are coincidentally linear), especially if the size of a sample is small. If

the underlying distributions are normal then it can be shown that the null hypothesis can

be rejected at the 0.95 confidence level if the test statistic

is larger than the corresponding T-distribution percent point function with confidence level

0.975 (because our test is two-tailed) and n−2 degrees of freedom. Here n is the number of

sample  points  in  each  of  X  and  Y.  We  can  invert  the  above  function,  and  solve  for  the

correlation coefficient r as a function of n.

Accordingly  we  can  plot  the  correlation  coefficient  as  a  function  of  the  sample  size,  as

illustrated in

Figure 22.8

. If r is larger than the value then the null hypothesis is rejected.

This  is  readily  done  in  Python  using  the  .ppf()  function  of  the  scipy.stats.t  distribution

object and applying the above equation:

from numpy import sqrt

from scipy.stats import t

nVals = range(5, 101)

rVals = []

for n in nVals:

tVal = t(n-2).ppf(0.975)

tVal2 = tVal * tVal

rVal = sqrt(tVal2/(n-2+tVal2))

rVals.append(rVal)

pyplot.plot(nVals, rVals, color='black')

pyplot.show()




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   358   359   360   361   362   363   364   365   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish