Source code online books for professionals by professionals



Download 4,67 Mb.
Pdf ko'rish
bet103/266
Sana31.12.2021
Hajmi4,67 Mb.
#213682
1   ...   99   100   101   102   103   104   105   106   ...   266
Bog'liq
2 5296731884800181221

BLaCK BOX: tIMSOrt
the algorithm hiding in 
list.sort
 is one invented (and implemented) by tim peters, one of the big names in the 
python community.
7
 the algorithm, aptly named timsort, replaces an earlier algorithm that had lots of tweaks to 
handle special cases such as segments of ascending and descending values, and the like. in timsort, these cases 
are handled by the general mechanism, so the performance is still there (and in some cases, it’s much improved), 
but the algorithm is cleaner and simpler. the algorithm is still a bit too involved to explain in detail here; i’ll try to 
give you a quick overview. For more details, take a look at the source.
8
timsort is a close relative to merge sort. it’s an in-place algorithm, in that it merges segments and leaves the 
result in the original array (although it uses some auxiliary memory during the merging). instead of simply sorting 
the array half-and-half and then merging those, though, it starts at the beginning, looking for segments that are 
already sorted (possibly in reverse), called runs. in random arrays, there won’t be many, but in many kinds of real 
data, there may be a lot—giving the algorithm a clear edge over a plain merge sort and a linear running time in 
the best case (and that covers a lot of cases beyond simply getting a sequence that’s already sorted).
as timsort iterates over the sequence, identifying runs and pushing their bounds onto a stack, it uses some 
heuristics to decide which runs are to be merged when. the idea is to avoid the kind of merge imbalance that 
would give you a quadratic running time while still exploiting the structure in the data (that is, the runs). First, any 
really short runs are artificially extended and sorted (using a stable insertion sort). Second, the following invariants 
are maintained for the three topmost runs on the stack, 
A

B
, and 
C
 (with 
A
 on top): 
len(A) > len(B) + len(C)
 
and 
len(B) > len(C)
. if the first invariant is violated, the smaller of 
A
 and 
C
 is merged with 
B
, and the result 
replaces the merged runs in the stack. the second invariant may still not hold, and the merging continues until 
both invariants hold.
the algorithm uses some other tricks as well, to get as much speed as possible. if you’re interested, i recommend 
you check out the source.
9
 if you’d rather not read C code, you could also take a look at the pure python version 
of timsort, available as part of the pypy project.
10
 their implementation has excellent comments and is clearly 
written. (the pypy project is discussed in appendix a.)
 
7
Timsort is, in fact, also used in Java SE 7, for sorting arrays.
 
8
See, for example, the file listsort.txt in the source code (or online, 
http://svn.python.org/projects/python/
trunk/
Objects/listsort.txt
).
 
9
You can find the actual C code at 
http://svn.python.org/projects/python/trunk/Objects/listobject.c
.
10
See 
https://bitbucket.org/pypy/pypy/src/default/rpython/rlib/listsort.py
.


Chapter 6 

 DiviDe, Combine, anD Conquer
127
How Fast Can We Sort?
One important result about sorting is that divide-and-conquer algorithms such as merge sort are optimal; for arbitrary 
values (where we can figure out which is bigger) it’s impossible, in the worst case, to do any better than W(n lg n). An 
important case where this holds is when we sort arbitrary  real numbers.
11
Note
 

  Counting sort and its relatives (discussed in Chapter 4) seem to break this rule. note that there we can’t sort 
arbitrary values—we need to be able to count occurrences, which means that the objects must be hashable, and we 
need to be able to iterate over the value range in linear time.
How do we know this? The reasoning is actually quite simple. First insight: Because the values are arbitrary and 
we’re assuming that we can figure out only whether one of them is greater than another, each object comparison boils 
down to a yes/no question. Second insight: The number of orderings of n elements is n!, and we’re looking for exactly 
one of them. Where does that get us? We’re back to “think of a particle,” or, in this case, “think of a permutation.”  
This means that the best we can do is to use W(lg n!) yes/no questions (the comparisons) to get the right permutation 
(that is, to sort the numbers). And it just so happens that lg n! is asymptotically equivalent to n lg n.
12
 In other words, 
the running time in the worst case is W(lg n!) = W(n lg n).
How, you say, do we arrive at this equivalence? The easiest way is to just use Stirling’s approximation, which 
says that n! is Q(n

Download 4,67 Mb.

Do'stlaringiz bilan baham:
1   ...   99   100   101   102   103   104   105   106   ...   266




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish