Thomas W. Christopher | Consulting in High Performance Computing. Training in Concurrency, Networking, and Parallelism, especially in Python and Java. |
Abstract
Continuing a tutorial on sorting algorithms, this page animates heap sort.
Author
Thomas W. Christopher
Heap sort is a bit strange in that it begins by typically increasing the number of inversions in the array, but it is still an O(N log N) sorting algorithm. Indeed, guaranteed O(N log N), unlike quick sort that is quadratic in the worst case.
One of the two meanings for "heap" in computer science is a representation for a priority queue. A heap has these elements:
The essence of heap sort is this:
The two examples of heap sort presented here differ only in how the array is initially made into a heap. Both use the same algorithm thereafter.
Suppose that there are N items to be sorted occupying elements 1 through N of the conceptual array. Since Java uses zero-origin addressing, the elements will actually occupy elements 0 through N-1 of the real array. This will cause a certain amount of confusion in the algorithms. Some places the indices are into the conceptual array. Other places, where we are actually examining or exchanging elements, they will be into the real array.
Anyway, once the array has been made into a heap, this is the loop to sort it:
for (int n=N;n>1;n--) {
exchange(n-1,0);
siftUp(1,n-1);
}
In this code, N is the number of elements in the array. It is the upper bound of the conceptual array. In the loop, the conceptual array is divided into two parts: elements 1 through n are a heap. Elements n+1 through N are sorted.
The exchange(n-1,0); uses indices in the real array. It exchanges elements 1 and n in the conceptual array. After this exchange, elements n through N are sorted. Elements 1 through n-1, however, are no longer a heap, but it is only element 1 that is out of place.
Method siftUp takes as its parameters indices in the conceptual array. At the call siftUp(i,n), the following are true:
The basic functioning of siftUp is as follows:
The actual code uses a loop rather than a recursive call.
void siftUp(int i,int n) {
int j,k;
for(j=i,k=2*j;k<=n;j=k,k=2*j) {
if (k<n && get(k-1)<get(k)) k++;
if (get(j-1)<get(k-1)) {
exchange(j-1,k-1);
} else break;
}
}
When we are sifting up from node 1, we take on the order of log N steps since there are log N levels in the tree that the value may have to move through.Therefore heapsort will take O(N log N) time, sifting up to log N levels for each element sorted. Even though the heap gets smaller as the array is sorted, it doesn't get samller very fast. Half the elements in the initial heap are in the leaves, and after being exchanged with the root, each of them can be expected to move about log N levels back.
Heap Sort This first version of heapsort builds the heap the way elements would be added to a priority queue one at a time. The inline code is:
for(i=2;i<=N;i++) siftDown(i);
Method siftDown(i) assumes that the elements 1 through i-1 already form a heap. It moves the value in position i rootwards: if it is larger than its parent, siftDown exchanges it with its parent and then repeats.
void siftDown(int i) {
for (int j=i/2;i>1;i=j,j=i/2) {
if (get(j-1)<get(i-1)) {
exchange(i-1,j-1);
} else break;
i=j;
}
}
There are two problems with this use of siftDown:
Heap Sort2 The trick in this second version of heap sort is to use siftUp to build the heap.
Observe that each leaf of the tree is automatically a heap by default.
We start off at the highest numbered internal node, whose subtrees are already heaps, and make its subtree into a heap. Then we proceed downwards through the array. For each node i, we make its subtree into a heap, knowing that all trees rooted in nodes numbered greater than i are already heaps, the requirement for using siftUp.
The code to create the initial heap is:
for(i=N/2;i>0;i--) siftUp(i,N);
This code, strangely, runs in linear time. The reasoning is as follows:
(N/2)*0 + (N/4)*1 + (N/8)*2 + (N/16)*3 + ...
on the order of N.
But even though the array can be "heapified" in linear time, the actual sorting still takes N log N time.
These AdSense ads may be a little inappropriate for an academic page, but they may help pay the costs of maintaining the website.