Thomas W. Christopher | Consulting in High Performance Computing. Training in Concurrency, Networking, and Parallelism, especially in Python and Java. |
Abstract
Continuing a tutorial on sorting algorithms, this page animates Shell sorts.
Author
Thomas W. Christopher
Shell Sort is basically a trick to make Insertion Sort run faster.
Since Insertion Sort removes one inversion per exchange, it cannot run faster than the number of inversions in the data, which in worst case is O(N2). Of course, it can't run faster than N, either, because it must look at each element, whether or not the element is out of position. We can't do any thing about the lower bound O(N), but we can do something about the number of inversions. If we can get the array in close to sorted order, then Insertion Sort won't have many inversions to remove.
The trick in Shell Sort is to sort the subsequences of elements spaced k elements apart. There are k such sequences starting at positions 0 through k-1 in the array. In these sorts, elements k positions apart are exchanged, removing between 1 and 2(k-1)+1 inversions.
Indeed, not satisfied with just one prepass, a Shell Sort may do several passes with decreasing values of k. The following examples experiment with different series of values of k.
Shell Sort. In this first example, we sort all subsequences of elements 8
apart, then 4, 2, and 1. As we sort them, we color the elements based on what subsequences
they are in. It becomes clear that this version of Shell Sort becomes a kind of merge sort
where pairs of subsequences are merged together. Comparing the performance of this
version to the two remaining, it also appears that this merging is not a great idea. The
subsequences can be quite a distance apart.
Here is the code:
for (k=8;k>0;k/=2) {
for(m=0;m<k;m++) {
setColor(m,series[m]);
for (j=m+k;j<N;j+=k) {
setColor(j,series[m]);
for (i=j;i>=k && get(i)<get(i-k);i-=k) {
exchange(i,i-k);
}
}
}
}
The colors are assigned from the array:
Color series[]={Color.blue,Color.red,
Color.green, Color.yellow,
Color.black,Color.white,
Color.cyan, Color.magenta};
Shell Sort 2. For our second attempt, we sort all subsequences of elements 7
apart, then 5, 3, and 1. For each new step size, the elements of the previous subsequences
are mixed. Notice that the subsequences end up quite close together by the last passes.
Shell Sort 3. The first two attempts used a fixed series of values of k. The initial pass with the largest value of k itself costs O(N2). It may not help much as the array size grows larger. What we want is an adaptive series.
For this third attempt, we first sort the array with k = N/5 and N/7. Then we sort all subsequences of elements 7 apart, then 5, 3, and 1 as in Shell Sort 2. It runs noticably faster than the other two implementations.
The in-line code is:
k=N/5;
for(m=0;m<k;m++) {
isort(m,k);
}
k=N/7;
for(m=0;m<k;m++) {
isort(m,k);
}
for (k=7;k>0;k-=2) {
for(m=0;m<k;m++) {
isort(m,k);
}
}
and the subroutine isort is
void isort(int m,int k) {
int i,j;
setColor(m,series[m%series.length]);
for (j=m+k;j<N;j+=k) {
setColor(j,series[m%series.length]);
for (i=j;i>=k && get(i)<get(i-k);i-=k) {
exchange(i,i-k);
}
}
}
In this example, the colors have to be assigned modulo the number of colors. In the initial passes the same color is assigned to more than one series.
The "divide and conquer" technique for parallel program design works nicely with Shell sort. Each phase of Shell sort divides the array into several non-overlapping subarrays that can be sorted independently. It is only between the phases that all threads must synchronize before they can continue.
Shell Sort using barriers. This version of Shell sort uses a fixed number of threads, numThreads, (here four) that synchronize at barriers between the phases. Each thread has a unique number between 0 and numThreads-1. Sup[pose there are four threads and seven subarrays, starting at elements 0 through 6. Thread 0 will sort the subarrays beginning at positions 0 and 4; thread 1, at positions 1 and 5; thread 2, at positions 2 and 6; and thread 3 the subarray beginning at position 3. Generally, thread j handles all subarrays beginning at positions j, j+numThreads, j+2*numThreads, .... Unfortuantely, the last phase sorts the entire array, and is executed by a single thread.
Shell Sort using RunQueue and Accumulator. This version of Shell sort uses Runnables to perform parts of the sort. There are two kinds of Runnables:
The initial subsorts are all done with SortPass runnables.
The later sorts are divided into parts. Each part is sorted by a SortPass task. Then pairs of the parts are merged with Imerge, and pairs of those are merged, and so on. The accumulators are used instead of TerminationGroups, although they would work as well.
These AdSense ads may be a little inappropriate for an academic page, but they may help pay the costs of maintaining the website.