Thomas W. Christopher Consulting in High Performance Computing.
Training in Concurrency, Networking, and Parallelism, especially in Python and Java.

Shell Sorts

Abstract

Continuing a tutorial on sorting algorithms, this page animates Shell sorts.

Author
Thomas W. Christopher

Shell Sort is basically a trick to make Insertion Sort run faster.

Since Insertion Sort removes one inversion per exchange, it cannot run faster than the number of inversions in the data, which in worst case is O(N2). Of course, it can't run faster than N, either, because it must look at each element, whether or not the element is out of position. We can't do any thing about the lower bound O(N), but we can do something about the number of inversions. If we can get the array in close to sorted order, then Insertion Sort won't have many inversions to remove.

The trick in Shell Sort is to sort the subsequences of elements spaced k elements apart. There are k such sequences starting at positions 0 through k-1 in the array. In these sorts, elements k positions apart are exchanged, removing between 1 and 2(k-1)+1 inversions.

Indeed, not satisfied with just one prepass, a Shell Sort may do several passes with decreasing values of k. The following examples experiment with different series of values of k.

Shell Sort Shell Sort. In this first example, we sort all subsequences of  elements 8 apart, then 4, 2, and 1. As we sort them, we color the elements based on what subsequences they are in. It becomes clear that this version of Shell Sort becomes a kind of merge sort where pairs of  subsequences are merged together. Comparing the performance of this version to the two remaining, it also appears that this merging is not a great idea. The subsequences can be quite a distance apart.
Here is the code:

for (k=8;k>0;k/=2) {
for(m=0;m<k;m++) {
setColor(m,series[m]);
for (j=m+k;j<N;j+=k) {
setColor(j,series[m]);
for (i=j;i>=k && get(i)<get(i-k);i-=k) {
exchange(i,i-k);
}
}
}
}

The colors are assigned from the array:

Color series[]={Color.blue,Color.red,
Color.green, Color.yellow,
Color.black,Color.white,
Color.cyan, Color.magenta};

 

Shell Sort 2 Shell Sort 2. For our second attempt, we sort all subsequences of  elements 7 apart, then 5, 3, and 1. For each new step size, the elements of the previous subsequences are mixed. Notice that the subsequences end up quite close together by the last passes.

Shell Sort 3 Shell Sort 3. The first two attempts used a fixed series of values of k. The initial pass with the largest value of k itself costs O(N2). It may not help much as the array size grows larger. What we want is an adaptive series.

For this third attempt, we first sort the array with k = N/5 and N/7. Then we sort all subsequences of  elements 7 apart, then 5, 3, and 1 as in Shell Sort 2. It runs noticably faster than the other two implementations.

The in-line code is:

    k=N/5;
for(m=0;m<k;m++) {
isort(m,k);
}
k=N/7;
for(m=0;m<k;m++) {
isort(m,k);
}
for (k=7;k>0;k-=2) {
for(m=0;m<k;m++) {
isort(m,k);
}
}

and the subroutine isort is

void isort(int m,int k) {
int i,j;
setColor(m,series[m%series.length]);
for (j=m+k;j<N;j+=k) {
setColor(j,series[m%series.length]);
for (i=j;i>=k && get(i)<get(i-k);i-=k) {
exchange(i,i-k);
}
}
}

In this example, the colors have to be assigned modulo the number of colors. In the initial passes the same color is assigned to more than one series.

Parallel Shell Sorts

The "divide and conquer" technique for parallel program design works nicely with Shell sort. Each phase of Shell sort divides the array into several non-overlapping subarrays that can be sorted independently. It is only between the phases that all threads must synchronize before they can continue.

Shell Sort using barriers. This version of Shell sort uses a fixed number of threads, numThreads, (here four) that synchronize at barriers between the phases. Each thread has a unique number between 0 and numThreads-1. Sup[pose there are four threads and seven subarrays, starting at elements 0 through 6. Thread 0 will sort the subarrays beginning at positions 0 and 4; thread 1, at positions 1 and 5; thread 2, at positions 2 and 6; and thread 3 the subarray beginning at position 3. Generally, thread j handles all subarrays beginning at positions j, j+numThreads, j+2*numThreads, .... Unfortuantely, the last phase sorts the entire array, and is executed by a single thread.

  ShellSort4

Shell Sort using RunQueue and Accumulator. This version of Shell sort uses Runnables to perform parts of the sort. There are two kinds of Runnables:

The initial subsorts are all done with SortPass runnables.

The later sorts are divided into parts. Each part is sorted by a SortPass task. Then pairs of the parts are merged with Imerge, and pairs of those are merged, and so on. The accumulators are used instead of TerminationGroups, although they would work as well.

On to Quick Sort


These AdSense ads may be a little inappropriate for an academic page, but they may help pay the costs of maintaining the website.