Generically sorts arbitrary shaped data (for example multiple arrays, 1,2 or
3-d matrices, and so on) using a quicksort or mergesort. This class addresses
two problems, namely
- Sorting multiple arrays in sync
- Sorting by multiple sorting criteria (primary, secondary,
tertiary, ...)
Sorting multiple arrays in sync
Assume we have three arrays X, Y and Z. We want to sort all three arrays by X
(or some arbitrary comparison function). For example, we have
X=[3, 2, 1], Y=[3.0, 2.0, 1.0], Z=[6.0, 7.0, 8.0]. The output should
be
X=[1, 2, 3], Y=[1.0, 2.0, 3.0], Z=[8.0, 7.0, 6.0].
How can we achive this? Here are several alternatives. We could ...
- make a list of Point3D objects, sort the list as desired using a
comparison function, then copy the results back into X, Y and Z. The classic
object-oriented way.
- make an index list [0,1,2,...,N-1], sort the index list using a
comparison function, then reorder the elements of X,Y,Z as defined by the
index list. Reordering cannot be done in-place, so we need to copy X to some
temporary array, then copy in the right order back from the temporary into X.
Same for Y and Z.
- use a generic quicksort or mergesort which, whenever two elements in X
are swapped, also swaps the corresponding elements in Y and Z.
Alternatives 1 and 2 involve quite a lot of copying and allocate significant
amounts of temporary memory. Alternative 3 involves more swapping, more
polymorphic message dispatches, no copying and does not need any temporary
memory.
This class implements alternative 3. It operates on arbitrary shaped data. In
fact, it has no idea what kind of data it is sorting. Comparisons and
swapping are delegated to user provided objects which know their data and can
do the job.
Lets call the generic data g (it may be one array, three linked
lists or whatever). This class takes a user comparison function operating on
two indexes (a,b), namely an
IntComparator. The comparison
function determines whether g[a] is equal, less or greater than
g[b]. The sort, depending on its implementation, can decide to swap
the data at index a with the data at index b. It calls a
user provided
cern.colt.Swapper object that knows how to swap the
data of these indexes.
The following snippet shows how to solve the problem.
final int[] x;
final double[] y;
final double[] z;
x = new int[] { 3, 2, 1 };
y = new double[] { 3.0, 2.0, 1.0 };
z = new double[] { 6.0, 7.0, 8.0 };
// this one knows how to swap two indexes (a,b)
Swapper swapper = new Swapper() {
public void swap(int a, int b) {
int t1;
double t2, t3;
t1 = x[a];
x[a] = x[b];
x[b] = t1;
t2 = y[a];
y[a] = y[b];
y[b] = t2;
t3 = z[a];
z[a] = z[b];
z[b] = t3;
}
};
// simple comparison: compare by X and ignore Y,Z
<br>
IntComparator comp = new IntComparator() {
public int compare(int a, int b) {
return x[a] == x[b] ? 0 : (x[a] < x[b] ? -1 : 1);
}
};
System.out.println("before:");
System.out.println("X=" + Arrays.toString(x));
System.out.println("Y=" + Arrays.toString(y));
System.out.println("Z=" + Arrays.toString(z));
GenericSorting.quickSort(0, X.length, comp, swapper);
// GenericSorting.mergeSort(0, X.length, comp, swapper);
System.out.println("after:");
System.out.println("X=" + Arrays.toString(x));
System.out.println("Y=" + Arrays.toString(y));
System.out.println("Z=" + Arrays.toString(z));
|
Sorting by multiple sorting criterias (primary, secondary, tertiary, ...)
Assume again we have three arrays X, Y and Z. Now we want to sort all three
arrays, primarily by Y, secondarily by Z (if Y elements are equal). For
example, we have
X=[6, 7, 8, 9], Y=[3.0, 2.0, 1.0, 3.0], Z=[5.0, 4.0, 4.0, 1.0]. The
output should be
X=[8, 7, 9, 6], Y=[1.0, 2.0, 3.0, 3.0], Z=[4.0, 4.0, 1.0, 5.0].
Here is how to solve the problem. All code in the above example stays the
same, except that we modify the comparison function as follows
//compare by Y, if that doesn't help, reside to Z
IntComparator comp = new IntComparator() {
public int compare(int a, int b) {
if (y[a] == y[b])
return z[a] == z[b] ? 0 : (z[a] < z[b] ? -1 : 1);
return y[a] < y[b] ? -1 : 1;
}
};
|
Notes
Sorts involving floating point data and not involving comparators, like, for
example provided in the JDK
java.util.Arrays and in the Colt
cern.colt.Sorting handle floating point numbers in special ways to
guarantee that NaN's are swapped to the end and -0.0 comes before 0.0.
Methods delegating to comparators cannot do this. They rely on the
comparator. Thus, if such boundary cases are an issue for the application at
hand, comparators explicitly need to implement -0.0 and NaN aware
comparisons. Remember: -0.0 < 0.0 == false,
(-0.0 == 0.0) == true, as well as
5.0 < Double.NaN == false, 5.0 > Double.NaN == false.
Same for float.
Implementation
The quicksort is a derivative of the JDK 1.2 V1.26 algorithms (which are, in
turn, based on Bentley's and McIlroy's fine work). The mergesort is a
derivative of the JAL algorithms, with optimisations taken from the JDK
algorithms. Both quick and merge sort are "in-place", i.e. do not allocate
temporary memory (helper arrays). Mergesort is stable (by definition),
while quicksort is not. A stable sort is, for example, helpful, if matrices
are sorted successively by multiple columns. It preserves the relative
position of equal elements.