Java VM 1.6.0 Sorting Collections [Java Programming]

Prev: NetBeans/StrutsTestCase - strange error message
Next: encrypted javamail MimeMultipart

From: Tom Anderson on 10 Jun 2010 17:51

On Thu, 10 Jun 2010, Joshua Cranmer wrote:

> On 06/10/2010 07:57 AM, Tom Anderson wrote:
>
>> 1. Java's standard sort has to be stable, which means in practice it's a
>> mergesort, whereas STL's doesn't, so it can be a quicksort. Quicksort
>> will typically be faster than mergesort for random data.
>
> Arrays.sort() will use quicksort for integral types.

For all primitive types, in fact, including doubles. Although i note it
won't sort booleans (for shame!).

I should not have said "Java's standard sort"; i should have said "Java's
standard sort for Lists", or "Collections' standard sort". Since the OP
has a List and doesn't want to use an array, the behaviour of Arrays.sort
is not terribly important - unless the OP wants to get clever and wrap an
array of primitive doubles in a List interface, as has been suggested.

tom

--
One chants out between two worlds Fire - Walk With Me.

From: Patricia Shanahan on 10 Jun 2010 18:13

Jim Janney wrote:
> Jim Janney <jjanney(a)shell.xmission.com> writes:
>
>> You're copying the list to an array, sorting the array, and then
>> copying the array back to the list. You might try coding up a simple
>> quicksort that operates directly on the ArrayList to see what kind of
>> times that gives you.
>
> The evil approach would be to use reflection to get the elementData
> field in ArrayList, and call Arrays.sort directly on that.
>

Before doing that, I suggest testing the performance of sorting an array
of Double references. There are two differences between the List sort
and sorting a double[]:

1. The copying between the List and the array of references that
actually gets sorted.

2. The extra cost of doing a Comparable compare rather than a double
comparison.

The elementData trick would avoid the first of these costs, but not the
second.

Patricia

From: Roedy Green on 10 Jun 2010 18:44

On Thu, 10 Jun 2010 14:51:07 -0700 (PDT), Lew <lew(a)lewscanon.com>
wrote, quoted or indirectly quoted someone who said :

>
>Theoretically HotSpot will be a native compiler for this test, as was
>pointed out to me upthread.

The problem with a HotSpot and a benchmark is:

1. the first part of your benchmark is run unoptimised.

2. part of your benchmark is chewed up optimising the code to native.
With a native compiler, this is all done before your benchmark starts.
HotSpot cannot afford to pore over the fine tuning of the optimisation
because the clock is running.

To mask those effects you need a very long-running benchmarks, or
perhaps running various lengths of benchmark and doing some math to
factor out the effects. You can also run multiple trials and throw
away the early/slow ones.

--
Roedy Green Canadian Mind Products
http://mindprod.com

Have you ever noticed that any computer search in the movies, is always linear, with, for example, candidate fingerprints flashing up on the screen one after another? The public is still under the delusion that electronic files are microscopic filing cabinets made out of tiny wires or magnetic patches inside the computer. Most lay people are surprised that it is easy for a computer to file things simultaneously by a dozen different schemes, and that they can have any report printed in any number of different sorted orders. With physical files, they are limited to one ordering/access.

From: Jim Janney on 10 Jun 2010 18:49

Patricia Shanahan <pats(a)acm.org> writes:

> Jim Janney wrote:
>> Jim Janney <jjanney(a)shell.xmission.com> writes:
>>
>>> You're copying the list to an array, sorting the array, and then
>>> copying the array back to the list. You might try coding up a simple
>>> quicksort that operates directly on the ArrayList to see what kind of
>>> times that gives you.
>>
>> The evil approach would be to use reflection to get the elementData
>> field in ArrayList, and call Arrays.sort directly on that.
>>
>
> Before doing that, I suggest testing the performance of sorting an array
> of Double references. There are two differences between the List sort
> and sorting a double[]:
>
> 1. The copying between the List and the array of references that
> actually gets sorted.
>
> 2. The extra cost of doing a Comparable compare rather than a double
> comparison.
>
> The elementData trick would avoid the first of these costs, but not the
> second.

And the cost of copying is linear, so for large lists the cost of
sorting will still dominate. Log2 of a million is around twenty, so
avoiding the copying may not buy very much.

--
Jim Janney

From: Joshua Cranmer on 10 Jun 2010 19:26

On 06/10/2010 06:49 PM, Jim Janney wrote:
> And the cost of copying is linear, so for large lists the cost of
> sorting will still dominate. Log2 of a million is around twenty, so
> avoiding the copying may not buy very much.

You have to do two copies in List sort, so moving to arrays eliminates 2
out of 22 linear scans, or about 9% of the total sort runtime. Actually,
at the maximum array size (2^31 - 1), lg is 31, so you'd get about 6%
speedup. Perhaps Java should have implemented Collections.sort in such a
manner that sorting something that implements List and RandomAccess is
sorted in-place.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: NetBeans/StrutsTestCase - strange error message
Next: encrypted javamail MimeMultipart