From: Arne Vajhøj on
On 10-06-2010 17:13, Roedy Green wrote:
> On Wed, 9 Jun 2010 21:00:08 -0700 (PDT), petek1976
> <pete.karousos(a)earthlink.net> wrote, quoted or indirectly quoted
> someone who said :
>> I have been having some trouble with the performance of Java. In my
>> code I have narrowed it down to the sort:
>
> another optimisation is to use a native compiler. See

Questionable.

Today conversion to native is done to make decompilation
harder not to improve performance.

Arne

From: Arne Vajhøj on
On 10-06-2010 16:51, Roedy Green wrote:
> On Wed, 9 Jun 2010 21:00:08 -0700 (PDT), petek1976
> <pete.karousos(a)earthlink.net> wrote, quoted or indirectly quoted
> someone who said :
>
>> final int vecSize = 1000000;
>> ArrayList<Double> vec = new ArrayList<Double>(vecSize);
>
> This is a fair hunk of RAM to carve out of the heap.

Not really.

4 MB / 8 MB is nothing today.

Arne
From: Arne Vajhøj on
On 10-06-2010 00:00, petek1976 wrote:
> I have been having some trouble with the performance of Java. In my
> code I have narrowed it down to the sort:
>
> For instance:
>
>
> import java.util.*;
> public class SortTest
> {
> public static void main(String args[])
> {
> final int vecSize = 1000000;
> ArrayList<Double> vec = new ArrayList<Double>(vecSize);
> final int numItr = 10;
> ArrayList<Double> times = new ArrayList<Double>(numItr);
> for (int i=0; i<numItr; ++i)
> {
> for (int k=0; k<vecSize; ++k)
> {
> vec.add(k,Math.random());
> }
> long startTime = System.nanoTime();
> java.util.Collections.sort(vec);
> long endTime = System.nanoTime();
> times.add(i,(endTime-startTime)*1.e-9);
> vec = new ArrayList<Double>(vecSize);
> }
> double avg=0.0;
> for (Double val:times)
> {
> avg += val;
> }
> avg /= times.size();
> System.out.println("To sort " + vec.size() + " elements " +
> numItr + " times took " + avg + " seconds on average");
>
> }
> }
>
>
>
> This prints about 0.58 seconds on average. How can I optimize this
> reasonably? Note I am only timing the sort function. Using an array
> instead of ArrayList is not an option. I tried Vector but it didn't
> help. The equivalent C++ code (using vector) runs in about 0.37
> seconds when built with optimization (g++ version 4.2.1 using -03
> optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
> with 4 GB of ram. I also tried this example on Linux and saw no
> difference. What is causing over 40% difference in speed? I must be
> doing something wrong in my java code.
>
>
> #include<iostream>
> #include<vector>
> #include<cstdlib>
> #include<algorithm>
> #include<sys/time.h>
> #include<numeric>
> using namespace std;
>
> template<class T>
> void FillRand(T start, T end)
> {
> while (start != end)
> {
> *start++ = double(rand())/RAND_MAX;
> }
> }
>
> class TimeStamp
> {
> public:
> void start()
> {
> gettimeofday(&st_val,0);
> }
> void stop()
> {
> gettimeofday(&end_val,0);
> }
> double elapsedSec() const
> {
> return end_val.tv_sec-st_val.tv_sec +
> (end_val.tv_usec-st_val.tv_usec)*1.e-6;
> }
> private:
> timeval st_val;
> timeval end_val;
> };
>
> int main()
> {
> vector<double> vec(1000000);
> const long num_itr = 10;
> vector<double> time_stamps(num_itr);
> TimeStamp ts;
> for (long i=0; i<num_itr; ++i)
> {
> FillRand(vec.begin(),vec.end());
> ts.start();
> std::sort(vec.begin(),vec.end());
> ts.stop();
> time_stamps[i] = ts.elapsedSec();
> }
> double avg=std::accumulate(time_stamps.begin(),time_stamps.end(),
> 0.0)/time_stamps.size();
> cout<< "To sort "<< vec.size()<< " elements took "<< avg<< "
> seconds on average\n";
> }

It does not sound surprising to me.

If we are using C terminology, then:
- in Java you have an array of pointers to double
- in C++ you have an array of double

It sounds very plausible that this will cause a performance
difference.

Use double[] in Java.

Or live with the performance.

Arne

From: Thomas Pornin on
According to Tom Anderson <twic(a)urchin.earth.li>:
> In terms of the java vs C++ comparison:
>
> 1. Java's standard sort has to be stable, which means in practice it's a
> mergesort, whereas STL's doesn't, so it can be a quicksort. Quicksort will
> typically be faster than mergesort for random data.
>
> 2. Java handles the doubles as objects on the heap here, whereas C++, i
> believe, will handle them as primitives (because it resolves templates at
> compile time, essentially, whereas java just erases types and uses the
> same code at runtime for all element types). Which means:
>
> (a) Java's data takes up more memory, and so requires more cache and
> memory bandwidth to work with (you have a million objects, so you're
> looking at 8 MB just for the raw numbers; the object overhead probably
> about quadruples that).
>
> (b) Java has to do comparisons by making virtual (worse - interface!)
> method calls, whereas C++ can just use a single machine instruction.

Actually, for all those reasons plus a few others (Java array accesses
are checked for boundaries; the JVM cannot optimize code as aggressively
because it must work fast), really optimized C++ code ought to be widely
faster than Java code. For that kind of work (no I/O, trivial memory
allocation) a factor of 2 to 4 is typical. That Java achieves 60% of the
C++ speed on this is impressive (it can happen, I have seen it, but not
on code dominated by array accesses), or, more accurately, suspicious.

I made a few measures on the machine I am currently using: an Intel
Core2 Q8200, clocked at 2.33 GHz. GCC-4.3.2, JVM is Sun's 1.6.0_01 (an
update is long past due, but that's not the point here). The JVM is
launched with "-server". The CPU runs in 32-bit mode.

A simplistic C code using qsort() sorts 1000000 random doubles in 0.37s.
Java code which uses an ArrayList<Double> and Collections.sort() does
the same job (after warm-up) in 0.83s. This looks more like what I would
expect (a factor of 2.2). Please note that qsort() uses a function
pointer for the comparison, and that function is not inlined in any way.

If the OP has Java code which works in 0.58s but the C++ code needs
0.37s on the same machine, then either the C++ code, the C++ STL
implementation, or the C++ compiler, is pathetic. The whole idea of
templates with inlining is so that the C++ version is twice _faster_
than the old qsort() function, not twice _slower_.


--Thomas Pornin