From: petek1976 on 10 Jun 2010 00:00 I have been having some trouble with the performance of Java. In my code I have narrowed it down to the sort: For instance: import java.util.*; public class SortTest { public static void main(String args[]) { final int vecSize = 1000000; ArrayList<Double> vec = new ArrayList<Double>(vecSize); final int numItr = 10; ArrayList<Double> times = new ArrayList<Double>(numItr); for (int i=0; i<numItr; ++i) { for (int k=0; k<vecSize; ++k) { vec.add(k,Math.random()); } long startTime = System.nanoTime(); java.util.Collections.sort(vec); long endTime = System.nanoTime(); times.add(i,(endTime-startTime)*1.e-9); vec = new ArrayList<Double>(vecSize); } double avg=0.0; for (Double val:times) { avg += val; } avg /= times.size(); System.out.println("To sort " + vec.size() + " elements " + numItr + " times took " + avg + " seconds on average"); } } This prints about 0.58 seconds on average. How can I optimize this reasonably? Note I am only timing the sort function. Using an array instead of ArrayList is not an option. I tried Vector but it didn't help. The equivalent C++ code (using vector) runs in about 0.37 seconds when built with optimization (g++ version 4.2.1 using -03 optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7 with 4 GB of ram. I also tried this example on Linux and saw no difference. What is causing over 40% difference in speed? I must be doing something wrong in my java code. #include <iostream> #include <vector> #include <cstdlib> #include <algorithm> #include <sys/time.h> #include <numeric> using namespace std; template <class T> void FillRand(T start, T end) { while (start != end) { *start++ = double(rand())/RAND_MAX; } } class TimeStamp { public: void start() { gettimeofday(&st_val,0); } void stop() { gettimeofday(&end_val,0); } double elapsedSec() const { return end_val.tv_sec-st_val.tv_sec + (end_val.tv_usec-st_val.tv_usec)*1.e-6; } private: timeval st_val; timeval end_val; }; int main() { vector<double> vec(1000000); const long num_itr = 10; vector<double> time_stamps(num_itr); TimeStamp ts; for (long i=0; i<num_itr; ++i) { FillRand(vec.begin(),vec.end()); ts.start(); std::sort(vec.begin(),vec.end()); ts.stop(); time_stamps[i] = ts.elapsedSec(); } double avg=std::accumulate(time_stamps.begin(),time_stamps.end(), 0.0)/time_stamps.size(); cout << "To sort " << vec.size() << " elements took " << avg << " seconds on average\n"; }
From: markspace on 10 Jun 2010 00:53 petek1976 wrote: > This prints about 0.58 seconds on average. How can I optimize this > reasonably? Note I am only timing the sort function. Using an array > instead of ArrayList is not an option. Unfortunately I think using an array of double is the only option. I didn't see anything obviously wrong with your code. I got about a 450% speed-up by switching to double[] instead of ArrayList<Double>. Why is using double[] proscribed? If it's the need for a List, you can roll your own.
From: Tom Anderson on 10 Jun 2010 07:57 On Wed, 9 Jun 2010, petek1976 wrote: > This prints about 0.58 seconds on average. How can I optimize this > reasonably?Note I am only timing the sort function. Using an array > instead of ArrayList is not an option. I tried Vector but it didn't > help. The equivalent C++ code (using vector) runs in about 0.37 > seconds In terms of the java vs C++ comparison: 1. Java's standard sort has to be stable, which means in practice it's a mergesort, whereas STL's doesn't, so it can be a quicksort. Quicksort will typically be faster than mergesort for random data. 2. Java handles the doubles as objects on the heap here, whereas C++, i believe, will handle them as primitives (because it resolves templates at compile time, essentially, whereas java just erases types and uses the same code at runtime for all element types). Which means: (a) Java's data takes up more memory, and so requires more cache and memory bandwidth to work with (you have a million objects, so you're looking at 8 MB just for the raw numbers; the object overhead probably about quadruples that). (b) Java has to do comparisons by making virtual (worse - interface!) method calls, whereas C++ can just use a single machine instruction. Java's runtime compiler stands a good chance of optimising that overhead away; i have no idea if it will catch none, some, most, or all of it. You could try dumping the compiled code to see what HotSpot is up to: http://wikis.sun.com/display/HotSpotInternals/PrintAssembly If performance is vital but you want a List interface, than as Mark suggested, write a List<Double> implementation that wraps a double[]. You can then use Arrays.sort to sort the content - for doubles, this is a quicksort and uses direct comparisons, so it should be fast. AbstractList makes this rather easy to implement. You could also try this guy's DoubleArrayList: http://pcj.sourceforge.net/ to which you could easily add internal sorting. It would be nice if there was such a class in the Apache or Google collections libraries, but there isn't. tom -- I'm angry, but not Milk and Cheese angry. -- Mike Froggatt
From: Jim Janney on 10 Jun 2010 12:09 petek1976 <pete.karousos(a)earthlink.net> writes: > I have been having some trouble with the performance of Java. In my > code I have narrowed it down to the sort: > > For instance: > > > import java.util.*; > public class SortTest > { > public static void main(String args[]) > { > final int vecSize = 1000000; > ArrayList<Double> vec = new ArrayList<Double>(vecSize); > final int numItr = 10; > ArrayList<Double> times = new ArrayList<Double>(numItr); > for (int i=0; i<numItr; ++i) > { > for (int k=0; k<vecSize; ++k) > { > vec.add(k,Math.random()); > } > long startTime = System.nanoTime(); > java.util.Collections.sort(vec); > long endTime = System.nanoTime(); > times.add(i,(endTime-startTime)*1.e-9); > vec = new ArrayList<Double>(vecSize); > } > double avg=0.0; > for (Double val:times) > { > avg += val; > } > avg /= times.size(); > System.out.println("To sort " + vec.size() + " elements " + > numItr + " times took " + avg + " seconds on average"); > > } > } > > > > This prints about 0.58 seconds on average. How can I optimize this > reasonably? Note I am only timing the sort function. Using an array > instead of ArrayList is not an option. I tried Vector but it didn't > help. The equivalent C++ code (using vector) runs in about 0.37 > seconds when built with optimization (g++ version 4.2.1 using -03 > optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7 > with 4 GB of ram. I also tried this example on Linux and saw no > difference. What is causing over 40% difference in speed? I must be > doing something wrong in my java code. > > > #include <iostream> > #include <vector> > #include <cstdlib> > #include <algorithm> > #include <sys/time.h> > #include <numeric> > using namespace std; > > template <class T> > void FillRand(T start, T end) > { > while (start != end) > { > *start++ = double(rand())/RAND_MAX; > } > } > > class TimeStamp > { > public: > void start() > { > gettimeofday(&st_val,0); > } > void stop() > { > gettimeofday(&end_val,0); > } > double elapsedSec() const > { > return end_val.tv_sec-st_val.tv_sec + > (end_val.tv_usec-st_val.tv_usec)*1.e-6; > } > private: > timeval st_val; > timeval end_val; > }; > > int main() > { > vector<double> vec(1000000); > const long num_itr = 10; > vector<double> time_stamps(num_itr); > TimeStamp ts; > for (long i=0; i<num_itr; ++i) > { > FillRand(vec.begin(),vec.end()); > ts.start(); > std::sort(vec.begin(),vec.end()); > ts.stop(); > time_stamps[i] = ts.elapsedSec(); > } > double avg=std::accumulate(time_stamps.begin(),time_stamps.end(), > 0.0)/time_stamps.size(); > cout << "To sort " << vec.size() << " elements took " << avg << " > seconds on average\n"; > } Take a look at the source for Collections.sort: public static <T extends Comparable<? super T>> void sort(List<T> list) { Object[] a = list.toArray(); Arrays.sort(a); ListIterator<T> i = list.listIterator(); for (int j=0; j<a.length; j++) { i.next(); i.set((T)a[j]); } } You're copying the list to an array, sorting the array, and then copying the array back to the list. You might try coding up a simple quicksort that operates directly on the ArrayList to see what kind of times that gives you. -- Jim Janney
From: Jim Janney on 10 Jun 2010 12:14 Jim Janney <jjanney(a)shell.xmission.com> writes: > You're copying the list to an array, sorting the array, and then > copying the array back to the list. You might try coding up a simple > quicksort that operates directly on the ArrayList to see what kind of > times that gives you. The evil approach would be to use reflection to get the elementData field in ArrayList, and call Arrays.sort directly on that. -- Jim Janney
|
Next
|
Last
Pages: 1 2 3 4 5 6 Prev: NetBeans/StrutsTestCase - strange error message Next: encrypted javamail MimeMultipart |