Prev: String Spliting/CSV question
Next: thread.exit
From: Regis d'Aubarede on 30 Jun 2010 11:20 Charles Nutter wrote: >> If with --server on your system JRuby's still slower than IronRuby,... > Maybe also worth showing an experimental dynopt flag for JRuby that seem to > improve performance .... Sorry for my bad english !! My test consist to verify that symetric multi-core (SMP) is well use by the VM. In this aspect, pure performence is not important. the decrease of duration calculation with the increase off used threads is my concern. (http://programmingzen.com/2010/06/28/the-great-ruby-shootout-windows-edition/ show that JRuby is superior to IronRuby...) For discrimination if the issue is in JRuby side or in JVM side, i run same JRubyCode, but invoke a pure Java traitement : (1..nb_threads).map { Thread.new() { Calc.calc(p1,n1) } } with class Calc { public static long calc(int a, int b) { long res=0; for (int i=0;i<a;i++) for (int j=0;j<b;j++) for (int k=0;k<1000;k++) res+=i+j+k; return(res); } } c:\usr\ruby\local>jruby thread_bench2.rb 1.8.7, java, 2010-05-12 1000 iterations by 1 threads , Duration = 15404 ms 500 iterations by 2 threads , Duration = 8147 ms 333 iterations by 3 threads , Duration = 5812 ms 250 iterations by 4 threads , Duration = 4690 ms 200 iterations by 5 threads , Duration = 4648 ms 166 iterations by 6 threads , Duration = 4749 ms 142 iterations by 7 threads , Duration = 4371 ms 125 iterations by 8 threads , Duration = 4222 ms So JVM scale right :) And my intel core i7 has realy 4 core... Attachments: http://www.ruby-forum.com/attachment/4829/thread_bench2.rb -- Posted via http://www.ruby-forum.com/.
From: Charles Oliver Nutter on 30 Jun 2010 13:57
On Wed, Jun 30, 2010 at 10:20 AM, Regis d'Aubarede <regis.aubarede(a)gmail.com> wrote: > For discrimination if the issue is in JRuby side or in JVM side, i run > same > JRubyCode, but invoke a pure Java traitement : > Â Â (1..nb_threads).map { Â Thread.new() { Calc.calc(p1,n1) } } > with > > class Calc { > Â public static long calc(int a, int b) { > Â Â long res=0; > Â Â for (int i=0;i<a;i++) > Â Â Â for (int j=0;j<b;j++) > Â Â Â for (int k=0;k<1000;k++) > Â Â Â res+=i+j+k; > Â Â return(res); > Â } > } Yes, this result is not surprising to me. In the original case, the benchmark suffers mostly from all the objects being created. For example: * All the numeric loops (in JRuby) create at least one new Fixnum object for every iteration * All the math operations create Fixnum or Float objects as well Running an allocation profile of your benchmark (which actually runs pretty slow because there's *so much* allocation happening) shows the amount of data that's being chewed up...it's very likely that the bottleneck is in allocating all those closures and all those Fixnums for this particular case: ~/projects/jruby â jruby -J-Xrunhprof thread_bench.rb 1.8.7, java, 2010-06-17 1000 iterations by 1 threads , Duration = 399267 ms ^CDumping Java heap ... allocation sites ... done. ~/projects/jruby â egrep "%|objs" java.hprof.txt | head -n 11 rank self accum bytes objs bytes objs trace name 1 65.18% 65.18% 13545024 423282 1133938432 35435576 302318 org.jruby.RubyFixnum 2 22.61% 87.79% 4697920 146810 381348672 11917146 302867 org.jruby.RubyFloat 3 1.32% 89.12% 274992 5350 274992 5350 300000 char[] 4 0.62% 89.74% 128488 5341 128488 5341 300000 java.lang.String 5 0.18% 89.92% 38184 1 38184 1 306423 short[] 6 0.18% 90.10% 38184 1 38184 1 306428 short[] 7 0.14% 90.24% 28720 718 29400 735 300521 java.util.WeakHashMap$Entry 8 0.13% 90.37% 27792 70 27792 70 300000 byte[] 9 0.13% 90.50% 26832 1118 35040 1460 300704 java.util.concurrent.ConcurrentHashMap$HashEntry 10 0.12% 90.63% 25232 166 25232 166 300557 org.jruby.MetaClass Note that this is only after the 1000-iteration run, and during execution over 1GB of memory was allocated and released, mostly in Fixnum objects with a smaller amount (380MB+) in Float objects. Running with verbose GC: ~/projects/jruby â jruby -J-verbose:gc thread_bench.rb 1.8.7, java, 2010-06-17 [GC 13184K->1128K(63936K), 0.0108696 secs] [GC 14312K->2124K(63936K), 0.0077762 secs] [GC 15308K->1445K(63936K), 0.0010409 secs] [GC 14629K->1246K(63936K), 0.0031958 secs] ... And adding up all the size changes (number of GC runs * difference in live object size) produces roughly the same estimate; for the period the 1000-iteration part of the bench runs, it allocates a *lot* of objects. IronRuby may do better here if they're able to treat Fixnum objects as value types, which the CLR handles more efficiently than the JVM's "every object is on the heap". Ultimately this is largely an allocation-rate benchmark, at least on JRuby, since our Fixnum objects are "real" objects (or to put it in MRI's favor...our Fixnum objects are forced to be "real" objects with heap lifecycles). The dynopt work is part of efforts in JRuby to bring math performance closer to Java, largely by eliminating te excessive object churn and layers of noise for math operations. - Charlie |