Prev: Code and Creation 04972
Next: Create grid
From: Patricia Shanahan on 18 Jan 2010 21:44 Roedy Green wrote: .... > This suggests that interpretive code with a tight core might run > faster than "highly optimised" machine code since you could arrange > that the core of it was entirely in cache. .... How would you implement an interpreter to avoid executing a totally unpredictable branch for each instruction? Patricia
From: Lew on 18 Jan 2010 21:45 Lew wrote: >> Hotspot runs bytecode altogether, at first (JNI excluded from >> consideration here). Based on actual runtime heuristics, it might >> convert some parts to native code and run the compiled version. As >> execution progresses, Hotspot may revert compiled parts back to >> interpreted bytecode, depending on runtime situations. Arne Vajhøj wrote: > Nothing in any spec prevents it from doing so, but I am skeptical > about whether any implementations would do so. Well, either Sun is a bunch of big, fat liars, or you can set your skepticism aside: <http://java.sun.com/products/hotspot/whitepaper.html#dynamic> "Both the Java HotSpot Client and Server compilers fully support dynamic deoptimization." > If it actually has spend time JIT compiling why should it go > back to interpreting? Some of the reasoning is explained in <http://java.sun.com/products/hotspot/whitepaper.html#3> There's more detail in <http://java.sun.com/products/hotspot/docs/general/hs2.html> "The Java HotSpot Server VM can revert to using the interpreter whenever compiler deoptimizations are called for because of dynamic class loading. When a class is loaded dynamically, HotSpot checks to ensure that the inter-class dependecies [sic] of inlined methods have not been altered. If any dependencies are affected by dynamically loaded class [sic], HotSpot can back out affected inlined code, revert to interpreting for a while, and re-optimize later based on the new class dependencies." One of my favorite experts, Brian Goetz, wrote about this back in 2004: <http://www.ibm.com/developerworks/library/j-jtp12214/> "[T]he JVM continues profiling, and may recompile the code again later with a higher level of optimization if it decides the code path is particularly hot or future profiling data suggests opportunities for additional optimization. The JVM may recompile the same bytecodes many times in a single application execution." and later, discussing inlining, "... the JVM can figure this out, and will invalidate the generated code that is based on the now-invalid assumption and revert to interpretation (or recompile the invalidated code path)." Despite your skepticism, not only has one (in fact, the) implementation done dynamic reversion to interpreted bytecode, but it's been doing so for quite some years. -- Lew
From: Arne Vajhøj on 18 Jan 2010 21:55 On 18-01-2010 21:45, Lew wrote: > Lew wrote: >>> Hotspot runs bytecode altogether, at first (JNI excluded from >>> consideration here). Based on actual runtime heuristics, it might >>> convert some parts to native code and run the compiled version. As >>> execution progresses, Hotspot may revert compiled parts back to >>> interpreted bytecode, depending on runtime situations. > > Arne Vajhøj wrote: >> Nothing in any spec prevents it from doing so, but I am skeptical >> about whether any implementations would do so. > > Well, either Sun is a bunch of big, fat liars, or you can set your > skepticism aside: > <http://java.sun.com/products/hotspot/whitepaper.html#dynamic> > "Both the Java HotSpot Client and Server compilers fully support dynamic > deoptimization." > >> If it actually has spend time JIT compiling why should it go >> back to interpreting? > > Some of the reasoning is explained in > <http://java.sun.com/products/hotspot/whitepaper.html#3> > > There's more detail in > <http://java.sun.com/products/hotspot/docs/general/hs2.html> > "The Java HotSpot Server VM can revert to using the interpreter whenever > compiler deoptimizations are called for because of dynamic class > loading. When a class is loaded dynamically, HotSpot checks to ensure > that the inter-class dependecies [sic] of inlined methods have not been > altered. If any dependencies are affected by dynamically loaded class > [sic], HotSpot can back out affected inlined code, revert to > interpreting for a while, and re-optimize later based on the new class > dependencies." > > One of my favorite experts, Brian Goetz, wrote about this back in 2004: > <http://www.ibm.com/developerworks/library/j-jtp12214/> > "[T]he JVM continues profiling, and may recompile the code again later > with a higher level of optimization if it decides the code path is > particularly hot or future profiling data suggests opportunities for > additional optimization. The JVM may recompile the same bytecodes many > times in a single application execution." > > and later, discussing inlining, > "... the JVM can figure this out, and will invalidate the generated code > that is based on the now-invalid assumption and revert to interpretation > (or recompile the invalidated code path)." > > Despite your skepticism, not only has one (in fact, the) implementation > done dynamic reversion to interpreted bytecode, but it's been doing so > for quite some years. Then I learned something today. Which is not a bad thing. Ensuring correct behavior is of course a very good reason to fall back to interpretation. Arne
From: Lew on 18 Jan 2010 22:05 Patricia Shanahan wrote: > Roedy Green wrote: > ... >> This suggests that interpretive code with a tight core might run >> faster than "highly optimised" machine code since you could arrange >> that the core of it was entirely in cache. > ... > > How would you implement an interpreter to avoid executing a totally > unpredictable branch for each instruction? This apparently rhetorical question leads to some interesting possibilities, e.g., the exploitation of latency. There is likely a tension between these possibilities and cache-locality, however since cache is a hack we can expect its limits to be less restrictive over time. Latency, OTOH, is likely to become a greater and greater issue. Hyperthreading is one technique that exploits latency. An answer to the question is to load all possible branches into the pipeline during the latency (-ies) involved in evaluating the "if" or other actions. (There is no such thing as a "totally unpredictable branch" as all branches can be predicted.) If the conclusion of the branch evaluation finds all, or at least all the most likely options already loaded up, the system can simply discard the unused branches. This term goes by various names; I believe one is "speculative execution". The avoidance itself is subject to definition. Do we avoid any possibility whatsoever of an unpredicted branch? Or do we do what CPUs already do, and reduce the likelihood of such a branch? Either one could be called "avoidance". I think Hotspot itself embodies various answers to the question. It inlines and compiles to native code based on run-time profiles. It undoes those optimizations if the assumptions behind them later fail. It optimizes the more likely branches. I don't think it's possible to keep all branches of all code, tight code or not, always in a limited RAM space, such as the 32KB Level 1 cache mentioned upthread, or even the 8MB Level 1 cache of the not-distant future. We can continue the existing trend of keeping most of what we mostly need mostly in the cache most of the time, moving "most" asymptotically toward unity. -- Lew
From: Roedy Green on 18 Jan 2010 23:36
On Mon, 18 Jan 2010 21:05:24 -0500, Arne Vajh�j <arne(a)vajhoej.dk> wrote, quoted or indirectly quoted someone who said : > >If it actually has spend time JIT compiling why should it go >back to interpreting? Let us say you dynamically load a class that overrides methods that the JIT had provisionally treated as final and had inlined. It has to do some pretty fancy footwork. It has to UN-inline all that code, turn it back into byte code, then rejit it. The problem has been solved, but it seems to me to be intractable. There is no simple correspondence between machine code and byte code. Data could be cached in registers. I am blown away that it works at all, much less works reliably. You'd think the one saving grace is the points where you have to rejit always occur at a call boundary. But there is no such guarantee on the other threads. I love to see a webinar on how they pulled this off. Perhaps the JIT machine code is quite constrained to make this possible. -- Roedy Green Canadian Mind Products http://mindprod.com I decry the current tendency to seek patents on algorithms. There are better ways to earn a living than to prevent other people from making use of one�s contributions to computer science. ~ Donald Ervin Knuth (born: 1938-01-10 age: 72) |