Crash Course In Modern Hardware [Java Help]

Prev: Code and Creation 04972
Next: Create grid

From: Patricia Shanahan on 18 Jan 2010 21:44

Roedy Green wrote:
....
> This suggests that interpretive code with a tight core might run
> faster than "highly optimised" machine code since you could arrange
> that the core of it was entirely in cache.
....

How would you implement an interpreter to avoid executing a totally
unpredictable branch for each instruction?

Patricia

From: Lew on 18 Jan 2010 21:45

Lew wrote:
>> Hotspot runs bytecode altogether, at first (JNI excluded from
>> consideration here). Based on actual runtime heuristics, it might
>> convert some parts to native code and run the compiled version. As
>> execution progresses, Hotspot may revert compiled parts back to
>> interpreted bytecode, depending on runtime situations.

Arne Vajhøj wrote:
> Nothing in any spec prevents it from doing so, but I am skeptical
> about whether any implementations would do so.

Well, either Sun is a bunch of big, fat liars, or you can set your skepticism
aside:
<http://java.sun.com/products/hotspot/whitepaper.html#dynamic>
"Both the Java HotSpot Client and Server compilers fully support dynamic
deoptimization."

> If it actually has spend time JIT compiling why should it go
> back to interpreting?

Some of the reasoning is explained in
<http://java.sun.com/products/hotspot/whitepaper.html#3>

There's more detail in
<http://java.sun.com/products/hotspot/docs/general/hs2.html>
"The Java HotSpot Server VM can revert to using the interpreter whenever
compiler deoptimizations are called for because of dynamic class loading. When
a class is loaded dynamically, HotSpot checks to ensure that the inter-class
dependecies [sic] of inlined methods have not been altered. If any
dependencies are affected by dynamically loaded class [sic], HotSpot can back
out affected inlined code, revert to interpreting for a while, and re-optimize
later based on the new class dependencies."

One of my favorite experts, Brian Goetz, wrote about this back in 2004:
<http://www.ibm.com/developerworks/library/j-jtp12214/>
"[T]he JVM continues profiling, and may recompile the code again later with a
higher level of optimization if it decides the code path is particularly hot
or future profiling data suggests opportunities for additional optimization.
The JVM may recompile the same bytecodes many times in a single application
execution."

and later, discussing inlining,
"... the JVM can figure this out, and will invalidate the generated code that
is based on the now-invalid assumption and revert to interpretation (or
recompile the invalidated code path)."

Despite your skepticism, not only has one (in fact, the) implementation done
dynamic reversion to interpreted bytecode, but it's been doing so for quite
some years.

--
Lew

From: Arne Vajhøj on 18 Jan 2010 21:55

On 18-01-2010 21:45, Lew wrote:
> Lew wrote:
>>> Hotspot runs bytecode altogether, at first (JNI excluded from
>>> consideration here). Based on actual runtime heuristics, it might
>>> convert some parts to native code and run the compiled version. As
>>> execution progresses, Hotspot may revert compiled parts back to
>>> interpreted bytecode, depending on runtime situations.
>
> Arne Vajhøj wrote:
>> Nothing in any spec prevents it from doing so, but I am skeptical
>> about whether any implementations would do so.
>
> Well, either Sun is a bunch of big, fat liars, or you can set your
> skepticism aside:
> <http://java.sun.com/products/hotspot/whitepaper.html#dynamic>
> "Both the Java HotSpot Client and Server compilers fully support dynamic
> deoptimization."
>
>> If it actually has spend time JIT compiling why should it go
>> back to interpreting?
>
> Some of the reasoning is explained in
> <http://java.sun.com/products/hotspot/whitepaper.html#3>
>
> There's more detail in
> <http://java.sun.com/products/hotspot/docs/general/hs2.html>
> "The Java HotSpot Server VM can revert to using the interpreter whenever
> compiler deoptimizations are called for because of dynamic class
> loading. When a class is loaded dynamically, HotSpot checks to ensure
> that the inter-class dependecies [sic] of inlined methods have not been
> altered. If any dependencies are affected by dynamically loaded class
> [sic], HotSpot can back out affected inlined code, revert to
> interpreting for a while, and re-optimize later based on the new class
> dependencies."
>
> One of my favorite experts, Brian Goetz, wrote about this back in 2004:
> <http://www.ibm.com/developerworks/library/j-jtp12214/>
> "[T]he JVM continues profiling, and may recompile the code again later
> with a higher level of optimization if it decides the code path is
> particularly hot or future profiling data suggests opportunities for
> additional optimization. The JVM may recompile the same bytecodes many
> times in a single application execution."
>
> and later, discussing inlining,
> "... the JVM can figure this out, and will invalidate the generated code
> that is based on the now-invalid assumption and revert to interpretation
> (or recompile the invalidated code path)."
>
> Despite your skepticism, not only has one (in fact, the) implementation
> done dynamic reversion to interpreted bytecode, but it's been doing so
> for quite some years.

Then I learned something today. Which is not a bad thing.

Ensuring correct behavior is of course a very good reason to
fall back to interpretation.

Arne

From: Lew on 18 Jan 2010 22:05

Patricia Shanahan wrote:
> Roedy Green wrote:
> ...
>> This suggests that interpretive code with a tight core might run
>> faster than "highly optimised" machine code since you could arrange
>> that the core of it was entirely in cache.
> ...
>
> How would you implement an interpreter to avoid executing a totally
> unpredictable branch for each instruction?

This apparently rhetorical question leads to some interesting possibilities,
e.g., the exploitation of latency. There is likely a tension between these
possibilities and cache-locality, however since cache is a hack we can expect
its limits to be less restrictive over time. Latency, OTOH, is likely to
become a greater and greater issue. Hyperthreading is one technique that
exploits latency.

An answer to the question is to load all possible branches into the pipeline
during the latency (-ies) involved in evaluating the "if" or other actions.
(There is no such thing as a "totally unpredictable branch" as all branches
can be predicted.) If the conclusion of the branch evaluation finds all, or
at least all the most likely options already loaded up, the system can simply
discard the unused branches. This term goes by various names; I believe one
is "speculative execution".

The avoidance itself is subject to definition. Do we avoid any possibility
whatsoever of an unpredicted branch? Or do we do what CPUs already do, and
reduce the likelihood of such a branch? Either one could be called "avoidance".

I think Hotspot itself embodies various answers to the question. It inlines
and compiles to native code based on run-time profiles. It undoes those
optimizations if the assumptions behind them later fail. It optimizes the
more likely branches.

I don't think it's possible to keep all branches of all code, tight code or
not, always in a limited RAM space, such as the 32KB Level 1 cache mentioned
upthread, or even the 8MB Level 1 cache of the not-distant future. We can
continue the existing trend of keeping most of what we mostly need mostly in
the cache most of the time, moving "most" asymptotically toward unity.

--
Lew

From: Roedy Green on 18 Jan 2010 23:36

On Mon, 18 Jan 2010 21:05:24 -0500, Arne Vajh�j <arne(a)vajhoej.dk>
wrote, quoted or indirectly quoted someone who said :

>
>If it actually has spend time JIT compiling why should it go
>back to interpreting?

Let us say you dynamically load a class that overrides methods that
the JIT had provisionally treated as final and had inlined.

It has to do some pretty fancy footwork. It has to UN-inline all that
code, turn it back into byte code, then rejit it.

The problem has been solved, but it seems to me to be intractable.
There is no simple correspondence between machine code and byte code.
Data could be cached in registers. I am blown away that it works at
all, much less works reliably.

You'd think the one saving grace is the points where you have to rejit
always occur at a call boundary. But there is no such guarantee on the
other threads.

I love to see a webinar on how they pulled this off. Perhaps the JIT
machine code is quite constrained to make this possible.
--
Roedy Green Canadian Mind Products
http://mindprod.com
I decry the current tendency to seek patents on algorithms. There are better ways to earn a living than to prevent other people from making use of one�s contributions to computer science.
~ Donald Ervin Knuth (born: 1938-01-10 age: 72)

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Code and Creation 04972
Next: Create grid