Crash Course In Modern Hardware [Java Help]

Prev: Code and Creation 04972
Next: Create grid

From: Roedy Green on 18 Jan 2010 20:34

On Sun, 17 Jan 2010 20:56:29 -0800, Peter Duniho
<NpOeStPeAdM(a)NnOwSlPiAnMk.com> wrote, quoted or indirectly quoted
someone who said :

>Profiling is definitely important for performance-critical code. It can
>uncover lots of important architecture-independent problems. But it has
>limited value in generalizing solutions for architecture-specific
>issues. Only if you can restrict your installation to the same hardware
>you used for profiling can you address those kinds of problems.

I would have thought by now distributed code would be optimised at the
customer's machine to suit the specific hardware, not by the
application, but by the OS using code provided by the CPU maker.
Presumably you could afford to spend more time in analysis than you
can on the fly in hardware while the code is running.
--
Roedy Green Canadian Mind Products
http://mindprod.com
I decry the current tendency to seek patents on algorithms. There are better ways to earn a living than to prevent other people from making use of one�s contributions to computer science.
~ Donald Ervin Knuth (born: 1938-01-10 age: 72)

From: Arne Vajhøj on 18 Jan 2010 21:03

On 18-01-2010 08:39, Tom Anderson wrote:
> On Sun, 17 Jan 2010, Arne Vajh?j wrote:
>
>> On 17-01-2010 22:10, Roedy Green wrote:
>>> On Sun, 17 Jan 2010 18:20:31 -0500, "John B. Matthews"
>>> <nospam(a)nospam.invalid> wrote, quoted or indirectly quoted someone who
>>> said :
>>>> * We need to update our mental performance models as the hardware
>>>> evolves
>>>
>>> I did not realise how important locality had become. A cache miss
>>> going to RAM costs 200 to 300 clock cycles! This penalty dominates
>>> everything else. This suggests that interpretive code with a tight
>>> core might run faster than "highly optimised" machine code since you
>>> could arrange that the core of it was entirely in cache.
>>
>> Why?
>>
>> The data fetched would still be the same.
>
> Not if the bytecode was more compact than the native code.

When I wrote data I actually meant data.

>> And the CPU intensive loop like inner loops seems more likely to fit
>> into I cache than the relevant part of the interpreter.
>
> If you have a single inner loop, then yes, the machine code will fit in
> the cache, and there's no performance advantage to bytecode. But if you
> have a large code footprint - something like an app server, say - then
> it's quite possible that more of the code will fit in the cache with
> bytecode than with native code.

It is possible.

Well - it is almost certain that it will be the case for some apps.

But in most cases I would expect most of the time being spend
on executing relative small pieces of code. 80-20 or 90-10 rule.

Arne

From: Arne Vajhøj on 18 Jan 2010 21:05

On 18-01-2010 19:54, Lew wrote:
> Donkey Hottie wrote:
>>> I thought the bytecode is nowadays always converted to native code by
>>> the JIT. Am I wrong?
>
> Yes.
>
> John B. Matthews wrote:
>> Some, but not all: "The Java Hotspot[VM] does not include a plug-in
>> JIT compiler but instead compiles and inline[s] methods that appear
>> [to be] the most used in the application."
>>
>> <http://java.sun.com/developer/onlineTraining/Programming/JDCBook/perf2.html>
>>
>
> Hotspot runs bytecode altogether, at first (JNI excluded from
> consideration here). Based on actual runtime heuristics, it might
> convert some parts to native code and run the compiled version. As
> execution progresses, Hotspot may revert compiled parts back to
> interpreted bytecode, depending on runtime situations.

Nothing in any spec prevents it from doing so, but I am skeptical
about whether any implementations would do so.

If it actually has spend time JIT compiling why should it go
back to interpreting?

Arne

From: Arne Vajhøj on 18 Jan 2010 21:11

On 17-01-2010 23:56, Peter Duniho wrote:
> Roedy Green wrote:
>> [...]
>> Hyperthreading is a defence. If you have many hardware threads
>> running in the same CPU, when one thread blocks to fetch from RAM, the
>> other threads can keep going and keep multiple adders, instruction
>> decoders etc chugging.
>
> Actually, hyperthreading and even, in some architectures, multi-core
> CPUs can actually make things worse.
>
> I've read claims that Intel has improved things with the Nehalem
> architecture. But the shared-cache design of early hyperthreaded
> processors could easily cause na�ve multi-threading implementations to
> perform _much_ worse than a single-threaded implementation. That's
> because having multiple threads all with the same entry point caused
> those threads to often operate with a stack layout identical to each
> other, which in turned caused aliasing in the cache.
>
> The two threads running simultaneously on the same CPU, sharing a cache,
> would spend most of their time alternately trashing the other thread's
> cached stack data and waiting for their own stack data to be brought
> back in to the cache from system RAM after the other thread trashed it.
>
> Hyperthreading is far from a panacea, and I would not call it even a
> defense. Specifically _because_ of how caching is so critical to
> performance today, hyperthreading can cause huge performance problems on
> certain CPUs, and even when it's used properly doesn't produce nearly as
> big a benefit as actual multiple CPU cores would.

SMT capability is obviously not as fast as full cores.

But given that most of the major server CPU's (Xeon, Power and SPARC)
uses the technique, then there seems to be agreement that it is a good
thing.

Arne

From: Roedy Green on 18 Jan 2010 21:38

On 18 Jan 2010 22:10:36 GMT, Thomas Pornin <pornin(a)bolet.org> wrote,
quoted or indirectly quoted someone who said :

>These games are expensive, not in clock cycles but in RAM: the JIT
>compiler must use more bytes than what a C compiler would do on the
>equivalent C source code.

An alternative is static compilation using Jet.

See http://mindprod.com/jgloss/jet.html

I notice though that it does a lot of unraveling and a loop versioning
where several variant loop bodies are created without ifs in them, and
selection at the top to choose which body to use.

Ironically, all this work may be slowing things down on the latest
CPUs.
--
Roedy Green Canadian Mind Products
http://mindprod.com
I decry the current tendency to seek patents on algorithms. There are better ways to earn a living than to prevent other people from making use of one�s contributions to computer science.
~ Donald Ervin Knuth (born: 1938-01-10 age: 72)

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Code and Creation 04972
Next: Create grid