From: Lew on
According to Lew:
>> or even the 8MB Level 1 cache of the not-distant future

Thomas Pornin wrote:
> This is a bold prediction.

It's not prediction, it's satire.

--
Lew
From: Lew on
Thomas Pornin wrote:
> My current PC, bought in January 2009, has 32 KB of "fast RAM"

per core, right? I'm seeing four-core CPUs with 128KB of L1 cache all over
the place.

--
Lew
From: Lew on
Peter Duniho wrote:
> And that's only theoretically possible. I've never heard any
> suggestions that Java actually does include architecture-specific
> optimizations, either in the JVM itself, or as part of the optimizer in
> the JIT [HotSpot?] compiler.

HotSpot most definitely does do architecture-specific optimizations.
<http://java.sun.com/products/hotspot/whitepaper.html#optimizations>
"System-specific runtime routines generated at VM startup time"

<http://java.sun.com/products/hotspot/docs/whitepaper/Java_Hotspot_v1.4.1/Java_HSpot_WP_v1.4.1_1002_4.html>
"The [Java HotSpot Server] compiler is highly portable, relying on a machine
description file to describe all aspects of the target hardware."

Things that differ between architectures include register allocation.

--
Lew
From: Patricia Shanahan on
Roedy Green wrote:
> On Mon, 18 Jan 2010 18:44:45 -0800, Patricia Shanahan <pats(a)acm.org>
> wrote, quoted or indirectly quoted someone who said :
>
>> How would you implement an interpreter to avoid executing a totally
>> unpredictable branch for each instruction?
>
> In my Forth interpreter, I arranged things so that branches fell
> through on the usual case.

That is a very good strategy from the point of view of instruction cache
locality. I don't see how it helps at all with stalls due to branch
mis-prediction, unless the processor is following a very basic strategy,
such as predicting all forwards branches as not taken.

> Think in terms of FORTH chips, that have the interpreter in hardware.
> They can do things like maintain branch history, and overlap RET on
> any instruction.

I'm assuming full state-of-the-art branch prediction. At least
state-of-the-art as of a few years ago, the last time I took a processor
architecture course.

> A Java Byte code machine with most of the interpreter in hardware
> might be a better architecture since the code is so much more compact.

On the other hand, the code has few opportunities for intra-thread
parallelism, because just about every instruction is dependent on the
stack and modifies the stack. A register-based machine with many
registers may be able to get more done while it is waiting for a slow load.

Patricia
From: Thomas Pornin on
According to Lew <noone(a)lewscanon.com>:
> per core, right?

Yes, per core. I consider my four-core CPU to actually consist in four
one-core CPU which happen to be connected with a very fast network.


--Thomas Pornin
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Code and Creation 04972
Next: Create grid