Prev: Code and Creation 04972
Next: Create grid
From: Lew on 19 Jan 2010 09:15 According to Lew: >> or even the 8MB Level 1 cache of the not-distant future Thomas Pornin wrote: > This is a bold prediction. It's not prediction, it's satire. -- Lew
From: Lew on 19 Jan 2010 09:17 Thomas Pornin wrote: > My current PC, bought in January 2009, has 32 KB of "fast RAM" per core, right? I'm seeing four-core CPUs with 128KB of L1 cache all over the place. -- Lew
From: Lew on 19 Jan 2010 09:31 Peter Duniho wrote: > And that's only theoretically possible. I've never heard any > suggestions that Java actually does include architecture-specific > optimizations, either in the JVM itself, or as part of the optimizer in > the JIT [HotSpot?] compiler. HotSpot most definitely does do architecture-specific optimizations. <http://java.sun.com/products/hotspot/whitepaper.html#optimizations> "System-specific runtime routines generated at VM startup time" <http://java.sun.com/products/hotspot/docs/whitepaper/Java_Hotspot_v1.4.1/Java_HSpot_WP_v1.4.1_1002_4.html> "The [Java HotSpot Server] compiler is highly portable, relying on a machine description file to describe all aspects of the target hardware." Things that differ between architectures include register allocation. -- Lew
From: Patricia Shanahan on 19 Jan 2010 09:57 Roedy Green wrote: > On Mon, 18 Jan 2010 18:44:45 -0800, Patricia Shanahan <pats(a)acm.org> > wrote, quoted or indirectly quoted someone who said : > >> How would you implement an interpreter to avoid executing a totally >> unpredictable branch for each instruction? > > In my Forth interpreter, I arranged things so that branches fell > through on the usual case. That is a very good strategy from the point of view of instruction cache locality. I don't see how it helps at all with stalls due to branch mis-prediction, unless the processor is following a very basic strategy, such as predicting all forwards branches as not taken. > Think in terms of FORTH chips, that have the interpreter in hardware. > They can do things like maintain branch history, and overlap RET on > any instruction. I'm assuming full state-of-the-art branch prediction. At least state-of-the-art as of a few years ago, the last time I took a processor architecture course. > A Java Byte code machine with most of the interpreter in hardware > might be a better architecture since the code is so much more compact. On the other hand, the code has few opportunities for intra-thread parallelism, because just about every instruction is dependent on the stack and modifies the stack. A register-based machine with many registers may be able to get more done while it is waiting for a slow load. Patricia
From: Thomas Pornin on 19 Jan 2010 10:30
According to Lew <noone(a)lewscanon.com>: > per core, right? Yes, per core. I consider my four-core CPU to actually consist in four one-core CPU which happen to be connected with a very fast network. --Thomas Pornin |