Prev: Code and Creation 04972
Next: Create grid
From: Arne Vajhøj on 21 Jan 2010 22:08 On 21-01-2010 19:15, Martin Gregorie wrote: > On Thu, 21 Jan 2010 22:23:22 +0000, Tom Anderson wrote: >> On Mon, 18 Jan 2010, Arne Vajh?j wrote: >>> When I wrote data I actually meant data. >> >> Doh! Sorry, Arne, i completely failed to understand there. You're quite >> right, of course. And i would imagine that in most applications, reads >> of data far outweigh reads of code (once you account for the caches). I >> would be very interested to see numbers for that across different kinds >> of program, though. >> > It depends what you mean by 'read'. > If you look at the instruction flow into the CPU, i.e. out of any caches > and into the CPU proper, the instruction flow is considerably larger than > the data flow in almost any architecture. Yes. But the flow from main memory and L3 which are the real slow ones should have a good chance of reading more data than code. Arne
From: Martin Gregorie on 22 Jan 2010 10:01 On Thu, 21 Jan 2010 22:08:18 -0500, Arne Vajhøj wrote: > On 21-01-2010 19:15, Martin Gregorie wrote: >> On Thu, 21 Jan 2010 22:23:22 +0000, Tom Anderson wrote: >>> On Mon, 18 Jan 2010, Arne Vajh?j wrote: >>>> When I wrote data I actually meant data. >>> >>> Doh! Sorry, Arne, i completely failed to understand there. You're >>> quite right, of course. And i would imagine that in most applications, >>> reads of data far outweigh reads of code (once you account for the >>> caches). I would be very interested to see numbers for that across >>> different kinds of program, though. >>> >> It depends what you mean by 'read'. >> If you look at the instruction flow into the CPU, i.e. out of any >> caches and into the CPU proper, the instruction flow is considerably >> larger than the data flow in almost any architecture. > > Yes. > > But the flow from main memory and L3 which are the real slow ones should > have a good chance of reading more data than code. > Agreed, but optimising that is a property of the cache rather than anything else. The use of Intel-style multi-level caching doesn't affect the argument. In a multi-CPU system cache management can become a nightmare: consider the situation where CPU A gets a cache miss that triggers a cache read from main memory for a piece of data that has been read and modified by CPU B, but whose cache has not yet been flushed. This is a common situation with copy-back caching. In general copy-back caching is faster than write-through though at the cost of added complexity: all caches must sniff the address bus and be capable of grabbing it for an immediate write-back if another CPU is asking for modified data which it holds but hasn't yet written. -- martin@ | Martin Gregorie gregorie. | Essex, UK org |
From: Patricia Shanahan on 22 Jan 2010 10:49
Martin Gregorie wrote: > On Thu, 21 Jan 2010 22:08:18 -0500, Arne Vajhøj wrote: > >> On 21-01-2010 19:15, Martin Gregorie wrote: >>> On Thu, 21 Jan 2010 22:23:22 +0000, Tom Anderson wrote: >>>> On Mon, 18 Jan 2010, Arne Vajh?j wrote: >>>>> When I wrote data I actually meant data. >>>> Doh! Sorry, Arne, i completely failed to understand there. You're >>>> quite right, of course. And i would imagine that in most applications, >>>> reads of data far outweigh reads of code (once you account for the >>>> caches). I would be very interested to see numbers for that across >>>> different kinds of program, though. >>>> >>> It depends what you mean by 'read'. >>> If you look at the instruction flow into the CPU, i.e. out of any >>> caches and into the CPU proper, the instruction flow is considerably >>> larger than the data flow in almost any architecture. >> Yes. >> >> But the flow from main memory and L3 which are the real slow ones should >> have a good chance of reading more data than code. >> > Agreed, but optimising that is a property of the cache rather than > anything else. The use of Intel-style multi-level caching doesn't affect > the argument. > > In a multi-CPU system cache management can become a nightmare: consider > the situation where CPU A gets a cache miss that triggers a cache read > from main memory for a piece of data that has been read and modified by > CPU B, but whose cache has not yet been flushed. This is a common > situation with copy-back caching. In general copy-back caching is faster > than write-through though at the cost of added complexity: all caches > must sniff the address bus and be capable of grabbing it for an immediate > write-back if another CPU is asking for modified data which it holds but > hasn't yet written. Bus snooping is an excellent strategy if you have few enough processor close enough together to put them all on one bus. For larger systems, life gets much more complicated than that. Generally, it is a good idea to avoid situations in which any processor is writing to a cache line at the same time as other processors are accessing it. I don't know how cache line aware the JVM implementations are. Patricia |