From: Arne Vajhøj on
On 21-01-2010 19:15, Martin Gregorie wrote:
> On Thu, 21 Jan 2010 22:23:22 +0000, Tom Anderson wrote:
>> On Mon, 18 Jan 2010, Arne Vajh?j wrote:
>>> When I wrote data I actually meant data.
>>
>> Doh! Sorry, Arne, i completely failed to understand there. You're quite
>> right, of course. And i would imagine that in most applications, reads
>> of data far outweigh reads of code (once you account for the caches). I
>> would be very interested to see numbers for that across different kinds
>> of program, though.
>>
> It depends what you mean by 'read'.
> If you look at the instruction flow into the CPU, i.e. out of any caches
> and into the CPU proper, the instruction flow is considerably larger than
> the data flow in almost any architecture.

Yes.

But the flow from main memory and L3 which are the real slow
ones should have a good chance of reading more data than code.

Arne

From: Martin Gregorie on
On Thu, 21 Jan 2010 22:08:18 -0500, Arne Vajhøj wrote:

> On 21-01-2010 19:15, Martin Gregorie wrote:
>> On Thu, 21 Jan 2010 22:23:22 +0000, Tom Anderson wrote:
>>> On Mon, 18 Jan 2010, Arne Vajh?j wrote:
>>>> When I wrote data I actually meant data.
>>>
>>> Doh! Sorry, Arne, i completely failed to understand there. You're
>>> quite right, of course. And i would imagine that in most applications,
>>> reads of data far outweigh reads of code (once you account for the
>>> caches). I would be very interested to see numbers for that across
>>> different kinds of program, though.
>>>
>> It depends what you mean by 'read'.
>> If you look at the instruction flow into the CPU, i.e. out of any
>> caches and into the CPU proper, the instruction flow is considerably
>> larger than the data flow in almost any architecture.
>
> Yes.
>
> But the flow from main memory and L3 which are the real slow ones should
> have a good chance of reading more data than code.
>
Agreed, but optimising that is a property of the cache rather than
anything else. The use of Intel-style multi-level caching doesn't affect
the argument.

In a multi-CPU system cache management can become a nightmare: consider
the situation where CPU A gets a cache miss that triggers a cache read
from main memory for a piece of data that has been read and modified by
CPU B, but whose cache has not yet been flushed. This is a common
situation with copy-back caching. In general copy-back caching is faster
than write-through though at the cost of added complexity: all caches
must sniff the address bus and be capable of grabbing it for an immediate
write-back if another CPU is asking for modified data which it holds but
hasn't yet written.


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
From: Patricia Shanahan on
Martin Gregorie wrote:
> On Thu, 21 Jan 2010 22:08:18 -0500, Arne Vajhøj wrote:
>
>> On 21-01-2010 19:15, Martin Gregorie wrote:
>>> On Thu, 21 Jan 2010 22:23:22 +0000, Tom Anderson wrote:
>>>> On Mon, 18 Jan 2010, Arne Vajh?j wrote:
>>>>> When I wrote data I actually meant data.
>>>> Doh! Sorry, Arne, i completely failed to understand there. You're
>>>> quite right, of course. And i would imagine that in most applications,
>>>> reads of data far outweigh reads of code (once you account for the
>>>> caches). I would be very interested to see numbers for that across
>>>> different kinds of program, though.
>>>>
>>> It depends what you mean by 'read'.
>>> If you look at the instruction flow into the CPU, i.e. out of any
>>> caches and into the CPU proper, the instruction flow is considerably
>>> larger than the data flow in almost any architecture.
>> Yes.
>>
>> But the flow from main memory and L3 which are the real slow ones should
>> have a good chance of reading more data than code.
>>
> Agreed, but optimising that is a property of the cache rather than
> anything else. The use of Intel-style multi-level caching doesn't affect
> the argument.
>
> In a multi-CPU system cache management can become a nightmare: consider
> the situation where CPU A gets a cache miss that triggers a cache read
> from main memory for a piece of data that has been read and modified by
> CPU B, but whose cache has not yet been flushed. This is a common
> situation with copy-back caching. In general copy-back caching is faster
> than write-through though at the cost of added complexity: all caches
> must sniff the address bus and be capable of grabbing it for an immediate
> write-back if another CPU is asking for modified data which it holds but
> hasn't yet written.

Bus snooping is an excellent strategy if you have few enough processor
close enough together to put them all on one bus. For larger systems,
life gets much more complicated than that.

Generally, it is a good idea to avoid situations in which any processor
is writing to a cache line at the same time as other processors are
accessing it. I don't know how cache line aware the JVM implementations are.

Patricia
First  |  Prev  | 
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Code and Creation 04972
Next: Create grid