Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: "Andy "Krazy" Glew" on 29 Dec 2009 01:05

nmm1(a)cam.ac.uk wrote:
> That doesn't help significantly. For the real benefit, you need to
> be able to ensure that objects that are accessed in separate threads
> can be separated by the implementation to any degree that it needs.
> That's a problem.

How can you do this with linked datastructures?

E.g. the flavor of sparse array that has cells
(row,col,data,ptr_to_next_in_row,ptr_to_next_in_col)
with linked lists on both dimensions.
Sometimes parallelized by row, sometimes by columns;
sometimes both at the same time.

From: nmm1 on 29 Dec 2009 04:25

In article <MMOdnSSpXI_wwaTWnZ2dnUVZ_tGdnZ2d(a)bestweb.net>,
Mayan Moudgill <mayan(a)bestweb.net> wrote:
>>
>>>Sorry, don't understand. Can you give a more concrete example? What do
>>>you mean by "byte-separation accesses"?
>>
>> The example you give is one such, but consider this:
>>
>> unsigned char flags[1000000000];
>> int offsets[num_threads];
>> /* This passes different sections of the flags to different threads,
>> for updating. ]
>> fred(&flags[offsets[thread]],offsets[thread+1]-offsets[thread]);
>>
>> In the case where the offsets are determined by the algorithm, and
>> cannot conveniently be made multiples of the cache line size. Even
>> when they can be, it is at best disgusting to have to build in such
>> a system-specific value just to ensure that a program is correct.
>> That is the absolutely antithesis of portability!
>
>Sorry; don't see how this suffers from any kind of degradation. Could
>you elaborate?

Because, if simultaneous updates of the same cache line are not
coherent (in whatever sense required by the language), that code is
incorrect. This has nothing to do with performance, and a great
deal to do with programs giving occasional, non-repeatable wrong
answers.

Regards,
Nick Maclaren.

From: nmm1 on 29 Dec 2009 04:29

In article <4B399C3B.2050207(a)patten-glew.net>,
Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote:
>
>> That doesn't help significantly. For the real benefit, you need to
>> be able to ensure that objects that are accessed in separate threads
>> can be separated by the implementation to any degree that it needs.
>> That's a problem.
>
>How can you do this with linked datastructures?
>
>E.g. the flavor of sparse array that has cells
>(row,col,data,ptr_to_next_in_row,ptr_to_next_in_col)
>with linked lists on both dimensions.
>Sometimes parallelized by row, sometimes by columns;
>sometimes both at the same time.

Just as for arrays. You copy in and copy back, when and as necessary.
Yes, it costs, which is why most Fortran compilers use copy-in/copy-out
for arrays only when essential.

My point is that the language must be such as to ALLOW the compiler
to do that. Fortran, Python etc. by and large do. C, C++ and Java,
by and large don't.

Regards,
Nick Maclaren.

From: Terje Mathisen "terje.mathisen at on 29 Dec 2009 04:50

nmm1(a)cam.ac.uk wrote:
> In article<MMOdnSSpXI_wwaTWnZ2dnUVZ_tGdnZ2d(a)bestweb.net>,
> Mayan Moudgill<mayan(a)bestweb.net> wrote:
>> Sorry; don't see how this suffers from any kind of degradation. Could
>> you elaborate?
>
> Because, if simultaneous updates of the same cache line are not
> coherent (in whatever sense required by the language), that code is
> incorrect. This has nothing to do with performance, and a great
> deal to do with programs giving occasional, non-repeatable wrong
> answers.

So then you need the compiler to know that all accesses within a cache
line (or whatever the coherency size is) of both edges of the range are
special, and must be handled with totally separate instructions, right?

Sounds like something it should be trivial to add to any given C
compiler! :-)

Actually it seems to me that a compiler which supports OpenMP or similar
must do quite a bit of this when automatically partitioning a loop...

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Terje Mathisen "terje.mathisen at on 29 Dec 2009 04:56

Andy "Krazy" Glew wrote:
> nmm1(a)cam.ac.uk wrote:
>> That doesn't help significantly. For the real benefit, you need to
>> be able to ensure that objects that are accessed in separate threads
>> can be separated by the implementation to any degree that it needs.
>> That's a problem.
>
> How can you do this with linked datastructures?
>
> E.g. the flavor of sparse array that has cells
> (row,col,data,ptr_to_next_in_row,ptr_to_next_in_col)
> with linked lists on both dimensions.
> Sometimes parallelized by row, sometimes by columns;
> sometimes both at the same time.

This would work with coherency limited to 4/8-byte words, since all of
those items are at least as large as this, ref. our previous discussion
about reduced tracking of 8/16-bit updates.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

First | Prev | Next | Last
Pages: 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
Prev: PEEEEEEP
Next: Texture units as a general function