Itanium had appeal [Computer Architecture]

Prev: Interesting: Is IBM considering an OS/2 redo?
Next: arch ne age

From: Anton Ertl on 28 Apr 2010 15:33

"nedbrek" <nedbrek(a)yahoo.com> writes:
>Hello all,
>
>"Anton Ertl" <anton(a)mips.complang.tuwien.ac.at> wrote in message
>news:2010Apr23.175039(a)mips.complang.tuwien.ac.at...
>> Robert Myers <rbmyersusa(a)gmail.com> writes:
>>>On Apr 22, 7:30=A0am, "nedbrek" <nedb...(a)yahoo.com> wrote:
>>>> The irony in Itanium was that the compiler would only use software
>>>> pipelining in floating point code (i.e. short code segments).
>>>> I think the memcpy in libc used it too.
>>>> That accounted for the only times I saw it in integer code.
>>
>> Does it have rotation for integer registers?
>
>Yes, unless my memory has completely failed...

I looked it up in the meantime. Yes, it has.

>> IIRC I read about the hardware for transparent register stack engine
>> operation not working, requiring a fallback to exception-driven
>> software spilling and refilling. That would not be a big problem on
>> most workloads. AFAIK SPARC and AMD29k have always used
>> exception-driven software spilling and refilling.
>
>There were multiple modes the RSE could be put into. The most agressive
>would load and store registers asynchronously. This is the mode which is
>much lamented and never used (I think it could be made to work, but never
>studied it).
>
>I am unaware of any problems with the least agressive mode (load/store on
>demand, no exception needed).

Yes, I probably read about the asynchronous and synchronous hardware
modes, and then forgot about the synchronous mode being hardware.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

From: Anton Ertl on 28 Apr 2010 15:37

Terje Mathisen <"terje.mathisen at tmsw.no"> writes:
>Anton Ertl wrote:
>> Terje Mathisen<"terje.mathisen at tmsw.no"> writes:
>>> The ALAT is similar to LLSC in that it will detect all modifications,
>>> including a rewrite of the same value.
>>
>> I don't think that the ALAT does any better. It will notice that
>> something was stored at the loaded address, but it won't notice if
>> somebody changed something that dependent loads accessed, so the
>> program will have to check them with the ALAT in all situations where
>> they will have to do it with the double-loading approach.
>
>No, the critical difference is the ability to detect all writes to
>protected locations, including rewrites of the same value:
>
>This means that any app which blindly writes the entire chain when
>updating would only need to check the start of the chain.

You lost me completely. The ALAT is about loads. If there is a chain
of writes, none of them would have a corresponding check instruction.
But I guess I misunderstand what you are trying to say. Could you
give an example?

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

From: Rick Jones on 28 Apr 2010 16:04

Anton Ertl <anton(a)mips.complang.tuwien.ac.at> wrote:
> Rick Jones <rick.jones2(a)hp.com> writes:

> >SPECcpu2006 explicitly disallows PBO in base and only allows it in
> >peak. That was a change from SPECcpu2000, which allowed PBO in
> >both.

> Good; thanks for that update. And the IA-64 Cint and CFP results I
> see on the SPEC website contain both baseline and "result" (peak)
> columns; also good.

> Of course, SPEC CPU is still designed in other aspects in ways that
> make it unrepresentative for much of the computer use I (and, I think,
> many others) experience, and therefore make it not very suitable (at
> least without complementing benchmarks) for the purposes that it is
> often used for: as an indicator of computer performance; and for
> directing compiler optimization development. Some things that come to
> mind are that process creation and dynamic linking overhead is much
> more important in my usage than in SPEC CPU.

Not sure if it is past various deadlines, and without getting into a
discussion of the extent to which process creation and dynamic linking
performance are within the scope of SPECcpu I will point-out that SPEC
is often looking for benchmarks:

http://www.spec.org/cpuv6/

rick jones
--
the road to hell is paved with business decisions...
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

From: Anton Ertl on 28 Apr 2010 15:42

MitchAlsup <MitchAlsup(a)aol.com> writes:
>On Apr 27, 4:36=A0pm, Terje Mathisen <"terje.mathisen at tmsw.no">
>wrote:
>> Anton Ertl wrote:
>> > Terje Mathisen<"terje.mathisen at tmsw.no"> =A0writes:
>> >> The ALAT is similar to LLSC in that it will detect all modifications,
>> >> including a rewrite of the same value.
>>
>> > I don't think that the ALAT does any better. =A0It will notice that
>> > something was stored at the loaded address, but it won't notice if
>> > somebody changed something that dependent loads accessed, so the
>> > program will have to check them with the ALAT in all situations where
>> > they will have to do it with the double-loading approach.
>>
>> No, the critical difference is the ability to detect all writes to
>> protected locations, including rewrites of the same value:
>>
>> This means that any app which blindly writes the entire chain when
>> updating would only need to check the start of the chain.
>
>With respect to the ABA problem and synchronization, a rewrite of a
>critical location even with the same resulting bit pattern is still an
>event that must terminate a synchronization attempt. At the very
>minimum, it is exceedingly dangerous to allow a synchronize to assume
>that it is successful when some other party has write-touched one of
>its critical storage locations. This was THE situation where the ABA
>problem acquired its name.

What you wrote would be called the AA problem, if it was a problem.
But I never heard about that, so I guess it isn't a problem.

In any case, AFAIK ALAT is not there as an implementation of LL/SC.
Or maybe it is, but then I don't know about that.

What I was writing about was speculative execution of a load before a
store in the same thread that is logically before the load. And for
that purpose ABA is no problem.

As for compiling the loads in the order that the memory model
requires, and what this means for the Daisy approach: yes, this may
require more loads than would be necessary if you had a very weak
memory model (like Alpha), but it only makes a difference between the
Daisy approach and the ALAT in cases that I don't expect to occur in
practice. Here's an example:

Original code:
*w=a;
y=*x;
z=*y;

Now we want to move the first load up above the store, but need to
check against aliasing:

y=*x;
*w=a;
check y=*x -> fixup;
cont: z=*y;
....

fixup:
y=*x;
goto cont;

Now assume we want to move the second load above the check, but not
above the branch (I don't see why one would want to do this, except to
create a contrived example:-):

y=*x;
*w=a;
z=*y;
check y=*x -> fixup;
cont: ...

fixup:
y=*x;
z=*y;
goto cont;

Here another thread might change *x in an ABA way, and change *y after
changing *x back to A, so we also have to check the second load if we
use the Daisy approach. With the ALAT, this code would probably be
ok. So the Daisy-correct variant of the example above is:

y=*x;
*w=a;
z=*y;
check y=*x -> fixup;
check z=*y -> fixup2;
cont: ...

fixup: y=*x;
fixup2: z=*y;
goto cont;

Anyway, given that one will usually not move a load between the store
and the check, this difference between the approaches will rarely have
an effect.

If we move the second load above the store, we need the second check
with either approach.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

From: MitchAlsup on 28 Apr 2010 17:13

On Apr 28, 2:42 pm, an...(a)mips.complang.tuwien.ac.at (Anton Ertl)
wrote:
> MitchAlsup <MitchAl...(a)aol.com> writes:
> >On Apr 27, 4:36=A0pm, Terje Mathisen <"terje.mathisen at tmsw.no">
> >wrote:
> >> Anton Ertl wrote:
> >> > Terje Mathisen<"terje.mathisen at tmsw.no"> =A0writes:
> >> >> The ALAT is similar to LLSC in that it will detect all modifications,
> >> >> including a rewrite of the same value.
>
> >> > I don't think that the ALAT does any better. =A0It will notice that
> >> > something was stored at the loaded address, but it won't notice if
> >> > somebody changed something that dependent loads accessed, so the
> >> > program will have to check them with the ALAT in all situations where
> >> > they will have to do it with the double-loading approach.
>
> >> No, the critical difference is the ability to detect all writes to
> >> protected locations, including rewrites of the same value:
>
> >> This means that any app which blindly writes the entire chain when
> >> updating would only need to check the start of the chain.
>
> >With respect to the ABA problem and synchronization, a rewrite of a
> >critical location even with the same resulting bit pattern is still an
> >event that must terminate a synchronization attempt. At the very
> >minimum, it is exceedingly dangerous to allow a synchronize to assume
> >that it is successful when some other party has write-touched one of
> >its critical storage locations. This was THE situation where the ABA
> >problem acquired its name.
>
> What you wrote would be called the AA problem, if it was a problem.
> But I never heard about that, so I guess it isn't a problem.
>
> In any case, AFAIK ALAT is not there as an implementation of LL/SC.
> Or maybe it is, but then I don't know about that.
>
> What I was writing about was speculative execution of a load before a
> store in the same thread that is logically before the load. And for
> that purpose ABA is no problem.

Outside of synchronization events, I don't see how it hurts anything
if the value loaded matches the value predicted to be loaded. Within
synchronization events, on the valriables used in synchronizing, I
can. Discriminating those loads that are participating in a
synchronization event from those that do not is tricky in most
instruction sets. {Where tricky genrally means dangerous to extremely
dangerous.}

Mitch

First | Prev | Next | Last
Pages: 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Prev: Interesting: Is IBM considering an OS/2 redo?
Next: arch ne age