Intel details future Larrabee graphics chip [Design]

Prev: LM3478 design gets insanely hot
Next: 89C51ED2

From: Nick Maclaren on 15 Aug 2008 12:56

In article <g84783$1ge$1(a)s1.news.oleane.net>,
=?ISO-8859-15?Q?Jan_Vorbr=FCggen?= <Jan.Vorbrueggen(a)not-thomson.net> writes:
|>
|> > Oh, it was worse than that! After he had done the initial design
|> > (which was reasonable, if not excellent), he was elbowed out, and
|> > half of his design was thrown out to placate the god Benchmarketing.
|> >
|> > The aspect that I remember was that the GUI was brought back from
|> > where he had exiled it to the 'kernel' - and, as we all know, the
|> > GUIs are the source of all ills on modern systems :-(
|>
|> Yep - I think that was part of the 3.51 to 4.0 transition. As I
|> understand it, the thing was just too resource-hungry for the
|> hardware of the day to be marketable in that state.

As I heard it, that was as much an excuse as a reason. What
I heard was that it did perform like a dog, but that didn't
distinguish it from any of the other major releases. And that
problem was temporary.

The other reason I heard was that the GUI (and other components?)
were so repulsive that moving all of their privileged actions to
the other side of an interface (ANY interface) was beyond the
programmers. But they didn't want to admit that, so they mounted
an internal propaganda campaign about the performance.

However, you know what such stories are like. I neither believe
nor disbelieve it.

Regards,
Nick Maclaren.

From: Dirk Bruere at NeoPax on 16 Aug 2008 22:30

Jan Panteltje wrote:
> On a sunny day (Sun, 10 Aug 2008 17:05:31 +0000) it happened ChrisQ
> <blackhole(a)devnull.com> wrote in <g7n75m$vi$1(a)aioe.org>:
>
>> Jan Panteltje wrote:
>>
>>> John Lennon:
>>>
>>> 'You know I am a dreamer' .... ' And I hope you join us someday'
>>>
>>> (well what I remember of it). You should REALLY try to program a Cell
>>> processor some day.
>>>
>>> Dunno what you have against programmers, there are programmaers who
>>> are amazingly clever with hardware resources. I dunno about NT and
>>> MS, but IIRC MS plucked programmers from unis, and sort of
>>> brainwashed them then.. the result we all know.
>>>
>>>
>> That's just the problem - programmers have been so good at hiding the
>> limitations of poorly designed hardware that the whole world thinks
>> that hardware must be perfect and needs no attention other than making
>> it go faster.
>>
>> If you look at some modern i/o device architectures, it's obvious the
>> hardware engineers never gave a second thought about how the thing would
>> be programmed efficiently...
>>
>> Chris (with embedded programmer hat on :-(
>
> Interesting.
> For me, I have a hardware background, but also software, the two
> came together with FPGA, when I wanted to implement DES as fast as possible.
> I did wind up with just a bunch of gates and 1 clock cycle, so no program :-)
> No loops (all unfolded in hardware).
> So, you need to define some boundary between hardware resources (that one used a lot of gates),
> and software resources, I think.

Unless you blur the boundary further by using on-the-fly reprogrammable
gate arrays.

--
Dirk

http://www.transcendence.me.uk/ - Transcendence UK
http://www.theconsensus.org/ - A UK political party
http://www.onetribe.me.uk/wordpress/?cat=5 - Our podcasts on weird stuff

From: Michel Hack on 18 Aug 2008 15:02

On Aug 13, 5:52 pm, "Wilco Dijkstra"
<Wilco.removethisDijks...(a)ntlworld.com> wrote:

> Btw Do you happen to know the reasoning behind signed left shifts being
> undefined while right shifts are implementation defined?

On some machines the high-order bit is shifted out, on others (e.g. S/
370)
it remains unchanged: 0x80000001 << 1 can become 0x80000002 and
not 0x00000002 in a 32-bit register. The S/370 way parallels the
common
sign-propagation method of arithmetic right shifts: the sign does not
change.

From: Wilco Dijkstra on 18 Aug 2008 17:46

"Nick Maclaren" <nmm1(a)cus.cam.ac.uk> wrote in message news:g81arl$6a0$1(a)gemini.csx.cam.ac.uk...
>
> In article <V4Uok.4995$Od3.4795(a)newsfe28.ams2>,
> "Wilco Dijkstra" <Wilco.removethisDijkstra(a)ntlworld.com> writes:
> |>
> |> I'd certainly be interested in the document. My email is above, just make
> |> the obvious edit.
>
> Sent.

Thanks, I've received it, I'll have a look at it soon (it's big...).

> |> > |> I bet that most code will compile and run without too much trouble.
> |> > |> C doesn't allow that much variation in targets. And the variation it
> |> > |> does allow (eg. one-complement) is not something sane CPU
> |> > |> designers would consider nowadays.
> |> >
> |> > The mind boggles. Have you READ the C standard?
> |>
> |> More than that. I've implemented it. Have you?
>
> Some of it, in an extremely hostile environment. However, that is a lot
> LESS than having written programs that get ported to radically different
> systems - especially ones that you haven't heard of when you wrote the
> code. And my code has been so ported, often without any changes needed.

My point is that such weird systems no longer get designed. The world has
standardized on 2-complement, 8-bit char, 32-bit int etc, and that is unlikely
to change. Given that there isn't much variation possible.

Putting in extra effort to allow for a theoretical system with sign-magnitude
5-bit char or a 31-bit one-complement int is completely insane.

> |> It's only when you implement the standard you realise many of the issues are
> |> irrelevant in practice. Take sequence points for example. They are not even
> |> modelled by most compilers, so whatever ambiguities there are, they simply
> |> cannot become an issue.
>
> They are relied on, heavily, by ALL compilers that do any serious
> optimisation. That is why I have seen many problems caused by them,
> and one reason why HPC people still prefer Fortran.

It's only source-to-source optimizers that might need to consider these
issues, but these are very rare (we bought one of the few still available).

Most compilers, including the highly optimizing ones, do almost all
optimization at a far lower level. This not only avoids most of the issues
you're talking about, but it also ensures badly behaved programs are
correctly optimized, while well behaved programs are still optimized
aggressively.

> |> Similarly various standard pendantics are moaning
> |> about shifts not being portable, but they can never mention a compiler that
> |> fails to implement them as expected...
>
> Shifts are portable if you code them according to the rules, and don't
> rely on unspecified behaviour. I have used compilers that treated
> signed right shifts as unsigned, as well as ones that used only the
> bottom 5/6/8 bits of the shift value, and ones that raised a 'signal'
> on left shift overflow. There are good reasons for all of the
> constraints.
>
> No, I can't remember which, offhand, but they included the ones for
> the System/370 and Hitachi S-3600. But there were also some
> microprocessor ones - PA-RISC? Alpha?

S370, Alpha and PA-RISC all support arithmetic right shifts. There
is no information available on the S-3600.

> |> Btw Do you happen to know the reasoning behind signed left shifts being
> |> undefined while right shifts are implementation defined.
>
> Signed left shifts are undefined only if they overflow; that is undefined
> because anything can happen (including the CPU stopping). Signed right
> shifts are only implementation defined for negative values; that is
> because they might be implemented as unsigned shifts.

No. The standard is quite explicit that any left shift of a negative value
is undefined, even if they there is no overflow. This is an inconsistency
as compilers change multiplies by a power of 2 into a left shift and visa
versa. There is no similar undefined behaviour for multiplies however.

> |> It will work as long as the compiler supports a 32-bit type - which it will of
> |> course. But in the infinitesimal chance it doesn't, why couldn't one
> |> emulate a 32-bit type, just like 32-bit systems emulate 64-bit types?
>
> Because then you can't handle the 64-bit objects returned from the
> library or read in from files!

You're missing the point. A theoretical 64-bit CPU that only supports
64-bit operations could emulate support for 8-bit char, 16-bit short,
32-bit int. Without such emulation it would need 64-bit char, 128-bit
short/int, 256-bit int/long in order to support C. Alpha is proof this is
perfectly feasible: the early versions emulated 8/16-bit types in
software without too much overhead.

Once we agree that it is feasible to emulate types, it is reasonable to
mandate that each implemenation supports the sized types.

Wilco

From: Wilco Dijkstra on 18 Aug 2008 18:17

"Terje Mathisen" <terje.mathisen(a)hda.hydro.com> wrote in message news:ibudnfv81sCstDnVnZ2dnUVZ8tHinZ2d(a)giganews.com...
> Wilco Dijkstra wrote:
>> "Terje Mathisen" <terje.mathisen(a)hda.hydro.com> wrote in message news:V92dnbsbmsAAST7VRVnyvwA(a)giganews.com...
>>> How many ways can you define such a function?
>>>
>>> The only serious alternatives would be in the handling of negative-or-zero inputs or when rounding the actual fp
>>> result to integer:
>>>
>>> Do you want the Floor(), i.e. truncate, Ceil() or Round_to_nearest_or_even()?
>>>
>>> Using the latest alternative could make it harder to come up with a perfect implementation, but otherwise it should
>>> be trivial.
>>
>> It was a trivial routine, just floor(log2(x)), so just finding the top bit that is set.
>> The mistakes were things like not handling zero, using signed rather than
>> unsigned variables, looping forever for some inputs, returning the floor result + 1.
>>
>> Rather than just shifting the value right until it becomes zero, it created a mask
>> and shifted it left until it was *larger* than the input (which is not going to work
>> if you use a signed variable for it or if the input has bit 31 set etc).
>>
>> My version was something like:
>>
>> int log2_floor(unsigned x)
>> {
>> int n = -1;
>> for ( ; x != 0; x >>= 1)
>> n++;
>> return n;
>> }
>
> <BG>
>
> That is _identical_ to the code I originally wrote as part of my post, but then deleted as it didn't really add to my
> argument. :-)
>
> There are of course many possible alternative methods, including inline asm to use a hardware bitscan opcode.
>
> Here's a possibly faster version:
>
> int log2_floor(unsigned x)
> {
> int n = -1;
> while (x >= 0x10000) {
> n += 16;
> x >>= 16;
> }
> if (x >= 0x100) {
> n += 8;
> x >>= 8;
> }
> if (x >= 0x10) {
> n += 4;
> x >>= 4;
> }
> /* At this point x has been reduced to the 0-15 range, use a
> * register-internal lookup table:
> */
> uint32_t lookup_table = 0xffffaa50;
> int lookup = (int) (lookup_table >> (x+x)) & 3;
>
> return n + lookup;
> }

I like the lookup in a register method. I once did something like this:

uint8 table[32] = { ... };

int log2_floor(unsigned x)
{
if (x == 0)
return -1;
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x *= 0x... // multiply with magic constant
return table[x >> 27]; // index into table
}

The shifted OR's force all bits after the leading one to be set too. This
reduces the number of possibilities to just 32. The multiply then shifts
the magic constant by N bits. It is chosen so that the top 5 bits end up
containing a unique bitpattern for each of the 32 possible values of x.
It took 10 instructions plus 32 bytes of table. Placing the table immediately
after the return instruction allowed the use of the LDRB r0,[PC,r0,LSR #27]
instruction, so it didn't even need an instruction to create the table address...

Wilco

First | Prev | Next | Last
Pages: 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Prev: LM3478 design gets insanely hot
Next: 89C51ED2