Larrabee delayed: anyone know what's happening? [Computer Architecture]

Prev: PEEEEEEP
Next: Texture units as a general function

From: Terje Mathisen "terje.mathisen at on 26 Dec 2009 04:43

Stephen Fuld wrote:
> I believe that Burroughs had an I/O backplane, into which different
> cards (called data link processors, or DLPs) could be placed. There were
> different types of DLPs for different peripheral types.
>
> CDC, had separate CPUs, called Peripheral Processors to do I/O.
>
> I don't know about any others.

Norsk Data's ND100/ND500 was the big brothers of the original ND10,
which btw was a _very_ good realtime control system design:

It had 16 sets of registers, along with 15 interrupt priorities on top
of the base level, so to handle an interrupt the cpu would just switch
to the corresponding register set and get going in a single cycle or two.

The problem was the ND never got a real operating system for the ND10
machines, so a lot of customer hardware was supported by user-level
machine code talking directly to the interfaces.

A few years later ND100 changed word lengths and lot of other things, so
to be able to keep customer code running, they bolted on an ND10 on the
front, to handle all IO.

This meant that a file transfer program, talking to the terminal
interface at one end and the file system (disks) at the other, would run
completely inside the IO cpu, using close to zero "real" cpu time, right?

A few seconds later, the host OS would notice the idle process and
deschedule it, taking away those few cycles needed to keep the IO part
running, and the file transfer would slow down by 10x.

The solution was of course to let the main (host) program spin around in
a loop, pretending to be busy. :-(

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Mayan Moudgill on 26 Dec 2009 06:53

Bernd Paysan wrote:
> Mayan Moudgill wrote:
>
>
>>Bernd Paysan wrote:
>>
>> >
>> > Sending chunks of code around which are automatically executed by
>> > the receiver is called "active messages".
>>
>>I'm not so sure. The original Active Messages stuff from Thorsten von
>>Eicken et.al. was more like passing a pointer to a user space
>>interrupt handler along with an inter-processsor message, so that the
>>message could be handled with zero-copies/low-latency (OK, it wasn't
>>always quite that - but its close in flavor). The interrupt handler
>>code was already resident on the processor.
>
>
> The original stuff is obviously a specialization. The idea is that the
> message "processes itself", right at arrival. Threaded code (i.e. a
> sequence of pointers to programs in memory) is certainly a possible
> subset, and a single pointer is certainly a subset of threaded code.
>
>
>>I've never heard of pushing code for execution on another processor
>>being called "active messages" - citations? references?
>
>
> How would you call it?
>

Good question.

Clearly the issue does not arise in shared memory systems, since that
code is in memory accessible at both the source and destination; you'd
simply pass the pointer to the code (though, I guess, in principle you
might push the code copy as an optimization).

In a multi-OS-copy distributed system (e.g. networked machines), the OS
has to have provisions for the interconnect hardware to respond to
messages, extract the data (device drivers) and process it (daemons). If
the daemon can process commands (e.g. sshd), then it meets the criteria
for user-pushed code. But they don't really call it something special.

However, I think the specific case you are looking for is one where the
device driver itself extracts and executes the embedded commands/code in
the message/packet in *device-driver space* - some implementations of
iSCSI targets *might* qualify. Again, I haven't encountered any special
name for it.

Of course, things get a little more interesting in the case of
single-OS-image non-shared-memory systems - distributed OSes. They would
avoid some of the protection issues, and might be able to really execute
messages safely in the driver. But I don't remember anything relevant.
Python on Amoeba, maybe? I don't know if it Amoeba had any drivers that
would interpret Python directly.

I've been assuming a somewhat standard ethernet hardware + TCP/IP stack
model, where the earliest point that the driver will try to
interpret/execute messages is fairly high in the stack. On a system with
specialized drivers (and maybe even specialized interconnects), it *may*
be possible and desirable to interpret commands at a lower level. I'm
wondering whether any transputer based systems did this, and if so, what
they named this idiom.

So: summarizing - I still don't think active messages is the right name.
I haven't encountered any real-life instances where people actually send
code to be executed (or even interpreted) at a low-level inside the
device driver. Even the active message people did not - they sent code
pointers, rather than code.

From: nmm1 on 26 Dec 2009 06:54

In article <jg0d07-mtb.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>Andy "Krazy" Glew wrote:
>>>>
>>>> I.e. I am suspecting that full cache coherency is overkill, but that
>>>> completely eliminating cache coherency is underkill.

That is probably correct.

>>> I agree, and I think most programmers will be happy with word-size
>>> tracking, i.e. we assume all char/byte operations happens on private
>>> memory ranges.
>>
>> Doesn't that fly in the face of the Alpha experience, where originally
>> they did not have byte memory operations, but were eventually forced to?
>>
>> Why? What changed? What is different?
>
>What's different is easy:
>
>I am not proposing we get rid of byte-sized memory operations, "only"
>that we don't promise they will be globally consistent, i.e. you only
>use 8/16-bit operations on private memory blocks.

And it is that which flies in the face of the Alpha experience. Sorry,
Terje, but you have (unusually for you) missed the key requirements.

>> I suspect that there is at least some user level parallel code that
>> assumes byte writes are - what is the proper term? Atomic? Non-lossy?
>> Not implemented via a non-atomic RMW?
>
>There might be some such code somewhere, but only by accident, I don't
>believe anyone is using it intentionally: You want semaphores to be
>separated at least by a cache line if you care about performance, but I
>guess it is conceivable some old coder decided to pack all his lock
>variables into a single byte range.

Andy is right, I am afraid. You aren't thinking of the right problem;
it's not the atomic/synchronisation requirements that are the issue.

Consider a program that has a large array of one-byte flags, and needs
to update that array in parallel. No single flag will be accessed by
two different threads without synchronisation, but the adjacency
properties are such that it cannot practically be separated into one
array for each thread. You can get the same issue by splitting up a
very large string (think DNA or whatever).

ALL current mainstream lamguages have built the concept that memory
independence occurs at the byte level into their basic memory model,
and are building their parallel features on top of that.

And, because, subsetting objects and passing pointers can be done
separately, compiled code can't tell if it has a subset of such a
shared structure or a structure that is private to the thread.

Regards,
Nick Maclaren.

From: nmm1 on 26 Dec 2009 07:49

In article <7pk6fnF8n6U1(a)mid.individual.net>,
Del Cecchi <delcecchiofthenorth(a)gmail.com> wrote:
>"Anton Ertl" <anton(a)mips.complang.tuwien.ac.at> wrote in message
>news:2009Dec25.142847(a)mips.complang.tuwien.ac.at...
>>
>> There is something wrong with your quoting:
>
>Sorry for that. The quoting pecularities seem to only by on Robert
>Meyers for some odd reason. This quoting appears to me to be ok on
>this reply, for example.

I wondered what the hell was going on when I saw one of your postings,
which contained Robert Myers's, er, usual flattery unquoted. It didn't
seem plausible for you, and he isn't a nymshifter.

Regards,
Nick Maclaren.

From: Del Cecchi on 26 Dec 2009 09:41

<nmm1(a)cam.ac.uk> wrote in message
news:hh50p0$mtr$1(a)smaug.linux.pwf.cam.ac.uk...
> In article <7pk6fnF8n6U1(a)mid.individual.net>,
> Del Cecchi <delcecchiofthenorth(a)gmail.com> wrote:
>>"Anton Ertl" <anton(a)mips.complang.tuwien.ac.at> wrote in message
>>news:2009Dec25.142847(a)mips.complang.tuwien.ac.at...
>>>
>>> There is something wrong with your quoting:
>>
>>Sorry for that. The quoting pecularities seem to only by on Robert
>>Meyers for some odd reason. This quoting appears to me to be ok on
>>this reply, for example.
>
> I wondered what the hell was going on when I saw one of your
> postings,
> which contained Robert Myers's, er, usual flattery unquoted. It
> didn't
> seem plausible for you, and he isn't a nymshifter.
>
>
> Regards,
> Nick Maclaren.

Apparently Robert is the only one posting (that I might respond to)
via Google Groups. Through my usual elegant trial and error I finally
figured out what was going on. Apparently it is an interaction
between Outlook express and google groups.

Outlook express is selected because of the "block sender" menu item
that also deletes current headers from that sender, unlike the filters
in thunderbird. (If someone can tell me how to get tbird to filter a
sender AFTER the headers have been downloaded I would be grateful).

del

First | Prev | Next | Last
Pages: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Prev: PEEEEEEP
Next: Texture units as a general function