From: Martin Brown on
On 10/06/2010 19:32, John Larkin wrote:
> On Thu, 10 Jun 2010 18:16:59 +0100, Martin Brown
> <|||newspam|||@nezumi.demon.co.uk> wrote:
>
>> On 10/06/2010 16:52, John Larkin wrote:
>>> On Thu, 10 Jun 2010 06:56:56 -0700 (PDT), MooseFET
>>> <kensmith(a)rahul.net> wrote:
>>>
>>>> On Jun 1, 11:07 am, John Larkin
>>>> <jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote:
>>>>> http://online.wsj.com/article/SB1000142405274870340660457527867166190...
>>>>>
>>>>> John
>>>>
>>>> 50 seems an odd number. I would expect a power of 2 or a power of 3
>>>> number of cores.
>>>
>>> Maybe they did 64 and only get 50 to work?
>>
>> From what limited technical info has leaked out it seems the chips are
>> 32 core so heaven knows where this 50 core number comes from. Perhaps a
>> pair of the prototype 32 core chips have 50 working cores between them.
>>
>> The whole thing appears to be PR fluff for Wall Street and investors to
>> drool over - there is very little about it on their website.
>>>
>>>>
>>>> The power of 2 number is just because things tend to be doubled and
>>>> doubled etc.
>>>>
>>>> The power of 3 number is because if you imagine a hypercube
>>>> like arrangement where each side is a bus for communication
>>>> directly between cores, it makes sense to have 3 processors
>>>> on a bus because while A and B are talking, C can't be having
>>>> a conversation with either. This would allow the array or cores
>>>> to get information quickly between themselves. It assumes
>>>> that they each have a cache that the transfer works to sync.
>>>>
>>>> At some point, adding more of the same cores stops working
>>>> as well as adding some special purpose hardware to a fraction
>>>> of the cores.
>>>>
>>>> Not every core needs to be able to do a floating point at all.
>>>> Some would be able to profit from a complex number ALU
>>>> or perhaps a 3D alu.
>>>>
>>>> Chances are, one core would get stuck with the disk I/O etc
>>>> that core would profit from having fast interrupt times. The
>>>> others less so.
>>>
>>> Eventually we'll have a CPU as every device driver, and a CPU for
>>> every program thread, with real execution protection. No more buffer
>>> overflow exploits, no more crashed OSs, no more memory leaks.
>>
>> And replace them with horrendous memory contention and cache coherency
>> problems - great!
>
> Synchronous logic. Hardware semaphores. Absolute hardware protections.
> OS that shares nothing with apps. Bulletproof.

Absolute hardware protection can be done on one CPU with segmented
architecture and a viciously defensive TLB. Even better if you use
Harvard architecture which for obvious reasons prevents data execution.

If your multi-CPUs share a common flat address space as is currently in
vogue any protection your separate physical cores offer is largely
illusory. You would be better off with virtual CPUs and a tiny
hypervisor with slightly paranoid behaviour watching over them.

Hardware contention is trivial for two CPUs, requires slight though for
three (but is still trivial) and can go wrong for four. Contention
issues for multiple CPUs N scale as N(N-1) this becomes non-trivial for
N>4 - at least if you want to gain some performance from adding the
extra CPU.
>
>>
>> Mickeysoft can barely cope with programming on 4 cores.
>
> That's because they are trying to run Windows on more cores, and
> worse, often trying to distribute one computational problem among
> multiple cores. That is insane.

Distributing some computational problems across multiple cores is the
only way to get them done fast enough. That is how just about all the
ray tracing engines in fancy graphics cards do it. SIMD.

Where it gets tricky is when you split up a complex task and do not
understand what you are doing. All too common in software these days and
also pretty common in hardware.

Regards,
Martin Brown
From: John Larkin on
On Thu, 10 Jun 2010 21:15:45 +0100, Martin Brown
<|||newspam|||@nezumi.demon.co.uk> wrote:

>On 10/06/2010 19:32, John Larkin wrote:
>> On Thu, 10 Jun 2010 18:16:59 +0100, Martin Brown
>> <|||newspam|||@nezumi.demon.co.uk> wrote:
>>
>>> On 10/06/2010 16:52, John Larkin wrote:
>>>> On Thu, 10 Jun 2010 06:56:56 -0700 (PDT), MooseFET
>>>> <kensmith(a)rahul.net> wrote:
>>>>
>>>>> On Jun 1, 11:07 am, John Larkin
>>>>> <jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote:
>>>>>> http://online.wsj.com/article/SB1000142405274870340660457527867166190...
>>>>>>
>>>>>> John
>>>>>
>>>>> 50 seems an odd number. I would expect a power of 2 or a power of 3
>>>>> number of cores.
>>>>
>>>> Maybe they did 64 and only get 50 to work?
>>>
>>> From what limited technical info has leaked out it seems the chips are
>>> 32 core so heaven knows where this 50 core number comes from. Perhaps a
>>> pair of the prototype 32 core chips have 50 working cores between them.
>>>
>>> The whole thing appears to be PR fluff for Wall Street and investors to
>>> drool over - there is very little about it on their website.
>>>>
>>>>>
>>>>> The power of 2 number is just because things tend to be doubled and
>>>>> doubled etc.
>>>>>
>>>>> The power of 3 number is because if you imagine a hypercube
>>>>> like arrangement where each side is a bus for communication
>>>>> directly between cores, it makes sense to have 3 processors
>>>>> on a bus because while A and B are talking, C can't be having
>>>>> a conversation with either. This would allow the array or cores
>>>>> to get information quickly between themselves. It assumes
>>>>> that they each have a cache that the transfer works to sync.
>>>>>
>>>>> At some point, adding more of the same cores stops working
>>>>> as well as adding some special purpose hardware to a fraction
>>>>> of the cores.
>>>>>
>>>>> Not every core needs to be able to do a floating point at all.
>>>>> Some would be able to profit from a complex number ALU
>>>>> or perhaps a 3D alu.
>>>>>
>>>>> Chances are, one core would get stuck with the disk I/O etc
>>>>> that core would profit from having fast interrupt times. The
>>>>> others less so.
>>>>
>>>> Eventually we'll have a CPU as every device driver, and a CPU for
>>>> every program thread, with real execution protection. No more buffer
>>>> overflow exploits, no more crashed OSs, no more memory leaks.
>>>
>>> And replace them with horrendous memory contention and cache coherency
>>> problems - great!
>>
>> Synchronous logic. Hardware semaphores. Absolute hardware protections.
>> OS that shares nothing with apps. Bulletproof.
>
>Absolute hardware protection can be done on one CPU with segmented
>architecture and a viciously defensive TLB. Even better if you use
>Harvard architecture which for obvious reasons prevents data execution.
>
>If your multi-CPUs share a common flat address space as is currently in
>vogue any protection your separate physical cores offer is largely
>illusory. You would be better off with virtual CPUs and a tiny
>hypervisor with slightly paranoid behaviour watching over them.

I was thinking that each would have an MMU (a real MMU, with serious
privilige categories, not an Intel toy) that was controlled by the OS
CPU, not by the local one. DRAM is cheap, so dump virtual memory and
make the world a better place.

>
>Distributing some computational problems across multiple cores is the
>only way to get them done fast enough. That is how just about all the
>ray tracing engines in fancy graphics cards do it. SIMD.

Let a video card do that. PCs don't need speed all that much anymore.
By major speed problem is that Windows has so much overhead, and
everything slows down by *seconds* whenever Windows gets a little
confused. Multiple CPUs would fix that and, in real life, be a lot
faster. Ordinary people don't do computational fluid dynamics sort of
stuff.

John

From: Paul Keinanen on
On Thu, 10 Jun 2010 21:15:45 +0100, Martin Brown
<|||newspam|||@nezumi.demon.co.uk> wrote:

>Absolute hardware protection can be done on one CPU with segmented
>architecture and a viciously defensive TLB. Even better if you use
>Harvard architecture which for obvious reasons prevents data execution.
>
>If your multi-CPUs share a common flat address space as is currently in
>vogue any protection your separate physical cores offer is largely
>illusory. You would be better off with virtual CPUs and a tiny
>hypervisor with slightly paranoid behaviour watching over them.

If you are sharing the same RAM chips between multiple cores, you are
still going to end up with a single (physical) address space.

Execution prevention as well as read only data pages has been done by
TLBs in mid 1970's minicomputers, so this is not really anything new.

Of course, in a multi core system each core must have their own TLBs
and must have a trusted method to set up these TLBs.

Having separate TLBs for each core is not so bad, since even now, some
architectures have the TaskId as part of the virtual address, thus, a
full TLB reload is not required during task switching.

From: MooseFET on
On Jun 10, 11:52 pm, John Larkin
<jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote:
> On Thu, 10 Jun 2010 06:56:56 -0700 (PDT), MooseFET
>
> <kensm...(a)rahul.net> wrote:
> >On Jun 1, 11:07 am, John Larkin
> ><jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote:
> >>http://online.wsj.com/article/SB1000142405274870340660457527867166190...
>
> >> John
>
> >50 seems an odd number. I would expect a power of 2 or a power of 3
> >number of cores.
>
> Maybe they did 64 and only get 50 to work?
>
>
>
>
>
> >The power of 2 number is just because things tend to be doubled and
> >doubled etc.
>
> >The power of 3 number is because if you imagine a hypercube
> >like arrangement where each side is a bus for communication
> >directly between cores, it makes sense to have 3 processors
> >on a bus because while A and B are talking, C can't be having
> >a conversation with either. This would allow the array or cores
> >to get information quickly between themselves. It assumes
> >that they each have a cache that the transfer works to sync.
>
> >At some point, adding more of the same cores stops working
> >as well as adding some special purpose hardware to a fraction
> >of the cores.
>
> >Not every core needs to be able to do a floating point at all.
> >Some would be able to profit from a complex number ALU
> >or perhaps a 3D alu.
>
> >Chances are, one core would get stuck with the disk I/O etc
> >that core would profit from having fast interrupt times. The
> >others less so.
>
> Eventually we'll have a CPU as every device driver, and a CPU for
> every program thread, with real execution protection. No more buffer
> overflow exploits, no more crashed OSs, no more memory leaks.

Multiple cores will be able to do all of those things and more.
There will be a large shared memory space to allow great
gobs of data to be handed back and forth. This will be where
one CPU can step on the output of another as it is being handed
off to the 3rd and 4th. When running the multi-core version of
Windows-9, there will still be crashes and the computer will still
be just fast enough to run Freecell.


Thinking about doing something like a sort on a multicore machine
with caches on each core has started me thinking about a bit of
code I wrote a long time ago. It was a sort of files up in the
megabyte size range when RAM was restricted to 48K of free
space. The trick to making it go fast is to sort chunks that will
fit into memory and then do a merge operation on the sorted
chunks. I nested the merge operation partly within the sort to
save one level of read-process-write.
>
> John

From: John Larkin on
On Fri, 11 Jun 2010 08:46:27 +0300, Paul Keinanen <keinanen(a)sci.fi>
wrote:

>On Thu, 10 Jun 2010 21:15:45 +0100, Martin Brown
><|||newspam|||@nezumi.demon.co.uk> wrote:
>
>>Absolute hardware protection can be done on one CPU with segmented
>>architecture and a viciously defensive TLB. Even better if you use
>>Harvard architecture which for obvious reasons prevents data execution.
>>
>>If your multi-CPUs share a common flat address space as is currently in
>>vogue any protection your separate physical cores offer is largely
>>illusory. You would be better off with virtual CPUs and a tiny
>>hypervisor with slightly paranoid behaviour watching over them.
>
>If you are sharing the same RAM chips between multiple cores, you are
>still going to end up with a single (physical) address space.
>
>Execution prevention as well as read only data pages has been done by
>TLBs in mid 1970's minicomputers, so this is not really anything new.
>
>Of course, in a multi core system each core must have their own TLBs
>and must have a trusted method to set up these TLBs.
>
>Having separate TLBs for each core is not so bad, since even now, some
>architectures have the TaskId as part of the virtual address, thus, a
>full TLB reload is not required during task switching.
>

Right. And if you dump virtual addressing, you don't need a gigantic
number of mapping registers.

John