From: JosephKK on 11 Jun 2010 22:12 On Fri, 11 Jun 2010 07:01:41 -0700, John Larkin <jjlarkin(a)highNOTlandTHIStechnologyPART.com> wrote: >On Fri, 11 Jun 2010 08:46:27 +0300, Paul Keinanen <keinanen(a)sci.fi> >wrote: > >>On Thu, 10 Jun 2010 21:15:45 +0100, Martin Brown >><|||newspam|||@nezumi.demon.co.uk> wrote: >> >>>Absolute hardware protection can be done on one CPU with segmented >>>architecture and a viciously defensive TLB. Even better if you use >>>Harvard architecture which for obvious reasons prevents data execution. >>> >>>If your multi-CPUs share a common flat address space as is currently in >>>vogue any protection your separate physical cores offer is largely >>>illusory. You would be better off with virtual CPUs and a tiny >>>hypervisor with slightly paranoid behaviour watching over them. >> >>If you are sharing the same RAM chips between multiple cores, you are >>still going to end up with a single (physical) address space. >> >>Execution prevention as well as read only data pages has been done by >>TLBs in mid 1970's minicomputers, so this is not really anything new. >> >>Of course, in a multi core system each core must have their own TLBs >>and must have a trusted method to set up these TLBs. >> >>Having separate TLBs for each core is not so bad, since even now, some >>architectures have the TaskId as part of the virtual address, thus, a >>full TLB reload is not required during task switching. >> > >Right. And if you dump virtual addressing, you don't need a gigantic >number of mapping registers. > >John It has been a while since i have seen a cogent thought on the subject expressed.
From: JosephKK on 11 Jun 2010 22:15 On Fri, 11 Jun 2010 07:19:23 -0700 (PDT), MooseFET <kensmith(a)rahul.net> wrote: >On Jun 11, 10:01 pm, John Larkin ><jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote: >> On Fri, 11 Jun 2010 08:46:27 +0300, Paul Keinanen <keina...(a)sci.fi> >> wrote: >> >> >> >> >On Thu, 10 Jun 2010 21:15:45 +0100, Martin Brown >> ><|||newspam...(a)nezumi.demon.co.uk> wrote: >> >> >>Absolute hardware protection can be done on one CPU with segmented >> >>architecture and a viciously defensive TLB. Even better if you use >> >>Harvard architecture which for obvious reasons prevents data execution. >> >> >>If your multi-CPUs share a common flat address space as is currently in >> >>vogue any protection your separate physical cores offer is largely >> >>illusory. You would be better off with virtual CPUs and a tiny >> >>hypervisor with slightly paranoid behaviour watching over them. >> >> >If you are sharing the same RAM chips between multiple cores, you are >> >still going to end up with a single (physical) address space. >> >> >Execution prevention as well as read only data pages has been done by >> >TLBs in mid 1970's minicomputers, so this is not really anything new. >> >> >Of course, in a multi core system each core must have their own TLBs >> >and must have a trusted method to set up these TLBs. >> >> >Having separate TLBs for each core is not so bad, since even now, some >> >architectures have the TaskId as part of the virtual address, thus, a >> >full TLB reload is not required during task switching. >> >> Right. And if you dump virtual addressing, you don't need a gigantic >> number of mapping registers. > >If you keep some virtual addressing, you can make the code in each >core >not need to know where its address space actually lives in the grand >scheme of things. Instead of an address map per task, you'd have >one per core. This would only need to apply to the portion of memory >that is shared. The private memory would not be virtual. > >when you do a fork(), a big bunch of stuff needs to fly from one >core's >private space to the new one. >> This would result in fork used only by fairly lightweight manager processes.
From: JosephKK on 11 Jun 2010 22:23 On Fri, 11 Jun 2010 06:59:35 -0700 (PDT), MooseFET <kensmith(a)rahul.net> wrote: >On Jun 10, 11:52 pm, John Larkin ><jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote: >> On Thu, 10 Jun 2010 06:56:56 -0700 (PDT), MooseFET >> >> <kensm...(a)rahul.net> wrote: >> >On Jun 1, 11:07 am, John Larkin >> ><jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote: >> >>http://online.wsj.com/article/SB1000142405274870340660457527867166190.... >> >> >> John >> >> >50 seems an odd number. I would expect a power of 2 or a power of 3 >> >number of cores. >> >> Maybe they did 64 and only get 50 to work? >> >> >> >> >> >> >The power of 2 number is just because things tend to be doubled and >> >doubled etc. >> >> >The power of 3 number is because if you imagine a hypercube >> >like arrangement where each side is a bus for communication >> >directly between cores, it makes sense to have 3 processors >> >on a bus because while A and B are talking, C can't be having >> >a conversation with either. This would allow the array or cores >> >to get information quickly between themselves. It assumes >> >that they each have a cache that the transfer works to sync. >> >> >At some point, adding more of the same cores stops working >> >as well as adding some special purpose hardware to a fraction >> >of the cores. >> >> >Not every core needs to be able to do a floating point at all. >> >Some would be able to profit from a complex number ALU >> >or perhaps a 3D alu. >> >> >Chances are, one core would get stuck with the disk I/O etc >> >that core would profit from having fast interrupt times. The >> >others less so. >> >> Eventually we'll have a CPU as every device driver, and a CPU for >> every program thread, with real execution protection. No more buffer >> overflow exploits, no more crashed OSs, no more memory leaks. > >Multiple cores will be able to do all of those things and more. >There will be a large shared memory space to allow great >gobs of data to be handed back and forth. This will be where >one CPU can step on the output of another as it is being handed >off to the 3rd and 4th. When running the multi-core version of >Windows-9, there will still be crashes and the computer will still >be just fast enough to run Freecell. > > >Thinking about doing something like a sort on a multicore machine >with caches on each core has started me thinking about a bit of >code I wrote a long time ago. It was a sort of files up in the >megabyte size range when RAM was restricted to 48K of free >space. The trick to making it go fast is to sort chunks that will >fit into memory and then do a merge operation on the sorted >chunks. I nested the merge operation partly within the sort to >save one level of read-process-write. So you re-invented parts of merge sort, almost cool.
From: MooseFET on 11 Jun 2010 23:05 On Jun 11, 11:21 pm, John Larkin <jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote: > On Fri, 11 Jun 2010 06:59:35 -0700 (PDT), MooseFET [...] > > >Thinking about doing something like a sort on a multicore machine > >with caches on each core has started me thinking about a bit of > >code I wrote a long time ago. It was a sort of files up in the > >megabyte size range when RAM was restricted to 48K of free > >space. The trick to making it go fast is to sort chunks that will > >fit into memory and then do a merge operation on the sorted > >chunks. I nested the merge operation partly within the sort to > >save one level of read-process-write. > > If you think of multicore as a way to get speed through parallelism, > it will always be tough. If you are willing to waste flipflops to make > a system brutally reliable, multicore is the way to go. > > I don't want more speed. I want reliability. For some tasks, we need both. A big part of gaining speed through multiple cores will be in picking what tasks can be spread out over many. Another issue is to make sure that code that lives on the far side of a cache knows when to do a bus locked operation and when it is safe not to. If you are doing a non-blocking table of pointers to a bunch of structures, you can malloc a chunk of ram for the structure and assume that nobody else will write into it while you make the structure. When you go to add the structure to the shared list of pointers, however, you had better use a "compare and swap" for that update to make sure that nobody else does the write to the same cell. I can also imagine a serious increase in speed in the running of Spice. Since most subcircuits only have a few ports, a core could do the work of all the internal nodes and then interact with the others to exchange data. > > John
From: MooseFET on 11 Jun 2010 23:09
On Jun 12, 10:15 am, "JosephKK"<quiettechb...(a)yahoo.com> wrote: > On Fri, 11 Jun 2010 07:19:23 -0700 (PDT), MooseFET <kensm...(a)rahul.net> > wrote: > > > > >On Jun 11, 10:01 pm, John Larkin > ><jjlar...(a)highNOTlandTHIStechnologyPART.com> wrote: > >> On Fri, 11 Jun 2010 08:46:27 +0300, Paul Keinanen <keina...(a)sci.fi> > >> wrote: > > >> >On Thu, 10 Jun 2010 21:15:45 +0100, Martin Brown > >> ><|||newspam...(a)nezumi.demon.co.uk> wrote: > > >> >>Absolute hardware protection can be done on one CPU with segmented > >> >>architecture and a viciously defensive TLB. Even better if you use > >> >>Harvard architecture which for obvious reasons prevents data execution. > > >> >>If your multi-CPUs share a common flat address space as is currently in > >> >>vogue any protection your separate physical cores offer is largely > >> >>illusory. You would be better off with virtual CPUs and a tiny > >> >>hypervisor with slightly paranoid behaviour watching over them. > > >> >If you are sharing the same RAM chips between multiple cores, you are > >> >still going to end up with a single (physical) address space. > > >> >Execution prevention as well as read only data pages has been done by > >> >TLBs in mid 1970's minicomputers, so this is not really anything new. > > >> >Of course, in a multi core system each core must have their own TLBs > >> >and must have a trusted method to set up these TLBs. > > >> >Having separate TLBs for each core is not so bad, since even now, some > >> >architectures have the TaskId as part of the virtual address, thus, a > >> >full TLB reload is not required during task switching. > > >> Right. And if you dump virtual addressing, you don't need a gigantic > >> number of mapping registers. > > >If you keep some virtual addressing, you can make the code in each > >core > >not need to know where its address space actually lives in the grand > >scheme of things. Instead of an address map per task, you'd have > >one per core. This would only need to apply to the portion of memory > >that is shared. The private memory would not be virtual. > > >when you do a fork(), a big bunch of stuff needs to fly from one > >core's > >private space to the new one. > > This would result in fork used only by fairly lightweight manager > processes. There is also clone() which needs the two tasks to continue to share the same space. A fork() doesn't get the same data space and code space should never ever be written at this level so I think it works out for the best. |