From: D Yuniskis on
Hi Paul,

Paul Keinanen wrote:
> On Fri, 21 May 2010 09:41:34 +0100, Martin Brown
> <|||newspam|||@nezumi.demon.co.uk> wrote:
>
>> For security I favour Harvard architecture.
>> Complete separation of code and data spaces prevents a lot of common
>> forms of attack against system integrity.
>
> The 80386 had read, write and execute bits in the segment descriptor
> register, why did't they use these to limit the access to different
> segments ?

Some OS's *do*. I rely on "write protection" to support
CoW in my DSM implementation.

Some newer processors cut corners on their "memory protection"
schemes (basically, just giving you write protection and some
"access protection"). Many also don't implement real virtual
memory (restartable instructions, etc.).

There is a *huge* difference in "feel" when working on "cheap"
systems vs. more "full fledged" platforms. You do things
very differently and lean on the hardware a lot more to give
you extra capabilities.

> In 386 and later models, the segment registers are still there
> (usually mapped from 0..4 GiB in an overlapped way), before going
> through the virtual to physical translation.
>
>> Flat linear address spaces
>> which are in vogue now may one day be seen as a very bad idea.
>
> Already in decent 70's virtual memory minicomputers, each virtual
> memory page had read, write and execute bits on a page by page basis.
>
> A large complex program running in a single huge address space can be
> a problem, if something is malfunctioning.

Yes. With some machines you can hack the (cheap) memory protection
units *dynamically* -- but, that requires a lot of care in how
you set up the linkage editor for the build.

It's *really* nice to be able to have the OS "ride herd" over
"tasks" (let's not debate what those are) and bring errant
tasks under control without jeopardizing the rest of the system.

It's also a nice way of doing things as you can move things out of
the kernel which makes them easier to maintain *and* more powerful.

> However, now that people have learned how to write multithreaded

Ha! ----------------------------^^^^^^^^^^^ :>

> programs, it should not be a huge leap for most programs to write
> truly multitasking programs, with each program running in a separate,
> protected address space. The communication can be handled with OS
> services or using shared memory areas to share that data that is meant
> to be shared and not everything.

I do almost everything via RPC/IPC. One big advantage is that it
makes it relatively easy to move to true multiprocessor systems
without relying on SMP/UMA. It gives you a lot more flexibility
in applying horsepower to a problem -- move tasks to different
processors to get true parallelism, etc.
From: D Yuniskis on
Hi Paul,

Paul Keinanen wrote:
> On Fri, 21 May 2010 06:38:01 -0700 (PDT), MooseFET
> <kensmith(a)rahul.net> wrote:
>
>> Many people don't like self modifications but seem not to
>> understand that the ability to write into a variable is just
>> as dangerous as a GOTO or self modification.
>
> One reason for the popularity of self modifying code in the old days
> was simply the lack of usable addressing modes.

I often use it as an expedient. I.e., rewrite the target of a
"call" instead of having to call *through* a pointer or via
a jump table, switch or condition tree.

It is especially useful in ISRs where every cycle saved can
actually represent a significant portion of the ISR itself.
E.g., my ISR's are little RAM-based routines that typically
invoke a set of (often unrelated) "handlers" each of which
may change from one invocation to the next. Being able to
rewrite the targets of each of these "handlers" makes dispatch
a lot slicker.

> Lack of index registers or bad indirect addressing support often
> forced to modify the effective address part of the instruction, in
> order to go through each element in a table.
From: krw on
On Sat, 22 May 2010 14:15:01 -0700, D Yuniskis <not.going.to.be(a)seen.com>
wrote:

>Hi Paul,
>
>Paul Keinanen wrote:
>> On Fri, 21 May 2010 06:29:43 -0700 (PDT), MooseFET
>> <kensmith(a)rahul.net> wrote:
>>
>>> On May 20, 8:10 pm, D Yuniskis <not.going.to...(a)seen.com> wrote:
>>>> MooseFET wrote:
>>>>> On May 20, 9:59 am, D Yuniskis <not.going.to...(a)seen.com> wrote:
>>>>> [...]
>>>>>> And I could never wrap my head around the i432. Just too "radical"
>>>>>> for my sensibilities (at the time), I guess... :<
>>>>> Each instruction of the 432 was basically like this:
>>>>> if (I am allowed to) Memory[AddressingTable[i]]
>>>>> = Memory[AddressingTable[j]]
>>>>> + Memory[AddressingTable[k]]
>>>>> There was no way to directly address the memory. The machine always
>>>>> had to take the time to look up the address where the object was and
>>>>> check to see if this instruction is allowed on this instruction.
>>>> Yes, they tried to make everything an "object". It was too
>>>> wacky thinking for the time. I suspect it would still be too
>>>> inefficient even with today['s technology.
>>
>> Current virtual memory machines have dedicated translation buffers,
>> these TLBs work just like caches. In normal operation, most virtual to
>> physical address translations are done in TLBs and only occasionally
>> there would be a miss and a new table partition needs to be loaded
>> from main memory.
>
>Yes, but even two level TLB's are relatively few in number
>and a single entry covers a significant amount of memory.
>I think the 432 dealt with much smaller "objects" than
>"physical pages".

IIRC, the '432 had a flat memory model, single-level store, much like IBM's FS
and AS/400. Every byte, no mater where it was (or what device), had a flat
address. So, an object was a byte (pretty sure a '432 byte = 8-bits). It's
been too long...

>> In the same way, the i432 addressing could work quite effectively
>> these days.
>>
>> With suitable caching, the TMS9900 style "register set in memory"
>> would also quite effective these days.
>
>I think it depends on how many register sets you end up using.
>I.e., one tends to think of register set in the context of
>a thread state. However, I could easily see compilers
>using this feature to implement small stack frames (I bet
>most blocks could benefit from a single "register context"!).
>If that's the case, you'll frequently have cache misses
>as you enter new blocks, etc. (of course, you will win
>*while* in that context... but I am not sure what the overall
>cost would be for those misses vs. having a register file
>*in* the processor.)
>
>Sounds like a good (safe) "senior project" :>
From: Paul Keinanen on
On Sat, 22 May 2010 07:17:13 -0700 (PDT), MooseFET
<kensmith(a)rahul.net> wrote:

>
>The 432 would still need an extra cycle or it would have to have a
>huge amount of extra hardware to make the translated version of the
>address table.
>
>The number of elements in the translation table would be far higher
>than the number of elements in the TLBs. Virtual memory works with
>pages much larger than just one floating point value. This would
>make the accesses slower because the speed of memory tends to fall
>as the sqrt() of the size.

It depends how you define an "object".

If every array element is a separate object with an own object
descriptor, the TLBs would be huge.

However, if some kind of AddressTable[object]+offset addressing is
supported, it would make sense to make single objects of each array
and structure and use the offset to access the individual element.

To save TLBs, even the function local variables could be considered a
structure and handled as a single object. However, the compiler could
use discretion to assign objects to each individual variable e.g.
during debugging or with high security requirements, alternative all
local elementary values could be put into same structure to speed up
execution.

Such mixed model would have quite good security and dynamic
fragmentation and garbage collection problems could be avoided, when
fragmentation can be avoided, by updating the address table and
copying data for a specific dynamic memory element.

Of course, if you want a truly object oriented computer, you would
need a few extra bits on each byte for the type tag :-).

From: MooseFET on
On May 22, 11:27 pm, Paul Keinanen <keina...(a)sci.fi> wrote:
> On Sat, 22 May 2010 07:17:13 -0700 (PDT), MooseFET
>
> <kensm...(a)rahul.net> wrote:
>
> >The 432 would still need an extra cycle or it would have to have a
> >huge amount of extra hardware to make the translated version of the
> >address table.
>
> >The number of elements in the translation table would be far higher
> >than the number of elements in the TLBs.  Virtual memory works with
> >pages much larger than just one floating point value.  This would
> >make the accesses slower because the speed of memory tends to fall
> >as the sqrt() of the size.
>
> It depends how you define an "object".
>
> If every array element is a separate object with an own object
> descriptor, the TLBs would be huge.

The 432 did this. Every object needed a table entry and a single
floating point number was an object.

> However, if some kind of AddressTable[object]+offset addressing is
> supported, it would make sense to make single objects of each array
> and structure and use the offset to access the individual element.

That is what the 286 was sort of doing. The 432 wouldn't hear of
it. If you make segments and put some things in different segments
then things can't be written off the end of and into a different
segment.

> To save TLBs, even the function local variables could be considered a
> structure and handled as a single object. However, the compiler could
> use discretion to assign objects to each individual variable e.g.
> during debugging or with high security requirements, alternative all
> local elementary values could be put into same structure to speed up
> execution.

As soon as you start letting the compiler decide to combine things you
are trusting software for the security. The paged based protection
also does this. There is no advantage to the segmentation other than
allowing things to be packed tighter than a page. The page based
wins out because the extra memory doesn't cost much and the simpler
hardware runs faster.

> Such mixed model would have quite good security and dynamic
> fragmentation and garbage collection problems could be avoided, when
> fragmentation can be avoided, by updating the address table and
> copying data for a specific dynamic memory element.

If you just allow translation of the pages, addresses by translating
the upper bits of the address, you gain almost all of the avoiding
of fragmentation. This still makes for simple hardware but means
you can change the logical position of memory sections.

If your version of malloc() uses a smart version of the first fit
and last fit method, you get very little fragmentation of memory.
The simple first fit does quite well. Adding the option that some
things get put as high in memory as possible, gives it a small
improvement if the things put at high memory are selected.