Prev: FAQ 6.13 What does it mean that regexes are greedy? How can I get around it?
Next: FAQ 5.8 How can I manipulate fixed-record-length files?
From: Peter J. Holzer on 1 Aug 2010 07:40 On 2010-07-31 23:40, Ilya Zakharevich <nospam-abuse(a)ilyaz.org> wrote: > On 2010-07-31, Peter J. Holzer <hjp-usenet2(a)hjp.at> wrote: >>> E.g., Generally speaking, on "well-designed" 32-bit architecture, one >>> would be able to address 8GB of memory (4GB of data, and 4GB of >>> code). > >> And we could split off the stack into a different segment, too and then >> address 12 GB of memory. > > Not with C. I used to think so, but it's not quite true. > The same subroutine should accept stack data pointers and > heap data pointers. Yes, but * automatic variables don't have to be on the "hardware stack". * only those variables which actually are accessed via pointer need to be in a pointer-accessible space. So a compiler could put stuff like * return addresses * non-array automatic variables which don't have their address taken * function arguments and return values * temporary variables into the stack segment and automatic variables which do have their address taken into the data segment. Among other things this means that return addresses are not accessible with a pointer and can't be overwritten by a buffer overflow. It also means that the size of the stack segment will almost always be very small (arrays will (almost) never be there) and that function call and return are more expensive (you need to maintain a second "stack"). So I'm not sure whether the advantages outweigh the disadvantages. But that's moot. I don't expect any new segmented architectures, and existing ones are either obsolete or used in "flat" mode. >> the code also much, much smaller than 4GB > > This is not what I experience with my system (which has less than 4GB > memory, though). The monsters like Mozilla take more text memory that > data memory (unless one loads a LOT of HTML into the browser). Mozilla is a monster, but it still uses only about 40 MB of code memory, which is about 1% of 4 GB: % perl -e 'while (<>) { my ($b, $e, $p) = /^(\w+)-(\w+) (\S+)/; $s = hex($e) - hex($b); $s{$p} += $s } for (keys %s) { printf "%s %9d\n", $_, $s{$_} / 1024 }' /proc/18752/maps |sort -n -k 2 ---p 64 rwxp 192 r--s 496 rw-s 768 r-xp 40656 r--p 98524 rw-p 279960 (this is firefox 3.6 running on 32bit linux for about a day) So if you moved that code into a different segment, you could use 4GB instead of 3.96GB for data. Doesn't seem like much of an improvement. (especially if you consider that on most 32-bit OSs the address space is limited to 2 or 3 GB anyway - lifting that limit would have a much larger effect). >> I see the smiley but I'd like to clarify for our young readers that >> 32bit Linux uses near pointers. On the 386, a far pointer would be 48 >> bits. > > ... but only if you round up to a multiple of 8bit; otherwise 46bit. > >>> [*] AFAIK, Solaris (tries to?) separate code from data AMAP. On >>> Solaris/i386, are they in different segments? > >> I don't think so. Sun likes Java. Java uses JIT compilers. JIT compilers >> and separated address spaces for code and data don't mesh well. > > Do not see how this would be related. A JIT compiler needs to generate executable code which is immediately executed by the same process. This is hard to do if the JIT compiler can't put the code into a place where it can be executed. > AFAI suspect (basing on the sparse info I have seen), the only way to > load code on solaris is to write an executable module on disk, and > dlopen() it. That would absolutely kill the performance of a JIT compiler. If Solaris/x86 uses separate code and data segments (which I doubt) then there is probably some way (maybe with mmap) to map a region of memory into both the data and the code segment. More likely they use a common address space and just use mprotect to prevent execution of data which isn't meant to be code. hp
From: Peter J. Holzer on 2 Aug 2010 05:16 On 2010-08-01 23:13, Ilya Zakharevich <nospam-abuse(a)ilyaz.org> wrote: > On 2010-08-01, Peter J. Holzer <hjp-usenet2(a)hjp.at> wrote: >> Mozilla is a monster, but it still uses only about 40 MB of code memory, >> which is about 1% of 4 GB: > > I suspect your system has 4K virtual address space granularity. Yes. > Mine has 64K. So that would increase the average internal fragmentation per code region from 2 kB to 32 kB (half the granularity - of course that depends on the size distribution but its good enough for a back of the envelope calculation). On Linux Firefox maps 132 code regions into memory (the GNOME people have a serious case of shared-libraryritis). So that's 132 * (32 kB - 2 kB) = 3960 kB or about 4 MB more. Noticable but probably less than the effects of other differences between OS/2 and Linux. > What is important is the ratio of data/text. No. What is important is the ratio between code and the usable address space. > In your case, it is less than 10. (With more memory, you run more of > OTHER monsters. ;-) Yes, but those other monsters get their own virtual address space, so they don't matter in this discussion. >> instead of 3.96GB for data. Doesn't seem like much of an improvement. >> (especially if you consider that on most 32-bit OSs the address space is >> limited to 2 or 3 GB anyway - lifting that limit would have a much >> larger effect). > > No, the effect would be the opposite: 40M/2G is LARGER than 40M/4G. ;-) No, you misunderstood. If you now have an address space of 2 GB for code+data, and you move the code to a different segment, you win 40MB for data. But if the OS is changed to give each process a 4 GB address space, then you win 2 GB, which is a lot more than 40 MB. hp
From: Peter J. Holzer on 3 Aug 2010 12:44
On 2010-08-02 21:19, Ilya Zakharevich <nospam-abuse(a)ilyaz.org> wrote: > On 2010-08-02, Peter J. Holzer <hjp-usenet2(a)hjp.at> wrote: >>> What is important is the ratio of data/text. >> >> No. What is important is the ratio between code and the usable address >> space. > > I see (below) that we discuss different scenarios. > >>> In your case, it is less than 10. (With more memory, you run more of >>> OTHER monsters. ;-) > >> Yes, but those other monsters get their own virtual address space, so >> they don't matter in this discussion. > > They do on OS/2: the DLL's-related memory is loaded into shared > address region. (This way one does not need any "extra" > per-process-context patching or redirection of DLL address accesses.) Sounds a bit like the pre-ELF shared library system in Linux. Of course that was designed when 16 MB was a lot of RAM and abandoned when 128 MB became normal for a server (but then I guess the same is true for OS/2). I'd still be surprised if anybody ran an application mix on OS/2 where the combined code size of all DLLs exceeds 1 GB. Heck, I'd be surprised if anybody did it on Linux (with code I really mean code - many systems put read-only data into the text segment of an executable, but you couldn't move that to a different address space, so it doesn't count here). >> No, you misunderstood. If you now have an address space of 2 GB for >> code+data, and you move the code to a different segment, you win 40MB >> for data. But if the OS is changed to give each process a 4 GB address >> space, then you win 2 GB, which is a lot more than 40 MB. > > I do not see how one would lift this limit (without a segmented > architecture ;-). If you can move code to a different segment you obviously have a segmented architecture. But even without ... > I expect that (at least) this would make context switch majorly > costlier... I don't see why the kernel should need a large address space in the same context as the running process. When both the size of physical RAM and the maximum VM of any process could realistically be expected to be much smaller than 4GB, a fixed split between user space and kernel space (traditionally 2GB + 2GB in Unix, but 3GB + 1GB in Linux) made some sense: Within a system call, the kernel could access the complete address space of the calling process and the complete RAM without fiddling with page tables. But when physical RAM exceeded the the kernel space that was no longer possible anyway, so there was no longer a reason to reserve a huge part of the address space of each process for the kernel. But of course making large changes for a factor of at most 2 doesn't make much sense in a world governed by Moore's law, and anybody who needed the space moved to 64 bit systems anyway. hp |