From: sfuerst on 15 Mar 2010 13:05 On Mar 15, 7:36 am, Rainer Weikusat <rweiku...(a)mssgmbh.com> wrote: > sfuerst <svfue...(a)gmail.com> writes: > > [...] > > > If you have a wide size-range, then you should know that writing a > > general purpose allocator isn't a 200 line job. To get something > > better than the above allocators will probably require a few > > thousand lines or more. > > This should have been "implementing the malloc interface such that the > implementation performs reasonably well for the usual benchmark > cases isn't a 200 line job". Although it actually even might be: The > Linux-kernel uses a power-of-two-freelist allocator based on the > object caching allocator it also uses. And 'if it is good enough for > Linux, it will be good enough for $my_application' is usually a > sensible proposition. But it is really much better to not try to do > this to begin with. And for this case, a 'general purpose allocator' > which basically avoids external fragmentation and whose allocation and > deallocation operations are guaranteed to be fast and complete in > constant time can be implemented in less than 200 lines of code[*], > provided that scalability regarding concurrent allocations/ > deallocations from multiple threads is not a priority concern. The kernel allocator needs to live in kernel-space, which places extra constraints on it. You might want to make your 200 line userspace allocator - there is nothing preventing you from doing it. A simple power-of-two allocator is a nice place to start. However, I think you'll find that the result will be 50-100% slower than the standard glibc allocator that the op was originally using. Why the slowdown? The problem is that an allocation + free will take O( lg(n) ) memory references. The better allocators take O(1) references, nearly all the time. The difference is noticeable for something that executes in as few instructions as a memory allocator. Thus in order to approach the speed of the competition, you'll need to add something like fast look-aside lists. The next problem is that the op specifically mentioned a notoriously difficult case for memory allocators: the producer consumer pattern. To prevent the producer and consumer threads interfering with each other isn't easy. Even a single extra cache-line bounce can kill performance in this case. Then there is the problem of "memory blow- up", which a few allocators unfortunately have in this situation. Your simple power-of-two allocator isn't going to work nearly as well as you might hope in this difficult case. Benchmark, and see. :-) In short, these days the generic allocators are very very good. They use all sorts of tricks that the average programmer hasn't thought of in order to increase speed. The only way to beat them is in specialized situations, and even then you need to benchmark a bit to see if the extra code you added actually helped. The only common case I found where a generic allocator could easily be beaten is where you are allocating and freeing many many objects of the same size. In that case, it pays to have a bounded-size per-thread freelist that you check first for objects to use. (This is ignoring the situations where you can use alloca() or the new C99 variable-sized arrays to allocate things with per-function lifetimes, because there you don't need a new allocator, just use an intrinsic part of the C language.) Steven
From: Chris Friesen on 15 Mar 2010 13:04 On 03/15/2010 08:36 AM, Rainer Weikusat wrote: > For instance, another > approach is to get a 'large' block of memory from the kernel and use > that to satisfy allocation requests for 'small structures' from many > subsystems which all cache their allocated structures internally and > reuse them before allocating new structures. Especially in a program > which isn't supposed to ever stop, every situation which did occur in > the past will happen again in future and this implies that letting go > of memory which has been allocated for a specific purpose just means > 'running the risk that it won't be available next time'. In systems with disk swap, not letting go of memory is no guarantee that it will be available next time--it could be swapped out to disk, the swap space could be full, and there could be no available memory in the system. Unless you've mlock()'d the memory, it's not guaranteed to be available. I've worked on projects where we preallocated (and mlock()'d) all memory up-front for performance/reliability reasons. This can be useful for dedicated systems. However, I wouldn't want all the dozens of apps on my desktop to all refuse to return memory back to the system just because they might want it again at some point in the future (thus unnecessarily forcing swapping to disk or a memory upgrade). Unless there is a very specific reason to think that performance is critical, my personal view is that it's polite for a userspace app to return memory back to the underlying system whenever it is reasonably possible. Chris
From: Rainer Weikusat on 15 Mar 2010 16:02 Chris Friesen <cbf123(a)mail.usask.ca> writes: > On 03/15/2010 08:36 AM, Rainer Weikusat wrote: [...] >> Especially in a program which isn't supposed to ever stop, every >> situation which did occur in the past will happen again in future >> and this implies that letting go of memory which has been allocated >> for a specific purpose just means 'running the risk that it won't >> be available next time'. [...] > I've worked on projects where we preallocated (and mlock()'d) all memory > up-front for performance/reliability reasons. This can be useful for > dedicated systems. > > However, I wouldn't want all the dozens of apps on my desktop to all > refuse to return memory back to the system just because they might want > it again at some point in the future (thus unnecessarily forcing > swapping to disk or a memory upgrade). Unless there is a very specific > reason to think that performance is critical, my personal view is that > it's polite for a userspace app to return memory back to the underlying > system whenever it is reasonably possible. 'malloc' may or may not return memory to the system. Usually, it won't, except in fringe cases (eg 'large allocations' done via mmap). Memory allocations which originally happened by calling brk/ sbrk cannot easily be returned to the system, anyway, only if freeing them happens to release a chunk of memory just below the current break.
From: Rainer Weikusat on 15 Mar 2010 16:22 lacos(a)ludens.elte.hu (Ersek, Laszlo) writes: > In article <87d3z6r36g.fsf(a)fever.mssgmbh.com>, Rainer Weikusat <rweikusat(a)mssgmbh.com> writes: >> Eric Sosman <esosman(a)ieee-dot-org.invalid> writes: >>> On 3/14/2010 4:40 PM, Rainer Weikusat wrote: >>>> >>>> Realistically, memory leaks are especially common in code which lacks >>>> any kind of memory managment, IOW, just uses malloc/ free naively. > > This is two statements: > > (1) "memory leaks are especially common in code which lacks any kind of > memory managment", > > (2) using malloc/free naively is equivalent to no memory management at > all ("lacks any kind of"). > > A third statement might come as a consequence, > > (3) "memory leaks are especially common in code which [...] just uses > malloc/ free naively" > > I tend to accept (1) -- not free()ing consistently can be considered a > lack of memory management. I strongly disagree with (2) In plain English, this seems to mean "I (meaning you) strongly desire to use the terms "you" (meaning me) were using to describe a specific situation to describe a fundamentally different situation" ... [...] > An approach with ownership responsibilities laid out clearly and > strictly can still be called naive. That stuff can be drawn up in all > kinds of diagrams, commented on in the code etc. .... in particular, your idea of 'code which implements memory management' seems to be 'code which doesn't implement memory management' but associated documentation contains something on this topic in some form. That's not what I was writing about. [...] >> - Usually, people who are having problems they cannot deal >> with anymore manually tend to rather write tools to help >> them with being able to continue to act in problem-creating >> ways than to consider changing their habits. Valgrind is >> such a tool and it has excellent support for 'straight' >> malloc/free and basically no support for anything else. > > I explicitly said I am not endorsing programming *for* valgrind -- I > expected such a followup. I have seriously no idea how you managed to understand my statement above (automated tools for detecting memory leaks in code which 'just uses' malloc/free exist => memory leaks are a severe problem in such code) in this way. [...] > (On a side note, you chose to ignore my earlier question (or so it > seems) targeted at the portability of those 50-300 SLOC custom > allocators and whether they are based on malloc().) malloc is a libray routine on UNIX(*) which uses some facility the kernel provides for actually allocating address space (not memory, strictly speaking). This will usually be brk/ sbrk or mmap. Because library routines will likely call malloc behind ones back, using brk/ sbrk in a more sophisticated custom allocator would be asking for trouble. Which leaves mmap. While brk is no longer a 'standardized' UNIX(*) routine, it will be available on UNIX(*) because existing allocators (eg, dlmalloc) use it. mmap is a standardized UNIX(*)-routine.
From: Eric Sosman on 15 Mar 2010 16:37
On 3/15/2010 4:02 PM, Rainer Weikusat wrote: > > 'malloc' may or may not return memory to the system. Usually, it > won't, except in fringe cases (eg 'large allocations' done via > mmap). Memory allocations which originally happened by calling brk/ > sbrk cannot easily be returned to the system, anyway, only if freeing > them happens to release a chunk of memory just below the current > break. The one time I actually tried returning memory from an sbrk-funded allocator (I didn't want to do it, but management had a bee in their bonnet), the actual amount of returnable memory turned out to be pretty small. Long- and short-lived objects were pretty well mixed, with the result that there was usually a long-lived object not very far below the break, "pinning" everything below it. Switched to getting memory from mmap (or mmap-like things on non-Unix platforms), and things got better -- but not by an enormous amount. Again, since long- and short-lived objects lived cheek by jowl, there was an excellent chance that any given page contained a long-lived object somewhere, preventing the whole page from being returned. Eventually moved to a mixed strategy, where "small" requests were satisfied by subdividing pages that never got returned, while "large" objects resided on dedicated pages that could be released when/if the object was freed. (I'm given to understand that glibc uses a strategy something like this.) The trick was to choose a small/large threshold that worked well: Set it too high and you'll retain too much memory, too low and you'll waste time thrashing pages back and forth to the kernel. The proper threshold seemed to depend a lot on the way the product was used; experiments didn't turn up One Best Value for all usage patterns. -- Eric Sosman esosman(a)ieee-dot-org.invalid |