malloc/free algorithms [Embedded]

Prev: Why a file system required in an embedded Device
Next: PHY initialization

From: Boudewijn Dijkstra on 24 Feb 2010 11:34

Op Wed, 24 Feb 2010 17:31:05 +0100 schreef D Yuniskis
<not.going.to.be(a)seen.com>:
> Boudewijn Dijkstra wrote:
>> Op Wed, 24 Feb 2010 07:48:35 +0100 schreef D Yuniskis
>> <not.going.to.be(a)seen.com>:
>>> But, I wonder if dynamic changes to the algorithms used
>>> by malloc/free can be exploited to provide *application
>>> specific* deterministic behavior in certain contexts.
>>>
>>> E.g., implement malloc(3) more like:
>>>
>>> malloc(size_t size, int (* selector)(fragment_t *))
>> What are you implying here? The user seems to be required to supply a
>> pointer to a "fragment selector function". What is that supposed to do?
>
> Obviously: select the "best" fragment to fit the request!

I'm afraid cannot be as simple as that.

I understand that the function argument is a pointer to an array of
fragments. Is it sorted by size? Does it contain fragments that are too
small? What if some other thread releases a "perfect fit" fragment within
an acceptable time-frame?

Why not just specify a size and a time-out, and let the malloc algorithm
do the rest?

--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)

From: Didi on 24 Feb 2010 12:04

On Feb 24, 6:31 pm, D Yuniskis <not.going.to...(a)seen.com> wrote:
> Boudewijn Dijkstra wrote:
> > Op Wed, 24 Feb 2010 07:48:35 +0100 schreef D Yuniskis
> > <not.going.to...(a)seen.com>:
> >> Quasi-deterministic malloc/free implementations tend to
> >> be inefficient (i.e., wasteful of resources).
>
> > Non-deterministic malloc/free implementations tend to
> > be inefficient (i.e., unpredictable timing).
>
> That is well-known. My point is, those attempts at producing
> (quasi)deterministic algorithms usually work at the expense
> of wasting memory (c.a.e)
>
> >> But, I wonder if dynamic changes to the algorithms used
> >> by malloc/free can be exploited to provide *application
> >> specific* deterministic behavior in certain contexts.
>
> >> E.g., implement malloc(3) more like:
>
> >> malloc(size_t size, int (* selector)(fragment_t *))
>
> > What are you implying here? The user seems to be required to supply a
> > pointer to a "fragment selector function". What is that supposed to do?
>
> Obviously: select the "best" fragment to fit the request!
>

But Don, this is far from being the "best" approach, that stuff
has been analyzed in the early days of OSs and seems to have been
well forgotten today.
The best approach by a good margin is the worst-fit allocate
strategy, that is, you always allocate at the beginning of the
largest contiguous block you have. This results in the least
possible fragmentation over time of all known approaches.
This is how DPS does it for both system memory and disk
space - slightly modified, actually, when allocating more
space to a file it is first checked if it can
be allocated so that the file remains contiguous.
The overhead for discovering the largest contiguous block is
constant (and nothing too huge) so the time to allocate is
also more or less constant.
This has been so since day one of DPS - (that day was > 15 years
ago... how can that be, I could swear it was yesterday :-) ).
A few years back I made an object - DPS has its runtime object
handler and tree of objects - which I call "memory_pool", very
convenient to be used by other objects and accessed from shell
scripts. It also uses worst-fit allocate, of course :-) .
Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

From: D Yuniskis on 24 Feb 2010 15:20

Hi Boudewijn,

Boudewijn Dijkstra wrote:
> Op Wed, 24 Feb 2010 17:31:05 +0100 schreef D Yuniskis
> <not.going.to.be(a)seen.com>:
>> Boudewijn Dijkstra wrote:
>>> Op Wed, 24 Feb 2010 07:48:35 +0100 schreef D Yuniskis
>>> <not.going.to.be(a)seen.com>:
>>>> But, I wonder if dynamic changes to the algorithms used
>>>> by malloc/free can be exploited to provide *application
>>>> specific* deterministic behavior in certain contexts.
>>>>
>>>> E.g., implement malloc(3) more like:
>>>>
>>>> malloc(size_t size, int (* selector)(fragment_t *))
>>> What are you implying here? The user seems to be required to supply
>>> a pointer to a "fragment selector function". What is that supposed
>>> to do?
>>
>> Obviously: select the "best" fragment to fit the request!
>
> I'm afraid cannot be as simple as that.
>
> I understand that the function argument is a pointer to an array of
> fragments. Is it sorted by size? Does it contain fragments that are
> too small? What if some other thread releases a "perfect fit" fragment
> within an acceptable time-frame?

The classic contract with malloc(3) makes *no* guarantees other
than "you will get a block of memory that will completely hold
an object of 'size' -- or you will get nothing".

How it picks that fragment is entirely up to it's (malloc's)
implementor.

I contend that the user (programmer) often has knowledge of
how he *will* be using memory -- what sorts of allocations
he will request as well as when and in what pattern. By
hiding all of the allocation mechanisms inside malloc(3),
you do the developer no justice.

I am advocating augmenting the interface to malloc -- call
it nalloc if you like -- so that the user can influence the
decision making process based on *his* intimate knowledge of
*his* application (which, obviously, is more than any standard
library malloc's knowledge of that application can be!)

> Why not just specify a size and a time-out, and let the malloc algorithm
> do the rest?

Because malloc has *no* information about the application
itself. And, because the resource that it manages (memory)
is typically pervasive throughout the application.

From: D Yuniskis on 24 Feb 2010 15:21

Hi Dimiter,

Didi wrote:
> On Feb 24, 6:31 pm, D Yuniskis <not.going.to...(a)seen.com> wrote:
>> Boudewijn Dijkstra wrote:
>>> Op Wed, 24 Feb 2010 07:48:35 +0100 schreef D Yuniskis
>>> <not.going.to...(a)seen.com>:
>>>> Quasi-deterministic malloc/free implementations tend to
>>>> be inefficient (i.e., wasteful of resources).
>>> Non-deterministic malloc/free implementations tend to
>>> be inefficient (i.e., unpredictable timing).
>> That is well-known. My point is, those attempts at producing
>> (quasi)deterministic algorithms usually work at the expense
>> of wasting memory (c.a.e)
>>
>>>> But, I wonder if dynamic changes to the algorithms used
>>>> by malloc/free can be exploited to provide *application
>>>> specific* deterministic behavior in certain contexts.
>>>> E.g., implement malloc(3) more like:
>>>> malloc(size_t size, int (* selector)(fragment_t *))
>>> What are you implying here? The user seems to be required to supply a
>>> pointer to a "fragment selector function". What is that supposed to do?
>> Obviously: select the "best" fragment to fit the request!
>
> But Don, this is far from being the "best" approach, that stuff
> has been analyzed in the early days of OSs and seems to have been
> well forgotten today.
> The best approach by a good margin is the worst-fit allocate
> strategy, that is, you always allocate at the beginning of the
> largest contiguous block you have. This results in the least
> possible fragmentation over time of all known approaches.

Any "static" allocation algorithm has to make assumptions
about how memory will be used. OTOH, the programmer
*knows* (or, *should* know) how he *will* be using memory.
I am advocating letting the user bias malloc's allocation
strategy based on his (the programmer's) knowledge of
how he will be using that memory.

In one of the applications I am working on currently,
crt0 builds the "system heap". My "init" task then
allocates blocks of memory for use by the RTOS. These
contain static data structures, etc. Some are small,
some are large. THEY WILL NEVER BE FREED.

Interspersed among these allocations, there are often
numerous other allocations (which *will* be freed,
typically).

A naive malloc will treat all of these the same -- since
the only information that it has at its disposal is the
*size* of the request.

I, on the other hand, know which allocations will be
static. I can pass a selector() to malloc which will
bias its choice of how (and "where") it selects its
fragment.

For example, the static allocations might be, BY CONVENTION,
selected from the "back" of the heap (highest addresses).
So, the top end of the heap slowly disappears as these
(static) allocations are satisfied. Once they have all been
satisfied, the memory footprint looks like the heap was
just smaller than you originally thought (because the "back"
has been *permanently* consumed)

Meanwhile, other allocations can proceed (again, by convention)
to be satisfied from the *front* of the heap. Their dynamic
nature doesn't result in any fragments embedded among the
"static" allocations at the "back" of the heap.

As each task is created, I allocate a stack for that task off
the system heap. For some tasks, I allocate a heap at the
far end of that stack segment for the exclusive use of that
task (i.e., that task's memory requirements are constrained;
alternatively, that task's memory requirements are *guaranteed*!)

How each of these malloc's ideally behave would obviously be
related to the needs of their clients. But, their clients
would obviously be more knowledgeable about those needs
than the "standard library" would! :>

When I am working in paged memory environments, I manage
memory differently than when I have no MMU support. So,
*that* malloc behaves different than those used in fixed
memory environments, etc.

If the user can provide more information to malloc to
bias the selection algorithm(s), then you can also get
more deterministic behavior (in some classes of applications
and memory usage patterns) than you could with a "boilerplate"
malloc.

> This is how DPS does it for both system memory and disk
> space - slightly modified, actually, when allocating more
> space to a file it is first checked if it can
> be allocated so that the file remains contiguous.
> The overhead for discovering the largest contiguous block is
> constant (and nothing too huge) so the time to allocate is
> also more or less constant.
> This has been so since day one of DPS - (that day was > 15 years
> ago... how can that be, I could swear it was yesterday :-) ).
> A few years back I made an object - DPS has its runtime object
> handler and tree of objects - which I call "memory_pool", very
> convenient to be used by other objects and accessed from shell
> scripts. It also uses worst-fit allocate, of course :-) .

My point is, instead of relying on one algorithm and *hoping*
it satisfies all your requests "good enough", why not provide
hints to malloc that could make it more efficient? Best fit,
worst fit, first fit, last fit, smallest containing, etc.

What I hoped to glean from this post was pointers to research
as to the efficacy of various allocation algorithms wrt their
temporal behaviors.

From: Didi on 24 Feb 2010 19:59

On Feb 24, 10:21 pm, D Yuniskis <not.going.to...(a)seen.com> wrote:

Hi Don,
> ....
> Any "static" allocation algorithm has to make assumptions
> about how memory will be used.

I was referring to dynamic allocate, whether paged or not,
DPS uses static allocate on boot time for pieces which will
never be released (there is a setup.syst file, one can
install device drivers using it so they get stuck together
on a .l basis rather than being cluster aligned etc.).
Then DPS also has an area of "BAT translated" memory, one
needs it in MMU systems to place IRQ handlers in it so an
IRQ does not cause a page fault (so the DSI exception
can run without masking IRQ). Now this is also statically
allocated, default or via setup.syst.
But all that static allocate - which DPS does by simply
adding to the heap before calling it a boot and allocating
the entire so far taken heap in the CAT (Cluster Allocation Table).

> ... OTOH, the programmer
> *knows* (or, *should* know) how he *will* be using memory.
> I am advocating letting the user bias malloc's allocation
> strategy based on his (the programmer's) knowledge of
> how he will be using that memory.

I now get some of your point. Of course this is so. I did not get
what exactly you were after because I don't use C (I regard it
as being "write only" and generally a waste of my time) and
so when programming I have access to all of the allocate
facilities (I use VPA, Virtual Processor Assembly, which
I based on 68k assembly and took forward, meanwhile quite
a distance forward; I have yet to encounter someone matching my
efficiency using some other language :-) ).

> In one of the applications I am working on currently,
> crt0 builds the "system heap". My "init" task then
> allocates blocks of memory for use by the RTOS. These
> contain static data structures, etc. Some are small,
> some are large. THEY WILL NEVER BE FREED.
>
> Interspersed among these allocations, there are often
> numerous other allocations (which *will* be freed,
> typically).
>
> A naive malloc will treat all of these the same -- since
> the only information that it has at its disposal is the
> *size* of the request

Well if the language would allow to pass malloc not just
the requested size but also some more info it could choose
which system facility to use - if I get what you are after.
E.g. give me that much memory but guarantee it will be
contiguous - one needs this for buffers used by say DMA;
or allocate that much memory at this fixed address - one may
need this and well, may be denied it :-); etc. etc.

But I am not sure I get exactly what you are after since
I have long since decided I will not let HLL restrictions
stand in my way, life is too short :-).

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/922aa74a9695d654?dmode=source

First | Prev | Next | Last
Pages: 1 2 3
Prev: Why a file system required in an embedded Device
Next: PHY initialization