Prev: TPS
Next: Style, or lack thereof in Graham's ANSI CL
From: Bennu Strelitzia on 16 Dec 2009 09:31 I would appreciate any pointers you may have on a few issues which I will make separate postings abput. Sorry if they are obvious or otherwise-unsuitable questions, but I have spent some time reading trying to answer them and have not been able to do so. What I would wish for in garbage collection in Lisp: Is there a flavor of lisp that due to immutability of structures makes referential cycles impossible, eliminating a requirement for full-blown GC? i.e. reference counting could do the trick for an efficient minimal implementation if there were no referential cycles, I hate the real weight of every full-blown GC I have ever seen in a language in spite of all the "wisdom" that claims that it is not a problem. I love nothing better than a simple C program that can run very fast start to finish in a few K of space, and dislike how impossible most GC schemes seem to make this, in my experience. I also happen to really like the model that every mutation to a structure creates a new structure sharing applicable read-only parts of the old structure, making many things inherently more concurrency-safe, so it seems like the two things would go hand in hand, because if each mutation of a structure creates a new structure, then you have no opportunity for a reference loop because any structure can only refer to objects older than itself. I have read that Clojure likes this sort of immutability in collections, but unfortunately it is stuck with the Java garbage collector. Is this something another Lisp might address without the fatness of a JVM or other full-blown GC? Bennu
From: Rahul Jain on 16 Dec 2009 11:58 Bennu Strelitzia <bennu.strelitzia(a)gmail.com> writes: > Is there a flavor of lisp that due to immutability of structures makes > referential cycles impossible, eliminating a requirement for full-blown GC? Look at haskell. It doesn't have lisp syntax, tho, and the way it deals with the fact that the real world has state makes it a bit arcane for writing complex applications. > i.e. reference counting could do the trick for an efficient minimal > implementation if there were no referential cycles, I hate the real > weight of every full-blown GC I have ever seen in a language in spite of > all the "wisdom" that claims that it is not a problem. I love nothing > better than a simple C program that can run very fast start to finish in > a few K of space, and dislike how impossible most GC schemes seem to > make this, in my experience. Refcounting uses more space and takes more CPU time to do in a purely functional language. Garbage collection does not take time, it is collection of non-garbage. In a language without side-effects, you will generate a lot of garbage, so you want to optimize for the garbage case. This means not using refcounting. -- Rahul Jain rjain(a)nyct.net Professional Software Developer, Amateur Quantum Mechanicist
From: Duane Rettig on 16 Dec 2009 12:36 On Dec 16, 6:31 am, Bennu Strelitzia <bennu.strelit...(a)gmail.com> wrote: > I would appreciate any pointers you may have on a few issues which I > will make separate postings abput. Sorry if they are obvious or > otherwise-unsuitable questions, but I have spent some time reading > trying to answer them and have not been able to do so. Likely because those who use modern Lisps simply don't find a need to discuss it, because it is seldom a problem, and when it is it is something to discuss with their Lisp vendor rather than publicly. However, it is a perfectly fine question, so I'll attempt to answer it. > What I would wish for in garbage collection in Lisp: > > Is there a flavor of lisp that due to immutability of structures makes > referential cycles impossible, eliminating a requirement for full-blown GC? When you say "full-blown GC", it makes me think that you are likely unaware of recent (within the last 30 years) developments in garbage- collection. Depending on what you mean by "full-blown", generational garbage-collectors have been the norm for many years, now, and almost never have to work on the whole heap. The theory is that most data are either short-lived or long-lived, and so if the gc works mostly on the long-lived data, it can be most efficient. > i.e. reference counting could do the trick for an efficient minimal > implementation if there were no referential cycles, I hate the real > weight of every full-blown GC I have ever seen in a language in spite of > all the "wisdom" that claims that it is not a problem. I love nothing > better than a simple C program that can run very fast start to finish in > a few K of space, and dislike how impossible most GC schemes seem to > make this, in my experience. A C program can indeed run very fast with very little space, and it will do very little (this is not a pejorative statement - it may be that the task only requires a little bit of work). Lisps can do this as well, and most lisps provide the ability to move aside the gc activity in favor of all-out speed for the task. In practice, this does not occur so often, because Lisp users like to have their lisps available for long periods of time for many different tasks, and because the GCs on these Lisps really aren't a problem (because they tend to do only the work that's needed) they tend to keep their Lisps running for longer periods of time, and put up with the percent or two of gc time. But back to your C analogy: once you start getting into larger tasks, data driven and using malloc/free, you are suddenly up against the same wall as every other language: "how do I manage my heap space?" and C doesn't do it very well. First, since malloc/free are manually managed, there is a lot of potential for user error, and many C program bugs are seen due to use of already-deallocated memory (there are even whole systems like Purify geared to precisely this problem). Secondly, your version of malloc/free, or its current parameterization, will determine whether it runs slowly or wastes memory. If it set to use a "best fit" algorithm, it will have to take malloc time to search for the best fit for the request, and it will also take time, either at malloc time or at free time, to coalesce multiple free blocks. At the other end of the scale, a bucket-style malloc might have powers-of-two sized blocks given to the user, even if the request was just larger than the next power-of-two smaller. There is therefore a huge potential for space waste. > I also happen to really like the model that every mutation to a > structure creates a new structure sharing applicable read-only parts of > the old structure, making many things inherently more concurrency-safe, > so it seems like the two things would go hand in hand, because if each > mutation of a structure creates a new structure, then you have no > opportunity for a reference loop because any structure can only refer to > objects older than itself. STM is indeed a powerful tool, but it is a hammer that forces you to shape your nails like STM nails. If you can do that, then great; your job is done if you find an STM system (Clojure is just such a system, as you mention below). But if your problem does not fit as easily into the STM model, then you have to deal with that fact and use tools that solve your problem. (note that Clojure may also have these tools, but since I am not a Clojure advocate, I'll let someone who is put in any plugs for it). > I have read that Clojure likes this sort of immutability in collections, > but unfortunately it is stuck with the Java garbage collector. Is this > something another Lisp might address without the fatness of a JVM or > other full-blown GC? Many other Lisps take the following philosophy: 1. JVM is here to stay, and thus we must be able to work with it. We do this via inter-language communication techniques, including in some cases generating Java byte codes, but in other cases simply using RPC style communication with JVM. 2. JVM is not everything, and we can generally get much more efficient code generation and heap management by taking on the task ourselves. As for concurrency, most of the Lisp vendors are working on or have concurrency models, with or without STM, based on the needs of their user base. You should do some shopping, and find the one that suits your needs. You haven't stated any needs in your post here, so it will be up to you to decide whether a particular Lisp suits your needs or not. As for the fear you've expressed that the Lisp GCs would be "full-blown", implying that they are inefficient and/or will cause noticeable pauses in your program's operation, I say try some for yourself, and become pleasantly surprised at how little most Lispers even think about their Lisp's garbage-collector. Duane
From: Kaz Kylheku on 16 Dec 2009 13:04 On 2009-12-16, Bennu Strelitzia <bennu.strelitzia(a)gmail.com> wrote: > i.e. reference counting could do the trick for an efficient minimal > implementation if there were no referential cycles, I hate the real Reference counting is not efficient. Except in some highly contrived benchmarks that don't reflect real-world use, it is slower than garbage collection. Even simple mark-and-sweep can beat reference counting in realistic situations. Refcounting does not eliminate ``pauses'' from the program execution. If you drop the last reference on an object which is itself the last reference a large number of other objects, this triggers a cascade of refcounts dropping to zero, which takes time to complete. Refcounting can't be deferred, so even a short-lived program whose heap is cleaned up by the operating system has to waste cycles on refcounting, whereas the program with GC wins simply by never invoking GC over its lifetime. Every time an object's refcount is modified, an entire cache line becomes dirty and has to be flushed to main memory. When another processor in the system wants to access anything which lands in that cache line, having noticed that it's dirty, it has to fetch the latest version of that cache line from main memory. This means that two or more processors can't work with the same refcounted object in a way that is scalable, if at all they touch the refcounts---even if they are treating the objects as read-only! As the refcounts are bumped, you have performance-wasting cache ping-pong, and this can happen even with separate objects (false sharing) if they land wholly or partially into the same cache line. Refcounts have to be updated in a thread-safe way, using expensive atomic operations that can cause stalls of thousands of cycles. The consequences is that a refcounted program simply will not scale well to a multiprocessor system. Cycles can't be avoided in Lisp as easily as you might think because of lexical closures. Closure environments are mutable, because variables are assignable, and can easily generate cycles. ;; cycle created: lexical varible fun points to ;; the closure, and is captured by it at the same time: (setf fun (lambda () ...)) So what you have to do is banish all variable assignment. Without cycles, you can't express graph structures, so an entire branch of computer science goes out the window. Many useful algorithms call for graphs. You can't compile a regular expression to an NFA using standard algorithms without generating a graph. So with refcounting you need hacks, like the use of some uncounted backreferences, or maintaining the structure completely outside of the managed memory.
From: Tamas K Papp on 16 Dec 2009 13:21
On Wed, 16 Dec 2009 18:04:14 +0000, Kaz Kylheku wrote: > Without cycles, you can't express graph structures, so an entire branch > of computer science goes out the window. Many useful algorithms I disagree. Graphs can be represented, for example, in a matrix (a[i,j] is nonzero iff i->j, etc). You can use this to reconstruct a lookup table, basically emulating pointers. Which would, of course, would be a pretty pointless thing to do -- but the OP's questions have been pretty pointless so far, so maybe this would be a harmonious outcome :-) Cheers, Tamas |