From: Kaz Kylheku on 7 Dec 2009 16:00 On 2009-12-07, Tim Bradshaw <tfb(a)cley.com> wrote: > On 2009-12-07 18:22:07 +0000, tar(a)sevak.isi.edu (Thomas A. Russ) said: > >> Well, that's one option. >> There are others. > > That actually turns out to be the best option if you care about > performance for large bit-based structures as well even if you have an > implementation where the fixnum limit is not a problem, unless you have > a heroic compiler. I think that, barring some fantastically exceptional circumstances, the use of a 500 megabyte bit array indicates a computer science flunkie between the keyboard and chair. For one thing, only if the data is random, with a ``white'' distribution is such a representation space-efficient. If there is any obvious sparseness (reams of 1's or 0's), you can compress the data with some alternate data structure: radix tree or whatever. Even if compression takes more operations, it may be beneficial. For one thing, you don't have a 500 megabyte monster mapping in your address space. For another, you have much better cache behavior. For instance consider random accesses to a huge bit vector. In a compressed scheme, most of the all-zero or all-one areas can be represented efficiently, by a small number of nodes in the radix tree or whatever structure. These can fit into your L1 cache, and if not that, then the L2. In the naive representation, random read accesses are spread through a vast area of memory. Each access is very likely to go to a different L1 cache line from that of any other recent access, and so on. You will likely trash your L1, most of your L2 and the TLB. Yuck, talk about dumb. Anyway, on the theme about how to overcome the bit array + fixnum limitation: without switching to any bit packing tricks, what you can also do is this: Instead of using some least significant bits of the index as a bit offset into a word which contains packed bits, what you can do is use bits on the /opposite/ end of the index to select which huge array to access. Then the least significant 30 bits (or whatever is the fixnum mantissa size) are used to index into that array. This way, you keep the bit arrays and don't have to write any code to pack bits, or worry about choosing the best cell size, etc.
From: Rob Warnock on 8 Dec 2009 07:04 Thomas A. Russ <tar(a)sevak.isi.edu> wrote: +--------------- | BTW, just for reference, here is the fixnum sizes for a number of lisps | that I happen to have easy access to. .... | 29 CMUCL 19e .... | 29 CMUCL 19c | | So, it looks like 29 bits is one of the most popular choices for 32-bit | architectures. +--------------- While it is true that (32-bit) CMUCL uses a 3-bit "lowtag" for object encoding, it is also the case that CMUCL allocates *two* lowtag codepoints (of the eight available) to fixnums. Thus fixnums in CMUCL are actually 30 bits, not 29. [And given Duane's comments about scaling fixnums to machine addresses, I suspect the same is true for ACL.] You later wrote: +--------------- | I used (integer-length most-positive-fixnum) to generate the sizes. +--------------- But that only counts the positive fixnums and ignores the negative ones. To get the correct answer, try this instead: cmu> (integer-length (- most-positive-fixnum most-negative-fixnum)) 30 cmu> Similar experiments in other CLs should bump all of your fixnum sizes by one. ;-} -Rob ----- Rob Warnock <rpw3(a)rpw3.org> 627 26th Avenue <URL:http://rpw3.org/> San Mateo, CA 94403 (650)572-2607
From: Duane Rettig on 8 Dec 2009 11:39 On Dec 8, 4:04 am, r...(a)rpw3.org (Rob Warnock) wrote: > Thomas A. Russ <t...(a)sevak.isi.edu> wrote: > +--------------- > | BTW, just for reference, here is the fixnum sizes for a number of lisps > | that I happen to have easy access to. > ... > | 29 CMUCL 19e > ... > | 29 CMUCL 19c > | > | So, it looks like 29 bits is one of the most popular choices for 32-bit > | architectures. > +--------------- > > While it is true that (32-bit) CMUCL uses a 3-bit "lowtag" for object > encoding, it is also the case that CMUCL allocates *two* lowtag codepoints > (of the eight available) to fixnums. Thus fixnums in CMUCL are actually > 30 bits, not 29. [And given Duane's comments about scaling fixnums to > machine addresses, I suspect the same is true for ACL.] It is, and likely a widely used concept. Tags 0 and 4 would be used for evan and odd fixnums, thus doubling the range of fixnums over what bits are usually available for addresses when dealing with the normal 3-bit tags (for 32-bit lisps). The same concept is true for 64-bit lisps, with the tag width being 4 and the two tags used to get fixnums are 0 and 8. This relationship would tend to be constant for any power-of-two natural-word-size, for any implementation which tends to allocate Lisp objects on a two-word basis. At Franz, we call these "Allocation Units", or AUs. An AU is always two natural words, and always starts on a boundary aligned to whatever the tag-width dictates (e.g. LSbits of #b000 for 32 bit lisps and #b0000 for 64-bit lisps). Allegro CL allocates the cons with the smallest allocation possible, i.e. one AU. But because a natural word, which is used to represent a Lisp tagged-pointer, is half the size of an AU, it requires one extra bit to encode one of these tagged pointers, which we call a LispVal. If you do the math, you will see that if we do a port to a 128-bit machine, the numbers will scale perfectly well; you already know what our design will be. > You later wrote: > > +--------------- > | I used (integer-length most-positive-fixnum) to generate the sizes. > +--------------- > > But that only counts the positive fixnums and ignores the negative ones. > To get the correct answer, try this instead: > > cmu> (integer-length (- most-positive-fixnum most-negative-fixnum)) > > 30 > cmu> > > Similar experiments in other CLs should bump all of your fixnum > sizes by one. ;-} I think "29 bits plus sign" is a perfectly acceptable way to say it, as long as you don't forget the "plus sign" part. Duane
From: Spiros Bousbouras on 8 Dec 2009 17:15 On Sun, 06 Dec 2009 09:01:52 +0100 pjb(a)informatimago.com (Pascal J. Bourguignon) wrote: > Kaz Kylheku <kkylheku(a)gmail.com> writes: > > > On 2009-12-05, Spiros Bousbouras <spibou(a)gmail.com> wrote: > >> On Sat, 5 Dec 2009 05:42:42 +0000 (UTC) > >> Kaz Kylheku <kkylheku(a)gmail.com> wrote: > >>> > >>> The trade off is there. You are perfectly free to ask your Lisp for > >>> an array biger than array-dimension-limit, > >> > >> And I would almost certainly get "no" for an answer :-D > >> > >>> and your implementation > >>> is free to provide it to you. > >> > >> Not if it wants to conform to the standard. > > > > So you propose that if a program asks for an oversized array, bigger > > than array-dimension-limit, and the implementation gives it to the program, > > the implementation is not conforming to the standard? > > No, it would be non conforming on the part of the program. > Check Section "1.5 Conformance", there are two classes of conformances. I read section 1.5 .I still don't see why it is the responsibility of the programme to not create arrays which are too big. For example the page on ARRAY-TOTAL-SIZE-LIMIT says "The actual limit on the array total size imposed by the implementation"... Note the "imposed by the implementation" part. And note also that the MAKE-ARRAY page does not say anywhere "a created array shall have size smaller than ARRAY-TOTAL-SIZE-LIMIT". > To make the program conformant, it would have to test against > ARRAY-DIMENSION-LIMIT (and ARRAY-TOTAL-SIZE-LIMIT in the case of > multi-dimensional arrays). > > > ;; non-conformant program: > (make-array size) > > ;; conformant program: > (if (< size array-dimension-limit) > (make-array size :initial-element 0) > #+implementation-with-bigger-arrays (make-array size :initial-element 0) > #-implementation-with-bigger-arrays (error "Cannot make an array big enough)) -- Who's your mama ?
From: Spiros Bousbouras on 8 Dec 2009 17:22
On Sun, 6 Dec 2009 02:14:55 +0000 (UTC) Kaz Kylheku <kkylheku(a)gmail.com> wrote: > On 2009-12-05, Spiros Bousbouras <spibou(a)gmail.com> wrote: > > On Sat, 5 Dec 2009 05:42:42 +0000 (UTC) > > Kaz Kylheku <kkylheku(a)gmail.com> wrote: > >> The trade off is there. You are perfectly free to ask your Lisp for > >> an array biger than array-dimension-limit, > > > > And I would almost certainly get "no" for an answer :-D > > > >> and your implementation > >> is free to provide it to you. > > > > Not if it wants to conform to the standard. > > So you propose that if a program asks for an oversized array, bigger > than array-dimension-limit, and the implementation gives it to the program, > the implementation is not conforming to the standard? That's what I was saying. > I am not able to find a citation which would support this belief. I cannot give you a citation which says this exactly but I thought this is the gist of what the standard is saying. I elaborate in my reply to Pascal. > It looks like the behavior of requesting a too-large array is simply > not defined. I hope you're right. > There is no requirement for this situation to be detected and signaled > with an error condition. > > Any response whatsoever, to undefined behavior, is conforming. -- There are only two kinds of storage devices - those that have failed, and those that are about to fail. Jonathan Schwartz |