From: Kaz Kylheku on
On 2009-12-07, Tim Bradshaw <tfb(a)cley.com> wrote:
> On 2009-12-07 18:22:07 +0000, tar(a)sevak.isi.edu (Thomas A. Russ) said:
>
>> Well, that's one option.
>> There are others.
>
> That actually turns out to be the best option if you care about
> performance for large bit-based structures as well even if you have an
> implementation where the fixnum limit is not a problem, unless you have
> a heroic compiler.

I think that, barring some fantastically exceptional circumstances, the use of
a 500 megabyte bit array indicates a computer science flunkie between the
keyboard and chair.

For one thing, only if the data is random, with a ``white'' distribution
is such a representation space-efficient.

If there is any obvious sparseness (reams of 1's or 0's), you can
compress the data with some alternate data structure: radix tree
or whatever.

Even if compression takes more operations, it may be beneficial.
For one thing, you don't have a 500 megabyte monster mapping in your
address space. For another, you have much better cache behavior.

For instance consider random accesses to a huge bit vector.
In a compressed scheme, most of the all-zero or all-one areas can
be represented efficiently, by a small number of nodes in the
radix tree or whatever structure. These can fit into your
L1 cache, and if not that, then the L2.

In the naive representation, random read accesses are spread through a vast
area of memory. Each access is very likely to go to a different L1
cache line from that of any other recent access, and so on.
You will likely trash your L1, most of your L2 and the TLB.
Yuck, talk about dumb.

Anyway, on the theme about how to overcome the bit array + fixnum limitation:
without switching to any bit packing tricks, what you can also do is this:
Instead of using some least significant bits of the index as a bit offset into
a word which contains packed bits, what you can do is use bits on the /opposite/
end of the index to select which huge array to access. Then the least
significant 30 bits (or whatever is the fixnum mantissa size) are used to index
into that array. This way, you keep the bit arrays and don't have to write any
code to pack bits, or worry about choosing the best cell size, etc.
From: Rob Warnock on
Thomas A. Russ <tar(a)sevak.isi.edu> wrote:
+---------------
| BTW, just for reference, here is the fixnum sizes for a number of lisps
| that I happen to have easy access to.
....
| 29 CMUCL 19e
....
| 29 CMUCL 19c
|
| So, it looks like 29 bits is one of the most popular choices for 32-bit
| architectures.
+---------------

While it is true that (32-bit) CMUCL uses a 3-bit "lowtag" for object
encoding, it is also the case that CMUCL allocates *two* lowtag codepoints
(of the eight available) to fixnums. Thus fixnums in CMUCL are actually
30 bits, not 29. [And given Duane's comments about scaling fixnums to
machine addresses, I suspect the same is true for ACL.]

You later wrote:
+---------------
| I used (integer-length most-positive-fixnum) to generate the sizes.
+---------------

But that only counts the positive fixnums and ignores the negative ones.
To get the correct answer, try this instead:

cmu> (integer-length (- most-positive-fixnum most-negative-fixnum))

30
cmu>

Similar experiments in other CLs should bump all of your fixnum
sizes by one. ;-}


-Rob

-----
Rob Warnock <rpw3(a)rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

From: Duane Rettig on
On Dec 8, 4:04 am, r...(a)rpw3.org (Rob Warnock) wrote:
> Thomas A. Russ <t...(a)sevak.isi.edu> wrote:
> +---------------
> | BTW, just for reference, here is the fixnum sizes for a number of lisps
> | that I happen to have easy access to.
> ...
> |   29   CMUCL 19e
> ...
> |   29   CMUCL 19c
> |
> | So, it looks like 29 bits is one of the most popular choices for 32-bit
> | architectures.
> +---------------
>
> While it is true that (32-bit) CMUCL uses a 3-bit "lowtag" for object
> encoding, it is also the case that CMUCL allocates *two* lowtag codepoints
> (of the eight available) to fixnums. Thus fixnums in CMUCL are actually
> 30 bits, not 29. [And given Duane's comments about scaling fixnums to
> machine addresses, I suspect the same is true for ACL.]

It is, and likely a widely used concept. Tags 0 and 4 would be used
for evan and odd fixnums, thus doubling the range of fixnums over what
bits are usually available for addresses when dealing with the normal
3-bit tags (for 32-bit lisps). The same concept is true for 64-bit
lisps, with the tag width being 4 and the two tags used to get fixnums
are 0 and 8.

This relationship would tend to be constant for any power-of-two
natural-word-size, for any implementation which tends to allocate Lisp
objects on a two-word basis. At Franz, we call these "Allocation
Units", or AUs. An AU is always two natural words, and always starts
on a boundary aligned to whatever the tag-width dictates (e.g. LSbits
of #b000 for 32 bit lisps and #b0000 for 64-bit lisps). Allegro CL
allocates the cons with the smallest allocation possible, i.e. one
AU. But because a natural word, which is used to represent a Lisp
tagged-pointer, is half the size of an AU, it requires one extra bit
to encode one of these tagged pointers, which we call a LispVal. If
you do the math, you will see that if we do a port to a 128-bit
machine, the numbers will scale perfectly well; you already know what
our design will be.

> You later wrote:
>
> +---------------
> | I used (integer-length most-positive-fixnum) to generate the sizes.
> +---------------
>
> But that only counts the positive fixnums and ignores the negative ones.
> To get the correct answer, try this instead:
>
>     cmu> (integer-length (- most-positive-fixnum most-negative-fixnum))
>
>     30
>     cmu>
>
> Similar experiments in other CLs should bump all of your fixnum
> sizes by one.  ;-}

I think "29 bits plus sign" is a perfectly acceptable way to say it,
as long as you don't forget the "plus sign" part.

Duane

From: Spiros Bousbouras on
On Sun, 06 Dec 2009 09:01:52 +0100
pjb(a)informatimago.com (Pascal J. Bourguignon) wrote:
> Kaz Kylheku <kkylheku(a)gmail.com> writes:
>
> > On 2009-12-05, Spiros Bousbouras <spibou(a)gmail.com> wrote:
> >> On Sat, 5 Dec 2009 05:42:42 +0000 (UTC)
> >> Kaz Kylheku <kkylheku(a)gmail.com> wrote:
> >>>
> >>> The trade off is there. You are perfectly free to ask your Lisp for
> >>> an array biger than array-dimension-limit,
> >>
> >> And I would almost certainly get "no" for an answer :-D
> >>
> >>> and your implementation
> >>> is free to provide it to you.
> >>
> >> Not if it wants to conform to the standard.
> >
> > So you propose that if a program asks for an oversized array, bigger
> > than array-dimension-limit, and the implementation gives it to the program,
> > the implementation is not conforming to the standard?
>
> No, it would be non conforming on the part of the program.
> Check Section "1.5 Conformance", there are two classes of conformances.

I read section 1.5 .I still don't see why it is the responsibility of
the programme to not create arrays which are too big. For example the
page on ARRAY-TOTAL-SIZE-LIMIT says "The actual limit on the array
total size imposed by the implementation"... Note the "imposed by the
implementation" part. And note also that the MAKE-ARRAY page does not
say anywhere "a created array shall have size smaller than
ARRAY-TOTAL-SIZE-LIMIT".

> To make the program conformant, it would have to test against
> ARRAY-DIMENSION-LIMIT (and ARRAY-TOTAL-SIZE-LIMIT in the case of
> multi-dimensional arrays).
>
>
> ;; non-conformant program:
> (make-array size)
>
> ;; conformant program:
> (if (< size array-dimension-limit)
> (make-array size :initial-element 0)
> #+implementation-with-bigger-arrays (make-array size :initial-element 0)
> #-implementation-with-bigger-arrays (error "Cannot make an array big enough))

--
Who's your mama ?
From: Spiros Bousbouras on
On Sun, 6 Dec 2009 02:14:55 +0000 (UTC)
Kaz Kylheku <kkylheku(a)gmail.com> wrote:
> On 2009-12-05, Spiros Bousbouras <spibou(a)gmail.com> wrote:
> > On Sat, 5 Dec 2009 05:42:42 +0000 (UTC)
> > Kaz Kylheku <kkylheku(a)gmail.com> wrote:
> >> The trade off is there. You are perfectly free to ask your Lisp for
> >> an array biger than array-dimension-limit,
> >
> > And I would almost certainly get "no" for an answer :-D
> >
> >> and your implementation
> >> is free to provide it to you.
> >
> > Not if it wants to conform to the standard.
>
> So you propose that if a program asks for an oversized array, bigger
> than array-dimension-limit, and the implementation gives it to the program,
> the implementation is not conforming to the standard?

That's what I was saying.

> I am not able to find a citation which would support this belief.

I cannot give you a citation which says this exactly but I thought this
is the gist of what the standard is saying. I elaborate in my reply
to Pascal.

> It looks like the behavior of requesting a too-large array is simply
> not defined.

I hope you're right.

> There is no requirement for this situation to be detected and signaled
> with an error condition.
>
> Any response whatsoever, to undefined behavior, is conforming.

--
There are only two kinds of storage devices - those that have failed,
and those that are about to fail.
Jonathan Schwartz