3 questions about arrays [Lisp]

Prev: Checking for declarations in macros
Next: scripting in CL -- please comment

From: Thomas A. Russ on 7 Dec 2009 13:03

Spiros Bousbouras <spibou(a)gmail.com> writes:

> On Thu, 03 Dec 2009 20:49:51 -0500
> Paul Wallich <pw(a)panix.com> wrote:
> >
> > Nuh-uh.
> >
> > 1) That's already the case in plenty of CL implementations -- unless a
> > lot of people have 60 bits of physical memory they're not telling anyone
> > about.
>
> It may be the case with plenty CL implementations but not all. For
> example on the computer I'm using right now most-positive-fixnum of
> SBCL is 536870911. This means that I can't create a bit vector with
> more than 536870911 members which is ridiculous since I have 1 GB ram.

Well, that's the point of having more than one implementation. You can
choose the implementation that suits your needs. So it looks like you
need to get a 64-bit lisp to use. That should get you around 60 bits of
fixnum space.

> > (Even on 32-bit operating systems, a single fixnum-denominated
> > array can already use up at least all of physical memory and quite
> > possibly more, depending on what's in the array.)
>
> Of course it can use all the memory if each element is large enough.
> The problem is that it might miss a lot of memory when every member of
> the array is small.

I suppose. But then you still end up with the situation that array
access will be, perhaps, unnecessarily slow. The issue is that you
would need to be prepared to do bignum arithmetic on all array accesses
unless there was specific information to the contrary.

And since the default in lisp is to not include type declarations, you
would end up with slow array access in the majority of cases. I think
that would be a bad trade-off on the part of the standard.

> > 2) Now array-size limit depends not on the lisp but on the OS and the
> > disks you have handy and how much someone allocated for /swap and what
> > other programs are running. Good luck with that.
>
> I'm not sure what you mean "not on the lisp". If you do your
> programming in Lisp then array size will obviously depend on the Lisp.
> A good Lisp implementation in turn will make optimal use of what the
> hardware and operating system make available. Actually this is true for
> every programming language ; luck has nothing to do with it.

Well, I think it would be rather odd to find a programming language
implementation that changed the sizes of things based on the particular
hardware it is running on. I mean, in C you wouldn't expect that "int"
would have its size depend on the amount of RAM installed.

> The problem here is that the standard does not allow a Lisp
> implementation to do this on the computer I'm using. It's a 32 bit
> computer so a fast fixnum should be at most 32 bits. I assume SBCL uses
> 2 bits for internal book-keeping.

Actually, I believe that it uses 3, since fixnums are 29 bits long.

> I don't know anything about the
> internal design of SBCL but if we assume that it was a good design
> decision to take these 2 bits away from the range of values then the
> conundrum becomes apparent : either use a fixnum greater than 32 bits
> which would have a negative impact on the speed of all integer
> arithmetic or arrive at a situation where the programmer cannot create
> a bit vector of size greater than 2**29 despite the 1 GB ram available.

Well, you as the programmer can always create larger bit vectors if you
really need them. It should be relatively easy to do.

(defstruct big-bit-vector
size
subvectors)

(defun create-big-bit-vector (size)
(multiple-value-bind (full remainder)
(floor size (1- most-positive-fixnum))
(let ((subvectors (make-array (if (zerop remainder) full (1+ full)))))
(loop for i from 0 below full
do (setf (aref subvectors i)
(make-array (1- most-positive-fixnum)
:element-type 'bit :initial-element 0)))
(unless (zerop remainder)
(setf (aref subvectors full)
(make-array remainder
:element-type 'bit :initial-element 0)))
(make-big-bit-vector :size size :subvectors subvectors))))

(defun big-bit-vector-aref (vector index)
(multiple-value-bind (full remainder)
(floor index (1- most-positive-fixnum))
(unless (zerop remainder) (incf full))
(aref (aref (big-bit-vector-subvectors vector) full) remainder)))

;; The setf method is left as an exercise for the reader.

.... etc.
Note that this is just a sketch, I didn't really do much testing. I do
suggest you set *print-array* to NIL before working with this code.

CL-USER> (setq *print-array* nil)
NIL
CL-USER> (defvar *bv* (create-big-bit-vector (* 2 most-positive-fixnum)))
#S(BIG-BIT-VECTOR :SIZE 1073741822 :SUBVECTORS #<Vector @ #x81612732>)
CL-USER> (length (big-bit-vector-subvectors *bv*))
3
CL-USER> (map 'list #'length (big-bit-vector-subvectors *bv*))
(536870910 536870910 2)

BTW, why do you need bit vectors with more than 500 million elements?

> It doesn't contradict my point if there are Lisp implementations ,
> possibly many , where a similar conundrum does not arise. If there is
> even one Lisp implementation where the conundrum arises then the
> standard made a poor choice in enforcing the fixnum restriction.

I think you have this backwards. As long as the standard allows an
implementation that doesn't have the problem, then the defect cannot be
in the standard. It must be in the implementation.

> > 3) Abandoning the fixnum limit, especially given adjustability, is
> > potentially really bad for performance even for arrays that aren't
> > bigger than that limit;
>
> Every time you pass from fixnums to bignums it is "potentially really
> bad for performance". This is something for the programmer to worry
> about.

Except that the programmer can't really do anything about the internal
implementation of AREF. So it really is the language implementer that
has to worry about such things. And I want my array references to be
fast, thank you very much.

> The standard could offer a declaration by which the programmer
> could inform the compiler that the total size of an array will be
> a fixnum. That's how it works with everything else in Lisp : the
> language allows flexibility and if you want to increase performance you
> restrict your flexibility through the appropriate declarations. I don't
> see why array access should be different.

I suppose. Although that would mean that by default array accesses
would be slow rather than fast.

> It's worth considering also that in the case of array access you may
> not even need a declaration at all. When you create the array or adjust
> its size the compiler knows if the (new) size is so large that it needs
> bignums or not. If not then it can automatically arrange for all access
> calculations to use fixnum arithmetic.

Um, the compiler can only know this if the array dimensions are
compile-time constants. If you dynamically create an array, then the
compiler doesn't have this information.

For example:

(defun create-my-new-array (size)
(make-array (list size size) :element-type ...))

The compile can't possibly know if size is fixnum or bignum, unless you
put in type declarations.

Similarly

(defun get-element (x y)
(aref *my-array* x y))

runs into similar problems.

> > if you want really bad performance, you can roll it yourself.
>
> I don't want really bad performance , I want arrays which can be larger
> than most-positive-fixnum. How can I roll this myself ?
>
> > 4) The problem has historically been with array-size-limit rather
> > smaller than the fixnum limit.
>
> Well things change. Memory and disks get bigger while the standard
> stays the same.

The standard just mandates things in terms of FIXNUM size, which does
not have an upper bound in the standard. Nothing stops lisp systems on
64-bit architectures from offering larger fixnums. And, surprise,
surprise, the 64-bit lisp systems do have larger fixnums. So just get a
64-bit lisp and all your problems will be solved. This is all well
within the range of design choices allowed by the standard.

--
Thomas A. Russ, USC/Information Sciences Institute

From: Thomas A. Russ on 7 Dec 2009 13:07

Spiros Bousbouras <spibou(a)gmail.com> writes:

> On 4 Dec 2009 15:54:00 GMT
> Tamas K Papp <tkpapp(a)gmail.com> wrote:

> > Eg if you are
> > writing some method which handles array indexes, you can always rely
> > on them being fixnums, regardless of the implementation. And then the
> > implementation is allowed to say what it considers a fixnum, and if
> > you don't like that, just pick another implementation.
>
> What if you want to use a 32-bit computer ?

Then don't complain about limitations imposed by the computer
architecture on your lisp system.

--
Thomas A. Russ, USC/Information Sciences Institute

From: Thomas A. Russ on 7 Dec 2009 13:22

Spiros Bousbouras <spibou(a)gmail.com> writes:

> On 04 Dec 2009 10:01:06 -0800
> tar(a)sevak.isi.edu (Thomas A. Russ) wrote:
> > Spiros Bousbouras <spibou(a)gmail.com> writes:
> >
> > > > > 2. Why the requirement that the dimensions of an array are fixnums ?
> > > >
> > > > Why not? This is not a big restriction, if an implementation wants
> > > > bigger arrays, it only has to provide bigger fixnums.
> > >
> > > fixnums are supposed to be efficient , that's the criterion for
> > > choosing the size of fixnums not the maximum size of arrays.
> >
> > Yes. And you want to make sure array access is efficient. Thus the
> > limitation on only allowing fixnums as indicies
>
> No , you want to make sure that the programmer has the freedom to
> choose the trade-off which is most appropriate for his application. If
> he is willing to sacrifice some speed in order to gain larger arrays he
> should have that choice.

He does have that choice.

He can always implement his own array storage system. See the sketch of
such an implementation in another reply.

But I would expect that an implementation that didn't use fixnum limits
would have generally slow array accesses, and so it would tend to be
avoided by must users. And thus such an implementation is unlikely to
exist for long in the lisp ecosystem.

> > > On the
> > > other hand the maximum size of arrays should be what the operating
> > > system can handle even if it means that part of the array would be in
> > > swap memory.
> >
> > Um, the question of fixnum sizes and the physical/virtual memory sizes
> > are completely orthogonal.
>
> I 99.99% agree. However in the part of my post you're quoting I'm not
> making a connection between size of fixnum and size of memory , I'm
> making a connection between size of arrays and size of memory.
>
> > As Pascal pointed out, most lisp systems can
> > generally index their entire address space (at a word level) using their
> > fixnums,
>
> I'm not sure what the "at a word level" part means but I believe I
> understand your overall gist.

As Duane Rettig also points out, the standard array which holds any lisp
object has storage sizes for machine-word size objects, since that is
typically what is used for FIXNUM and POINTERS. Since, on a 32-bit
architecture, there are 4 bytes per word, and addresses are typically
measured in bytes, you can cleverly array to address all of virtual
memory using fixnum indices for undifferentiated arrays.

The only exception would seem to be bit vectors, where -- if the
implementations supports it -- you can pack 8 elements into a single
byte. [Note that the standard doesn't require this.]

> So what about those lisp systems which
> can't index their entire address space using their fixnums ? From what
> Pillsy said CLisp probably can't.

Well, that certainly isn't the standard's fault. Choose a different
lisp implementation. That's why there are so many -- different design
points.

> What if you want a large bit vector ? Are you supposed to use something
> larger than a bit for the array element and then calculate bit
> addresses and do your own bit fiddling ?

Well, that's one option.
There are others.

> > Even if a lot of systems use 29 rather than 30 bits for
> > fixnums, this still gives you most of memory, and some copying GCs need
> > can only safely use half of the virtual memory in order to guarantee
> > enough space to copy into....
> >
> > Besides, it would be ludicrous to expect that the fixnum size of a lisp
> > implementation would depend on the actual amount of physical RAM
> > installed in a computer, rather than the processor (and software)
> > architecture.
>
> I agree. On the other hand it makes sense to expect that the maximum
> array size would depend on the actual amount of physical RAM installed
> in a computer so the standard is at fault for creating a connection
> between fixnum size and array size.

Why?
Why physical RAM and not virtual address space?

Typcially the fixnum size is tied to virtual address space instead, so
you get what you want.

And tying the array size to particular machine configurations makes it
really hard to distribute your program, since you can't count on the
maximum array size being anything in particular.

And you don't have the option of running more slowly with less RAM. You
wouldn't be able to run at all.

> > How do you address a multiple dimension array in RAM without
> > multiplying?
>
> If all the dimensions are powers of 2 then you can do it by ORing and
> shifting. Perhaps there are some nice tricks available in other cases
> too.

Well, yes, but they get harder when you have to do them with BIGNUMs.
They are no longer single machine instructions.

> > > > Moreover, there would be no point, because in general
> > > > the size of arrays slots is one word, and fixnum is all you need to
> > > > index all the words you can store in memory. (eg. with a 32-bit
> > > > processor, a fixnum of 30-bit is enough to index the whole memory.
> > >
> > > Is it ?
> >
> > Sure. Why do you think it isn't?
>
> I'm thinking it *may* not be because you can cover a larger range of
> values with 32 bits than with 30.

See comment about object sizes, word alignment and byte addressing.

It's only an issue for bit vectors or other specialized arrays that take
less than one word per element.

> > > 5 , so ? In fact if the point for the limitation in the size of arrays
> > > is to never have a bignum as an array subscript then each dimension
> > > could be up to (1+ most-positive-fixnum) so the definition of
> > > ARRAY-DIMENSION-LIMIT should be different in order to allow for such a
> > > possibility.
> >
> > Um, not really. Because the ARRAY-DIMENSION-LIMIT has to apply to all
> > arrays, including single-dimensional arrays that you want to displace to
> > your multi-dimensional array.
> >
> > I also suspect that ROW-MAJOR-AREF also plays into this limit.
>
> As I've said to my reply to Tamas "the standard could allow
> ARRAY-DIMENSION-LIMIT and ARRAY-TOTAL-SIZE-LIMIT to be at most (1+
> most-positive-fixnum) (which means that both should be allowed to be
> bignums). This would still ensure that ARRAY-ROW-MAJOR-INDEX can work
> with only fixnums while giving the programmer 1 extra subscript to work
> with". Issue "ARRAY-DIMENSION-LIMIT-IMPLICATIONS" acknowledges this. It
> says

I think this is getting to the ridiculous hair-splitting point. Are
your requirements really so constrained that getting ONE or TWO
additional array elements out of 500 million makes a difference?

--
Thomas A. Russ, USC/Information Sciences Institute

From: Tim Bradshaw on 7 Dec 2009 14:44

On 2009-12-07 18:22:07 +0000, tar(a)sevak.isi.edu (Thomas A. Russ) said:

> Well, that's one option.
> There are others.

That actually turns out to be the best option if you care about
performance for large bit-based structures as well even if you have an
implementation where the fixnum limit is not a problem, unless you have
a heroic compiler.

From: Kaz Kylheku on 7 Dec 2009 15:19

On 2009-12-06, Tim Bradshaw <tfb(a)cley.com> wrote:
> On 2009-12-05 04:44:33 +0000, Spiros Bousbouras <spibou(a)gmail.com> said:
>
>> So the issue of multiplication doesn't even arise with vectors. For
>> arrays which are not vectors it could be handled with appropriate
>> declarations. Again , see my reply to Paul Wallich.
>
> Yes, it does. Most (all?) modern systems are byte-addressed. So
> unless your array is not an array of bytes, the implementation will
> have to multiply or divide, typically by a power of two, to get the
> reference.

Ah, but on x86, the only architecture Spiros knows about, you can do a load
with two registers (base and offset), and a small power-of-two scale
by which the offset is multiplied, all in one instruction.

:)

First | Prev | Next | Last
Pages: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Prev: Checking for declarations in macros
Next: scripting in CL -- please comment