From: Lew on
Andreas Leitgeb wrote:
> On some deeper level, a relational DB seems to actually use the "separate
> arrays" approach, too. Otherwise I cannot explain the relatively low cost
> of adding another column to a table of 100 million entries already in it.

There's a big difference between a database with tens of thousands, maybe far
more manhours of theory, development, testing, user feedback, optimization
efforts, commercial competition and evolution behind it, and an ad-hoc use of
in-memory arrays by a solo programmer.

A database system is far, far more than a simple "separate arrays" approach.
There are B[+]-trees, caches, indexes, search algorithms, stored procedures,
etc., etc., etc. Your comment is like saying that "on some deeper level" a
steel-and-glass skyscraper is like the treehouse you built for your kid in the
back yard.

--
Lew
From: Tom McGlynn on
On Jul 26, 1:33 pm, Tom Anderson <t...(a)urchin.earth.li> wrote:
> On Mon, 26 Jul 2010, Tom McGlynn wrote:
> > On Jul 26, 8:01 am, Tom Anderson <t...(a)urchin.earth.li> wrote:
>
> >> But Joshua was talking about using instances of Color, where those
> >> instances are singletons (well, flyweights is probably the right term
> >> when there are several of them), exposed in static final fields on
> >> Color, and
>
> > While I agree with Tom's main point here,I am dubious about his
> > suggestion for what the static instances should be called in a class
> > that only creates a unique closed set of such instances.
>
> > I don't think flyweights is the right word.  For me flyweights are
> > classes where part of the state is externalized for some purpose. This
> > is orthogonal to the concept of singletons. E.g., suppose I were running
> > a simulation of galaxy mergers of two 100-million-star galaxies.  Stars
> > differ only in position, velocity and mass.  Rather than creating 200
> > million Star objects I might create a combination flyweight/singleton
> > Star where each method call includes an index that is used to find the
> > mutable state in a few external arrays.
>
> I am 90% sure that is absolutely not how 'flyweight' is defined in the
> Gang of Four book, from which its use in the programming vernacular
> derives. If you want to use a different definition, then that's fine, but
> you are of course wrong.
>
> > So is there a better word than flyweights for the extension of a
> > singleton to a set with cardinality > 1?
>
> Multipleton.
>
> More seriously, enumeration.
>


Thanks Tom for responding to my real question about the vocabularly
rather than the example. That was just intended to clarify a
distinction between a singleton and my idea of a flyweight.

Here's a bit of what the GOF has to say about flyweights. (Page 196
in my version)....


"A flyweight is a shared object that can be used in multiple contexts
simultaneously. The flyweight acts as an independent object in each
context--it's indistinguishable from an instance of the object that's
not shared.... The key concept here is the distinction between
intrinsic and extrinsic state. Intrinsic state is stored in the
flyweight. It consists of information that's independent of the
flyweight's context, thereby making it shareable. Extrinsic state
depends on and varies with the flyweights context and therefore can't
be shared. Client objects are responsible for passing extrinsic state
to the flyweight when it needs it."

That's reasonably close to what I had in mind. In my simple example
stars may share some common state (e.g., age), but the information
about the position and velocity is extrinsic and supplied when the
object is used. In a more realistic example I might have multiple
flyweights for different types of stars.

Getting back to my original concern, I don't think enumeration is a
good word for the concept either. Enumerations are often used for an
implementation of the basis set -- favored in Java by special syntax.
However the word enumeration strongly suggests a list. In general
the set of special values may have a non-list relationship (e.g., they
could form a hierarchy). I like the phrase 'basis set' I used above
but that suggests that other elements can be generated by combining
the elements of the basis so it's not really appropriate either.

Regards,
Tom McGlynn
From: Alan Gutierrez on
Lew wrote:
> Andreas Leitgeb wrote:
>> On some deeper level, a relational DB seems to actually use the "separate
>> arrays" approach, too. Otherwise I cannot explain the relatively low
>> cost
>> of adding another column to a table of 100 million entries already in it.
>
> There's a big difference between a database with tens of thousands,
> maybe far more manhours of theory, development, testing, user feedback,
> optimization efforts, commercial competition and evolution behind it,
> and an ad-hoc use of in-memory arrays by a solo programmer.
>
> A database system is far, far more than a simple "separate arrays"
> approach. There are B[+]-trees, caches, indexes, search algorithms,
> stored procedures, etc., etc., etc. Your comment is like saying that
> "on some deeper level" a steel-and-glass skyscraper is like the
> treehouse you built for your kid in the back yard.

In other words, nobody ever got fired for buying IBM.

--
Alan Gutierrez - alan(a)blogometer.com - http://twitter.com/bigeasy
From: Alan Gutierrez on
Andreas Leitgeb wrote:
> Lew <lew(a)lewscanon.com> wrote:
>> Lew wrote:
>>>> Except that that "mutable state in a few external arrays" still has to
>>>> maintain state for order 10^8 objects, coordinating position, velocity
>>>> (relative to ...?) and mass. So you really aren't reducing very much,
>>>> except simplicity in the programming model and protection against
>>>> error.
>> Andreas Leitgeb wrote:
>>> About 2.4GB are not much?
>>> (200 mill * (8bytes plain Object + 4bytes for some ref to each))
>> No, it isn't. Around USD 75 worth of RAM.
>
> Except, if the machine is already RAM-stuffed to its limits...
>
> Even if the machine wasn't yet fully RAM'ed, then buying more RAM
> *and* using the arrays-kludge(yes, that's it, afterall) would allow
> even larger galaxies to be simulated.

The RAM is cheaper than programmer time argument is useful to salt the
tail of the newbie that seeks to dive down every micro-optimization
rabbit hole that he come across on the path to the problems that truly
deserve such intense consideration. You have to admire the moxie of the
newbie that wants to catenate last name first as fast as possible, but
you explain to them that their are plenty of dragons to slay further
down the road.

It is not a good argument for someone who brings a problem that is truly
limited by available memory. Memory management is an appropriate
consideration for the problem. Memory management is the problem.

Memory procurement is the non-programmer solution. Throw money at it.
Scale up rather than scaling out, because we can scale up with cash, but
scaling out requires programmers who understand algorithms.

You're right that scaling up hits a foreseeable limit. I like to have
the limitations of my program be unforeseeable. That is, if I'm going to
read something into memory, say, every person in the world who would
loan money to me personally without asking questions, I'd like to know
that hitting the limits of the finite resource employed on a
contemporary computer system correlates to situation in reality that is
unimaginable.

Moore's Law does not excuse brute force.

Which is why I am similarly taken aback to hear RAM prices quoted for
something that has obvious solutions in plain old Java.

>>> I for myself might choose the perhaps less cpu-efficient (due to all
>>> the repeated indexing and for the defied locality) and also less simple
>>> programming model, if it allowed me to solve larger problems with the
>>> available RAM.
>> With that much data to manage, I'd go with the straightforward object
>> model and a database or the $75 worth of RAM chips. Or both.
>
> On some deeper level, a relational DB seems to actually use the "separate
> arrays" approach, too. Otherwise I cannot explain the relatively low cost
> of adding another column to a table of 100 million entries already in it.

On some deeper level, a relational database through an object relational
mapping layer will be paging information in and out of memory, on and
off of disk, as you need it. That is the feature you need to address
your memory problem.

Lately, I've been mucking about with `MappedByteBuffer`, so I imagine
for your (hypothetical) problem of modeling the Galaxy, you would model
it by keeping the primitives you describe in the `MappedByteBuffer` and
creating objects from them as needed. This is not `Flyweight` to my
mind, where you keep objects that map to finite set of values, these
values are assembled into a larger structure in an infinite number of
permutations. These atomic components exist within the larger structure,
but they are reused. Interned `String` is a flyweight to my mind.

I'm not sure what the pattern is for the short term objectification of a
record, but that is a lot of what Hibernate is about. Making objecty
that which is stringy, just long enough for you do your GRUD in the
security of your type-safe world.

>> And what about when the model changes, and you want to track star age,
>> brightness, color, classification, planets, name, temperature,
>> galactic quadrant, ...? With parallel arrays the complexity and risk
>> of bugs just goes up and up. With an object model, the overhead of
>> maintaining that model becomes less and less significant, but the
>> complexity holds roughly steady.
>
> 100% agree to these points.

You create an `Star` object that can read the information from a
`MappedByteBuffer` at a particular index, and you can simply change the
`read` and `write` method of the star.

You've reached down to the deeper level of the ORM+RDBMS stack and
extracted the only design pattern you need to address the problem of
reading the Universe into memory.

--
Alan Gutierrez - alan(a)blogometer.com - http://twitter.com/bigeasy
From: Martin Gregorie on
On Tue, 27 Jul 2010 12:34:27 -0500, Alan Gutierrez wrote:

>
> In other words, nobody ever got fired for buying IBM.
>
Regardless of what you might think of their business methods, and in the
past they didn't exactly smell of roses, their software quality control
and their hardware build quality are both hard to beat. I've used S/38
and AS/400 quite a bit and never found bugs in their system software or
lost work time due to hardware problems.

For elegant systems design ICL had them beat hands down, but although ICL
quality was OK by IT standards the IBM kit was more reliable.

IME anyway.


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |