From: Lew on
Alan Gutierrez wrote:
> The RAM is cheaper than programmer time argument is useful to salt the
> tail of the newbie that seeks to dive down every micro-optimization
> rabbit hole that he come across on the path to the problems that truly
> deserve such intense consideration. You have to admire the moxie of the
> newbie that wants to catenate last name first as fast as possible, but
> you explain to them that their are plenty of dragons to slay further
> down the road.
>
> It is not a good argument for someone who brings a problem that is truly
> limited by available memory. Memory management is an appropriate
> consideration for the problem. Memory management is the problem.
>
> Memory procurement is the non-programmer solution. Throw money at it.
> Scale up rather than scaling out, because we can scale up with cash, but
> scaling out requires programmers who understand algorithms.
>
> You're right that scaling up hits a foreseeable limit. I like to have
> the limitations of my program be unforeseeable. That is, if I'm going to
> read something into memory, say, every person in the world who would
> loan money to me personally without asking questions, I'd like to know
> that hitting the limits of the finite resource employed on a
> contemporary computer system correlates to situation in reality that is
> unimaginable.
>
> Moore's Law does not excuse brute force.
>
> Which is why I am similarly taken aback to hear RAM prices quoted for
> something that has obvious solutions in plain old Java.

I'm pretty surprised to hear a clean object model described as "brute force",
but OK. The point of a spirited discussion is to expose all sides of an issue.

I'd go with clean design first, which to my mind an object model is, then play
around with non-expandable, hard-to-maintain, bug-prone parallel-array
solutions if the situation truly demanded it, but I just don't see that demand
in the scenario under discussion.

--
Lew
From: Alan Gutierrez on
Lew wrote:
> Alan Gutierrez wrote:
>> The RAM is cheaper than programmer time argument is useful to salt the
>> tail of the newbie that seeks to dive down every micro-optimization
>> rabbit hole that he come across on the path to the problems that truly
>> deserve such intense consideration. You have to admire the moxie of the
>> newbie that wants to catenate last name first as fast as possible, but
>> you explain to them that their are plenty of dragons to slay further
>> down the road.
>>
>> It is not a good argument for someone who brings a problem that is truly
>> limited by available memory. Memory management is an appropriate
>> consideration for the problem. Memory management is the problem.
>>
>> Memory procurement is the non-programmer solution. Throw money at it.
>> Scale up rather than scaling out, because we can scale up with cash, but
>> scaling out requires programmers who understand algorithms.
>>
>> You're right that scaling up hits a foreseeable limit. I like to have
>> the limitations of my program be unforeseeable. That is, if I'm going to
>> read something into memory, say, every person in the world who would
>> loan money to me personally without asking questions, I'd like to know
>> that hitting the limits of the finite resource employed on a
>> contemporary computer system correlates to situation in reality that is
>> unimaginable.
>>
>> Moore's Law does not excuse brute force.
>>
>> Which is why I am similarly taken aback to hear RAM prices quoted for
>> something that has obvious solutions in plain old Java.
>
> I'm pretty surprised to hear a clean object model described as "brute
> force", but OK. The point of a spirited discussion is to expose all
> sides of an issue.
>
> I'd go with clean design first, which to my mind an object model is,
> then play around with non-expandable, hard-to-maintain, bug-prone
> parallel-array solutions if the situation truly demanded it, but I just
> don't see that demand in the scenario under discussion.

The scenario under discussion is, I want to do something that will reach
the limits of system memory. Your solution is procure memory. My
solution is to use virtual memory.

Again, it seems to me that `MappedByteBuffer` and a bunch of little
facades to the contents of the `MappedByteBuffer` is a preferred
solution that respects memory usage. The design is as expandable,
easy-to-maintain and bug free as a great big array of objects, without
having to think much about memory management at all.

I don't know where "parallel" arrays come into play in the problem
described. I'm imagining that, if the records consist entirely of
numeric values, that you can treat them as fixed length records.

--
Alan Gutierrez - alan(a)blogometer.com - http://twitter.com/bigeasy
From: Alan Gutierrez on
Martin Gregorie wrote:
> On Tue, 27 Jul 2010 12:34:27 -0500, Alan Gutierrez wrote:
>
>> In other words, nobody ever got fired for buying IBM.
>>
> Regardless of what you might think of their business methods, and in the
> past they didn't exactly smell of roses, their software quality control
> and their hardware build quality are both hard to beat. I've used S/38
> and AS/400 quite a bit and never found bugs in their system software or
> lost work time due to hardware problems.
>
> For elegant systems design ICL had them beat hands down, but although ICL
> quality was OK by IT standards the IBM kit was more reliable.
>
> IME anyway.

I wasn't really picking on IBM.

I was addressing the fallacy of the appeal to authority. The argument
that a monolithic system contains institutionalized knowledge that is
superior to any other solution offered to a problem that the monolithic
system could conceivably address.

--
Alan Gutierrez - alan(a)blogometer.com - http://twitter.com/bigeasy
From: Lew on
Alan Gutierrez wrote:
> The scenario under discussion is, I want to do something that will reach
> the limits of system memory. Your solution is procure memory. My
> solution is to use virtual memory.
>
> Again, it seems to me that `MappedByteBuffer` and a bunch of little
> facades to the contents of the `MappedByteBuffer` is a preferred
> solution that respects memory usage. The design is as expandable,
> easy-to-maintain and bug free as a great big array of objects, without
> having to think much about memory management at all.

I like that idea.

> I don't know where "parallel" arrays come into play in the problem

Did you read this thread? Like, say, yesterday, when Tom McGlynn wrote:
>>> E.g., suppose I were running a simulation of galaxy mergers
>>> of two 100-million-star galaxies. Stars differ only in position,
>>> velocity and mass. Rather than creating 200 million Star objects
>>> I might create a combination flyweight/singleton Star where each
>>> method call includes an index that is used to find the mutable
>>> state in a few external arrays.

Alan Gutierrez wrote:
> described. I'm imagining that, if the records consist entirely of
> numeric values, that you can treat them as fixed length records.

--
Lew
From: Alan Gutierrez on
Lew wrote:
> Alan Gutierrez wrote:
>> The scenario under discussion is, I want to do something that will reach
>> the limits of system memory. Your solution is procure memory. My
>> solution is to use virtual memory.
>>
>> Again, it seems to me that `MappedByteBuffer` and a bunch of little
>> facades to the contents of the `MappedByteBuffer` is a preferred
>> solution that respects memory usage. The design is as expandable,
>> easy-to-maintain and bug free as a great big array of objects, without
>> having to think much about memory management at all.
>
> I like that idea.

Oh, yeah! Well another thing mister... You, I, uh, but... Wait...

Well, golly gee. Thanks.

I'd run off to write some code to illustrate my point.

package comp.lang.java.programmer;

import java.nio.ByteBuffer;

public interface ElementIO<T> {
public void write(ByteBuffer bytes, int index, T item);
public T read(ByteBuffer bytes, int index);
public int getRecordLength();
}

package comp.lang.java.programmer;

import java.nio.MappedByteBuffer;
import java.util.AbstractList;

public class BigList<T> extends AbstractList<T> {
private final ElementIO<T> io;

private final MappedByteBuffer bytes;

private int size;

public BigList(ElementIO<T> io, MappedByteBuffer bytes, int size) {
this.io = io;
this.bytes = bytes;
this.size = size;
}

// result is not `==` to value `set` so only use element type that
// defines `equals` (and `hashCode`).
@Override
public T get(int index) {
return io.read(bytes, index * io.getRecordLength());
}

@Override
public T set(int index, T item) {
if (index < 0 || index >= size) {
throw new IndexOutOfBoundsException();
}
T result = get(index);
io.write(bytes, index * io.getRecordLength(), item);
return result;
}

@Override
public void add(int index, T element) {
size++;
// probably off by one, but you get the idea...
for (int i = size - 2; i >= index; i--) {
set(index + 1, get(index));
}
set(index, element);
}

// and `remove` and the like, but of course only `get`, `set`
// and `add` to the very end can be counted on to be performant.

@Override
public int size() {
return size;
}
}

Create the above with however much `MappedByteBuffer` you need for your
Universe. Define `ElementIO` to read and write your `Star` type. Each
time you read a `Star` in `ElementIO` you do mint a new `Star` so that
is like Flyweight in some way, but seems like a little `Bridge` or
`Adaptor`.

If you shutdown soft and record the size, you can reopen the list. If
you change `Star` you need need to update `ElementIO` and rebuild your
list, but not probably not your code that references `Star` or the
`BigList`.

>> I don't know where "parallel" arrays come into play in the problem
>
> Did you read this thread? Like, say, yesterday, when Tom McGlynn wrote:
>>>> E.g., suppose I were running a simulation of galaxy mergers
>>>> of two 100-million-star galaxies. Stars differ only in position,
>>>> velocity and mass. Rather than creating 200 million Star objects
>>>> I might create a combination flyweight/singleton Star where each
>>>> method call includes an index that is used to find the mutable
>>>> state in a few external arrays.

I see it now. Looking for the word parallel in the long thread didn't
find it for me, but that's what is described here. That does sound a bit
fragile.

Anyway, it seems like there is a middle ground between ORM+RMDBS and
everything in memory. My hobby horse. (Rock, rock, rock.)

--
Alan Gutierrez - alan(a)blogometer.com - http://twitter.com/bigeasy