From: Patricia Shanahan on
On 7/9/2010 5:15 AM, Eric Sosman wrote:
> On 7/8/2010 9:11 PM, Patricia Shanahan wrote:
>> Arne Vajh�j wrote:
>>> On 08-07-2010 17:35, Boris Punk wrote:
>>>> Integer.MAX_VALUE = 2147483647
>>>>
>>>> I might need more items than that. I probably won't, but it's nice to
>>>> have
>>>> extensibility.
>>>
>>> It is a lot of data.
>>>
>>> I think you should assume YAGNI.
>>
>>
>> Historically, each memory size has gone through a sequence of stages:
>>
>> 1. Nobody will ever need more than X bytes.
>>
>> 2. Some people do need to run multiple jobs that need a total of more
>> than X bytes, but no one job could possibly need that much.
>>
>> 3. Some jobs do need more than X bytes, but no one data structure could
>> possibly need that much.
>>
>> 4. Some data structures do need more than X bytes.
>>
>> Any particular reason to believe 32 bit addressing will stick at stage
>> 3, and not follow the normal progression to stage 4?
>
> None. But Java's int isn't going to grow wider, nor will the
> type of an array's .length suddenly become non-int; too much code
> would break. When Java reaches the 31-bit wall, I doubt it will
> find any convenient door; Java's descendants may pass through, but
> I think Java will remain stuck on this side.
>
> In ten years, we'll all have jobs converting "legacy Java code"
> to Sumatra.
>

I don't think the future for Java is anywhere near as bleak as you paint it.

The whole collections issue could be handled by creating a parallel
hierarchy based on java.util.long_collections (or something similar for
those who don't like separating words in package names). It would
replicate the class names in the java.util hierarchy, but with long
replacing int wherever necessary to remove the size limits. It could be
implemented, using arrays of arrays where necessary, without any JVM
changes.

To migrate a program to the new collections one would first change the
import statements to pick up the new packages, and then review all int
declarations to see if they should be long. Many of the ones that need
changing would show up as errors.

Arrays are a worse problem, requiring JVM changes. The size field
associated with an array would have to be long. There would also need to
be a new "field" longLength. Attempts to use arrayRef.length for an
array with more that Integer.MAX_VALUE elements would throw an
exception. arrayRef.length would continue to work for small arrays for
backwards compatibility.

I suspect Eclipse would have "Source -> Long Structures" soon after the
first release supporting this, and long before most programs would need
to migrate.

Patricia
From: Tom McGlynn on
On Jul 9, 10:31 am, Patricia Shanahan <p...(a)acm.org> wrote:
> I don't think the future for Java is anywhere near as bleak as you paint it.
>
....
> The whole collections issue could be handled by creating a parallel
> hierarchy based on java.util.long_collections (or something similar for
> those who don't like separating words in package names). It would
> replicate the class names in the java.util hierarchy, but with long
> replacing int wherever necessary to remove the size limits. It could be
> implemented, using arrays of arrays where necessary, without any JVM
> changes.

An alternative would be to update the existing classes: overload
methods
that take int arguments to allow longs and add new methods
(e.g., longSize()) to parallel functions that return int's. Are there
discriminators between these two approaches? I don't think either
would be especially difficult to implement assuming that we have
arrays
with long indices.

I agree that the issues at the Java API and language level are
probably trivial
compared to the changes needed in the JVM.


Regards,
Tom McGlynn
From: BGB / cr88192 on

"Eric Sosman" <esosman(a)ieee-dot-org.invalid> wrote in message
news:i173u6$vhi$1(a)news.eternal-september.org...
> On 7/8/2010 9:11 PM, Patricia Shanahan wrote:
>> Arne Vajh�j wrote:
>>> On 08-07-2010 17:35, Boris Punk wrote:
>>>> Integer.MAX_VALUE = 2147483647
>>>>
>>>> I might need more items than that. I probably won't, but it's nice to
>>>> have
>>>> extensibility.
>>>
>>> It is a lot of data.
>>>
>>> I think you should assume YAGNI.
>>
>>
>> Historically, each memory size has gone through a sequence of stages:
>>
>> 1. Nobody will ever need more than X bytes.
>>
>> 2. Some people do need to run multiple jobs that need a total of more
>> than X bytes, but no one job could possibly need that much.
>>
>> 3. Some jobs do need more than X bytes, but no one data structure could
>> possibly need that much.
>>
>> 4. Some data structures do need more than X bytes.
>>
>> Any particular reason to believe 32 bit addressing will stick at stage
>> 3, and not follow the normal progression to stage 4?
>
> None. But Java's int isn't going to grow wider, nor will the
> type of an array's .length suddenly become non-int; too much code
> would break. When Java reaches the 31-bit wall, I doubt it will
> find any convenient door; Java's descendants may pass through, but
> I think Java will remain stuck on this side.
>
> In ten years, we'll all have jobs converting "legacy Java code"
> to Sumatra.
>

more likely, they would simply expand the field to long, although they would
have to do something about the 'implicit downcasting being an error' case
(maybe softening it to 'implicit downcasting is a warning').

alternatively, a special-case implicit downcast could be supported (such as
via a modifier flag or special type), but at the cost that it would throw an
exception if the long wont fit into an int.


another issue though is that current JBC depends somewhat on it being an
int, and so expanding it to long would break the bytecode (unless they did
something hacky, like making long-based array indexing be a method call).

invokestatic "java/lang/Array/_aiload_l([IJ)I"

that or JBC having to add lots of new opcodes, but from what I can tell,
this is largely avoided (it being apparently preferable to hack over method
calls in place of expanding the core instruction set).


or whatever...


From: Wayne on
On 7/9/2010 12:31 AM, Patricia Shanahan wrote:
> Wayne wrote:
>> On 7/8/2010 5:35 PM, Boris Punk wrote:
>>> Integer.MAX_VALUE = 2147483647
>>>
>>> I might need more items than that. I probably won't, but it's nice to have
>>> extensibility.
>>
>> To me, it is unlikely your system will run well if this one data structure
>> consumes 2G of memory. (You didn't really state the application or system;
>> certainly there are exceptions to the rule.) I would suggest you use a
>> more flexible system, where you keep the data on storage (disk) and use
>> memory as a cache. Perhaps an ArrayList of soft references would work well.
>> It might even be possible in your particular case to run a daemon thread
>> that pre-fetches items into the cache.
>
> What's the difference between one data structure occupying over 2 GB and a set of
> data structures that use that much space?
>
> Certainly, given enough memory, Java can support total data structure sizes well over
> 2 GB without excessive paging.
>
> Patricia

A reduction in the number of page faults. There was an interesting article about
this topic in this month's Communications of the ACM, by Poul-Jenning Kamp, who
was one of the lead developers of the FreeBSD kernel. He applied his insight
to a web proxy replacement for Squid called Varnish, and was able to replace
12 Squid machines with 3 Varnish ones. It used a modified binary heap he called
a B-heap, which respected the page size of memory. The article was titled
"You're doing It Wrong". The message I came away with was, don't ignore the
fact that computers use paging when designing large data structures. I was
thinking that lesson might apply to the OP's situation.

--
Wayne
From: Boris Punk on

"Kevin McMurtrie" <mcmurtrie(a)pixelmemory.us> wrote in message
news:4c36aa98$0$22174$742ec2ed(a)news.sonic.net...
> In article <4c368bee$0$4837$9a6e19ea(a)unlimited.newshosting.com>,
> Wayne <nospan(a)all.invalid> wrote:
>
>> On 7/8/2010 5:35 PM, Boris Punk wrote:
>> > Integer.MAX_VALUE = 2147483647
>> >
>> > I might need more items than that. I probably won't, but it's nice to
>> > have
>> > extensibility.
>>
>> To me, it is unlikely your system will run well if this one data
>> structure
>> consumes 2G of memory. (You didn't really state the application or
>> system;
>> certainly there are exceptions to the rule.) I would suggest you use a
>> more flexible system, where you keep the data on storage (disk) and use
>> memory as a cache. Perhaps an ArrayList of soft references would work
>> well.
>> It might even be possible in your particular case to run a daemon thread
>> that pre-fetches items into the cache.
>>
>> Keep in mind a modern general-purpose computer will use virtual memory,
>> typically with 4kiB pages. Any data structure larger than that will
>> likely end up swapped to disk anyway. If you need the semantics of
>> a "BigList", try a custom class, a List of <pagesize> lists with
>> appropriate set and get methods to access the items.
>>
>> Questions like yours are missing context. If you want a good answer,
>> you need to post the problem you are really trying to solve, rather
>> than posting a question about how to implement the solution you've
>> already decided on.
>>
>> Hope this helps!
>
> 24GB of RAM is a standard server configuration this year. Even my
> laptop has 8GB and can only run 64 bit Java. A Java array indexing
> limit of 2147483647 is a growing problem, not a future problem.
>
> Multiplexing to smaller arrays through a class isn't a great solution.
> First, it's unlikely that an application needing a 2+ GB array can
> tolerate the performance hit of not using an array directly. Some
> critical JIT optimizations for memory caching and range checking won't
> work because of the multiplexing logic. Second, such a class could not
> be compatible with anything else because it can't support the Collection
> design. Oracle can't define "Collection64 extends Collection" and be
> done with it because such a design can not be compatible in Java.
> --
> I won't see Google Groups replies because I must filter them as spam

Is it not as simple as assigning int as 64 bit and long as 128 bit in newer
versions?