From: Mike Schilling on


"Arne Vajh�j" <arne(a)vajhoej.dk> wrote in message
news:4c37bcf4$0$282$14726298(a)news.sunsite.dk...
> On 09-07-2010 02:07, Mike Schilling wrote:
>> "Arne Vajh�j" <arne(a)vajhoej.dk> wrote in message
>> news:4c366580$0$280$14726298(a)news.sunsite.dk...
>>> On 08-07-2010 17:15, Lew wrote:
>>>> From the JLS, which I strongly urge you to study:
>>>
>>> Unless the poster has a solid programming experience,
>>> then the JLS may not be the best to study.
>>>
>>> Sure it is by definition correct,
>>
>> mod typos and misstatements, of course.
>
> What counts: "what is written in the spec" or "what
> should have been written in the spec" ?

When the implementations all match the latter, then the latter counts.
Honestly, the JLS isn't the Bible, and we don't have to pretend that the sun
goes around the earth.

From: Mike Schilling on


"Eric Sosman" <esosman(a)ieee-dot-org.invalid> wrote in message
news:i173u6$vhi$1(a)news.eternal-september.org...

>
> In ten years, we'll all have jobs converting "legacy Java code"
> to Sumatra.

"It was a class which is associated with the giant array of Sumatra, a
construct for which the world is not yet prepared."

From: Lew on
Arne Vajhøj wrote:
>>>> Unless the poster has a solid programming experience,
>>>> then the JLS may not be the best to study.
>>>>
>>>> Sure it is by definition correct,

Mike Schilling wrote:
>>> mod typos and misstatements, of course.

Arne Vajhøj wrote:
>> What counts: "what is written in the spec" or "what
>> should have been written in the spec" ?

Mike Schilling wrote:
> When the implementations all match the latter, then the latter counts.
> Honestly, the JLS isn't the Bible, and we don't have to pretend that the
> sun goes around the earth.

Actually it's just as valid to say the Sun revolves around the Earth as the
other way, it's just that the math is so much easier heliocentrically.

There was a young lady from Bright
who traveled faster than light.
She set out one day
in a relative way
and returned the previous night.

--
Lew
From: ClassCastException on
On Fri, 09 Jul 2010 21:53:10 -0400, Arne Vajhøj wrote:

> On 09-07-2010 10:31, Patricia Shanahan wrote:
>> On 7/9/2010 5:15 AM, Eric Sosman wrote:
>>> On 7/8/2010 9:11 PM, Patricia Shanahan wrote:
>>>> Arne Vajhøj wrote:
>>>>> On 08-07-2010 17:35, Boris Punk wrote:
>>>>>> Integer.MAX_VALUE = 2147483647
>>>>>>
>>>>>> I might need more items than that. I probably won't, but it's nice
>>>>>> to have
>>>>>> extensibility.
>>>>>
>>>>> It is a lot of data.
>>>>>
>>>>> I think you should assume YAGNI.
>>>>
>>>> Historically, each memory size has gone through a sequence of stages:
>>>>
>>>> 1. Nobody will ever need more than X bytes.
>>>>
>>>> 2. Some people do need to run multiple jobs that need a total of more
>>>> than X bytes, but no one job could possibly need that much.
>>>>
>>>> 3. Some jobs do need more than X bytes, but no one data structure
>>>> could possibly need that much.
>>>>
>>>> 4. Some data structures do need more than X bytes.
>>>>
>>>> Any particular reason to believe 32 bit addressing will stick at
>>>> stage 3, and not follow the normal progression to stage 4?
>>>
>>> None. But Java's int isn't going to grow wider, nor will the type of
>>> an array's .length suddenly become non-int; too much code would break.
>>> When Java reaches the 31-bit wall, I doubt it will find any convenient
>>> door; Java's descendants may pass through, but I think Java will
>>> remain stuck on this side.
>>>
>>> In ten years, we'll all have jobs converting "legacy Java code" to
>>> Sumatra.
>>
>> I don't think the future for Java is anywhere near as bleak as you
>> paint it.
>>
>> The whole collections issue could be handled by creating a parallel
>> hierarchy based on java.util.long_collections (or something similar for
>> those who don't like separating words in package names). It would
>> replicate the class names in the java.util hierarchy, but with long
>> replacing int wherever necessary to remove the size limits. It could be
>> implemented, using arrays of arrays where necessary, without any JVM
>> changes.
>>
>> To migrate a program to the new collections one would first change the
>> import statements to pick up the new packages, and then review all int
>> declarations to see if they should be long. Many of the ones that need
>> changing would show up as errors.
>
> Collections is certainly solvable.
>
>> Arrays are a worse problem, requiring JVM changes. The size field
>> associated with an array would have to be long. There would also need
>> to be a new "field" longLength. Attempts to use arrayRef.length for an
>> array with more that Integer.MAX_VALUE elements would throw an
>> exception. arrayRef.length would continue to work for small arrays for
>> backwards compatibility.
>>
>> I suspect Eclipse would have "Source -> Long Structures" soon after the
>> first release supporting this, and long before most programs would need
>> to migrate.
>
> It is not a perfect solution.
>
> When calling a library some arrays would have to be marked as
> @SmallArray to indicate that you can not call with a big array, because
> the method calls length.

IMO this is barking up the wrong tree. Changing existing arrays to use
long lengths is going to break a ton of stuff and be very difficult to
pull off without a LOT of headaches.

So what should be done is to introduce a parallel *new* data structure
that is a long array and is treated as a different family of types to the
existing array types. You'd createe one by using a long constant in the
square brackets: new Foo[29954683548976828345678L]. If you wanted to you
could make a "long" array new Foo [3L] that would be a long array in
terms of type compatibility while not actually being long; so you could
mix arrays of shorter-than-2^31 and longer arrays in the same code if you
had to. The Arrays class would have long-array versions of its methods
and methods to convert short to long arrays.

The trickier part is that we'd also need a bunch of new type names; Foo[]
would have to remain "a short array of Foo" so we'd need to allow, say,
Foo[L] or some such notation to mean "a long array of Foo" when an array
type needed to be specified. (The Arrays method signatures then tend to
have type parameters and argument overloads for T[] and T[L], and of
course T[L] makeLongArray <T> (T[] shortArray).)

The supersized BigCollection classes could be made before these changes,
using hierarchical array structures under the hood, and later have their
innards retrofit to use long arrays.

As for numerics using arrays, if you really need fast numerics you might
want to contemplate simply going native, as long as you wrap whole
lengthy computations in JNI rather than each little individual step;
otherwise the overhead of going down and back up through JNI all the time
will ruin performance. The downside is you lose easy portability. At some
point Java needs a good numerics library that has many cross-platform
versions and takes advantage of SIMD, pipelining, and other CPU-
enhancement tricks on the appropriate architectures. Probably this means
a kind of added language and compilers that can make a DLL implementing
JNI methods plus a .class for you out of source code with Java method
declarations that contain expressions in a subset of FORTRAN, or
something of the sort, or even just a "native math enabled compiler" that
will turn Java methods in your source code into native methods that meet
certain criteria involving basically only doing arithmetic on primitives.

Actually that might be too limiting. Really you'd need some sort of
metacompiler or templating facility. Situations like that make me want to
use Lisp macros instead of just Java, so I can have higher-level source
code that still converts into primitive-arithmetic code after
macroexpansion and can then be eligible to become optimized native math
code.

Actually, what's *really* needed is for the JVM JIT to really take
advantage of the host CPU's numerics features. The problem is that by the
time the JIT is optimizing assembly any large-scale features of the
problem (e.g., that it's doing vector sums) that could inform such
optimizations have dissolved into a low-level representation where it
can't see the forest for the trees.

Nonetheless I've seen impressive performance from JITted code on -server
class machines, especially if the source was Clojure with macros used to
reduce high-level concepts at compile time into a tight arithmetic loop
or whatever. The results are comparable to a basic stab at coding the
arithmetic loop in assembly, e.g. 7-10ns per iteration for a few fpu
mults and adds with some compare-and-tests on a GHz CPU, the kind of
speed you'd get if the loop compiled down to just the obvious FPU
instructions with fairly efficient register use but no fancy SIMD/MMX/
etc. feature use or GPU use. Java gets the same speed with the FP loop
coded in Java in the obvious way; what macros get you is the ability to
have parameters in that loop that affect its structure in various ways
and if they're set at compile time the loop's as fast as if it were
simple. A good javac implementation might get you equivalent performance
if ifs that have compile-time-constant false expressions compile away to
just the else clause or nothing and ones with compile-time-constant true
expressions become just the then clause or nothing. With Lisp eval and
JIT, though, you get the same even if some parts of the loop aren't known
until runtime, which AFAICT is pretty much impossible in plain Java.

OK, rambling a bit. The upshot is that the JIT is the place that applies
the most leverage to optimizing numerics, since it could do so across all
JVM hosted languages and not just Java. The language might better support
this if, among other things, it supported long arrays. Floating-point
types larger than double and more efficient bignum types would also go a
long way. One problem with making a more efficient bignum type is that
there's no way in Java to check if an integer operation left the carry
bit set in the CPU, so you have to make do with 31- or 63-bit chunks in
your bignums and explicit tests of the high bit everywhere. The latter's
the bigger performance killer; if Java had an "if carry" you could use
immediately following an integer arithmetic expression, you could do
things like

newLow = low1 + low2;
if-carry {
newHigh = high1 + high2 + 1;
} else
newHigh = high1 + high2;
}

with the compiler arranging it that low1 + low2 is done and stored in
some register; then the carry bit is tested; etc.

Better yet,

newLow = low1 + low2;
newHigh = high1 + high2 + carry;

would be nice! This could compile and JIT into very efficient assembly on
architectures that provide explicit add-without-first-clearing-carry
instructions as well as ordinary adds, and similarly for other arithmetic
operations, providing all the building blocks to assemble bignums right
on the chip.

Of course, the most efficient bignum implementation will also depend on
the largest one-cycle-arithmetic word size the CPU supports. Maybe it's
best if bignums are special-case library classes instead. The existing
BigInteger and BigDecimal that are base-10 would be kept for
compatibility, and new BigInt and BigFloat classes added that are binary
and maximum-speed, with a group of architecture-selected native method
implementations provided with them and the one appropriate to the current
host hardware selected on the fly by the JIT the first time a bignum
native method got called.

Of course, then this new functionality should be made available to all
JNI users: the ability to supply several versions of the native code for
different architectures, labeled in some manner, for the JIT to select.
When code that calls the native method runs the first time, the most
appropriate one will be selected and the calling method will immediately
be JITted to an optimized assembly form that calls the specific,
appropriate native method for the CPU, so subsequent calls to that
calling method will not have to repeat the test-and-selection process (on
the presumption, valid for the foreseeable future, that the CPU
architecture will not change in the middle of a single program run -- but
if, in the future, program runs can be hibernated and then resumed on
changed hardware, all JIT caches will have to be invalidated on such
occasions anyway).

So, final conclusion:
* Add a parallel collection library that allows long-indexed collections
(size, indexing methods, etc. return long). Add RandomAccessIterator to
existing Iterator and ListIterator that allows indexed-sized forward and
backward jumps. Add a RandomAccessList interface that ArrayList
implements and that provides a RandomAccessIterator. Let the new
collection interfaces add an exception type parameter that can be
thrown by the methods, so specific implementations can be backed by disk
files and throw IOException or by databases and throw SQLException,
while the basic in-memory ones would have this type parameter set to
RuntimeException to indicate no additional checked exceptions get
thrown. RandomAccessFile would be retrofit to implement
RandomAccessList<byte>.
* Long arrays would be a good idea, but add them as new, parallel data
structures to the existing arrays so as not to add too many
compatibility headaches. The supersized collection implementations would
be internally reworked to exploit the new long arrays to make them more
efficient.
* JIT should be improved to optimize the use of long arrays.
* JIT should be improved to allow native method calls whose callers get
JITted to start calling an architecture-optimized version of the native
method selected at JIT time from among several provided alternatives
based on the actual host hardware at JIT time.
* JNI toolkit should provide a way to generate such alternative version
sets of native methods.
* JIT should invalidate code caches on any session-save-and-restore on any
future occasion that adds such a capability to JVMs. Just save the
session with the cache empty, or else save hardware info with session
and invalidate if CPU arch is changed on restore.
* BigInt and BigFloat should be added to standard library, with efficient
multi-architecture native method alternative-sets for the major
arithmetic operations and specifyable binary precision in bits. (The
actual precision will be the next larger multiple of N bits, with N
usually either 32 or 64. You ask for a minimum precision and you get
the most efficient precision that's no lower.) Possibly add BigFixed, a
fixed-point non-integer type. BigInteger and BigDecimal don't change
from their present base-10 forms, again to avoid compatibility problems.
* Possibly add support for arrays that store contiguous blocks of records
of dissimilar primitive types, also to aid numerics. E.g. an array of
float float float int, float float float int blocks that gets stored as
a contiguous memory block. This might be implemented by adding a
primitiverecord class type to go along with class, interface, and enum,
which has pass-by-value semantics and can only contain primitive,
primitive array, and primitiverecord instance members. A
primitiverecord type is not a reference type! And it cannot contain any
as instance members! Perhaps it shouldn't be allowed to have instance
methods or constructors, either; all fields public and zeroed at
instance creation. Arrays of a primitiverecord type store the records as
contiguous blocks. Disallow "char" and "byte" to discourage creating
imitation-legacy COBOLish code storing non-numeric data or rigid,
brittle binary file formats; allow float, double, int, long, and
possibly allow enums. Allow whatever static members.
* In the further future, possibly add a sophisticated higher-level
numerics library that uses the above.

The above changes, taken over time and in the order specified, would help
transition Java to 64-bit architectures and ever larger applications,
data sets, and machines, as well as gaining it some respectability as a
language for performing numeric calculations, overall making it better
suited for portably implementing the very large simulations that will be
increasingly important in the future in engineering, climate science, and
numerous other fields of endeavor.
From: ClassCastException on
On Fri, 09 Jul 2010 21:57:23 -0400, Arne Vajhøj wrote:

> On 09-07-2010 08:15, Eric Sosman wrote:
>> On 7/8/2010 9:11 PM, Patricia Shanahan wrote:
>>> Arne Vajhøj wrote:
>>>> On 08-07-2010 17:35, Boris Punk wrote:
>>>>> Integer.MAX_VALUE = 2147483647
>>>>>
>>>>> I might need more items than that. I probably won't, but it's nice
>>>>> to have
>>>>> extensibility.
>>>>
>>>> It is a lot of data.
>>>>
>>>> I think you should assume YAGNI.
>>>
>>>
>>> Historically, each memory size has gone through a sequence of stages:
>>>
>>> 1. Nobody will ever need more than X bytes.
>>>
>>> 2. Some people do need to run multiple jobs that need a total of more
>>> than X bytes, but no one job could possibly need that much.
>>>
>>> 3. Some jobs do need more than X bytes, but no one data structure
>>> could possibly need that much.
>>>
>>> 4. Some data structures do need more than X bytes.
>>>
>>> Any particular reason to believe 32 bit addressing will stick at stage
>>> 3, and not follow the normal progression to stage 4?
>>
>> None. But Java's int isn't going to grow wider, nor will the type of an
>> array's .length suddenly become non-int; too much code would break.
>> When Java reaches the 31-bit wall, I doubt it will find any convenient
>> door; Java's descendants may pass through, but I think Java will remain
>> stuck on this side.
>>
>> In ten years, we'll all have jobs converting "legacy Java code" to
>> Sumatra.
>
> If Java get 20 years as "it" and 20 years as "legacy", then that would
> actually be more than OK.
>
> Things evolve and sometimes it is better to start with a blank sheet of
> paper.
>
> 64 bit array indexes, functions as first class type, bigint and
> bigdecimal as language types etc..

Clojure has all of this already except 64 bit array indexes and runs on
the JVM.

Clojure doesn't even have arrays, though, unless you drop down to Java to
use Java's arrays. Clojure's collections are built on Java's arrays and
collections, so some limits might start kicking in when you got to 2^32
elements; I'm not sure how they behave if they get that big.

A *real* future-proof language would, of course, have bigint array
indexes. :-)