From: Joshua Maurice on
On May 9, 9:30 am, "Mike Schilling" <mscottschill...(a)hotmail.com>
wrote:
> Joshua Maurice wrote:
> > It is technically impossible to do a true minimal recompile
> > algorithmically. Let's define it as "Let a build be a set of file
> > compilations. Let the minimum recompile be the minimum such set for
> > which the output class file are equivalent to the class files of a
> > full clean build." First, we'd have to prove such a minimum exists.
> > That's relatively straightforward. With that out of the way, I think I
> > could then prove that the problem is equivalent to the Halting
> > problem. If you define "equivalent" generously, I'm pretty sure this
> > is the case. If you define it as "same binary file content", then
> > perhaps not, though still possibly yes.
>
> Why would you think that?  The ways in which a change to A can affect B is
> finite and well-defined.

Let's suppose the source changed from "2" to "(1+1)". Using the strict
interpretation, the output class files would probably be equivalent
(ignoring debug information like original source code). Thus a rebuild
is technically not required.

Or, a slightly less trivial case: Suppose we have class A which has 2
public static functions A1 and A2 which have independent
implementations. Class C uses A1. Suppose someone comes along and
changes A2 in some meaningful way. Class C does not need to be
recompiled, but any sort of file level dependency analysis which is
correct would recompile Class C.

If you define it as binary file contents as equivalent output files,
then it's not equivalent to the Halting problem, I think. However, if
you define it as "The output class files display the same visible
behavior across all valid input", then I think the general case \is\
the Halting problem. You would have to prove for all allowed inputs to
the built program that the behavior is the same with the "minimal"
rebuild and with a full clean build, and that there is no smaller
rebuild which would display the same behavior.

If you instead simplify it to "If a file is touched, then all direct
dependencies and ghost dependencies should be recompiled, and repeat
until the leaves of the cascade are unchanged", then I think this is
quite doable. However, someone did mention "But what if you change
just a comment?". I can extend this too "but what if you just changed
a '2' to '1+1'?", which can probably be extended further. Hopefully
you see my point.
From: Joshua Maurice on
On May 9, 4:42 pm, EJP <esmond.not.p...(a)not.bigpond.com> wrote:
> On 10/05/2010 1:26 AM, Tom Anderson wrote:
>
> Doesn't/didn't 'jikes' do all this?

As far as I can tell, Jikes is no longer in development or supported.
Also, I have doubts as to its correctness. Also, my company would
probably feel better if the compiler in use was Sun' javac and not
Jikes.
From: Joshua Maurice on
On May 9, 8:26 am, Tom Anderson <t...(a)urchin.earth.li> wrote:
> On Sun, 9 May 2010, Joshua Maurice wrote:
> > I am still in the process of implementing, so I haven't really been able
> > to test,
>
> You're doing it wrong. Test first, test incrementally, build things in a
> way you can test as you go. Before you start work for real, do a
> higher-level test, a 'spike solution', to make sure that all your
> assumptions (like this one) are valid.

I am. It's just that implementing this class file parser, ghost
dependency, build system, etc., from scratch will take some time. I
almost have it working on a toy example, at which point I can throw
what little tests I have against it.
From: Mike Schilling on
EJP wrote:
> Doesn't/didn't 'jikes' do all this?

It did something of the sort (I don't know how thoroughly), but it's very
obsolete, never having supported the language features that were new in JDK
1.5. But having thought about this a bit, I think that the onlt feasible
way to support anything like minimal recompilation is by having the compiler
calculate and persist dependency information.


From: Mike Schilling on
Joshua Maurice wrote:
> On May 9, 9:30 am, "Mike Schilling" <mscottschill...(a)hotmail.com>
> wrote:
>> Joshua Maurice wrote:
>>> It is technically impossible to do a true minimal recompile
>>> algorithmically. Let's define it as "Let a build be a set of file
>>> compilations. Let the minimum recompile be the minimum such set for
>>> which the output class file are equivalent to the class files of a
>>> full clean build." First, we'd have to prove such a minimum exists.
>>> That's relatively straightforward. With that out of the way, I
>>> think I could then prove that the problem is equivalent to the
>>> Halting problem. If you define "equivalent" generously, I'm pretty
>>> sure this is the case. If you define it as "same binary file
>>> content", then perhaps not, though still possibly yes.
>>
>> Why would you think that? The ways in which a change to A can affect
>> B is finite and well-defined.
>
> Let's suppose the source changed from "2" to "(1+1)". Using the strict
> interpretation, the output class files would probably be equivalent
> (ignoring debug information like original source code). Thus a rebuild
> is technically not required.

If the change is from

public static final int I = 2;

to

public static final int I = 1 + 1;

I'd accept a system that recompiles all users of I as still "minimal". (Not
that it's a difficult optimization to make, since the rules for what's a
compile-time constant are straightforward.) If this change isn't to a
compile-time constant, it would have no effect on anything defined in a
different source file.

>
> Or, a slightly less trivial case: Suppose we have class A which has 2
> public static functions A1 and A2 which have independent
> implementations. Class C uses A1. Suppose someone comes along and
> changes A2 in some meaningful way.

That is, chaged its signature. Merely chaning its implementation would not
rquire C to be recompiled.
..
> Class C does not need to be
> recompiled, but any sort of file level dependency analysis which is
> correct would recompile Class C.

Right, strictly minimal recompilation wouild need to know not only that C
uses A but what parts of A it uses. Quite strightforward, though perhaps not
a good idea. (It's possbile that gathering too much dependency information,
and evaluationing it at too granular a level, slows the whole process down,
compared to allowing some recompilations that are strictly speaking
unnecessary.)

>
> If you define it as binary file contents as equivalent output files,
> then it's not equivalent to the Halting problem, I think. However, if
> you define it as "The output class files display the same visible
> behavior across all valid input", then I think the general case \is\
> the Halting problem.

If I understand you, you're distinguishing between:

1. A recompilation would result in the same class file that already
exists, and
2. A recompilation would result in a change to the class file, but the
result of running the two will always be the same

I agree that a system that tries to determine 2 is infeasible. And my point
about adding the comment isn't that the file itself would be recompiled.
That's acceptable, and might even be necessary to get the line numbers seen
by a debugger to be correct. My point is that all files which refer to it
are recompiled, even though nothing they use (the class definitions, the
field and method definitions, and the values of the compile-time constants)
has changed. The same is true when e.g. method implementations change or an
init block is added; I was using "add a comment" as the limiting case of a
change no other file would care about.