From: Arne Vajhøj on
On 04-07-2010 19:19, Joshua Cranmer wrote:
> On 07/04/2010 05:16 PM, BGB / cr88192 wrote:
>> "Arne Vajh�j"<arne(a)vajhoej.dk> wrote in message
>> news:4c30c519$0$281$14726298(a)news.sunsite.dk...
>>> 10 years ago people considered it very cool that you could
>>> decompile.
>>>
>>> Today it is old news and only those that for some unusual reason
>>> really need the functionality are inteterested.
>>
>> I suspect part of the issue may be that most practical uses of
>> decompilation
>> are either questionable (decompiling code for which one doesn't
>> legally have
>> ownership) or nefarious (the previous, but with intent to either steal
>> said
>> code, or attempt to circumvent or exploit).
>
> Researchers have pretty much established that decompilation has
> substantial valid uses (supposedly, 20% of all source code just simply
> doesn't exist anymore); I myself had to decompile my own code due to an
> undiscovered feature in my version control system.

Some source code is certainly lost.

But my guess is that the Java percentage is lower, because
there are not so much 40 year old Java code.

And if the source code is lost then there is a decent
chance that that it is due to no need for modifying it
for a long time, which make it less likely to need to
be modified ever.

On the other hand sometimes oops'es happen. Like for
the OP, like for your VCS feature or for many other
reasons (I guess that most people have tried losing
source code at some point in time).

But I hope those circumstances still qualify for
"unusual reason".

Arne
From: Arne Vajhøj on
On 04-07-2010 19:12, Tom Anderson wrote:
> On Sun, 4 Jul 2010, Joshua Cranmer wrote:
>> In any case, interest in decompiling has significantly waned over the
>> past decade or so. A project or two on sourceforge claim to support
>> Java 5 decompilation, but I haven't tested it in depth.
>
> I wonder if the driver of the fall of decompilation is the rise of open
> source, and perhaps also open standards. If your landscape consists of,
> say, the JDK, JBoss, Spring, and Hibernate, then there are easier and
> more reliable ways to get hold of source code than decompilation.

If 50% of all Java code is open source then it should reduce
the need by 50%.

Unless one does as the OP and modify the open source code
and lose the modifications.

Arne
From: Joshua Cranmer on
On 07/04/2010 07:12 PM, Tom Anderson wrote:
> On Sun, 4 Jul 2010, Joshua Cranmer wrote:
>> In any case, interest in decompiling has significantly waned over the
>> past decade or so. A project or two on sourceforge claim to support
>> Java 5 decompilation, but I haven't tested it in depth.
>
> I wonder if the driver of the fall of decompilation is the rise of open
> source, and perhaps also open standards. If your landscape consists of,
> say, the JDK, JBoss, Spring, and Hibernate, then there are easier and
> more reliable ways to get hold of source code than decompilation.

I think a better explanation is that it was never really a widespread
avenue of research to begin with. Academically, it consists of
disassembly [1], control structure identification, and typing and
variable analysis. The middle part is pretty much a solved problem, and
I'm reasonably sure that the type/variable analysis is also pretty well
solved. Disassembly has, by and large, remained generally difficult for
native code, but great strides have been made in the last 20 years or so.

Since Java bytecode doesn't mash data and code together in the same
space, and given how much of the structure information is left in the
bytecode, it induced a massive spurt in decompilers because it was easy
to decompile. I'm guessing this spurt was more of a proof-of-concept
than a full-blown branching out. Since fully automated disassembly is
the most unsolved portion of decompiling, Java is academically
uninteresting to decompile; furthermore, you don't need to go the full
decompiler route to showcase improvements in disassembler. On top of all
of this, one of the major problem classes for reverse engineering in
general is dealing with malware, which mostly exists in native code and
not bytecode languages. You can see that there are a handful of
decompilers, defunct or otherwise, for other bytecodes (I know of two or
three for both Python and .NET); the only two languages which have a
large number of decompilers are Java (because it was easier) and C
(because it was harder).

In short, academically, Java decompilers are effectively solved, but
maintaining an up-to-date decompiler for Java (or any other bytecode
language) is not something many people wish to do. This has probably
been true since before Java was created: the lack of modern decompilers
is probably more attributable to an abnormal interest generated by Java
being the first major bytecode language in existence.

For an open source project to survive, it needs a critical threshold of
developers. The Java decompiler market is already crowded with several
"good enough" solutions, C decompilers are effectively beyond the start
of the art [2], and the interest for other markets is generally
insufficient to sustain even a small operation. Perhaps a tool which
could become the "gcc" of decompilers (able to go from many source
architectures to many destination languages) might achieve this
threshold. But unless a tool achieves substantially better results, it
is probably not going to be successful as a project.

[1] I'm glossing over a lot of stuff here which is actually quite
difficult for native code, but many of the problems don't exist in Java.

[2] In the sense of fully-automated decompilation. x86 disassembly is a
royal pain in the butt; while there exist tools that can do this well
(IDA!), I'm not aware of anything that could be used in open-source
software [3].

[3] On reflection, I suppose LLVM is utilizing its x86 assembly
architecture for disassembly (for debugging purposes).
--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
From: Lew on
Tom Anderson wrote:
>> I wonder if the driver of the fall of decompilation is the rise of open
>> source, and perhaps also open standards. If your landscape consists of,
>> say, the JDK, JBoss, Spring, and Hibernate, then there are easier and
>> more reliable ways to get hold of source code than decompilation.

Arne Vajhøj wrote:
> If 50% of all Java code is open source then it should reduce
> the need by 50%.
>
> Unless one does as the OP and modify the open source code
> and lose the modifications.

I wonder what those modifications to the OP's "large JAR files such as
Hibernate" were. It's hard to imagine that the local variations that
carelessly were not maintained in a source-control system should be so
extensive or so necessary that one could not abandon them altogether, or do as
I've had to do on occasion in my career and recapitulate them from spec.

If the modifications were so all-fired important then the modifiers were
criminally negligent not to preserve their source. I advise the OP to abandon
his dependency on them and go to the canonical versions of those "large JAR
files" (really, libraries of classes - confusing classes with files is all too
common a mistake).

--
Lew
From: Arved Sandstrom on
Lew wrote:
[ SNIP ]

> As for justification to rewrite parts of Hibernate, I am at best
> skeptical. Hibernate is a robust and rather complete set of libraries.
> I have to wonder what changes it required that would not have been
> better served by writing libraries or client code extrinsic to the
> Hibernate libraries themselves. OP? What raises my suspicions even
> further is that the rewrites were performed by people who didn't have
> the wisdom to protect their code against loss.

In my experience permitted customizations (through inheritance or
interface implementation), along with the occasional application of
known patches (that may be officially available only for later versions
than what you've got deployed) sometimes end up bundled in the original
JARs. Throw into the mix a reluctance to put third-party source under
version control (*), and possibly a hot-fix system with inadequate
tracking. And you can easily then end up with what I call "mystery" or
"magic" JARs...they're labelled as a well-known JAR but aren't in fact
quite the same thing. It's more bad configuration management than it is
strictly bad VC.

What happens with mystery JARS is that as soon as the original
developers are no longer actively involved, the prime directive is to
carefully preserve them. Knowledge of how to _make_ them is lost.
Experienced developers of later generations know when they have the
mystery JARs by looking at the filesizes, the secret of using these JARs
is passed down carefully, and nobody is willing to rip them out because
the knowledge of what they do differently is vanished into the mists of
time.

I agree with you that the best thing to do in this scenario is to go
back to the stock JARs and simply deal with what breaks. Frequently
_nothing_ breaks because all of your other stuff has moved on and the
custom stuff is not being used.

AHS

* I've seen this time and time again: a strong aversion to making
in-house mods to open source; hence no willingness to have third-party
open source in one's own version control. There is absolutely no
hesitation in using third party OSS though. And yet there are no qualms
about letting your developers hack your *own* code beyond recognition
(and effective documentation and design). It's like saying that the
developers of the third-party code knew what they were doing but your
own developers don't.

--
Without requirements or design, programming is the art of adding bugs to
an empty text file.
-- Louis Srygley