JavaScript code mangler / scrambler / ... khm, more than obfuscator... :) [JavaScript]

Prev: Code Guidelines
Next: JavaScript code mangler / scrambler / ... khm, more than obfuscator... :)

From: Scott Sauyet on 28 Dec 2009 10:48

On Dec 27, 2:00 pm, Andrew <anz...(a)gmail.com> wrote:
> On Dec 23, 7:45 pm, Scott Sauyet <scott.sau...(a)gmail.com> wrote:
>> [ ... ]
> I am beginning to think that the example was not such a good idea...
> yes it's flawed. It is an EXAMPLE, not proof of concept. You can do
> better if you wish. :)

But the point is that this was an attempt at a simple example. If we
can't get a simple case to work, how likely is it that the more
complicated ones will be workable? Let's take it simpler still. How
about if our mangler had a rule to turn this:

function f(x) {
return x + 1;
}

into this:

function f_prime(x) {
var y = x;
return y + 1;
}

Except noting possible boundary issues, it's pretty clear that we've
maintained the correspondence between f and f_prime. But now what if
we tried it on this:

function g(x) {
eval("var y = y ? y + x : x")
return x + 1
}

to get this:

function g_prime(x) {
var y = x;
eval("var y = y ? y + x : x")
return y + 1
}

Then g(20) = 21, but g_prime(20) = 41.

And of course we could rig it so that the "y" is not easily visible
inside the eval statement. The point is that this really is
difficult.

> - of course it is possible to do it, at least to some extent (still
> not buying the Rice's theorem having to do anything with this - as
> Lasse pointed out, compilers do that all the time)

I'm certainly not buying Thomas Lahn's assertion that Rice's theorm
implies that

| [Y]ou cannot devise a general function that computes another
function
| that computes the same result for the same input (a trivial
property)
| as a third function.

I think the Y-Combinator [1] demonstrates that this is false.

But I also think Lasse Nielsen is overstating when he says that this
is so similar to what a compiler does. A compiler is doing something
akin to transcription. I think your mangler would have to do
something closer to translation. While I have read Hofstadter [2] and
understand something about how difficult it might be to draw the line
between the two, I think it makes a difference in how easy it would be
to create an algorithm for one versus the other.

>> In other words, while I think this is an interesting abstract
>> discussion, I don't think there are any real chances of building a
>> tool that does what you want.
>
> Well, that might be true. But not because of all the things others
> have written in these posts, but because JS is such a difficult
> language for this task. I am no way an expert in JS, but from what I
> have seen (closures come to mind) there are some concepts that make it
> really hard to do this kind of thing. Not impossible, but difficult.

Actually, I think closures would be one of the best tools to use to
accomplish some obfuscation.

function f_prime2 = (function(test) {
var y = test ? 0 : 1;
return function(x) {
return x + y;
}
})(false);

This is already fairly obfuscated from the original function, and I
can clearly imagine additional layers. But I don't see that as likely
to hide much from even a mildly persistent snoop.

I think the worst problem would come with "eval" and its cousins.

> Not sure it would be a bad business idea though - seeing how
> obfuscators sell for $60-$250 when they are not even remotely
> successful. If someone is capable of doing it, of course. I would have
> bought a copy right now, but of course, it would have to be so good
> that a capable programmer would have difficulties deducting what the
> code does.

I think the best assessment in this thread was in Lasse Nielsen's
first response where he discussed the factors to be considered in
approaching this type of tool and concluding:

| I personally think any informed evaluation of that kind would
| end in rejecting the idea.

It's not that something couldn't be built. Clearly someone could
build a tool that would accomplish much of what you suggest. But as
the inputs got more complicated, it's more and more likely that the
tool would actually break working code, and of course it would be by
design much more difficult to debug the issues that arose.

-- Scott

[1] http://en.wikipedia.org/wiki/Fixed_point_combinator
[2] http://en.wikipedia.org/wiki/Metamagical_Themas

From: Thomas 'PointedEars' Lahn on 28 Dec 2009 18:49

[Cancel & Supersedes]

Scott Sauyet wrote:

> Andrew wrote:
>> - of course it is possible to do it, at least to some extent (still
>> not buying the Rice's theorem having to do anything with this - as
>> Lasse pointed out, compilers do that all the time)
>
> I'm certainly not buying Thomas Lahn's assertion that Rice's theorm
> implies that
>
> | [Y]ou cannot devise a general function that computes another
> | function that computes the same result for the same input
> | (a trivial property) as a third function.
>
> I think the Y-Combinator [1] demonstrates that this is false.

I do not think the Y-Combinator applies here. Especially, we can read
where "Y-combinator" redirects to:

,-<http://en.wikipedia.org/wiki/Fixed_point_combinator>
|
| A fixed point combinator (or fixed-point operator) is a higher-order
| function that computes a fixed point of other functions. A fixed point
| of a function f is a value x such that f(x) = x. For example, 0 and 1
| are fixed points of the function f(x) = x2, because 02 = 0 and 12 = 1.
| Whereas a fixed-point of a first-order function (a function on "simple"
| values such as integers) is a first-order value, a fixed point of a
| higher-order function f is another function p such that f(p) = p.
| A fixed point combinator, then, is a function g which produces such a
| fixed point p for any function f:
|
| p = g(f), f(p) = p
|
| or, alternately:
|
| f(g(f)) = g(f).
|
| Because fixed-point combinators are higher-order functions, their history
| is intimately related to the development of lambda calculus. One well
| known fixed point combinator in the untyped lambda calculus is Haskell
| Curry's Y = ?f�(?x�f (x x)) (?x�f (x x)). The name of this combinator is
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| incorrectly used sometimes to refer to any fixed point combinator. The
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| untyped lambda calculus however, contains an infinity of fixed point
| combinators. Fixed point combinators do not necessarily exist in more
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| restrictive models of computation. For instance, they do not exist in
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| simply typed lambda calculus.

As for Rice's theorem:

<http://en.wikipedia.org/wiki/Rice%27s_theorem#Proof_by_reduction_from_the_halting_problem>

| Proof by reduction from the halting problem
| ---------------------------------------------------------------------
|
| Proof sketch
|
| Suppose, for concreteness, that we have an algorithm for examining
| a program p and determining infallibly whether p is an implementation
| of the squaring function, which takes an integer d and returns d^2.
| The proof works just as well if we have an algorithm for deciding any
| other nontrivial property of programs, and will be given in general
| below.
|
| The claim is that we can convert our algorithm for identifying
| squaring programs into one which identifies functions that halt.
| We will describe an algorithm which takes inputs a and i and
| determines whether program a halts when given input i.
|
| The algorithm is simple: we construct a new program t which (1)
| temporarily ignores its input while it tries to execute program a on
| input i, and then, if that halts, (2) returns the square of its input.
| Clearly, t is a function for computing squares if and only if step (1)
| halts. Since we've assumed that we can infallibly identify program for
| computing squares, we can determine whether t is such a program, and
| therefore whether program a halts on input i. Note that we needn't
| ctually execute t; we need only decide whether it is a squaring program,
| and, by hypothesis, we know how to do this.
|
| t(n) {
| a(i)
| return n�n
| }
|
| This method doesn't depend specifically on being able to recognize
| functions that compute squares; as long as some program can do what
| we're trying to recognize, we can add a call to a to obtain our t.
| We could have had a method for recognizing programs for computing
| square roots, or programs for computing the monthly payroll, or
| programs that halt when given the input "Abraxas", or programs that
| commit array bounds errors; in each case, we would be able to solve
| the halting problem similarly.
|
| [Following the proof that this cannot be done, because if it were
| doable that would mean the halting problem was decidable, which
| it is not.]

Now, in order to write a program `P' that obfuscates code in a way that
statements are replaced by more complex code (e.g., function calls) that
computes the same as the original code for any given input, that program
`P' must be able to recognize, for example, a set of statements that is an
implementation of the squaring function as such a program `p'. Since the
proof above shows that this is not possible, there cannot be such a program
`P' (even if there can be such programs `p' or `t', which was not disputed
before).

A corollary of this is what I originally stated: If you cannot
programmatically determine if a set of statements is an implementation of a
certain function `p', you can _not_ write a program `P' that computes or
selects another function `t' that computes the same values as this set of
statements for the same inputs.

I find it rather curious that nobody appears to have recognized that even
though the proof for Rice's theorem in the Wikipedia is based on the very
same trivial property that the OP used in their question (the squaring
function), and I pointed that out in my answer.

PointedEars
--
realism: HTML 4.01 Strict
evangelism: XHTML 1.0 Strict
madness: XHTML 1.1 as application/xhtml+xml
-- Bjoern Hoehrmann

From: Scott Sauyet on 29 Dec 2009 10:50

On Dec 28, 6:49 pm, Thomas 'PointedEars' Lahn <PointedE...(a)web.de>
wrote:
> Scott Sauyet wrote:
>> Andrew wrote:
>>> - of course it is possible to do it, at least to some extent (still
>>> not buying the Rice's theorem having to do anything with this - as
>>> Lasse pointed out, compilers do that all the time)
>
>> I'm certainly not buying Thomas Lahn's assertion that Rice's theorm
>> implies that
>
>> | [Y]ou cannot devise a general function that computes another
>> | function that computes the same result for the same input
>> | (a trivial property) as a third function.
>
>> I think the Y-Combinator [1] demonstrates that this is false.
>
> I do not think the Y-Combinator applies here. Especially, we can read
> where "Y-combinator" redirects to:
>
> ,-<http://en.wikipedia.org/wiki/Fixed_point_combinator>

Which, incidentally, is the exact link I supplied.

> | [ ... ] The name of this combinator is
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> | incorrectly used sometimes to refer to any fixed point combinator.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

However, I did not misuse the term. I used the Y-combinator as the
best-known example of a fixed-point combinator. Since it was a
demonstration of existence, one example is plenty.

> | [ ... ] Fixed point combinators do not necessarily exist in more
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> | restrictive models of computation. For instance, they do not exist in
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> | simply typed lambda calculus.

But it does exist in Javascript. A quick web search turns up numerous
examples. This is my preferred form:

var Y = function(f) {
return (function(g) {
return g(g);
})(function(h) {
return function() {
return f(h(h)).apply(null, arguments);
};
});
};

Can you find a function f and a value x such that the following do not
yield the same result:

f(x)
Y(function() {return f;})(x)

?

> As for Rice's theorem:
> <http://en.wikipedia.org/wiki/Rice%27s_theorem#Proof_by_reduction_from...>
> | Proof by reduction from the halting problem

Yes, we all know how to read Wikipedia...

> Now, in order to write a program `P' that obfuscates code in a way that
> statements are replaced by more complex code (e.g., function calls) that
> computes the same as the original code for any given input, that program
> `P' must be able to recognize, for example, a set of statements that is an
> implementation of the squaring function as such a program `p'.

No, it doesn't have to do that. I could replace every function in the
original program with another that computes the same result, but which
is syntactically more complicated. This does not go very far towards
the OP's goal, though, as I'm sure it would be straightforward to
write code to recognize and reverse this transformation.

> Since the
> proof above shows that this is not possible, there cannot be such a program
> `P' (even if there can be such programs `p' or `t', which was not disputed
> before).

Sorry, I don't understand the parenthetical remark. But your
conclusion is clearly based upon a misreading of Rice's Theorem.

> A corollary of this is what I originally stated: If you cannot
> programmatically determine if a set of statements is an implementation of a
> certain function `p', you can _not_ write a program `P' that computes or
> selects another function `t' that computes the same values as this set of
> statements for the same inputs.

But you don't necessarily have to come at this from that direction.
You don't have to demonstrate an understanding of any set of
statements. You can do these transformations at the syntactic level,
using, for example, a fixed-point combinator like the Y-combinator on
the functions presented.

I'm not trying to support the OP. I think the idea is not well-
conceived, and that even if it were to work reasonably well, it would
cause more problems than it could possibly solve, but Rice's theorem
is not relevant.

You didn't respond to Lasse Nielsen's last post, but, if you're right
that Rice's theorem says that you cannot programatically convert a
function to another of identical behavior, how do explain the
existence of compilers?

> I find it rather curious that nobody appears to have recognized that even
> though the proof for Rice's theorem in the Wikipedia is based on the very
> same trivial property that the OP used in their question (the squaring
> function), and I pointed that out in my answer.

That means only that we can't algorithmically recognize any and all
sets of statements that represent squaring functions. It doesn't
imply that we can't transform one squaring function into another.

Do you think the following functions are equally transparent?:

var f1 = function(s) {return x * x;}

var f2 = (function(f) {
return (function(g) {
return g(g);
})(function(h) {
return function() {
return f(h(h)).apply(null, arguments);
};
});
})(function() {
return function(x) {return x * x;}
});

The second is simply the result of the Y-combinator applied to the
first.

We're pretty far afield from the OP's goals, though. The thread has
given a pretty strong consensus that those goals are at best very
difficult and, even if attainable, are not worth pursuing.

-- Scott

From: Scott Sauyet on 29 Dec 2009 12:08

On Dec 29, 10:50 am, I <scott.sau...(a)gmail.com> wrote:
> Do you think the following functions are equally transparent?:
>
> var f1 = function(s) {return x * x;}
^^^
Sorry, typo:

var f1 = function(x) {return x * x;}

-- Scott

From: Thomas 'PointedEars' Lahn on 29 Dec 2009 12:46

Scott Sauyet wrote:

> Thomas 'PointedEars' Lahn wrote:
>> As for Rice's theorem:
>>
<http://en.wikipedia.org/wiki/Rice%27s_theorem#Proof_by_reduction_from...>
>> | Proof by reduction from the halting problem
>
> Yes, we all know how to read Wikipedia...

Apparently not.

> You didn't respond to Lasse Nielsen's last post, but, if you're right
> that Rice's theorem says that you cannot programatically convert a
> function to another of identical behavior, how do explain the
> existence of compilers?

A compiler transforms source code into machine-dependent code; IOW, it
translates one higher-level language into another, lower-level language.
That is a concept fundamentally different from that which Rice's theorem
addresses.

>> I find it rather curious that nobody appears to have recognized that
>> even though the proof for Rice's theorem in the Wikipedia is based on
>> the very same trivial property that the OP used in their question (the
>> squaring function), and I pointed that out in my answer.
>
> That means only that we can't algorithmically recognize any and all
> sets of statements that represent squaring functions.

It means that no *program* can do that.

> It doesn't imply that we can't transform one squaring function into
> another.

It does imply that a *program* cannot do that.

> Do you think the following functions are equally transparent?:
>
> var f1 = function(s) {return x * x;}
>
> var f2 = (function(f) {
> return (function(g) {
> return g(g);
> })(function(h) {
> return function() {
> return f(h(h)).apply(null, arguments);
> };
> });
> })(function() {
> return function(x) {return x * x;}
> });
>
> The second is simply the result of the Y-combinator applied to the
> first.

What you are still missing is that the "Y-combinator" was applied by *you*
here, an (arguably) intelligent being applying informed reasoning; _not_ a
program implementing an algorithm. For a simple implementation of the
example used in the proof for Rice's theorem, a program implementing an
algorithm would need to recognize whether f() in

function a(j)
{
while (--j) b(j);
}

function f(x)
{
a(i);
return x * x;
}

was an implementation of the squaring function or not in order to compute
or select a suitable replacement for obfuscation. It cannot do that
because it depends on the pre-call value of `i' (and the behavior of b(j)),
whether a(i) halts, and if it halts, if execution even reaches the `return'
statement in f(), *regardless* that it is there. And the program cannot
recognize whether a(i), i.e. the while-loop in it, halts, let alone
whether, if it halts, the `return' statement in f() is ever reached.

> We're pretty far afield from the OP's goals, though.

Obfuscation was part of the goal, and that has been attained to a certain
degree. However, it has necessarily _not_ been attained *through a
program*, of course, and so missing the other part, of course.

> The thread has given a pretty strong consensus that those goals are at
> best very difficult and, even if attainable, are not worth pursuing.

ACK

PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Code Guidelines
Next: JavaScript code mangler / scrambler / ... khm, more than obfuscator... :)