Call for review: Hashing by hand algorithm [Cryptography]

Prev: Dynamic Hill cipher
Next: PE Scrambler

From: Maaartin on 29 Apr 2010 13:34

On Apr 29, 6:34 pm, bmearns <mearn...(a)gmail.com> wrote:
> Alright, I've got a new routine that I'm calling "raking" the deck
> (for no particular reason). I think that it can be done pretty quickly

For me it seems like you get rid of 2 cards in each step, so you need
to do quite a lot of work. Or am I wrong?

Currently, I don't have the time for our "card play", but I'll read it
later carefully, it looks interresting. I've also got some new ideas
to be tried out.

> To bury a card, X, take N to be the rank of X (a number in [1,13]).

Interpreting cards as numbers in 1..13 is different what we (at least
I and J.D.) did before. We interpreted Aces as 14, which is no good
idea for the following reasons:
- Ace looks like 1, you can easily be misguided.
- Counting starts usually with 1 instead of 2 (ok, 0 is better, but
not here).
- We lay the initial deck like A2345..., so A=1.

> So like I said if we add in the output function, this might work out
> really well. I'll start running some statistical tests, and if those
> are promising, I'll start implementing some more thorough tests.

These tests are important, I'm going to evaluate some new ideas of
mine before I start to describe and precise them.

Could you post the test code here?

I've got a proposal for a general test function. The interface could
look like

evaluate(input_function, finalizing_function, output_function, effort,
seed)

where

input_function takes two arguments (deck, letter) and returns the new
deck after absorbing letter

finalizing_function takes a single argument (deck) and returns the new
deck after postprocessing (like my 3 burries)

output_function takes a single argument (deck) and returns [deck,
output_letter]

effort is a number determining the running time

seed is the seed for a deterministic prng used for the tests

The result of evaluate could be a list of numbers, which should be
equal to 0 for a perfect algorithm; the further from 0 they are, the
worse.

What do you think about it?

From: bmearns on 29 Apr 2010 13:56

On Apr 29, 1:25 pm, "J.D." <degolyer...(a)yahoo.com> wrote:
> On Apr 29, 12:34 pm, bmearns <mearn...(a)gmail.com> wrote:
>
>
>
> > So that's the rake operation, and the overall algorithm is simply
> > feeding in input cards using the old shuffle_1 routine,
>
> Which one was the old shuffle_1 routine? I've lost track of what was
> what...

shuffle_1 was the very simple one: split the deck into top and bottom
parts, above and below the input card. Move the input card to the
front of the top part, then move the next card after it to the back of
the top part. Then put the top part behind the bottom part. It's
almost just a simple cut, but you move the input card up to the top
before cutting.

So far I've just been running collision tests on this latest routine
of mine, without even using a final output function. I've tried 250-
thousand unique random inputs of 10-cards each and gotten 250-thousand
unique outputs. That's definitely a good start, but we know collision
isn't everything.

I'm also going to need to spend several hours otherwise engaged, but
I'll plan to continue running collision tests in the background, and
then work on the more useful tests when I'm free.

-Brian

From: bmearns on 29 Apr 2010 14:25

On Apr 29, 1:34 pm, Maaartin <grajc...(a)seznam.cz> wrote:
> On Apr 29, 6:34 pm, bmearns <mearn...(a)gmail.com> wrote:
>
> > Alright, I've got a new routine that I'm calling "raking" the deck
> > (for no particular reason). I think that it can be done pretty quickly
>
> For me it seems like you get rid of 2 cards in each step, so you need
> to do quite a lot of work. Or am I wrong?

No, it only gets rid of 1 at a time, but I may not have described it
very well. Here's the python:

###############################################
def color(card, deckSize):
half = deckSize >> 1
if card < half:
return 0
else:
return 1

def bury(deck, x, suitSize):
"""
Buries card x into the given deck. Interpets X as a
number under modulo of the specified suit size, then
adds one (so it's just the rank of the card), counts
that many spaces down the deck, removes the /next/
card, and replaces it with x. Returns a two-tuple
(output, deck) where output is the removed card,
and deck is the resulting deck with the output card
replaced by x.
"""
n = (x % suitSize) + 1
deckSize = len(deck)
if n >= deckSize:
n = deckSize - 1
head = deck[0:n]
output = deck[n]
tail = deck[n+1:]
deck = head + [x] + tail
return (output, deck)

def rake(deck, suits):
"""
Arguments:
deck - A permutation of numerical cards. The
red cards are the lowest numbers, the
black cards are the rest. Goes from 0
up to N-1.

suits - The number of suits in the deck. For
a real deck, this is the number of
cards (the length of 'deck') divided
by 13. But you can set it however
you want. Dividing this into the
length of the deck determines how big
each suit is, and this in turn is
used to determine the rank of each
card (by taking the numerical value
modulo the size-of-a-suit).
"""
deckSize = len(deck)
suitSize = deckSize / suits
result = []

s = deck[0]
scol = color(s, deckSize)
deck = deck[1:]
while len(deck) > 1:
t = deck[0]
tcol = color(t, deckSize)

#Same color, bury t
if tcol == scol:

#Count down by #t, take next card as output
# and replace with t.
output, deck = bury(deck, t, suitSize)

#An artifact of doing it in code, now we've
# got d buried, we also have it at the front of
# the deck, so remove it.
deck = deck[1:]

#Append output
result += [output]

#Different color, bury s, then pick up t as new s.
else:
#Count down by #s, take next card after that
# as output, and replace it with s.
output, deck = bury(deck, s, suitSize)

#But the deck hasn't shrunk at all, we removed
# a card (output), but we replaced it
# with the selected card. So now pick up the
# first card as the new selected.
s = deck[0]
scol = color(s, deckSize)
deck = deck[1:]

#Append output
result += [output]

#When there's only one card left in the deck (and one
# in our hand), no matter what they are, we're going to
# end up outputing the one in the deck, so just do that,
# and then tack on the selected.
return result + deck + [s]

###############################################

>
> Currently, I don't have the time for our "card play", but I'll read it
> later carefully, it looks interresting. I've also got some new ideas
> to be tried out.

Yeh, I'll have to take a few hours off as well. Hopefully will get
back at it sometime tonight.

> > To bury a card, X, take N to be the rank of X (a number in [1,13]).
>
> Interpreting cards as numbers in 1..13 is different what we (at least
> I and J.D.) did before. We interpreted Aces as 14, which is no good
> idea for the following reasons:
[snip]

Yeh, I've been taking ace as 1, but I think I noticed you were using
14. To a certain extent, using 14 makes sense because it gives a
bigger range. For me, it's more natural to associate ace with 1, as
you pointed out, and it's more natural to start counting at 1 than 2.
Likewise, converting to letters is more direct if ace is 1. All in
all, I don't think it makes a big difference, it's mostly just a
matter of convention. As long as you make a choice and stick with it
everytime you run the algorithm, it shouldn't really matter much. I'll
stick with ace=1 for the reasons you cited.

>
> > So like I said if we add in the output function, this might work out
> > really well. I'll start running some statistical tests, and if those
> > are promising, I'll start implementing some more thorough tests.
>
> These tests are important, I'm going to evaluate some new ideas of
> mine before I start to describe and precise them.

In addition to the "direction" and "2-card sequence" tests I described
previously, I've come up with the following tests that I haven't
implemented yet:

* See where each card ends up in the final output. Should be uniformly
distributed.

* For each card as first input, see where the card ends up with
different numbers of inputs. At a certain minimum number of inputs, it
should be uniform.

* For each card as last input, see where the card ends up with
different numbers of inputs. At a certain minimum number of inputs, it
should be uniform.

* See how far every card moves between two digests when different
numbers of additional cards are fed in.

If anyone can think of other tests to do, or comment on these, that'd
be great.

> Could you post the test code here?

I will, but not yet. It's kind of a shambles right now, in addition to
just general cleaning up, I want to implement a logical and standard
interface like what you've suggested below. I'm also working on using
a file-based-database to prevent the system memory from filling up too
much when running large numbers of tests and collecting large numbers
of statistics. (Fortunately python has this functionality built in).

>
> I've got a proposal for a general test function. The interface could
> look like
>
> evaluate(input_function, finalizing_function, output_function, effort,
> seed)
>
> where
>
> input_function takes two arguments (deck, letter) and returns the new
> deck after absorbing letter
>
> finalizing_function takes a single argument (deck) and returns the new
> deck after postprocessing (like my 3 burries)
>
> output_function takes a single argument (deck) and returns [deck,
> output_letter]
>
> effort is a number determining the running time
>
> seed is the seed for a deterministic prng used for the tests
>
> The result of evaluate could be a list of numbers, which should be
> equal to 0 for a perfect algorithm; the further from 0 they are, the
> worse.
>
> What do you think about it?

That looks pretty good, but do you mean there would be a different one
of these functions to test every metric? I think we can set it up a
little more efficiently where we can test a number of different
metrics on a single deck. Then, for instance, my crappy statistical
testing code output data in CSV format which can be read and plotted
in Excel to compare different things (though I'll probably move to
using GNUPlot instead).

I've got a Subversion server running at home. I'll probably start
putting my code in there so you can always get the latest from there
if you want. If you're not familiar with Subversion, it has a simple
web interface so you can always just download it. If you are familiar
with Subversion, I can give you write access as well, so we can all
contribute to the same code.

But that will have to wait till later tonight. In the mean time, just
let me know if you want write access, and if you do give me a username
you want to use and an email address I should send your login info to.
Actually, if you use PGP, you can encrypt a username and password to
me with key-id 0x3aa70848. Just don't use an important password; I
don't have Subversion set up to be very secure with your passwords.

-Brian

From: J.D. on 29 Apr 2010 23:08

I have some analysis for the two algorithms that I tested by hand
(mine and Maaartin's). Specifically I have been thinking about
reversibility, and how resistant each algorithm is to preimage
attack. For the moment I am going to just focus on Maaartin's
algorithm (this post is long enough already). Hopefully this will
give us some useful tools for thinking about other potential
algorithms.

Maaartin's Algorithm and Preimage Attacks:

The first thing to note is that both the "bury" and the fixed (1,2)
deal operations are easily undoable. Here is how to undo them:

"Disinter" (undoes "bury" step)
i) Look at the last card, interpret it as a value, "N" (which during
my tests was in a range from 2 to 14), and set it aside,
ii) Take the bottom N cards from the end of the deck and move them in
a block to the front,
iii) Take the former last card and put it on the top of the deck.

"Break Deal" (undoes fixed (1,2) deal step)
i) Take the first 18 cards off the top of the deck -- this is the
"short stack", the remaining 34 cards is the "tall stack", and there
is a "result stack" that is currently empty,
ii) Take one card off the top of the short stack and put it on the
result stack,
iii) Take two cards off the top of the tall stack and put them on the
result stack,
iv) Repeat steps ii and iii in that order until all the cards are in
the result stack -- the result is the deck as it existed before the
deal step.

So, let's say that we are given the digest, consisting of the full
deck after all letters have been absorbed and then three buries
performed during finalization. We don't know how long the message was
(though we can assume it was relatively short, since this was done by
hand and the message was an easily memorable password). We do know
the starting condition of the deck before any of the letters were
absorbed.

The first step in reversing the hash is to undo those three buries,
which can be done by calling "Disinter" three times. Now we call
"Break Deal" once, and right away we can determine with certainty the
last letter of the hashed message. That is because the last step not
undone was a shuffle-cut, and that would have moved an instance of
that last letter to the very end of the deck. So all we have to do is
look at the last card and interpret it as a letter, and that gives us
the last letter of the hashed message.

Generally you don't want to let the attacker be able to easily
determine even one of the letters of the message with 100% certainty.
So that is an area where Maaartin's algorithm needs improvement.
However I think we can go further than just finding the last letter.
To go any further and recover more letters of the message requires
that we be able to undo the combination of two shuffle-cuts and a bury
step.

The difficulty with reversing the "shuffle-cut + bury + shuffle-cut"
steps is that while each shuffle-cut moves the first instance to the
end, if we don't know what the deck looked like before the shuffle-cut
then it is somewhat hard to determine precisely how many cards were in
the deck ahead of that first instance (and hence how many cards were
moved to the end along with the first instance by the shuffle-cut).
But we can narrow down the possibilities...

Consider what happens to the deck when a letter is absorbed. First,
the hasher translates the letter into a pair of cards (the two
instances of that latter, X and Y). Then he searches through the deck
from the top for the first of those two instances to appear, and once
he finds it he moves it and all the cards above that first instance to
the end. To visualize:

Before first shuffle-cut:
[first unknown no. of cards] X [second unknown no. of cards] Y [third
unknown no. of cards]
After first shuffle-cut:
[second unknown no. of cards] Y [third unknown no. of cards][first
unknown no. of cards] X

Now he performs a bury. To do this, he looks at the new first card,
interprets it as a value, "N", and moves it and the next N number of
cards to the end like so:

Before bury:
"N" [N cards] [?] Y [?] X
After bury:
[?] Y [?] X [N cards] "N"
(I assume for the moment that the next instance, Y, was not "N" or
among the N cards moved -- later we'll look at what happens when it
is)

Then the second shuffle-cut:
[?] Y [?] X [N cards] "N" ==> [?] X [N cards] "N" [?] Y

That bury step in the middle is the weak point, and will enable us to
(with a certain amount of guessing) undo all three steps. Here is how
to start undoing them, given the output of the three steps, and still
assuming that the second instance was not moved to the bottom by the
bury step:

1) Look at the last card -- this will be an instance of the absorbed
letter (Y);
2) Find the other instance of that letter in the deck (X), "N" will be
between them;
3) To find "N", look for all of the self-justifying cards between X
and Y;

General Definition: A self-justifying card is a card with a face value
of "N", and there are N cards between it and some "anchor-point". The
relevant anchor-point given our assumption is the current highest
instance (X). To illustrate:

A self-justifying card where X is the anchor point:
[?] X [N cards] "N" [...] Y

Any card that satisfies the above illustrated condition is a potential
candidate for the determinant of the bury step (the card that
determined how many cards below it to bury). In the simplest case,
let's say there is only one such candidate (there will always be at
least one). In that case reversing the last shuffle-cut and then the
bury is easy:

[?] X [N cards] "N" [...] Y
Reverse last shuffle-cut:
[...] Y [?] X [N cards] "N"
Reverse bury:
"N" [N cards] [...] Y [?] X

So what if our assumption was wrong, and the other instance was moved
by the bury step? If that is the case then we can still reverse the
two steps by looking for self-justifying cards. For example, let's
say that the other instance was among the N cards specified by "N".
In which case, after the bury, the deck will look like this:

Bury:
"N" {[a] Y [b]} [?] X ==> [?] X {[a] Y [b]} "N"
(where a + b + 1 = N)
Second shuffle-cut:
[?] X {[a] Y [b]} "N" ==> {[a] Y [b]} "N" [?] X

Note that "N" is self-justifying when the anchor-point is the top of
the deck (i.e. a card with a face value of "N" is self-justifying if
there are exactly N cards above it in the deck). This holds even if
we assume that the other instance was also the bury-determinant:

1st shuffle-cut:
[?] X Y [z cards] ==> Y [z] [?] X
Bury:
[z-Y] [?] X [Y cards] Y
2nd shuffle-cut:
[Y cards] Y [z-Y] [?] X

i.e. "Y" is Y cards away from the top of the deck.

Putting this all together, let's return to our procedure and specify
it more completely:

"Seek Justification" (undoes the second shuffle-cut and bury steps):
i) Look at the last card -- this will be an instance of the absorbed
letter, (Y);
ii) Find the other instance of that letter in the deck, (X) -- the
actual bury determinant, "N", will either be between the two instances
or will be X itself;
iii) To find "N", look for all of the self-justifying cards from X
(including X) up to Y (not including Y);
A card is self-justifying iff:
- the card has a face value of "N" and there are N cards between it
and X, OR
- the card has a face value of "N" and there are only N cards above
it.
iv) If there is more than one self-justifying card then we will need
to branch our analysis -- i.e. pursue multiple possible paths to the
answer -- For each and every self-justifying card, "J":
a) Take all the cards below "J" (including Y) and move them as a block
to the top,
b) Put "J" aside, and take J many cards from the bottom and move them
to the top,
c) Then put "J" on the top,
d) Record the current condition of the deck as 'Condition of Deck for
candidate "J"', and then return the deck to the condition it was in
before (a), and repeat steps (a)-(d) for the next self-justifying card
until all the possible candidates for the actual bury determinant have
been processed and recorded.

The actual condition of the deck before the bury function and the
second shuffle-cut will be on that list. But since we don't know
which one it was (unless there is only one self-justifying card), we
have to investigate all of them until we find one that leads to a
valid preimage. We do that by picking one of the possible "J"s,
assume it was in fact the correct bury determinant, and continue with
the next phase of the reversal analysis under that assumption. Then
repeat for each and every "J".

Reversing the First Shuffle-Cut:

The next phase in the analysis is reversing the first shuffle-cut, and
will probably require even more guessing and more branching of our
analysis. After the second shuffle-cut and bury steps are reversed,
we can say with certainty that the original first card (the top of the
deck before the first shuffle-cut) is either an element of the
interval between the two instances, or is X itself:

"N" [N cards] [...] Y <<<[interval] X>>>
Original top card is within <<< >>>, which I will call "the Zone of
possible top cards".

The number of cards in the Zone determines how difficult reversing the
first shuffle will be (the larger this number is, the harder it is to
undo). If there is only one card in the Zone (i.e. only X), then we
don't need to guess or branch the analysis at all; we just move that
card to the top and the first shuffle-cut is undone with certainty.
But if there is more than one card in the Zone then we need to test
each one and see if it meets certain criteria. Those that do not meet
the criteria can be ignored (as they will not be possible top cards),
but all those that do meet the criteria have to be pursued as possible
paths to a valid preimage.

So what criteria should we use, and how do evaluate the candidates?
The procedure I have come up with is as follows:

"Seek More Justification" (undoes first shuffle-cut)
For each and every card, "C", in the Zone:
i) Assume that "C" was in fact the top card before the first shuffle-
cut, and reverse that first shuffle-cut by taking "C" and every card
below "C" and moving them as a block to the top;
ii) Criterion One: Compare the deck as it is now with the starting
condition of the deck before any letters are absorbed -- if the deck
is identical to the starting condition then we can stop all of our
analysis, because with a high degree of probability we have found the
point where the hashing began. The message can be extracted by
looking at the various reverse-transformations of the digest that we
took in order to reach this starting condition.
iii) Criterion Two: If the deck under the assumption that "C" was the
top card fails Criterion One, then there are other tests we can
perform on it to see if it is none-the-less on the right path to a
valid preimage:
a) Perform "Break Deal" on the deck,
b) Perform "Seek Justification" on the resulting deck,
c) If "Seek Justification" does NOT produce any self-justifying cards,
then the assumption that "C" was the top card cannot have been
correct, and we do not need to pursue this branch any further. (If it
isn't clear to you why that is, tell me and I will explain it.) On
the other hand, if "Seek Justification" does find one or more self-
justifying cards then we must consider this a "fruitful" branch of our
analysis, and record "C" as a true possibility.
iv) Return the deck to the condition it was in before (i) and continue
steps (i) through (iii) will the other possible cards in the Zone.

The card that was in fact the top card before the first shuffle-cut
will be on the list of 'true possibilities'. But since we can't be
sure which one it is (unless the list has only one element), we need
to pursue each true possibility (each "fruitful" branch of the
analysis) until we find a valid preimage, the same way that we had to
follow up on every self-justifying card in the previous phase.

The next phase is to repeat our cycle of "Break Deal", then "Seek
Justification" and then "Seek More Justification" for each of the
fruitful branches, in order to peel off the next letter of the
message. We keep repeating that cycle, and peel off more and more
letters (and branching more and more) unless we run out of memory
(because we have too many fruitful branches to pursue) or we find a
valid preimage (one that meets Criterion One in the "Seek More
Justification" phase).

Attack Complexity:

The complexity of this attack is a function of the average number of
fruitful branches that occur in order to peel off a single letter.
The attack will be faster than a brute force attack if and only if the
average number of branches is below a certain threshold. As I
understand it, this threshold is 26 -- i.e. so long as the average
number of branches that arise during our attempt to peel off a letter
is less than 26 then our algorithm (on average) will require fewer
operations to find a valid preimage than a brute force preimage
attack.

Why 26? Well, if the brute force attacker doesn't know the length of
the message, but does know it is likelier to be short rather than long
(for the reasons described at the top of this post) then the attacker
would probably first search all possible messages of length 1, then
all possible messages of length 2, and so on until he finds a message
that produces the hash in question. The number of possible messages of
length N is 26^N, and to systematically search all possible messages
up to length N in that fashion requires SUM_(i: i=1,2,3N) 26^i
evaluations of the hash function.

The reversal attack algorithm described in this post also has an
exponential complexity (it requires exponentially more operations to
find valid preimages the longer they are). Like the brute force
attack, the exponent is the length L of the message in question, and
the algorithm searches all shorter preimages before moving onto longer
and longer ones. But for the reversal attack, the base is the average
number of fruitful branches required to peel off a letter. So long as
the base is less than 26 it will be a faster way to find a valid
preimage of length L than the brute force attack.

So what is the average number of fruitful branches?

I don't know -- that is an empirical question that must be determined
by testing. However I can say more precisely what we need to look for
while testing. Recall that branches occur in two places while peeling
off a single letter -- at the "Seek Justification" stage, and at the
"Seek More Justification" stage. The average number of fruitful
branches at the first stage, "P", is identical to the average number
of self-justifying cards found during the "Seek Justification" step.
The average number of fruitful branches at the second stage, "Q", is
the product of two numbers: (a) the average number of cards in the
Zone; and (b), the average proportion of Zone cards that cannot be
ignored because they might possibly lead to a valid preimage
(expressed as a fraction of the total number of Zone cards). The
average number of branches that arise when peeling off a letter is
thus P*Q. So long as P*Q < 26, the attack described in this post will
break Maaartin's algorithm (though it may not be a practical break for
very large messages).

I have a similar analysis for my own algorithm, but for now let's make
sure that the above analysis is a) clear, and b) correct. Is there
any part that you guys need clarified or expanded upon? Do you spot
any errors in the analysis?

From: J.D. on 29 Apr 2010 23:44

Brian,

I'll take a closer look at your new "raking" idea tomorrow. For now I
am kinda wiped out...

First | Prev | Next | Last
Pages: 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Prev: Dynamic Hill cipher
Next: PE Scrambler