Call for review: Hashing by hand algorithm [Cryptography]

Prev: Dynamic Hill cipher
Next: PE Scrambler

From: J.D. on 27 Apr 2010 14:02

On Apr 27, 12:03 pm, "J.D." <degolyer...(a)yahoo.com> wrote:
>
> Alternatively, maybe if the 'deal' amount is defined by the latest
> letter to be absorbed (assuming I understand your 'deal' function at
> all). e.g. to absorb 'c', double shuffle with instances of 'c', and
> then deal-sort the deck by triplets (as 'c' = 3); to absorb 'd',
> double shuffle with instances of 'd', then deal-sort the deck by
> quadruplets...and so on. This would be less effective for 'a' and 'z'
> if 'a' = 1 and 'z' = 26, but perhaps we could assign every letter a
> number in the range 2 to 14 (with 'a' and 'n' both = Aces = 14).

OK, I have implemented this idea (reusing a lot of your code):

hashing "qwerty":
EvwxGzATPQuWmJKDbcdHMNOBCeUFpijXYyIZqVSLrstRklnofgha
hashing "qwarty":
GzAxMBCJKDEFgTPQOHpijkIaUVhLnobcdXYyRZqNrSlefstuWmvw
hashing "kwerty":
MTPQUVkjRZOabHIxiuScdJpfghKezANCnoBqrstyXYDEFwlmLGvW
hashing "qwerti":
apijXYItRklnofghKDEvwxyrszATPQuWmJMNOBCeUFGZqVSLbcdH

A visual inspection of these few examples suggests a good mixing
occurs after even only 6 letters are hashed (no secondary hashing is
necessary), and possibly a good avalanche too. However significantly
more statistical testing is necessary to confirm how well it
avalanches...

code (only new stuff is in double_shuffle):

def new_deck(n=26, occurrences=2):
deck = "";
for i in range(occurrences):
for j in range(n):
deck = deck + chr(ord("aA"[i&1]) + j)

def deal(deck, n):
result = ""
while len(deck)>n:
result = deck[:n] + result
deck = deck[n:]
result = deck + result
return result;

def double_shuffle(deck, letter):
for i in range(len(deck)):
if deck[i].lower() == letter:
break
for j in range(len(deck)-1, 0, -1):
if deck[j].lower() == letter:
break
head = deck[0:i]
middle = deck[i+1:j]
tail = deck[j+1:]
tempdeck = middle + deck[j] + tail + deck[i] + head
if letter == 'a' or letter == 'n' or letter == 'A' or letter == 'N':
return deal(tempdeck, 14)
elif letter == 'b' or letter == 'o' or letter == "B" or letter ==
"O":
return deal(tempdeck, 2)
elif letter == 'c' or letter == 'p' or letter == "C" or letter ==
"P":
return deal(tempdeck, 3)
elif letter == 'd' or letter == 'q' or letter == "D" or letter ==
"Q":
return deal(tempdeck, 4)
elif letter == 'e' or letter == 'r' or letter == "E" or letter ==
"R":
return deal(tempdeck, 5)
elif letter == 'f' or letter == 's' or letter == "F" or letter ==
"S":
return deal(tempdeck, 6)
elif letter == 'g' or letter == 't' or letter == "G" or letter ==
"T":
return deal(tempdeck, 7)
elif letter == 'h' or letter == 'u' or letter == "H" or letter ==
"U":
return deal(tempdeck, 8)
elif letter == 'i' or letter == 'v' or letter == "I" or letter ==
"V":
return deal(tempdeck, 9)
elif letter == 'j' or letter == 'w' or letter == "J" or letter ==
"W":
return deal(tempdeck, 10)
elif letter == 'k' or letter == 'x' or letter == "K" or letter ==
"X":
return deal(tempdeck, 11)
elif letter == 'l' or letter == 'y' or letter == "L" or letter ==
"Y":
return deal(tempdeck, 12)
else:
return deal(tempdeck, 13)

From: J.D. on 27 Apr 2010 14:23

Sorry, forgot to add...

def csh(string):
deck = new_deck()
for i in range(len(string)):
deck = double_shuffle(deck, string[0])
string = string[1:]
return deck

#"csh' = candidate shuffle hash

From: Maaartin on 27 Apr 2010 15:49

On Apr 27, 6:03 pm, "J.D." <degolyer...(a)yahoo.com> wrote:
> On Apr 27, 9:42 am, Maaartin <grajc...(a)seznam.cz> wrote:
> > I changed the representation to letters, I use both lowercase and
> > uppercase, but they're treated exactly the same. I also added a new
> > optional operation: deal(deck, n), which is very simple: Take n cards
> > for the top, put them on the table, repeat until there are no more
> > cards left (in the last step there may be less than n cards). This is
> > like dealing the cards to one player, it reverses the deck and for n>1
> > shuffles it a bit.
>
> Let me see if I understand you: You are taking some set of cards and
> then partially re-ordering them by, let's say, pairs, such that the
> first pair becomes the last, the second pair becomes the second from
> last, and so on. e.g. (0,1)(2,3)(4,5) --> (4,5)(2,3)(0,1)
>
> Is that correct?

Yes. This time it's way easier to try than to explain:

print deal("012345", 2)
=>
452301

print deal("012345678", 2)
=>
867452301

Note, that it is exactly like dealing cards to single person, two at a
time.

Obviously, dealing to more persons could perform better, so I rewrote:

def deal(deck, list):
result = []
for n in list: result += [""];
while len(deck)>0:
for i in range(len(list)):
n = min(list[i], len(deck))
#print "@", i, n, result, deck
result[i] = deck[:n] + result[i]
deck = deck[n:]
#print "@@", result
result = "".join(result)
return result;

You can use the commented out lines for viewing how it works. The
second parameter is a list where the n-th entry determines how much
cards the n-th person is given to.

print deal("012345678", (2,))
=>
867452301 # obviously a single element list works just as before

print deal("012345678", (2,1)) # here the first person is given 2
cards, and the second is given 1 card
=>
673401852 # the dealt decks "67 34 01" for the first person and "8 5"
for the second get concatenated

>
>
>
> > The function double_shuffle(deck, letter, shouldDeal) looks for the
> > first and the last (i.e., second) occurrence of letter, and does what
> > you described. With shouldDeal=True it also uses the function deal for
> > the head and tail parts with n=2 and n=2, respectively.
>
> > The program is very simple:
>
> Thank you for implementing it. I can say with certainty that if you
> are new to python you are still much further along than I.

I'm really new to python, except for me having some time ago to change
(with a lot of swearing) two lines in a short python program.

> Unfortunately, with some testing using your code, it appears my idea
> does not really increase the mix rate as much as I had hoped (even
> with your deal function to boost it)

I disagree. I repeat here only the last line after mixing "qwerty" in:
y => ZqabcdefghijklmnopwrstuvExyzABCDRFGHIJKLMNOPQTSYUVWX
...!!!!!!!!!!!!!!!..!!!!..!!!!!!..!!!!!!!!!!!....!!! 39
The exclamation marks denote pairs in the original order, the number
to the right is their count. With dealing it looks like
y => zAxDBCijEnklopmSTFGKHILMJQNOXYswutvdqrcheVgfUabPWZyR
.!...!.!...!.!..!.!..!.!...!.!.......!........!..... 13
So after 6 inputs there hardly any pairs untouched. Sure, that implies
no avalanche.

I used the following function

def eval_neighbours(deck):
result = ""
cnt = 0;
for i in range(0, len(deck)):
left = deck[i-1]
right = deck[i]
left = left.lower()
right = right.lower()
diff = ord(right) - ord(left)
b = diff==1 or diff==-25
result += "!" if b else "."
if b: cnt += 1;
return result + " " + str(cnt)

> -- there is still far too much
> retention of the original card order after 10, 15 and even 20 message
> letters are absorbed. Maybe bmearn's improvement above will do
> better.

Currently I can imagine too many possibilities...

> Alternatively, maybe if the 'deal' amount is defined by the latest
> letter to be absorbed (assuming I understand your 'deal' function at
> all). e.g. to absorb 'c', double shuffle with instances of 'c', and
> then deal-sort the deck by triplets (as 'c' = 3); to absorb 'd',
> double shuffle with instances of 'd', then deal-sort the deck by
> quadruplets...and so on. This would be less effective for 'a' and 'z'
> if 'a' = 1 and 'z' = 26, but perhaps we could assign every letter a
> number in the range 2 to 14 (with 'a' and 'n' both = Aces = 14).

Yes, this could help. But I'd prefer to use let's say the first card
below the first card corresponding to the letter, so there's another
indirection.

*********

On Apr 27, 6:30 pm, "J.D." <degolyer...(a)yahoo.com> wrote:
> On Apr 27, 10:23 am, bmearns <mearn...(a)gmail.com> wrote:
> Yeah, the persistence of those sequences is kind of annoying. A note
> of caution: the goal is avalanche, not merely to get rid of the
> sequences. The persistence of the sequences is an indication of the
> lack of avalanche -- but the absence of the sequences does not prove
> that avalanche has occurred. So we might come up with a trick that
> appears to thoroughly scramble the sequences, but if different inputs
> don't cause large changes in the output (e.g. significantly change the
> relative ordering of the cards) then the hash will probably still be
> vulnerable.

I fed the three strings "qwerty", "pwerty", "owerty" into a fresh deck
and looked at the result:

without deal
qwerty => ZqabcdefghijklmnopwrstuvExyzABCDRFGHIJKLMNOPQTSYUVWX
pwerty => ZpabcdefghijklmnowqrstuvExyzABCDRFGHIJKLMNOPQTSYUVWX
owerty => ZoabcdefghijklmnwpqrstuvExyzABCDRFGHIJKLMNOPQTSYUVWX

with deal
qwerty => zAxDBCijEnklopmSTFGKHILMJQNOXYswutvdqrcheVgfUabPWZyR
pwerty => CzAEDBfcjghkliorqetGHIFMJKNOLdPZabWTsYuvyxnRUmSpQXwV
owerty => BCzjEDkfiognruqtGHIFMJKNOLWYhTaspVedZbcyAxlRUPSmQXwv

I looks like deal makes a good job, but not good enough.

> > I still need to do the sequence testing I mentioned, but based on a
> > few tests, I think 14 or 15 inputs will generally suffice in a 26-
> > card deck to destroy the sequences, and as few as 10 inputs seems to
> > be working well if I perform the second iteration. I think that's a
> > fairly reasonable number of letters to remember as a password. It may
> > also be effective to just feed the password in twice...but that might
> > also somehow reveal information (like when the Germans encoded the
> > ring settings twice in Enigma messages).

Feeding in in twice is good, but doubles the work including the part
which I hate most: mapping letters to cards.

> Yeah, testing with Maaartin's implementation has convinced me that
> this is not enough...

Is there any particular reason for returning
middle + deck[j] + tail + deck[i] + head
and not for example
tail + deck[j] + middle + deck[i] + head
or
head + tail + deck[j] + middle + deck[i]
? Actually, I have no idea which one to prefer.

According to my last experiments, it looks like dealing the whole
stack after each input is a good idea. I mean something like

deck = new_deck();
for x in "qwerty":
deck = double_shuffle(deck, x, False);
print x, "=>", deck
deck = deal(deck, (1, 2, 4))
print " ", " ", deck

The detail don't matter much now, my point is: No matter how small
change you make in double_shuffle, it gets amplified by deal.

> Yeah, that could work. Though I am concerned that the 'every other
> card" part might be a bit fiddly and error prone (I would have to do
> testing with an actual deck of cards to be sure). I am also not sure
> about how well it would increase avalanche. Testing is required.

If I understand him correctly, this can be done like a sort of
dealing, too: Deal the cards to two persons (one at a time), where
only the first person gets her cards faced up. Than put the two stacks
together again. Doing this in hand is surely more demanding but
possible.

*********

On Apr 27, 8:02 pm, "J.D." <degolyer...(a)yahoo.com> wrote:
> On Apr 27, 12:03 pm, "J.D." <degolyer...(a)yahoo.com> wrote:
> OK, I have implemented this idea (reusing a lot of your code):
>
> hashing "qwerty":
> EvwxGzATPQuWmJKDbcdHMNOBCeUFpijXYyIZqVSLrstRklnofgha
> hashing "qwarty":
> GzAxMBCJKDEFgTPQOHpijkIaUVhLnobcdXYyRZqNrSlefstuWmvw
> hashing "kwerty":
> MTPQUVkjRZOabHIxiuScdJpfghKezANCnoBqrstyXYDEFwlmLGvW
> hashing "qwerti":
> apijXYItRklnofghKDEvwxyrszATPQuWmJMNOBCeUFGZqVSLbcdH
>
> A visual inspection of these few examples suggests a good mixing
> occurs after even only 6 letters are hashed (no secondary hashing is
> necessary), and possibly a good avalanche too. However significantly
> more statistical testing is necessary to confirm how well it
> avalanches...

This looks really nice!

You could make your program much shorter using something like

n = ord(letter.lower()) - ord("a")
n %= 13;
if n<=1: n += 13;

From: bmearns on 27 Apr 2010 15:52

On Apr 27, 2:02 pm, "J.D." <degolyer...(a)yahoo.com> wrote:
> On Apr 27, 12:03 pm, "J.D." <degolyer...(a)yahoo.com> wrote:
>
>
>
> > Alternatively, maybe if the 'deal' amount is defined by the latest
> > letter to be absorbed (assuming I understand your 'deal' function at
> > all). e.g. to absorb 'c', double shuffle with instances of 'c', and
> > then deal-sort the deck by triplets (as 'c' = 3); to absorb 'd',
> > double shuffle with instances of 'd', then deal-sort the deck by
> > quadruplets...and so on. This would be less effective for 'a' and 'z'
> > if 'a' = 1 and 'z' = 26, but perhaps we could assign every letter a
> > number in the range 2 to 14 (with 'a' and 'n' both = Aces = 14).
>
> OK, I have implemented this idea (reusing a lot of your code):
>
> hashing "qwerty":
> EvwxGzATPQuWmJKDbcdHMNOBCeUFpijXYyIZqVSLrstRklnofgha
> hashing "qwarty":
> GzAxMBCJKDEFgTPQOHpijkIaUVhLnobcdXYyRZqNrSlefstuWmvw
> hashing "kwerty":
> MTPQUVkjRZOabHIxiuScdJpfghKezANCnoBqrstyXYDEFwlmLGvW
> hashing "qwerti":
> apijXYItRklnofghKDEvwxyrszATPQuWmJMNOBCeUFGZqVSLbcdH
>
> A visual inspection of these few examples suggests a good mixing
> occurs after even only 6 letters are hashed (no secondary hashing is
> necessary), and possibly a good avalanche too. However significantly
> more statistical testing is necessary to confirm how well it
> avalanches...
>
[snip]

Your initial results look good, but I'm really not crazy about this
"deal" mechanism. Besides the increased time complexity of doing this
(especially having to do it differently depending on the letter), I'd
really like to avoid having to put any cards down as both a matter of
convenience and a matter of security. If it came down to doing that,
I'd think a more thorough transposition would be better. E.g., instead
of keeping blocks of N cards together, deal out N cards in a row, and
then deal out the next N on top of those, etc. So it's just an N
column columnar transposition, where cards K, K+N, K+2N, ... end up
adjacent.

Also, you're main shuffling algorithm (prior to the dealing) still has
that kink I mentioned earlier, where despite the pair of anchor
points, you're actually just splitting the deck around a single card
(the first instance).

I tested out the version I previously suggested, where the middle
interval is alternately moved to the top of the deck. I still haven't
gotten a chance to actually test it by hand to see how easy it is, but
I think that the fact the the opposite cards can be moved to a
different location as well (the bottom of the top stack), instead of
just having to skip over them somehow) should make it not too error
prone.

Anyway, I ran some tests on it, and it looks fairly good. It actually
cascades just slightly worse than the prior version. For a 52-card
deck, looking at the entire deck as the output, the old version (I'll
call it shuffle_1) averages 65.4% of bits change per input, and the
new version (call it shuffle_2) averages 67%. As far as I understand,
we want to it be as close to 50% as possible: 65% is essentially the
same as 35%, with a bit-wise negation built in. So neither of these
look awesome in that respect.

However, if you draw a smaller "hand" from the deck to use as your
output, you can get outputs of very reasonable size for use as
passwords that cascade very nicely. For instance, drawing a hand of 12
cards (from a 52-card deck) gives 66 bits in the result and has
average bit changes of 51.15% (shuffle_1) and 52.46% (shuffle_2) For
reference, md5 cascades 49.997%, sha-1 cascades 44.998%, and sha-256
cascades 50.011%. The std-dev is relevant as well: md5 has 4.44, sha-1
has 3.95, and sha-256 has 3.12, while shuffle_1 has 6.48 and shuffle_2
has 6.25.

So my amateur evaluation is that the shuffle_1 and shuffle_2
algorithms both cascade pretty well, with shuffle_1 fairing slightly
better.

Note: I've got a bunch of statistics and numbers given below. If you
want to skip it, I'll summarize it at the end.

I also looked at a number of statistics related to the sequences.
First I simply counted the number of 2-card sequences in each
permutation (i.e., the number of cards for which it's next neighbor is
exactly +1 or -1 from it). For random permutations, this averaged
2.003, but with a pretty significant std-dev of 1.38. Shuffle_1
compared very poorly in this metric, as we already knew. The stat
consistently decreases as the number of inputs increases (as one would
generally expect), but even with as many as 27 input values, there
were still an average of 13 such sequences per 52-card sequence.
Shuffle_2 did much better in this metric: the average number of
sequences held pretty steadily between 1.7 and 1.9 as the length of
the input increased from 5 to 27. It also had a consistent std-dev
very close to that of the random sequences, hovering between about 1.3
and 1.5 for the most part.

I think this is a pretty reasonable metric for showing that an
algorithm is bad, as in the case of shuffle_1. shuffle_2 did much
better, but this metric alone doesn't tell us much about the quality.
Because of the alternating distribution, we would expect to see very
few sequences, but the presence of "every-other-value" sequences is
still undesirable, and this metric doesn't account for that.

So I did a more general test in which I looked at the "direction" of
each pair of cards. If a card's next neighbor is greater than it, then
the direction of that pair is positive. If the card's next neighbor is
less than it, the direction is negative. This will account for the
every-other-card sequences described above, and should generally be a
pretty nice statistic to compare against random permutations.

I looked at a few different metrics with regards to this stat. For
instance, I counted the maximum number of consecutive pairs in the
deck that have the same direction. I also counted the number of times
that the direction changes while walking through the deck, and the
average number of consecutive pairs that are all the same direction
(obviously, these metrics are all pretty tightly coupled to one
another). For random permutations of 52-card decks, the average number
of times the direction changed was about 33. The maximum number of
consecutive pairs with the same direction averaged to about 3.6, and
the standard deviation of the number of consecutive pairs with the
same direction was about 0.7.

For shuffle_1, the average number of turns only made it up to about 27
when the input length was an somewhat unreasonable 27 values. This
indicates a tendency towards longer runs of same-direction pairs in
the output compared to the random data. The maximum number of
consecutive pairs in the same direction only got down to 6.6 with 27-
value inputs. This reinforces the tendency suggested by the first
metric. Finally, the standard deviation of the number of consecutive
pairs was about twice that of the random samples, at 1.4, which
basically just means that there were some long groups and some short
groups, compared to a more even distribution in the random samples.

Shuffle_2 once again performed quite a bit better on these metrics.
The average number of turns hit the 33 mark (the value for random
samples) with just 10 input values, and actually climbed from there up
to about 42 with 27 inputs. The maximum number of consecutive pairs
was a little high with 10 inputs, averaging to 4.57, but with 13 or
more input values, it hovered just above the benchmark at about 3.7 to
3.8. These two metrics seem to indicate that the shuffle_2 algorithm
acts pretty randomly with regard to this particular statistic. The std-
deviation of the number of consecutive pairs was also very similar to
the random sample, hitting 0.78 with 12 inputs, then quickly zeroing
in on 0.74 and 0.73 with a few additional inputs.

So here's the summary I promised: shuffle_1 cascades slightly better
than shuffle_2, but those sequences are a very Bad Thing and are hard
to get rid of. shuffle_2 seems to do pretty well at eliminating those
sequences and still cascades quite well.

So I'd like to make shuffle_2 the launch pad for going forward as it
seems to be performing well in all the areas I want it to. I still
need to get a deck of cards and actually try it out, but I'm
optimistic. Basically what I'm looking at right now is a full deck of
52 cards, 13 or 14 values in the master password (which is reasonable
for a strong password anyway, giving 74 or 79 bits), and drawing a
hand of probably 10 to 12 cards for the final output, giving passwords
of 55 to 66 bits, which is probably far beyond sufficient for logging
into gmail, etc.

I think additional modifications that might be useful would be some
sort of final mixing like Maaartin previously suggested.Still bearing
in mind, of course, the requirements that it be relatively fast, very
easy, not error prone, and not require laying any cards down. Since
it's just a final stage, it can be a bit lengthier than the rest, but
not too long. I think one of you suggested something like stepping
through the deck by counting a number of cards equal to the value on
the facing card, or something similar. I think that might be
reasonable to perform, and add significant obscurity to the output.
However, the card that tells you how many to count should /not/ go
into the output. If it did, an attacker could probably reconstruct the
location of every output card in the final deck order. So I'm thinking
something like this, instead:
1) look at the top card. Count down the corresponding number of cards
(1-13), moving each counted card to the bottom of the deck (so we
don't have to think about wrapping when we get to the end).
2) After counting/moving those cards, the new top card is the output
value. Move that card to the bottom as well.
3) Go back to 1 until you have enough output values.

I guess adding this step means I have to rerun all my tests. Any
thoughts on it before I do?

Warm regards,
-Brian

From: Maaartin on 27 Apr 2010 16:24

Output procedure revisited

My previous output procedure liked to cycle, but the remedy is
trivial: Throw away the guilty card (ideally burn it use never again).

single_output(output, deck): It takes two arguments and returns a list
of two string. The first one is the previous output concatenated with
the current output (a single letter). The second is the modified deck.
The current output is the first card but one. The very first card
determines the number of cards to be put to the bottom and get
discarded. Note, that it's not the card becoming a part of the output,
which is thrown away, so any number of repetitions in the output is
possible. Using variable dealing instead of simply moving n cards
would make it most probably better.

multi_output(output, deck, n): Calls single_output n-times and prints
the state.

demo(): Starts once with the original deck, and once with a slightly
modified one. The outputs are very different.

#########
def new_deck(n=26, occurrences=2):
deck = "";
for i in range(occurrences):
for j in range(n):
deck = deck + chr(ord("aA"[i&1]) + j)
return deck;

def card_to_number(card):
n = ord(card.lower()) - ord("a")
n %= 13
return n

def single_output(output, deck):
output += deck[1]
n = card_to_number(deck[0]) + 1
deck = deck[1:]
deck = deck[n:] + deck[:n]
return [output, deck]

def multi_output(output, deck, n):
for i in range(n):
print output, deck
[output, deck] = single_output(output, deck)
print

def demo():
deck = new_deck()
multi_output("", deck, 10)
deck = deck[1:] + deck[0];
multi_output("", deck, 10)

demo()
#########

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Prev: Dynamic Hill cipher
Next: PE Scrambler