Call for review: Hashing by hand algorithm [Cryptography]

Prev: Dynamic Hill cipher
Next: PE Scrambler

From: J.D. on 27 Apr 2010 00:09

On Apr 26, 10:39 pm, bmearns <mearn...(a)gmail.com> wrote:
> However, after the rather disastrous results of actually trying to
> perform my first algorithm, I'm very weary of over complicating it. I
> have no doubt that your suggestions would greatly improve the security
> of the algorithm, but it doesn't seem very feasible to perform by
> hand. Multiplying and taking mod 26 is going to slow things way down,
> and is going to be pretty error prone I think.

Yeah, that's the conclusion I came to as well. This is partially
rectifiable by using written tables (so you aren't doing math in your
head), but that defeats the simplicity of the "deck of cards"
approach, and takes time to draw up the tables.

>
> But I think I did come up with a reasonable way to significantly
> improve the security. The algorithm is exactly the same, but after
> feeding in the entire input, I re-hash the resulting deck. For
> instance, I cut a deck in two (Hearts and Clubs, Spades and Diamonds),
> feed my input into one of them. Then, I feed all 26 cards from that
> deck into the second deck, and I can take my output from that. I /
> think/ that will significantly obscure any sequences, even for fairly
> short inputs. And it's pretty reasonable to do since the algorithm is
> so quick. So this is really just applying the concept of iterations.

That's also sounds like a good idea. I would go with the full deck
though. One thing I didn't mention with my above approach (because it
already had enough going against it) was that it effectively cuts the
internal state down to 26 cards, which is roughly equivalent to 88
bits. Brute force collision attacks generally require about 2^n/2
evaluations of a hash function (where n = the size in bits of the
chaining variable, which I think is analogous to the internal state of
the shuffle hash). So brute forcing the shuffle hash function (with a
26 card deck) would take only 2^44 evaluations -- which doesn't take
very long using a computer.

If you are going to go the above route, I would recommend going full
out and using all 52 cards. This increases the internal state to
approximately 225 bits (and makes brute force collision attacks
require approximately 2^112 evaluations). That is to say:
i) feed the message into the (52 card) deck via your algorithm,
ii) let the new message = the 52 card digest from step (i), or some
relatively large subset of that digest (if you can't stomach hashing a
52-letter-long message),
iii) reset the deck and feed the new message into the deck via your
algorithm,
iv) output the final digest.

One complication that you might consider: I really like Maaartin's
idea of using the colors of the cards to increase the mixing rate, but
I kind of want to do something else with that notion. OK, let's say
that each letter of the alphabet is represented by two cards -- one
red and one black. Let's call the two cards that stand for a letter
the "instances" of the letter. e.g. 'A' is represented by the Ace of
Hearts and the Ace of Clubs -- and both cards are "instances" of the
letter A. The first instance you encounter while moving through the
deck from the first card to the last is called the "first instance",
and the second instance encountered is the "second instance".

To feed 'A' into the hash:

i) find the first instance of A,
ii) find the second instance of A,
iii) take every card between the first and second instances of A
(which may be zero) and set them aside for the moment,
iv) take the first instance of A and move it to the front of the deck
(if it's not already there),
v) take all cards from the second instance of A to the end and move
them all to the front of the deck (i.e. swap front and back 'halves'),
vi) take the set aside cards that originally were between the two
instances and move them to the front of the deck (there may have been
none, in which case do nothing).

To illustrate:
Steps i and ii)
<---10 cards--><Ace of Clubs><--5 cards--><Ace of Hearts><--Rest of
deck-->

Step iii)
<---10 cards--><Ace of Clubs><Ace of Hearts><--Rest of deck--
> .....<--5 cards-->

Step iv)
<Ace of Clubs><---10 cards--><Ace of Hearts><--Rest of deck--
> .....<--5 cards-->

Step v)
<Ace of Hearts><--Rest of deck--><Ace of Clubs><---10 cards--
> .....<--5 cards-->

Step vi)
<--5 cards--><Ace of Hearts><--Rest of deck--><Ace of Clubs><---10
cards-->

From: Maaartin on 27 Apr 2010 09:42

On Apr 27, 6:09 am, "J.D." <degolyer...(a)yahoo.com> wrote:
> On Apr 26, 10:39 pm, bmearns <mearn...(a)gmail.com> wrote:

> That's also sounds like a good idea. I would go with the full deck
> though. One thing I didn't mention with my above approach (because it
> already had enough going against it) was that it effectively cuts the
> internal state down to 26 cards, which is roughly equivalent to 88
> bits. Brute force collision attacks generally require about 2^n/2
> evaluations of a hash function (where n = the size in bits of the
> chaining variable, which I think is analogous to the internal state of
> the shuffle hash). So brute forcing the shuffle hash function (with a
> 26 card deck) would take only 2^44 evaluations -- which doesn't take
> very long using a computer.

Right, but collisions are probably not very important, especially when
it gets used for generating passwords. Nonetheless, having a large
internal state seems to be a good idea, and it cost about nothing,
since the work involved is nearly independent of the stack size. The
hardest and most error-prone part is IMHO converting the letters into
cards and vice versa, at least in the simple version I'm heading
towards.

> If you are going to go the above route, I would recommend going full
> out and using all 52 cards. This increases the internal state to
> approximately 225 bits (and makes brute force collision attacks
> require approximately 2^112 evaluations). That is to say:
> i) feed the message into the (52 card) deck via your algorithm,
> ii) let the new message = the 52 card digest from step (i), or some
> relatively large subset of that digest (if you can't stomach hashing a
> 52-letter-long message),
> iii) reset the deck and feed the new message into the deck via your
> algorithm,
> iv) output the final digest.

This means either writing the cards down or use two stacks, right?

> One complication that you might consider: I really like Maaartin's
> idea of using the colors of the cards to increase the mixing rate, but
> I kind of want to do something else with that notion.

The two operation I proposed (two_color_shuffle and
four_shapes_shuffle) are not good. I still hope we could do something
similar, but I like your proposal (below), too.

> OK, let's say
> that each letter of the alphabet is represented by two cards -- one
> red and one black. Let's call the two cards that stand for a letter
> the "instances" of the letter. e.g. 'A' is represented by the Ace of
> Hearts and the Ace of Clubs -- and both cards are "instances" of the
> letter A. The first instance you encounter while moving through the
> deck from the first card to the last is called the "first instance",
> and the second instance encountered is the "second instance".
>
> To feed 'A' into the hash:
>
> i) find the first instance of A,
> ii) find the second instance of A,
> iii) take every card between the first and second instances of A
> (which may be zero) and set them aside for the moment,
> iv) take the first instance of A and move it to the front of the deck
> (if it's not already there),
> v) take all cards from the second instance of A to the end and move
> them all to the front of the deck (i.e. swap front and back 'halves'),
> vi) take the set aside cards that originally were between the two
> instances and move them to the front of the deck (there may have been
> none, in which case do nothing).

The implementation is shorter than the description, see double_shuffle
below.

I changed the representation to letters, I use both lowercase and
uppercase, but they're treated exactly the same. I also added a new
optional operation: deal(deck, n), which is very simple: Take n cards
for the top, put them on the table, repeat until there are no more
cards left (in the last step there may be less than n cards). This is
like dealing the cards to one player, it reverses the deck and for n>1
shuffles it a bit.

The function double_shuffle(deck, letter, shouldDeal) looks for the
first and the last (i.e., second) occurrence of letter, and does what
you described. With shouldDeal=True it also uses the function deal for
the head and tail parts with n=2 and n=2, respectively.

The program is very simple:

#########
def new_deck(n=26, occurrences=2):
deck = "";
for i in range(occurrences):
for j in range(n):
deck = deck + chr(ord("aA"[i&1]) + j)
return deck;

def deal(deck, n):
result = ""
while len(deck)>n:
result = deck[:n] + result
deck = deck[n:]
result = deck + result
return result;

def double_shuffle(deck, letter, shouldDeal):
for i in range(len(deck)):
if deck[i].lower() == letter:
break
for j in range(len(deck)-1, 0, -1):
if deck[j].lower() == letter:
break
head = deck[0:i]
middle = deck[i+1:j]
tail = deck[j+1:]
if shouldDeal:
head = deal(head, 2)
tail = deal(tail, 3)
return middle + deck[j] + tail + deck[i] + head;

def demo(shouldDeal):
deck = new_deck();
print "with" if shouldDeal else "without", "deal"
print " ", deck
for x in "qwerty":
deck = double_shuffle(deck, x, shouldDeal);
print x, " => ", deck
print

demo(False)

demo(True)
#########

Here are the results, it looks like deal is doing a good job:

#########
without deal
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
q => rstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZqabcdefghijklmnop
w => xyzABCDEFGHIJKLMNOPQRSTUVWXYZqabcdefghijklmnopwrstuv
e => FGHIJKLMNOPQRSTUVWXYZqabcdefghijklmnopwrstuvExyzABCD
r => STUVWXYZqabcdefghijklmnopwrstuvExyzABCDRFGHIJKLMNOPQ
t => UVWXYZqabcdefghijklmnopwrstuvExyzABCDRFGHIJKLMNOPQTS
y => ZqabcdefghijklmnopwrstuvExyzABCDRFGHIJKLMNOPQTSYUVWX

with deal
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
q => rstuvwxyzABCDEFGHIJKLMNOPQXYZUVWRSTqopmnklijghefcdab
w => xyzABCDEFGHIJKLMNOPQXYZUVWabfcdghelijmnkqopRSTwvturs
e => FGHIJKLMNOPQXYZUVWabfcdgheurswvtRSTqopmnklijEDBCzAxy
r => swvtRyzAxDBCijEnklopmSTqruhedgfcabVWZUXYPQNOLMJKHIFG
t => RyzAxDBCijEnklopmSTFGKHILMJQNOXYPWZUabVgfchedqrutvsw
y => zAxDBCijEnklopmSTFGKHILMJQNOXYswutvdqrcheVgfUabPWZyR
#########

From: bmearns on 27 Apr 2010 10:23

On Apr 27, 12:09 am, "J.D." <degolyer...(a)yahoo.com> wrote:
> On Apr 26, 10:39 pm, bmearns <mearn...(a)gmail.com> wrote:
>
> > However, after the rather disastrous results of actually trying to
> > perform my first algorithm, I'm very weary of over complicating it. I
> > have no doubt that your suggestions would greatly improve the security
> > of the algorithm, but it doesn't seem very feasible to perform by
> > hand. Multiplying and taking mod 26 is going to slow things way down,
> > and is going to be pretty error prone I think.
>
> Yeah, that's the conclusion I came to as well. This is partially
> rectifiable by using written tables (so you aren't doing math in your
> head), but that defeats the simplicity of the "deck of cards"
> approach, and takes time to draw up the tables.

Agreed. There are already lots of good hash algorithms available; if
we can't come up with one that is extremely simple to perform by hand,
there's really no point.

>
>
>
> > But I think I did come up with a reasonable way to significantly
> > improve the security. The algorithm is exactly the same, but after
> > feeding in the entire input, I re-hash the resulting deck. For
> > instance, I cut a deck in two (Hearts and Clubs, Spades and Diamonds),
> > feed my input into one of them. Then, I feed all 26 cards from that
> > deck into the second deck, and I can take my output from that. I /
> > think/ that will significantly obscure any sequences, even for fairly
> > short inputs. And it's pretty reasonable to do since the algorithm is
> > so quick. So this is really just applying the concept of iterations.
>
> That's also sounds like a good idea. I would go with the full deck
> though. One thing I didn't mention with my above approach (because it
> already had enough going against it) was that it effectively cuts the
> internal state down to 26 cards, which is roughly equivalent to 88
> bits. Brute force collision attacks generally require about 2^n/2
> evaluations of a hash function (where n = the size in bits of the
> chaining variable, which I think is analogous to the internal state of
> the shuffle hash). So brute forcing the shuffle hash function (with a
> 26 card deck) would take only 2^44 evaluations -- which doesn't take
> very long using a computer.
>
> If you are going to go the above route, I would recommend going full
> out and using all 52 cards. This increases the internal state to
> approximately 225 bits (and makes brute force collision attacks
> require approximately 2^112 evaluations). That is to say:
> i) feed the message into the (52 card) deck via your algorithm,
> ii) let the new message = the 52 card digest from step (i), or some
> relatively large subset of that digest (if you can't stomach hashing a
> 52-letter-long message),
> iii) reset the deck and feed the new message into the deck via your
> algorithm,
> iv) output the final digest.

Well, the bad new is I just tried out the "iteration" idea, and it
didn't help a whole lot. With a 26 card deck, I fed in just a few
values and ended up with:
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 3 1
2 4 5
Then I fed that whole thing into a fresh deck and got:
9 8 10 12 11 13 15 14 16 18 17 19 21 20 22 24 23 6 0 25 3 2 1
5 4 7

Those sequences are harder to get rid of than I expected. A 52 card
deck definitely brings the bit count way up, but it also means more
inputs are needed to destroy the sequences.

Also remember that for this application, collision attacks really
aren't an issue. There are essentially two security concerns:
1) It should be difficult for an attacker to guess the output of the
hash without knowing the input.
2) Given the output, it should be difficult for an attacker to figure
out the input.

The first point basically just requires that the output not be biased,
I think. The second point is what I've been calling irreversibility. I
suppose this is sort of like a first preimage attack, except that it
is not sufficient for an attacker to just find any old message that
hashes to the correct value, they need to find the specific value that
was used in order for it to be useful.

That being the case, I think it's more important to eliminate those
tell-tale sequences than it is to increase the maximum number of
states. A 26-card deck has about 88 bits, as you said. For a password,
that's pretty damn good, so as long as the outputs are close to
uniform, I think that satisfies the first point. An attacker would
need to try, on average, 2^87 different passwords before getting the
right one, which I think is probably secure for most people's
purposes?

I still need to do the sequence testing I mentioned, but based on a
few tests, I think 14 or 15 inputs will generally suffice in a 26-
card deck to destroy the sequences, and as few as 10 inputs seems to
be working well if I perform the second iteration. I think that's a
fairly reasonable number of letters to remember as a password. It may
also be effective to just feed the password in twice...but that might
also somehow reveal information (like when the Germans encoded the
ring settings twice in Enigma messages).

>
> One complication that you might consider: I really like Maaartin's
> idea of using the colors of the cards to increase the mixing rate, but
> I kind of want to do something else with that notion. OK, let's say
> that each letter of the alphabet is represented by two cards -- one
> red and one black. Let's call the two cards that stand for a letter
> the "instances" of the letter. e.g. 'A' is represented by the Ace of
> Hearts and the Ace of Clubs -- and both cards are "instances" of the
> letter A. The first instance you encounter while moving through the
> deck from the first card to the last is called the "first instance",
> and the second instance encountered is the "second instance".
>
> To feed 'A' into the hash:
>
> i) find the first instance of A,
> ii) find the second instance of A,
> iii) take every card between the first and second instances of A
> (which may be zero) and set them aside for the moment,
> iv) take the first instance of A and move it to the front of the deck
> (if it's not already there),
> v) take all cards from the second instance of A to the end and move
> them all to the front of the deck (i.e. swap front and back 'halves'),
> vi) take the set aside cards that originally were between the two
> instances and move them to the front of the deck (there may have been
> none, in which case do nothing).
>
> To illustrate:
> Steps i and ii)
> <---10 cards--><Ace of Clubs><--5 cards--><Ace of Hearts><--Rest of
> deck-->
>
> Step iii)
> <---10 cards--><Ace of Clubs><Ace of Hearts><--Rest of deck--
>
> > .....<--5 cards-->
>
> Step iv)
> <Ace of Clubs><---10 cards--><Ace of Hearts><--Rest of deck--
>
> > .....<--5 cards-->
>
> Step v)
> <Ace of Hearts><--Rest of deck--><Ace of Clubs><---10 cards--
>
> > .....<--5 cards-->
>
> Step vi)
> <--5 cards--><Ace of Hearts><--Rest of deck--><Ace of Clubs><---10
> cards-->

I like the idea but take a look at what you did:
[ x ] A [ y ] B [ z ]
turned into
[ y ] B [ z ] A [ x ]

The [ y ] B [ z ] end up together again, in order. All you actually
did was swap the cards before and after the first instance:
[ e ] A [ f ] --> [ f ] A [ e ]

But, I do like this idea, and I think we can work with it.I really
want to get those sequence the hell out of there. Using your same 52-
card deck, and idea of first and second instances of each letter, what
if we take all the cards starting with the first and going up to, but
not including, the second instance, then alternately move them to the
top and bottom of the top part. This is essentially the same as what I
originally had, except there I only did it for two cards. Now we're
doing it for however many cards are between the two instances. As a
final step, we can cut after the second instance.
For example (A and B are the instances):
Start: [ x ] A e f g h i j k B [ y ]
Then: j h f A [ x ] e g i k B [ y ]
Finally: [ y ] j h f A [ x ] e g i k B

I think it should be easy to do without putting down any cards,
writing anything down, referring to any tables, or anything like that,
and it should be pretty quick, too, though I'll need to test it out
when it's more appropriate for me to be playing with cards. I like
this because it forcibly removes some sequences (not irreversibly, of
course, but hopefully after a few feeds they'll scatter more). It also
changes the distance between the two instances by an amount based on
the position of the first card (i.e., the length of [x]). Even feeding
the same value twice in a row just scrambles it more.

And if the instances are adjacent, it looks like this:
Start: [ x ] A B [ y ]
Then: A [ x ] B [ y ]
Finally: [ y ] A [ x ] B

I'll implement this and run some simple tests to see how it looks,
then I'll start some more extensive testing.

Cheers,
-Brian

From: J.D. on 27 Apr 2010 12:03

On Apr 27, 9:42 am, Maaartin <grajc...(a)seznam.cz> wrote:
>
> Right, but collisions are probably not very important, especially when
> it gets used for generating passwords.

That's true.

> This means either writing the cards down or use two stacks, right?

Good point. That is a complication it would be best to avoid, if at
all possible.

>
> I changed the representation to letters, I use both lowercase and
> uppercase, but they're treated exactly the same. I also added a new
> optional operation: deal(deck, n), which is very simple: Take n cards
> for the top, put them on the table, repeat until there are no more
> cards left (in the last step there may be less than n cards). This is
> like dealing the cards to one player, it reverses the deck and for n>1
> shuffles it a bit.

Let me see if I understand you: You are taking some set of cards and
then partially re-ordering them by, let's say, pairs, such that the
first pair becomes the last, the second pair becomes the second from
last, and so on. e.g. (0,1)(2,3)(4,5) --> (4,5)(2,3)(0,1)

Is that correct?

>
> The function double_shuffle(deck, letter, shouldDeal) looks for the
> first and the last (i.e., second) occurrence of letter, and does what
> you described. With shouldDeal=True it also uses the function deal for
> the head and tail parts with n=2 and n=2, respectively.
>
> The program is very simple:

Thank you for implementing it. I can say with certainty that if you
are new to python you are still much further along than I.

Unfortunately, with some testing using your code, it appears my idea
does not really increase the mix rate as much as I had hoped (even
with your deal function to boost it) -- there is still far too much
retention of the original card order after 10, 15 and even 20 message
letters are absorbed. Maybe bmearn's improvement above will do
better.

Alternatively, maybe if the 'deal' amount is defined by the latest
letter to be absorbed (assuming I understand your 'deal' function at
all). e.g. to absorb 'c', double shuffle with instances of 'c', and
then deal-sort the deck by triplets (as 'c' = 3); to absorb 'd',
double shuffle with instances of 'd', then deal-sort the deck by
quadruplets...and so on. This would be less effective for 'a' and 'z'
if 'a' = 1 and 'z' = 26, but perhaps we could assign every letter a
number in the range 2 to 14 (with 'a' and 'n' both = Aces = 14).

From: J.D. on 27 Apr 2010 12:30

On Apr 27, 10:23 am, bmearns <mearn...(a)gmail.com> wrote:
> Well, the bad new is I just tried out the "iteration" idea, and it
> didn't help a whole lot. With a 26 card deck, I fed in just a few
> values and ended up with:
> 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 3 1
> 2 4 5
> Then I fed that whole thing into a fresh deck and got:
> 9 8 10 12 11 13 15 14 16 18 17 19 21 20 22 24 23 6 0 25 3 2 1
> 5 4 7
>
> Those sequences are harder to get rid of than I expected. A 52 card
> deck definitely brings the bit count way up, but it also means more
> inputs are needed to destroy the sequences.

Yeah, the persistence of those sequences is kind of annoying. A note
of caution: the goal is avalanche, not merely to get rid of the
sequences. The persistence of the sequences is an indication of the
lack of avalanche -- but the absence of the sequences does not prove
that avalanche has occurred. So we might come up with a trick that
appears to thoroughly scramble the sequences, but if different inputs
don't cause large changes in the output (e.g. significantly change the
relative ordering of the cards) then the hash will probably still be
vulnerable.

>
> Also remember that for this application, collision attacks really
> aren't an issue. There are essentially two security concerns:
> 1) It should be difficult for an attacker to guess the output of the
> hash without knowing the input.
> 2) Given the output, it should be difficult for an attacker to figure
> out the input.
>
> The first point basically just requires that the output not be biased,
> I think. The second point is what I've been calling irreversibility. I
> suppose this is sort of like a first preimage attack, except that it
> is not sufficient for an attacker to just find any old message that
> hashes to the correct value, they need to find the specific value that
> was used in order for it to be useful.
>
> That being the case, I think it's more important to eliminate those
> tell-tale sequences than it is to increase the maximum number of
> states. A 26-card deck has about 88 bits, as you said. For a password,
> that's pretty damn good, so as long as the outputs are close to
> uniform, I think that satisfies the first point. An attacker would
> need to try, on average, 2^87 different passwords before getting the
> right one, which I think is probably secure for most people's
> purposes?

Well, if you are only using this as a key derivation function then the
attacker would focus on the thing with less entropy. So if the
password has less entropy than the internal state of the hash function
(which is highly likely given that people are lazy) then the attacker
will focus on the password and just try to guess that.

>
> I still need to do the sequence testing I mentioned, but based on a
> few tests, I think 14 or 15 inputs will generally suffice in a 26-
> card deck to destroy the sequences, and as few as 10 inputs seems to
> be working well if I perform the second iteration. I think that's a
> fairly reasonable number of letters to remember as a password. It may
> also be effective to just feed the password in twice...but that might
> also somehow reveal information (like when the Germans encoded the
> ring settings twice in Enigma messages).

What's the avalanche rate after 14 or 15 inputs?

> I like the idea but take a look at what you did:
> [ x ] A [ y ] B [ z ]
> turned into
> [ y ] B [ z ] A [ x ]
>
> The [ y ] B [ z ] end up together again, in order. All you actually
> did was swap the cards before and after the first instance:
> [ e ] A [ f ] --> [ f ] A [ e ]

Yeah, testing with Maaartin's implementation has convinced me that
this is not enough...

>
> But, I do like this idea, and I think we can work with it.I really
> want to get those sequence the hell out of there. Using your same 52-
> card deck, and idea of first and second instances of each letter, what
> if we take all the cards starting with the first and going up to, but
> not including, the second instance, then alternately move them to the
> top and bottom of the top part. This is essentially the same as what I
> originally had, except there I only did it for two cards. Now we're
> doing it for however many cards are between the two instances. As a
> final step, we can cut after the second instance.
> For example (A and B are the instances):
> Start: [ x ] A e f g h i j k B [ y ]
> Then: j h f A [ x ] e g i k B [ y ]
> Finally: [ y ] j h f A [ x ] e g i k B

Hmm, you mean take every other card of the interval (the cards between
the two instances), starting with the first instance, and move them to
the front, and then move the cards after the second instance to the
front?

Yeah, that could work. Though I am concerned that the 'every other
card" part might be a bit fiddly and error prone (I would have to do
testing with an actual deck of cards to be sure). I am also not sure
about how well it would increase avalanche. Testing is required.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: Dynamic Hill cipher
Next: PE Scrambler