Prev: Dynamic Hill cipher
Next: PE Scrambler
From: J.D. on 27 Apr 2010 14:02 On Apr 27, 12:03 pm, "J.D." <degolyer...(a)yahoo.com> wrote: > > Alternatively, maybe if the 'deal' amount is defined by the latest > letter to be absorbed (assuming I understand your 'deal' function at > all). e.g. to absorb 'c', double shuffle with instances of 'c', and > then deal-sort the deck by triplets (as 'c' = 3); to absorb 'd', > double shuffle with instances of 'd', then deal-sort the deck by > quadruplets...and so on. This would be less effective for 'a' and 'z' > if 'a' = 1 and 'z' = 26, but perhaps we could assign every letter a > number in the range 2 to 14 (with 'a' and 'n' both = Aces = 14). OK, I have implemented this idea (reusing a lot of your code): hashing "qwerty": EvwxGzATPQuWmJKDbcdHMNOBCeUFpijXYyIZqVSLrstRklnofgha hashing "qwarty": GzAxMBCJKDEFgTPQOHpijkIaUVhLnobcdXYyRZqNrSlefstuWmvw hashing "kwerty": MTPQUVkjRZOabHIxiuScdJpfghKezANCnoBqrstyXYDEFwlmLGvW hashing "qwerti": apijXYItRklnofghKDEvwxyrszATPQuWmJMNOBCeUFGZqVSLbcdH A visual inspection of these few examples suggests a good mixing occurs after even only 6 letters are hashed (no secondary hashing is necessary), and possibly a good avalanche too. However significantly more statistical testing is necessary to confirm how well it avalanches... code (only new stuff is in double_shuffle): def new_deck(n=26, occurrences=2): deck = ""; for i in range(occurrences): for j in range(n): deck = deck + chr(ord("aA"[i&1]) + j) def deal(deck, n): result = "" while len(deck)>n: result = deck[:n] + result deck = deck[n:] result = deck + result return result; def double_shuffle(deck, letter): for i in range(len(deck)): if deck[i].lower() == letter: break for j in range(len(deck)-1, 0, -1): if deck[j].lower() == letter: break head = deck[0:i] middle = deck[i+1:j] tail = deck[j+1:] tempdeck = middle + deck[j] + tail + deck[i] + head if letter == 'a' or letter == 'n' or letter == 'A' or letter == 'N': return deal(tempdeck, 14) elif letter == 'b' or letter == 'o' or letter == "B" or letter == "O": return deal(tempdeck, 2) elif letter == 'c' or letter == 'p' or letter == "C" or letter == "P": return deal(tempdeck, 3) elif letter == 'd' or letter == 'q' or letter == "D" or letter == "Q": return deal(tempdeck, 4) elif letter == 'e' or letter == 'r' or letter == "E" or letter == "R": return deal(tempdeck, 5) elif letter == 'f' or letter == 's' or letter == "F" or letter == "S": return deal(tempdeck, 6) elif letter == 'g' or letter == 't' or letter == "G" or letter == "T": return deal(tempdeck, 7) elif letter == 'h' or letter == 'u' or letter == "H" or letter == "U": return deal(tempdeck, 8) elif letter == 'i' or letter == 'v' or letter == "I" or letter == "V": return deal(tempdeck, 9) elif letter == 'j' or letter == 'w' or letter == "J" or letter == "W": return deal(tempdeck, 10) elif letter == 'k' or letter == 'x' or letter == "K" or letter == "X": return deal(tempdeck, 11) elif letter == 'l' or letter == 'y' or letter == "L" or letter == "Y": return deal(tempdeck, 12) else: return deal(tempdeck, 13)
From: J.D. on 27 Apr 2010 14:23 Sorry, forgot to add... def csh(string): deck = new_deck() for i in range(len(string)): deck = double_shuffle(deck, string[0]) string = string[1:] return deck #"csh' = candidate shuffle hash
From: Maaartin on 27 Apr 2010 15:49 On Apr 27, 6:03 pm, "J.D." <degolyer...(a)yahoo.com> wrote: > On Apr 27, 9:42 am, Maaartin <grajc...(a)seznam.cz> wrote: > > I changed the representation to letters, I use both lowercase and > > uppercase, but they're treated exactly the same. I also added a new > > optional operation: deal(deck, n), which is very simple: Take n cards > > for the top, put them on the table, repeat until there are no more > > cards left (in the last step there may be less than n cards). This is > > like dealing the cards to one player, it reverses the deck and for n>1 > > shuffles it a bit. > > Let me see if I understand you: You are taking some set of cards and > then partially re-ordering them by, let's say, pairs, such that the > first pair becomes the last, the second pair becomes the second from > last, and so on. e.g. (0,1)(2,3)(4,5) --> (4,5)(2,3)(0,1) > > Is that correct? Yes. This time it's way easier to try than to explain: print deal("012345", 2) => 452301 print deal("012345678", 2) => 867452301 Note, that it is exactly like dealing cards to single person, two at a time. Obviously, dealing to more persons could perform better, so I rewrote: def deal(deck, list): result = [] for n in list: result += [""]; while len(deck)>0: for i in range(len(list)): n = min(list[i], len(deck)) #print "@", i, n, result, deck result[i] = deck[:n] + result[i] deck = deck[n:] #print "@@", result result = "".join(result) return result; You can use the commented out lines for viewing how it works. The second parameter is a list where the n-th entry determines how much cards the n-th person is given to. print deal("012345678", (2,)) => 867452301 # obviously a single element list works just as before print deal("012345678", (2,1)) # here the first person is given 2 cards, and the second is given 1 card => 673401852 # the dealt decks "67 34 01" for the first person and "8 5" for the second get concatenated > > > > > The function double_shuffle(deck, letter, shouldDeal) looks for the > > first and the last (i.e., second) occurrence of letter, and does what > > you described. With shouldDeal=True it also uses the function deal for > > the head and tail parts with n=2 and n=2, respectively. > > > The program is very simple: > > Thank you for implementing it. I can say with certainty that if you > are new to python you are still much further along than I. I'm really new to python, except for me having some time ago to change (with a lot of swearing) two lines in a short python program. > Unfortunately, with some testing using your code, it appears my idea > does not really increase the mix rate as much as I had hoped (even > with your deal function to boost it) I disagree. I repeat here only the last line after mixing "qwerty" in: y => ZqabcdefghijklmnopwrstuvExyzABCDRFGHIJKLMNOPQTSYUVWX ...!!!!!!!!!!!!!!!..!!!!..!!!!!!..!!!!!!!!!!!....!!! 39 The exclamation marks denote pairs in the original order, the number to the right is their count. With dealing it looks like y => zAxDBCijEnklopmSTFGKHILMJQNOXYswutvdqrcheVgfUabPWZyR .!...!.!...!.!..!.!..!.!...!.!.......!........!..... 13 So after 6 inputs there hardly any pairs untouched. Sure, that implies no avalanche. I used the following function def eval_neighbours(deck): result = "" cnt = 0; for i in range(0, len(deck)): left = deck[i-1] right = deck[i] left = left.lower() right = right.lower() diff = ord(right) - ord(left) b = diff==1 or diff==-25 result += "!" if b else "." if b: cnt += 1; return result + " " + str(cnt) > -- there is still far too much > retention of the original card order after 10, 15 and even 20 message > letters are absorbed. Maybe bmearn's improvement above will do > better. Currently I can imagine too many possibilities... > Alternatively, maybe if the 'deal' amount is defined by the latest > letter to be absorbed (assuming I understand your 'deal' function at > all). e.g. to absorb 'c', double shuffle with instances of 'c', and > then deal-sort the deck by triplets (as 'c' = 3); to absorb 'd', > double shuffle with instances of 'd', then deal-sort the deck by > quadruplets...and so on. This would be less effective for 'a' and 'z' > if 'a' = 1 and 'z' = 26, but perhaps we could assign every letter a > number in the range 2 to 14 (with 'a' and 'n' both = Aces = 14). Yes, this could help. But I'd prefer to use let's say the first card below the first card corresponding to the letter, so there's another indirection. ********* On Apr 27, 6:30 pm, "J.D." <degolyer...(a)yahoo.com> wrote: > On Apr 27, 10:23 am, bmearns <mearn...(a)gmail.com> wrote: > Yeah, the persistence of those sequences is kind of annoying. A note > of caution: the goal is avalanche, not merely to get rid of the > sequences. The persistence of the sequences is an indication of the > lack of avalanche -- but the absence of the sequences does not prove > that avalanche has occurred. So we might come up with a trick that > appears to thoroughly scramble the sequences, but if different inputs > don't cause large changes in the output (e.g. significantly change the > relative ordering of the cards) then the hash will probably still be > vulnerable. I fed the three strings "qwerty", "pwerty", "owerty" into a fresh deck and looked at the result: without deal qwerty => ZqabcdefghijklmnopwrstuvExyzABCDRFGHIJKLMNOPQTSYUVWX pwerty => ZpabcdefghijklmnowqrstuvExyzABCDRFGHIJKLMNOPQTSYUVWX owerty => ZoabcdefghijklmnwpqrstuvExyzABCDRFGHIJKLMNOPQTSYUVWX with deal qwerty => zAxDBCijEnklopmSTFGKHILMJQNOXYswutvdqrcheVgfUabPWZyR pwerty => CzAEDBfcjghkliorqetGHIFMJKNOLdPZabWTsYuvyxnRUmSpQXwV owerty => BCzjEDkfiognruqtGHIFMJKNOLWYhTaspVedZbcyAxlRUPSmQXwv I looks like deal makes a good job, but not good enough. > > I still need to do the sequence testing I mentioned, but based on a > > few tests, I think 14 or 15 inputs will generally suffice in a 26- > > card deck to destroy the sequences, and as few as 10 inputs seems to > > be working well if I perform the second iteration. I think that's a > > fairly reasonable number of letters to remember as a password. It may > > also be effective to just feed the password in twice...but that might > > also somehow reveal information (like when the Germans encoded the > > ring settings twice in Enigma messages). Feeding in in twice is good, but doubles the work including the part which I hate most: mapping letters to cards. > Yeah, testing with Maaartin's implementation has convinced me that > this is not enough... Is there any particular reason for returning middle + deck[j] + tail + deck[i] + head and not for example tail + deck[j] + middle + deck[i] + head or head + tail + deck[j] + middle + deck[i] ? Actually, I have no idea which one to prefer. According to my last experiments, it looks like dealing the whole stack after each input is a good idea. I mean something like deck = new_deck(); for x in "qwerty": deck = double_shuffle(deck, x, False); print x, "=>", deck deck = deal(deck, (1, 2, 4)) print " ", " ", deck The detail don't matter much now, my point is: No matter how small change you make in double_shuffle, it gets amplified by deal. > Yeah, that could work. Though I am concerned that the 'every other > card" part might be a bit fiddly and error prone (I would have to do > testing with an actual deck of cards to be sure). I am also not sure > about how well it would increase avalanche. Testing is required. If I understand him correctly, this can be done like a sort of dealing, too: Deal the cards to two persons (one at a time), where only the first person gets her cards faced up. Than put the two stacks together again. Doing this in hand is surely more demanding but possible. ********* On Apr 27, 8:02 pm, "J.D." <degolyer...(a)yahoo.com> wrote: > On Apr 27, 12:03 pm, "J.D." <degolyer...(a)yahoo.com> wrote: > OK, I have implemented this idea (reusing a lot of your code): > > hashing "qwerty": > EvwxGzATPQuWmJKDbcdHMNOBCeUFpijXYyIZqVSLrstRklnofgha > hashing "qwarty": > GzAxMBCJKDEFgTPQOHpijkIaUVhLnobcdXYyRZqNrSlefstuWmvw > hashing "kwerty": > MTPQUVkjRZOabHIxiuScdJpfghKezANCnoBqrstyXYDEFwlmLGvW > hashing "qwerti": > apijXYItRklnofghKDEvwxyrszATPQuWmJMNOBCeUFGZqVSLbcdH > > A visual inspection of these few examples suggests a good mixing > occurs after even only 6 letters are hashed (no secondary hashing is > necessary), and possibly a good avalanche too. However significantly > more statistical testing is necessary to confirm how well it > avalanches... This looks really nice! You could make your program much shorter using something like n = ord(letter.lower()) - ord("a") n %= 13; if n<=1: n += 13;
From: bmearns on 27 Apr 2010 15:52 On Apr 27, 2:02 pm, "J.D." <degolyer...(a)yahoo.com> wrote: > On Apr 27, 12:03 pm, "J.D." <degolyer...(a)yahoo.com> wrote: > > > > > Alternatively, maybe if the 'deal' amount is defined by the latest > > letter to be absorbed (assuming I understand your 'deal' function at > > all). e.g. to absorb 'c', double shuffle with instances of 'c', and > > then deal-sort the deck by triplets (as 'c' = 3); to absorb 'd', > > double shuffle with instances of 'd', then deal-sort the deck by > > quadruplets...and so on. This would be less effective for 'a' and 'z' > > if 'a' = 1 and 'z' = 26, but perhaps we could assign every letter a > > number in the range 2 to 14 (with 'a' and 'n' both = Aces = 14). > > OK, I have implemented this idea (reusing a lot of your code): > > hashing "qwerty": > EvwxGzATPQuWmJKDbcdHMNOBCeUFpijXYyIZqVSLrstRklnofgha > hashing "qwarty": > GzAxMBCJKDEFgTPQOHpijkIaUVhLnobcdXYyRZqNrSlefstuWmvw > hashing "kwerty": > MTPQUVkjRZOabHIxiuScdJpfghKezANCnoBqrstyXYDEFwlmLGvW > hashing "qwerti": > apijXYItRklnofghKDEvwxyrszATPQuWmJMNOBCeUFGZqVSLbcdH > > A visual inspection of these few examples suggests a good mixing > occurs after even only 6 letters are hashed (no secondary hashing is > necessary), and possibly a good avalanche too. However significantly > more statistical testing is necessary to confirm how well it > avalanches... > [snip] Your initial results look good, but I'm really not crazy about this "deal" mechanism. Besides the increased time complexity of doing this (especially having to do it differently depending on the letter), I'd really like to avoid having to put any cards down as both a matter of convenience and a matter of security. If it came down to doing that, I'd think a more thorough transposition would be better. E.g., instead of keeping blocks of N cards together, deal out N cards in a row, and then deal out the next N on top of those, etc. So it's just an N column columnar transposition, where cards K, K+N, K+2N, ... end up adjacent. Also, you're main shuffling algorithm (prior to the dealing) still has that kink I mentioned earlier, where despite the pair of anchor points, you're actually just splitting the deck around a single card (the first instance). I tested out the version I previously suggested, where the middle interval is alternately moved to the top of the deck. I still haven't gotten a chance to actually test it by hand to see how easy it is, but I think that the fact the the opposite cards can be moved to a different location as well (the bottom of the top stack), instead of just having to skip over them somehow) should make it not too error prone. Anyway, I ran some tests on it, and it looks fairly good. It actually cascades just slightly worse than the prior version. For a 52-card deck, looking at the entire deck as the output, the old version (I'll call it shuffle_1) averages 65.4% of bits change per input, and the new version (call it shuffle_2) averages 67%. As far as I understand, we want to it be as close to 50% as possible: 65% is essentially the same as 35%, with a bit-wise negation built in. So neither of these look awesome in that respect. However, if you draw a smaller "hand" from the deck to use as your output, you can get outputs of very reasonable size for use as passwords that cascade very nicely. For instance, drawing a hand of 12 cards (from a 52-card deck) gives 66 bits in the result and has average bit changes of 51.15% (shuffle_1) and 52.46% (shuffle_2) For reference, md5 cascades 49.997%, sha-1 cascades 44.998%, and sha-256 cascades 50.011%. The std-dev is relevant as well: md5 has 4.44, sha-1 has 3.95, and sha-256 has 3.12, while shuffle_1 has 6.48 and shuffle_2 has 6.25. So my amateur evaluation is that the shuffle_1 and shuffle_2 algorithms both cascade pretty well, with shuffle_1 fairing slightly better. Note: I've got a bunch of statistics and numbers given below. If you want to skip it, I'll summarize it at the end. I also looked at a number of statistics related to the sequences. First I simply counted the number of 2-card sequences in each permutation (i.e., the number of cards for which it's next neighbor is exactly +1 or -1 from it). For random permutations, this averaged 2.003, but with a pretty significant std-dev of 1.38. Shuffle_1 compared very poorly in this metric, as we already knew. The stat consistently decreases as the number of inputs increases (as one would generally expect), but even with as many as 27 input values, there were still an average of 13 such sequences per 52-card sequence. Shuffle_2 did much better in this metric: the average number of sequences held pretty steadily between 1.7 and 1.9 as the length of the input increased from 5 to 27. It also had a consistent std-dev very close to that of the random sequences, hovering between about 1.3 and 1.5 for the most part. I think this is a pretty reasonable metric for showing that an algorithm is bad, as in the case of shuffle_1. shuffle_2 did much better, but this metric alone doesn't tell us much about the quality. Because of the alternating distribution, we would expect to see very few sequences, but the presence of "every-other-value" sequences is still undesirable, and this metric doesn't account for that. So I did a more general test in which I looked at the "direction" of each pair of cards. If a card's next neighbor is greater than it, then the direction of that pair is positive. If the card's next neighbor is less than it, the direction is negative. This will account for the every-other-card sequences described above, and should generally be a pretty nice statistic to compare against random permutations. I looked at a few different metrics with regards to this stat. For instance, I counted the maximum number of consecutive pairs in the deck that have the same direction. I also counted the number of times that the direction changes while walking through the deck, and the average number of consecutive pairs that are all the same direction (obviously, these metrics are all pretty tightly coupled to one another). For random permutations of 52-card decks, the average number of times the direction changed was about 33. The maximum number of consecutive pairs with the same direction averaged to about 3.6, and the standard deviation of the number of consecutive pairs with the same direction was about 0.7. For shuffle_1, the average number of turns only made it up to about 27 when the input length was an somewhat unreasonable 27 values. This indicates a tendency towards longer runs of same-direction pairs in the output compared to the random data. The maximum number of consecutive pairs in the same direction only got down to 6.6 with 27- value inputs. This reinforces the tendency suggested by the first metric. Finally, the standard deviation of the number of consecutive pairs was about twice that of the random samples, at 1.4, which basically just means that there were some long groups and some short groups, compared to a more even distribution in the random samples. Shuffle_2 once again performed quite a bit better on these metrics. The average number of turns hit the 33 mark (the value for random samples) with just 10 input values, and actually climbed from there up to about 42 with 27 inputs. The maximum number of consecutive pairs was a little high with 10 inputs, averaging to 4.57, but with 13 or more input values, it hovered just above the benchmark at about 3.7 to 3.8. These two metrics seem to indicate that the shuffle_2 algorithm acts pretty randomly with regard to this particular statistic. The std- deviation of the number of consecutive pairs was also very similar to the random sample, hitting 0.78 with 12 inputs, then quickly zeroing in on 0.74 and 0.73 with a few additional inputs. So here's the summary I promised: shuffle_1 cascades slightly better than shuffle_2, but those sequences are a very Bad Thing and are hard to get rid of. shuffle_2 seems to do pretty well at eliminating those sequences and still cascades quite well. So I'd like to make shuffle_2 the launch pad for going forward as it seems to be performing well in all the areas I want it to. I still need to get a deck of cards and actually try it out, but I'm optimistic. Basically what I'm looking at right now is a full deck of 52 cards, 13 or 14 values in the master password (which is reasonable for a strong password anyway, giving 74 or 79 bits), and drawing a hand of probably 10 to 12 cards for the final output, giving passwords of 55 to 66 bits, which is probably far beyond sufficient for logging into gmail, etc. I think additional modifications that might be useful would be some sort of final mixing like Maaartin previously suggested.Still bearing in mind, of course, the requirements that it be relatively fast, very easy, not error prone, and not require laying any cards down. Since it's just a final stage, it can be a bit lengthier than the rest, but not too long. I think one of you suggested something like stepping through the deck by counting a number of cards equal to the value on the facing card, or something similar. I think that might be reasonable to perform, and add significant obscurity to the output. However, the card that tells you how many to count should /not/ go into the output. If it did, an attacker could probably reconstruct the location of every output card in the final deck order. So I'm thinking something like this, instead: 1) look at the top card. Count down the corresponding number of cards (1-13), moving each counted card to the bottom of the deck (so we don't have to think about wrapping when we get to the end). 2) After counting/moving those cards, the new top card is the output value. Move that card to the bottom as well. 3) Go back to 1 until you have enough output values. I guess adding this step means I have to rerun all my tests. Any thoughts on it before I do? Warm regards, -Brian
From: Maaartin on 27 Apr 2010 16:24
Output procedure revisited My previous output procedure liked to cycle, but the remedy is trivial: Throw away the guilty card (ideally burn it use never again). single_output(output, deck): It takes two arguments and returns a list of two string. The first one is the previous output concatenated with the current output (a single letter). The second is the modified deck. The current output is the first card but one. The very first card determines the number of cards to be put to the bottom and get discarded. Note, that it's not the card becoming a part of the output, which is thrown away, so any number of repetitions in the output is possible. Using variable dealing instead of simply moving n cards would make it most probably better. multi_output(output, deck, n): Calls single_output n-times and prints the state. demo(): Starts once with the original deck, and once with a slightly modified one. The outputs are very different. ######### def new_deck(n=26, occurrences=2): deck = ""; for i in range(occurrences): for j in range(n): deck = deck + chr(ord("aA"[i&1]) + j) return deck; def card_to_number(card): n = ord(card.lower()) - ord("a") n %= 13 return n def single_output(output, deck): output += deck[1] n = card_to_number(deck[0]) + 1 deck = deck[1:] deck = deck[n:] + deck[:n] return [output, deck] def multi_output(output, deck, n): for i in range(n): print output, deck [output, deck] = single_output(output, deck) def demo(): deck = new_deck() multi_output("", deck, 10) deck = deck[1:] + deck[0]; multi_output("", deck, 10) demo() ######### |