Prev: Dynamic Hill cipher
Next: PE Scrambler
From: Maaartin on 27 Apr 2010 16:54 On Apr 27, 9:52 pm, bmearns <mearn...(a)gmail.com> wrote: > Your initial results look good, but I'm really not crazy about this > "deal" mechanism. Besides the increased time complexity of doing this IMHO, dealing 52 cards takes ages when done once at a time (as it should be in bridge), but is quite fast when done two or three cards at a time. I didn't try more. > (especially having to do it differently depending on the letter), I'd > really like to avoid having to put any cards down as both a matter of > convenience and a matter of security. You can do it all with cards facing down, so you don't loose any security, do you? > If it came down to doing that, > I'd think a more thorough transposition would be better. E.g., instead > of keeping blocks of N cards together, deal out N cards in a row, and > then deal out the next N on top of those, etc. So it's just an N > column columnar transposition, where cards K, K+N, K+2N, ... end up > adjacent. Yes, but IMHO it's quite important to let the transformation be data- dependent. Some irregularity like in deal(deck, (2, 3)) is welcome, too. > Also, you're main shuffling algorithm (prior to the dealing) still has > that kink I mentioned earlier, where despite the pair of anchor > points, you're actually just splitting the deck around a single card > (the first instance). > > I tested out the version I previously suggested, where the middle > interval is alternately moved to the top of the deck. I still haven't > gotten a chance to actually test it by hand to see how easy it is, but > I think that the fact the the opposite cards can be moved to a > different location as well (the bottom of the top stack), instead of > just having to skip over them somehow) should make it not too error > prone. Doing it on the table is surely not error-prone, you create 5 stacks, i.e., [head, deck[i], middle, deck[j], tail] and permute them. I didn't try it in hand. > Anyway, I ran some tests on it, and it looks fairly good. It actually > cascades just slightly worse than the prior version. For a 52-card > deck, looking at the entire deck as the output, the old version (I'll > call it shuffle_1) averages 65.4% of bits change per input, and the > new version (call it shuffle_2) averages 67%. As far as I understand, > we want to it be as close to 50% as possible: 65% is essentially the > same as 35%, with a bit-wise negation built in. So neither of these > look awesome in that respect. I'm afraid, it all comes from the factoradic system. IMHO, the avalanche (computed correctly, whatever this means) must be much worse, I'd expect something about 5-10%, since perfect avalanche means destroing half of the previous relationship, which is not the case here. > However, if you draw a smaller "hand" from the deck to use as your > output, you can get outputs of very reasonable size for use as > passwords that cascade very nicely. For instance, drawing a hand of 12 > cards (from a 52-card deck) gives 66 bits in the result and has > average bit changes of 51.15% (shuffle_1) and 52.46% (shuffle_2) For > reference, md5 cascades 49.997%, sha-1 cascades 44.998%, and sha-256 > cascades 50.011%. The std-dev is relevant as well: md5 has 4.44, sha-1 > has 3.95, and sha-256 has 3.12, while shuffle_1 has 6.48 and shuffle_2 > has 6.25. I consider the avalanche difference between the two as negligible, especially in face of the huge deviations. IMHO, it's just a sort of measuring error. > I also looked at a number of statistics related to the sequences. > First I simply counted the number of 2-card sequences in each > permutation (i.e., the number of cards for which it's next neighbor is > exactly +1 or -1 from it). For random permutations, this averaged > 2.003, but with a pretty significant std-dev of 1.38. Shuffle_1 > compared very poorly in this metric, as we already knew. The stat > consistently decreases as the number of inputs increases (as one would > generally expect), but even with as many as 27 input values, there > were still an average of 13 such sequences per 52-card sequence. > Shuffle_2 did much better in this metric: the average number of > sequences held pretty steadily between 1.7 and 1.9 as the length of > the input increased from 5 to 27. It also had a consistent std-dev > very close to that of the random sequences, hovering between about 1.3 > and 1.5 for the most part. > > I think this is a pretty reasonable metric for showing that an > algorithm is bad, as in the case of shuffle_1. shuffle_2 did much > better, but this metric alone doesn't tell us much about the quality. I fully agree, you can use it only for abandoning an algorithm. > So here's the summary I promised: shuffle_1 cascades slightly better > than shuffle_2, but those sequences are a very Bad Thing and are hard > to get rid of. shuffle_2 seems to do pretty well at eliminating those > sequences and still cascades quite well. > > So I'd like to make shuffle_2 the launch pad for going forward as it > seems to be performing well in all the areas I want it to. I still > need to get a deck of cards and actually try it out, but I'm > optimistic. Basically what I'm looking at right now is a full deck of > 52 cards, 13 or 14 values in the master password (which is reasonable > for a strong password anyway, giving 74 or 79 bits), and drawing a > hand of probably 10 to 12 cards for the final output, giving passwords > of 55 to 66 bits, which is probably far beyond sufficient for logging > into gmail, etc. I agree again. > I think additional modifications that might be useful would be some > sort of final mixing like Maaartin previously suggested.Still bearing > in mind, of course, the requirements that it be relatively fast, very > easy, not error prone, and not require laying any cards down. Since > it's just a final stage, it can be a bit lengthier than the rest, but > not too long. I think one of you suggested something like stepping > through the deck by counting a number of cards equal to the value on > the facing card, or something similar. I think that might be > reasonable to perform, and add significant obscurity to the output. > However, the card that tells you how many to count should /not/ go > into the output. Right, and I was thinking about it. > If it did, an attacker could probably reconstruct the > location of every output card in the final deck order. So I'm thinking > something like this, instead: > 1) look at the top card. Count down the corresponding number of cards > (1-13), moving each counted card to the bottom of the deck (so we > don't have to think about wrapping when we get to the end). > 2) After counting/moving those cards, the new top card is the output > value. Move that card to the bottom as well. > 3) Go back to 1 until you have enough output values. That's similar to my approach and may also lead to cycles. I'd suggest to throw away the top card from step 1. > I guess adding this step means I have to rerun all my tests. Any > thoughts on it before I do? There'll be probably much more runs, so go ahead. I like the tests you invented and trust them much more than the factoradic system based avalanche. I wonder how to convert the permutation to a form suitable for classical randomness tests in a way not masking the regular patterns.
From: J.D. on 27 Apr 2010 17:20 On Apr 27, 3:49 pm, Maaartin <grajc...(a)seznam.cz> wrote: > > Obviously, dealing to more persons could perform better, so I rewrote: > > def deal(deck, list): > result = [] > for n in list: result += [""]; > while len(deck)>0: > for i in range(len(list)): > n = min(list[i], len(deck)) > #print "@", i, n, result, deck > result[i] = deck[:n] + result[i] > deck = deck[n:] > #print "@@", result > result = "".join(result) > return result; > > You can use the commented out lines for viewing how it works. The > second parameter is a list where the n-th entry determines how much > cards the n-th person is given to. > > print deal("012345678", (2,)) > => > 867452301 # obviously a single element list works just as before > > print deal("012345678", (2,1)) # here the first person is given 2 > cards, and the second is given 1 card > => > 673401852 # the dealt decks "67 34 01" for the first person and "8 5" > for the second get concatenated > That's an interesting twist. I also like bmearns idea of dealing cards one by one into N piles (where N is a function of the latest letter absorbed). Let's call all of these proposals (including the original) 'deal variants'. I know he doesn't seem to care for the deal variants (because of time/complexity and because you'd probably have to put cards down), but I think it is the way to go (i.e. it seems worth whatever problems are entailed by laying cards down). Obviously that is entirely his decision in the end, but the deal step does seem to significantly amplify the mixing of whatever primary shuffle method you use, and I think it makes reversing the hash more difficult. Which is to say, without a deal variant (where the deal amount is a function of the latest input), for any given input x, where the prior value of the hash is known it is trivial to recover x from shuffle(prior value, x), whatever shuffle algorithm you use. It is more difficult to recover multiple inputs, or it seems to be more difficult -- but I don't know if that is merely an artifact of my own limitations rather than an actual difficulty. But if every single input affects the result in two different ways (i.e. via the shuffle algorithm and via the input-dependent deal variant) then it stands to reason that each input is probably harder (harder than with shuffle alone) to recover from the result even when the prior result is known. Sort of like how with y = (c+x)*x mod n, it is harder to recover x from y even when c is known -- 'harder' than if the function was y = (c+x) mod n. > > > > > The function double_shuffle(deck, letter, shouldDeal) looks for the > > > first and the last (i.e., second) occurrence of letter, and does what > > > you described. With shouldDeal=True it also uses the function deal for > > > the head and tail parts with n=2 and n=2, respectively. > > > > The program is very simple: > > > Thank you for implementing it. I can say with certainty that if you > > are new to python you are still much further along than I. > > I'm really new to python, except for me having some time ago to change > (with a lot of swearing) two lines in a short python program. Yes, I have certainly had much practice with the 'swearing' part of programming of late. > > Yes, this could help. But I'd prefer to use let's say the first card > below the first card corresponding to the letter, so there's another > indirection. > Hmm. Yeah, seems like that could work well too. For every step you'd have to convert from cards to amount-to-deal, but maybe that's not any harder than converting from letters to amount-to-deal. > Is there any particular reason for returning > middle + deck[j] + tail + deck[i] + head > and not for example > tail + deck[j] + middle + deck[i] + head > or > head + tail + deck[j] + middle + deck[i] > ? Actually, I have no idea which one to prefer. Bmearns is correct in his criticism of my shuffle algorithm -- it does just pivot around the first instance. Perhaps this is better: head + first + interval + second + tail ==shuffle==> interval + first + tail + second + head You don't want the instances to appear in an easy to determine place (e.g. at the very top or bottom of the stack), because then it is trivial to recover the input by just looking at the first or last card. A deal step would help in that regard, but not as much as you'd think. e.g. assume an instance of the input always appears at the top or bottom of the deck after the shuffle. If the hash uses an input- dependent deal like in my program above, then the first card will be moved to within a narrow range. Look at the cards in that range, and --for each card at position i within the range-- if the amount moved from the start to i is equal to the distance the first card would be moved had that card at i been an instance of the input, then the card at i is probably an instance of the input. Your idea of using the card below the first instance to determine deal amounts would (I think) cure that weakness... > > If I understand him correctly, this can be done like a sort of > dealing, too: Deal the cards to two persons (one at a time), where > only the first person gets her cards faced up. Than put the two stacks > together again. Doing this in hand is surely more demanding but > possible. I've tested it out with a real deck of cards. His idea of splitting the interval by every other card during the shuffle step is significantly slower than simply shifting the entire interval to some point. However it may well be worth the increase in time. It is a bit fiddly to do without putting cards down, but with more practice I suspect it shouldn't be too difficult or error-prone. > > You could make your program much shorter using something like > > n = ord(letter.lower()) - ord("a") > n %= 13; > if n<=1: n += 13; No doubt, though I don't know what those operators are...
From: J.D. on 27 Apr 2010 17:29 On Apr 27, 5:20 pm, "J.D." <degolyer...(a)yahoo.com> wrote: > > You could make your program much shorter using something like > > > n = ord(letter.lower()) - ord("a") > > n %= 13; > > if n<=1: n += 13; > > No doubt, though I don't know what those operators are... Now I do. As I said -- I just started on python a couple of days ago, so there is huge amounts of even the basic stuff I still have to learn.
From: Maaartin on 27 Apr 2010 17:56 On Apr 27, 11:20 pm, "J.D." <degolyer...(a)yahoo.com> wrote: > On Apr 27, 3:49 pm, Maaartin <grajc...(a)seznam.cz> wrote: > That's an interesting twist. I also like bmearns idea of dealing > cards one by one into N piles (where N is a function of the latest > letter absorbed). I find it a bit too complicated, and it requires a large table. I personally would like to stick to something needing only a place for 3 decks or about, so you could do it even on you legs. > Let's call all of these proposals (including the > original) 'deal variants'. I know he doesn't seem to care for the > deal variants (because of time/complexity and because you'd probably > have to put cards down), but I think it is the way to go (i.e. it > seems worth whatever problems are entailed by laying cards down). So do I, we need some strong operation and currently the variants od dealing are the only ones we know about. > Which is to say, without a deal variant (where the deal amount is a > function of the latest input), for any given input x, where the prior > value of the hash is known it is trivial to recover x from > shuffle(prior value, x), whatever shuffle algorithm you use. It is > more difficult to recover multiple inputs, or it seems to be more > difficult -- but I don't know if that is merely an artifact of my own > limitations rather than an actual difficulty. > > But if every single input affects the result in two different ways > (i.e. via the shuffle algorithm and via the input-dependent deal > variant) then it stands to reason that each input is probably harder > (harder than with shuffle alone) to recover from the result even when > the prior result is known. Sort of like how with y = (c+x)*x mod n, > it is harder to recover x from y even when c is known -- 'harder' than > if the function was y = (c+x) mod n. I agree, a data-dependent dealing is a strong non-linear operation here, similar to variable-distance rotation. > > Yes, this could help. But I'd prefer to use let's say the first card > > below the first card corresponding to the letter, so there's another > > indirection. > > Hmm. Yeah, seems like that could work well too. For every step you'd > have to convert from cards to amount-to-deal, but maybe that's not any > harder than converting from letters to amount-to-deal. It is *much* easier, you just look at the card, five means 5, ten means 10, and for JQKA you just remember 11, 12, 13, 14. > Bmearns is correct in his criticism of my shuffle algorithm -- it does > just pivot around the first instance. Perhaps this is better: > > head + first + interval + second + tail > ==shuffle==> > interval + first + tail + second + head I hated the dilemma so much, so I avoided it completely (see below). > You don't want the instances to appear in an easy to determine place > (e.g. at the very top or bottom of the stack), because then it is > trivial to recover the input by just looking at the first or last > card. A deal step would help in that regard, but not as much as you'd > think. e.g. assume an instance of the input always appears at the top > or bottom of the deck after the shuffle. If the hash uses an input- > dependent deal like in my program above, then the first card will be > moved to within a narrow range. Look at the cards in that range, and > --for each card at position i within the range-- if the amount moved > from the start to i is equal to the distance the first card would be > moved had that card at i been an instance of the input, then the card > at i is probably an instance of the input. > > Your idea of using the card below the first instance to determine deal > amounts would (I think) cure that weakness... I'm not sure if I understand. > I've tested it out with a real deck of cards. His idea of splitting > the interval by every other card during the shuffle step is > significantly slower than simply shifting the entire interval to some > point. However it may well be worth the increase in time. It is a > bit fiddly to do without putting cards down, but with more practice I > suspect it shouldn't be too difficult or error-prone. I didn't try, but find it quite complicated. After having proposed a lot of complications, now I try to simplify it all. > > You could make your program much shorter using something like > > > n = ord(letter.lower()) - ord("a") > > n %= 13; > > if n<=1: n += 13; > > No doubt, though I don't know what those operators are... lower - converts to lowercase ord - converts to ascii % - remainder something else? SIMPLIFIED APPROACH - mg_shuffle1 Combine three simple operations: bury, find_shuffle, and deal. bury: Similar to what I proposed for the output procedure, convert the top card to number n in 0..12, put it aside, move n+1 cards from the top to the bottom, add the put-aside card to the bottom. This is a simple data-dependent permutation of the deck. def bury(deck): n = card_to_number(deck[0]) + 1 return deck[n:] + deck[1:n] + deck[0] find_shuffle: Find the card corresponding with the input letter, put all cards above it to the bottom. This is a return to the former simple version, but it gets used twice for each input. def find_shuffle(deck, letter): for n in range(len(deck)): if deck[n].lower() == letter: break return deck[n:] + deck[:n] deal: As before. The combined function just invokes find_shuffle, bury, find_shuffle again and deal. ######### the whole program def new_deck(n=26, occurrences=2): deck = ""; for i in range(occurrences): for j in range(n): deck = deck + chr(ord("aA"[i&1]) + j) return deck; def card_to_number(card): n = ord(card.lower()) - ord("a") n %= 13 return n def bury(deck): n = card_to_number(deck[0]) + 1 return deck[n:] + deck[1:n] + deck[0] def find_shuffle(deck, letter): for n in range(len(deck)): if deck[n].lower() == letter: break return deck[n:] + deck[:n] def deal(deck, list): result = [] for n in list: result += [""]; while len(deck)>0: for i in range(len(list)): n = min(list[i], len(deck)) result[i] = deck[:n] + result[i] deck = deck[n:] result = "".join(result) return result; def mg_shuffle1(deck, letter, shouldPrint=False): deck = find_shuffle(deck, letter) if shouldPrint: print letter, "=>", deck deck = bury(deck) if shouldPrint: print " ", deck deck = find_shuffle(deck, letter) if shouldPrint: print " ", deck deck = deal(deck, (1, 2)) if shouldPrint: print " ", deck return deck def demo(): print "\n", "processing 'qwerty' in detail" deck = new_deck() for letter in "qwerty": deck = mg_shuffle1(deck, letter, True) for s in ["pw", "qw", "rw"]: print "\n", "state after processing", repr(s); deck = new_deck() for letter in s: deck = mg_shuffle1(deck, letter) print deck demo() ######### ######### the output processing 'qwerty' in detail q => qrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop uvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnoprstq QRSTUVWXYZabcdefghijklmnoprstquvwxyzABCDEFGHIJKLMNOP PMJGDAxusolifcZWTQNOKLHIEFBCyzvwtqprmnjkghdeabXYUVRS w => WTQNOKLHIEFBCyzvwtqprmnjkghdeabXYUVRSPMJGDAxusolifcZ FBCyzvwtqprmnjkghdeabXYUVRSPMJGDAxusolifcZTQNOKLHIEW wtqprmnjkghdeabXYUVRSPMJGDAxusolifcZTQNOKLHIEWFBCyzv vCWHOTfoxGPVXegnpwyzFBIEKLQNcZliusDAMJRSYUabhdjkrmtq e => egnpwyzFBIEKLQNcZliusDAMJRSYUabhdjkrmtqvCWHOTfoxGPVX yzFBIEKLQNcZliusDAMJRSYUabhdjkrmtqvCWHOTfoxGPVXgnpwe EKLQNcZliusDAMJRSYUabhdjkrmtqvCWHOTfoxGPVXgnpweyzFBI IzwgPoOCtkhURAuZQEFBeynpVXxGTfWHqvrmdjabSYMJsDliNcKL r => RAuZQEFBeynpVXxGTfWHqvrmdjabSYMJsDliNcKLIzwgPoOCtkhU EFBeynpVXxGTfWHqvrmdjabSYMJsDliNcKLIzwgPoOCtkhUAuZQR rmdjabSYMJsDliNcKLIzwgPoOCtkhUAuZQREFBeynpVXxGTfWHqv vWGVyFQAkOgIclJSjrHqTfXxnpBeREuZhUCtPozwKLiNsDYMabmd t => TfXxnpBeREuZhUCtPozwKLiNsDYMabmdvWGVyFQAkOgIclJSjrHq eREuZhUCtPozwKLiNsDYMabmdvWGVyFQAkOgIclJSjrHqfXxnpBT tPozwKLiNsDYMabmdvWGVyFQAkOgIclJSjrHqfXxnpBTeREuZhUC CZRBxqjlgAyWmMsLzthUEuTenpfXrHJSIckOFQGVdvabDYiNwKPo y => yWmMsLzthUEuTenpfXrHJSIckOFQGVdvabDYiNwKPoCZRBxqjlgA TenpfXrHJSIckOFQGVdvabDYiNwKPoCZRBxqjlgAWmMsLzthUEuy YiNwKPoCZRBxqjlgAWmMsLzthUEuyTenpfXrHJSIckOFQGVdvabD DvGOIHfeuhLmgqRowYabVdFQckJSXrnpyTUEztMsAWjlBxCZKPiN state after processing 'pw' acjlsuBDKMwbktFOUWdZfgmioqvpxyEAGHNJSPYVhernCzLIQRXT state after processing 'qw' vCWHOTfoxGPVXegnpwyzFBIEKLQNcZliusDAMJRSYUabhdjkrmtq state after processing 'rw' VcewoqxDFMOXgpyHQWYZfbhiklsnuvzAGCIJPLURdamjrtEBNKST #########
From: J.D. on 27 Apr 2010 18:03
On Apr 27, 3:52 pm, bmearns <mearn...(a)gmail.com> wrote: > > [snip] > > Your initial results look good, but I'm really not crazy about this > "deal" mechanism. Besides the increased time complexity of doing this > (especially having to do it differently depending on the letter), I'd > really like to avoid having to put any cards down as both a matter of > convenience and a matter of security. If it came down to doing that, > I'd think a more thorough transposition would be better. E.g., instead > of keeping blocks of N cards together, deal out N cards in a row, and > then deal out the next N on top of those, etc. So it's just an N > column columnar transposition, where cards K, K+N, K+2N, ... end up > adjacent. > > Also, you're main shuffling algorithm (prior to the dealing) still has > that kink I mentioned earlier, where despite the pair of anchor > points, you're actually just splitting the deck around a single card > (the first instance). Yeah, I had addressed these concerns in my above post to Maaartin. Let me know what you think... > I tested out the version I previously suggested, where the middle > interval is alternately moved to the top of the deck. I still haven't > gotten a chance to actually test it by hand to see how easy it is, but > I think that the fact the the opposite cards can be moved to a > different location as well (the bottom of the top stack), instead of > just having to skip over them somehow) should make it not too error > prone. It is a bit fiddly and slow with a real deck, but with practice it shouldn't be too problematic. It may well be worth the increase in mixing. However it should probably be compared to a simpler shuffle step combined with an input-dependent deal step. > > Anyway, I ran some tests on it, and it looks fairly good. It actually > cascades just slightly worse than the prior version. For a 52-card > deck, looking at the entire deck as the output, the old version (I'll > call it shuffle_1) averages 65.4% of bits change per input, and the > new version (call it shuffle_2) averages 67%. As far as I understand, > we want to it be as close to 50% as possible: 65% is essentially the > same as 35%, with a bit-wise negation built in. So neither of these > look awesome in that respect. How are you measuring the 'bits change per input'? With an ideal hash function (that outputs a digest in the form of a permutation of a deck of cards), changing a single letter of the message (at any position in the message) should cause every card to change with a probability of --I think-- 0.96 (i.e. 25/26). However probability is not my strong suit, so I may be wrong about that. If I am right, perhaps a good test would be to take a set of messages all of length L, and all of which differ from each other only at position i, and count up how many times the same letter appears at the same position in the corresponding set of digests. Then test many other possible positions of variance (i.e. different i's) with other similar sets of messages, and a number of different lengths L. > > I also looked at a number of statistics related to the sequences. > First I simply counted the number of 2-card sequences in each > permutation (i.e., the number of cards for which it's next neighbor is > exactly +1 or -1 from it). That sounds like a good measure to me. > > So I did a more general test in which I looked at the "direction" of > each pair of cards. If a card's next neighbor is greater than it, then > the direction of that pair is positive. If the card's next neighbor is > less than it, the direction is negative. This will account for the > every-other-card sequences described above, and should generally be a > pretty nice statistic to compare against random permutations. Sounds good too. You may also want to try the average position of digest characters given changes in the input. That is, take a set of messages that only differ from each other at a specific position or small number of positions, then look at the position in the corresponding digests of each output character (e.g. where in the various digests is the ace of spades?). For a random permutation the average position over a large number of 'digests' should be near the middle. If the hash output averages cluster somewhere else then something is wrong. Also you could look at the average _relative_ distance between two output characters -- i.e. over many digests, how far apart are the ace of spades and the ace of clubs, and is that average distance different from what you'd expect given a random permutation? > > So I'd like to make shuffle_2 the launch pad for going forward as it > seems to be performing well in all the areas I want it to. I still > need to get a deck of cards and actually try it out, but I'm > optimistic. Basically what I'm looking at right now is a full deck of > 52 cards, 13 or 14 values in the master password (which is reasonable > for a strong password anyway, giving 74 or 79 bits), and drawing a > hand of probably 10 to 12 cards for the final output, giving passwords > of 55 to 66 bits, which is probably far beyond sufficient for logging > into gmail, etc. How does shuffle_2 compare to a simple shuffle (like my improved shuffle in the above response to Maaartin) plus a simple input- dependent deal step? That is comparing not just the statistical results, but also the efficiency -- moving every other card of the interval doesn't seem like it should be that much easier and faster than an input dependent deal step. |