Prev: ANN: Leo 4.7 final released
Next: AKKA vs Python
From: Michael Rudolf on 24 Feb 2010 12:56 Am 24.02.2010 18:23, schrieb mk: > Even then I'm not getting completely uniform distribution for some reason: > d 39411 > l 39376 > f 39288 > a 39275 > s 39225 > r 39172 > p 39159 > t 39073 > k 39071 > u 39064 > e 39005 > o 39005 > n 38995 > j 38993 > h 38975 > q 38958 > c 38938 > b 38906 > g 38894 > i 38847 > m 38819 > v 38712 > z 35321 > y 35228 > w 35189 > x 35075 > > Code: > > import operator > > def gen_rand_word(n): > with open('/dev/urandom') as f: > return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)]) The reason is 256 % 26 != 0 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is approx. 10) more often than w-z. You might want to skip the values 0-22 to achieve a truly uniform distribution. FYI: Electronic Cash PINs in europe (dont know about the rest of the world) were computed the same way (random hexdigit and just mod it when it's too large) leading to a high probability that your first digit was a 1 :) Regards, Michael
From: Steve Holden on 24 Feb 2010 12:59 mk wrote: > On 2010-02-24 03:50, Paul Rubin wrote: >> The stuff about converting 4 random bytes to a decimal string and then >> peeling off 2 digits at a time is pretty awful, and notice that since >> 2**32 is 4294967296, in the cases where you get 10 digits, the first >> 2-digit pair is never higher than 42. > > Yikes! I didn't think about that. This is probably where (some part of) > probability skewing comes from. > > Anyway, the passwords for authorized users will be copied and pasted > from email into in the application GUI which will remember it for them, > so they will not have to remember and type them in. So I have little in > the way of limitations of password length - even though in *some* cases > somebody might have to (or be ignorant enough) to retype the password > instead of pasting it in. > > In that case the "diceware" approach is not necessary, even though I > will certainly remember this approach for a case when users will have to > remember & type the passwords in. > > The main application will access the data using HTTP (probably), so the > main point is that an attacker is not able to guess passwords using > brute force. > > Using A-z with 10-char password seems to provide 3 orders of magnitude > more combinations than a-z: > >>>> 57 ** 10 > 362033331456891249L >>>> 25 ** 10 > 95367431640625L > > Even though I'm not sure it is worth it, assuming 1000 brute-force > guesses per second (which over the web would amount pretty much to DOS), > this would take # days: > >>>> 57 ** 10 / (1000 * 3600 * 24) > 4190200595L >>>> 25 ** 10 / (1000 * 3600 * 24) > 1103789L > > Even then I'm not getting completely uniform distribution for some reason: > > d 39411 > l 39376 > f 39288 > a 39275 > s 39225 > r 39172 > p 39159 > t 39073 > k 39071 > u 39064 > e 39005 > o 39005 > n 38995 > j 38993 > h 38975 > q 38958 > c 38938 > b 38906 > g 38894 > i 38847 > m 38819 > v 38712 > z 35321 > y 35228 > w 35189 > x 35075 > > Code: > > import operator > > def gen_rand_word(n): > with open('/dev/urandom') as f: > return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)]) > > def count_chars(chardict, word): > for c in word: > try: > chardict[c] += 1 > except KeyError: > chardict[c] = 0 > > if __name__ == "__main__": > chardict = {} > for i in range(100000): > w = gen_rand_word(10) > count_chars(chardict, w) > counts = list(chardict.items()) > counts.sort(key = operator.itemgetter(1), reverse = True) > for char, count in counts: > print char, count > >> I'd write your code something like this: >> >> nletters = 5 >> >> def randomword(n): >> with open('/dev/urandom') as f: >> return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)]) >> >> print randomword(nletters) > > Aw shucks when will I learn to do the stuff in 3 lines well instead of > 20, poorly. :-/ > When you've got as much experience as Paul? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/
From: mk on 24 Feb 2010 13:13 On 2010-02-24 18:59, Steve Holden wrote: >> Aw shucks when will I learn to do the stuff in 3 lines well instead of >> 20, poorly. :-/ >> > When you've got as much experience as Paul? And how much experience does Paul have? (this is mostly not a facile question) For my part, my more serious effort (on and off) with programming in Python is under a year. Regards, mk
From: mk on 24 Feb 2010 13:35 On 2010-02-24 18:56, Michael Rudolf wrote: > The reason is 256 % 26 != 0 > 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is > approx. 10) more often than w-z. <Barbie voice>writing secure code is hard... I'm going to switch to PHP: Python world wouldn't lose much, but PHP would gain a lot. > You might want to skip the values 0-22 > to achieve a truly uniform distribution. Hmm perhaps you meant to skip values over 256 - 22 ? Bc I'm getting this (reduced the run to 1000 generated strings): def gen_rand_word(n): with open('/dev/urandom') as f: return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x) > 22]) z 3609 b 3608 s 3567 e 3559 j 3556 r 3555 g 3548 p 3540 m 3538 q 3532 h 3528 y 3526 v 3524 i 3500 x 3496 c 3488 k 3488 l 3487 u 3487 a 3469 o 3465 d 3455 t 3439 f 3437 n 3417 w 3175 While with this: def gen_rand_word(n): with open('/dev/urandom') as f: return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x) < 235]) a 3852 w 3630 s 3623 v 3582 y 3569 p 3568 c 3558 k 3558 b 3556 r 3553 x 3546 m 3534 n 3522 o 3515 h 3510 d 3505 u 3487 t 3486 i 3482 f 3477 e 3474 g 3460 q 3453 l 3437 z 3386 j 3382 1. I'm systematically getting 'a' outlier: have no idea why for now. 2. This is somewhat better (except 'a') but still not uniform. > FYI: Electronic Cash PINs in europe (dont know about the rest of the > world) were computed the same way (random hexdigit and just mod it when > it's too large) leading to a high probability that your first digit was > a 1 :) Schadenfreude is deriving joy from others' misfortunes; what is the German word, if any, for deriving solace from others' misfortunes? ;-) Regards, mk
From: Robert Kern on 24 Feb 2010 14:01
On 2010-02-24 12:35 PM, mk wrote: > While with this: > > def gen_rand_word(n): > with open('/dev/urandom') as f: > return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x) > < 235]) > > a 3852 .... > 1. I'm systematically getting 'a' outlier: have no idea why for now. > > 2. This is somewhat better (except 'a') but still not uniform. I will repeat my advice to just use random.SystemRandom.choice() instead of trying to interpret the bytes from /dev/urandom directly. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |