From: Michael Rudolf on
Am 24.02.2010 18:23, schrieb mk:
> Even then I'm not getting completely uniform distribution for some reason:
> d 39411
> l 39376
> f 39288
> a 39275
> s 39225
> r 39172
> p 39159
> t 39073
> k 39071
> u 39064
> e 39005
> o 39005
> n 38995
> j 38993
> h 38975
> q 38958
> c 38938
> b 38906
> g 38894
> i 38847
> m 38819
> v 38712
> z 35321
> y 35228
> w 35189
> x 35075
>
> Code:
>
> import operator
>
> def gen_rand_word(n):
> with open('/dev/urandom') as f:
> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])

The reason is 256 % 26 != 0
256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
approx. 10) more often than w-z. You might want to skip the values 0-22
to achieve a truly uniform distribution.

FYI: Electronic Cash PINs in europe (dont know about the rest of the
world) were computed the same way (random hexdigit and just mod it when
it's too large) leading to a high probability that your first digit was
a 1 :)

Regards,
Michael
From: Steve Holden on
mk wrote:
> On 2010-02-24 03:50, Paul Rubin wrote:
>> The stuff about converting 4 random bytes to a decimal string and then
>> peeling off 2 digits at a time is pretty awful, and notice that since
>> 2**32 is 4294967296, in the cases where you get 10 digits, the first
>> 2-digit pair is never higher than 42.
>
> Yikes! I didn't think about that. This is probably where (some part of)
> probability skewing comes from.
>
> Anyway, the passwords for authorized users will be copied and pasted
> from email into in the application GUI which will remember it for them,
> so they will not have to remember and type them in. So I have little in
> the way of limitations of password length - even though in *some* cases
> somebody might have to (or be ignorant enough) to retype the password
> instead of pasting it in.
>
> In that case the "diceware" approach is not necessary, even though I
> will certainly remember this approach for a case when users will have to
> remember & type the passwords in.
>
> The main application will access the data using HTTP (probably), so the
> main point is that an attacker is not able to guess passwords using
> brute force.
>
> Using A-z with 10-char password seems to provide 3 orders of magnitude
> more combinations than a-z:
>
>>>> 57 ** 10
> 362033331456891249L
>>>> 25 ** 10
> 95367431640625L
>
> Even though I'm not sure it is worth it, assuming 1000 brute-force
> guesses per second (which over the web would amount pretty much to DOS),
> this would take # days:
>
>>>> 57 ** 10 / (1000 * 3600 * 24)
> 4190200595L
>>>> 25 ** 10 / (1000 * 3600 * 24)
> 1103789L
>
> Even then I'm not getting completely uniform distribution for some reason:
>
> d 39411
> l 39376
> f 39288
> a 39275
> s 39225
> r 39172
> p 39159
> t 39073
> k 39071
> u 39064
> e 39005
> o 39005
> n 38995
> j 38993
> h 38975
> q 38958
> c 38938
> b 38906
> g 38894
> i 38847
> m 38819
> v 38712
> z 35321
> y 35228
> w 35189
> x 35075
>
> Code:
>
> import operator
>
> def gen_rand_word(n):
> with open('/dev/urandom') as f:
> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])
>
> def count_chars(chardict, word):
> for c in word:
> try:
> chardict[c] += 1
> except KeyError:
> chardict[c] = 0
>
> if __name__ == "__main__":
> chardict = {}
> for i in range(100000):
> w = gen_rand_word(10)
> count_chars(chardict, w)
> counts = list(chardict.items())
> counts.sort(key = operator.itemgetter(1), reverse = True)
> for char, count in counts:
> print char, count
>
>> I'd write your code something like this:
>>
>> nletters = 5
>>
>> def randomword(n):
>> with open('/dev/urandom') as f:
>> return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])
>>
>> print randomword(nletters)
>
> Aw shucks when will I learn to do the stuff in 3 lines well instead of
> 20, poorly. :-/
>
When you've got as much experience as Paul?

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/

From: mk on
On 2010-02-24 18:59, Steve Holden wrote:

>> Aw shucks when will I learn to do the stuff in 3 lines well instead of
>> 20, poorly. :-/
>>
> When you've got as much experience as Paul?

And how much experience does Paul have? (this is mostly not a facile
question)

For my part, my more serious effort (on and off) with programming in
Python is under a year.

Regards,
mk

From: mk on
On 2010-02-24 18:56, Michael Rudolf wrote:

> The reason is 256 % 26 != 0
> 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
> approx. 10) more often than w-z.

<Barbie voice>writing secure code is hard...

I'm going to switch to PHP: Python world wouldn't lose much, but PHP
would gain a lot.

> You might want to skip the values 0-22
> to achieve a truly uniform distribution.

Hmm perhaps you meant to skip values over 256 - 22 ? Bc I'm getting this
(reduced the run to 1000 generated strings):

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)
if ord(x) > 22])


z 3609
b 3608
s 3567
e 3559
j 3556
r 3555
g 3548
p 3540
m 3538
q 3532
h 3528
y 3526
v 3524
i 3500
x 3496
c 3488
k 3488
l 3487
u 3487
a 3469
o 3465
d 3455
t 3439
f 3437
n 3417
w 3175

While with this:

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)
if ord(x) < 235])

a 3852
w 3630
s 3623
v 3582
y 3569
p 3568
c 3558
k 3558
b 3556
r 3553
x 3546
m 3534
n 3522
o 3515
h 3510
d 3505
u 3487
t 3486
i 3482
f 3477
e 3474
g 3460
q 3453
l 3437
z 3386
j 3382

1. I'm systematically getting 'a' outlier: have no idea why for now.

2. This is somewhat better (except 'a') but still not uniform.


> FYI: Electronic Cash PINs in europe (dont know about the rest of the
> world) were computed the same way (random hexdigit and just mod it when
> it's too large) leading to a high probability that your first digit was
> a 1 :)

Schadenfreude is deriving joy from others' misfortunes; what is the
German word, if any, for deriving solace from others' misfortunes? ;-)

Regards,
mk


From: Robert Kern on
On 2010-02-24 12:35 PM, mk wrote:

> While with this:
>
> def gen_rand_word(n):
> with open('/dev/urandom') as f:
> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
> < 235])
>
> a 3852
....

> 1. I'm systematically getting 'a' outlier: have no idea why for now.
>
> 2. This is somewhat better (except 'a') but still not uniform.

I will repeat my advice to just use random.SystemRandom.choice() instead of
trying to interpret the bytes from /dev/urandom directly.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: ANN: Leo 4.7 final released
Next: AKKA vs Python