what is probability to create two equal hashes for md5 algorithm [Cryptography]

Prev: cracking DES ciphertext with known plaintext using custom character set.
Next: Large prime for easily finding discreet logs

From: Unruh on 7 Dec 2006 18:47

Peter Fairbrother <zenadsl6186(a)zen.co.uk> writes:

>Unruh wrote:

>> "Joseph Ashwood" <ashwood(a)msn.com> writes:
>
>>> I'll show how by using something we can prove if uniformly distributed, but
>>> is cryptographically weak enough for the process to be seen; modular
>>> division by 2^64, let's call it F(x). It is trivial to prove that the output
>>> of F is uniformly distributed across the integers [0,2^64-1]. However, we
>>
>> It is uniformly distributed across any uniformly distributed
>> input with lenth greater than 64 bits.

>scary BOO !!!!!

>And, No, it ain't, neither.

>Over a uniformly distributed set of inputs between 0 and 2^64.5 bits, output
>values between 0 and 2^63 will have twice the probability of occurence
>(occurance?) than values between 2^63 and 2^64.

What is half a bit? Note what I said. "across any uniformly distributed
input with length greater than 64 bits" Length, measured in bits is an
integer. A number is eitehr 64 bits long or 65 bits long. NOt 64.5 bits
long. (Ie, the words were more carefully chosen than it at first seems.)

>Or something like that, I'm drunk and probably got the numbers wrong
>somewhere. or not, if I'm lucky :)

From: Unruh on 7 Dec 2006 18:52

Peter Fairbrother <zenadsl6186(a)zen.co.uk> writes:

>Unruh wrote:

>> "Sergei" <silentser(a)gmail.com> writes:
>>
>>> Joseph Ashwood wrote:
>>>> "Sergei" <silentser(a)gmail.com> wrote in message
>>>> news:1165441615.135737.121220(a)f1g2000cwa.googlegroups.com...
>>>>
>>>>> But if SHA-1 is a uniformly distributed function, how the Chinese guys
>>>>> could claim that it is possible to find a collision in 2^63 hash
>>>>> operations?
>>>>
>>>> I'll show how by using something we can prove if uniformly distributed, but
>>>> is cryptographically weak enough for the process to be seen; modular
>>>> division by 2^64, let's call it F(x). It is trivial to prove that the output
>>>> of F is uniformly distributed across the integers [0,2^64-1]. However, we
>>>> can also trivially compute x's that collide by simply adding 2^64 to the
>>>> previous x. F is perfectly uniform across all integers x, but is trivially
>>>> weak.
>>>>
>>>> Uniform distribution among the output set is not enough for cryptographic
>>>> strength.

>Yes. Though Joseph's example is cr*p.

>
>>> Sorry for another dilettantish question, but what is "a uniformly
>>> distributed function"? I thought this is simply another name for a
>>> random oracle function. If this is a function that simpy gives
>>> uniformly distributed outputs when given a uniformly distributed inputs
>>> then, for example, F(x)=x is also uniformly distributed. But what does
>>> it have to do with the collision probabilities in the case described
>>> here?
>>
>> It is precisely relevant. He wants to define collisions as collisions
>> between different inputs producing the same output. Obviously if the inputs
>> are not uniform, then the outputs are not since the function is
>> deterministic ( same input always gives the same output.)

>No, they aren't. In fact the precise opposite would be true of a good hash
>function. A good hash function should give randomly-distributed outputs
>whether or not the inputs were randomly distributed (as long as the inputs
>are different).

Of course it would not. The outputs are deterministically dependent on the
input. Thus if the input is biased, so is the output. Of course the biases
may not be obvious. For example, lets say that there are only two inputs,
then there are only two, definite, outputs.

>
>> Then assuming that the input is uniform, so is the output (or at least that
>> is the assumption)

>No, that isn't the assumption.

>> If that is true, then given 2^x different inputs, the
>> probability
>> of having a collision is 1-exp(-2^(2x-N)) for x<N.

>No, it isn't. It wouldn't be even if that was true (which it sort-of is, but

No what isn't? I am afraid your sentence is all tangled up in its
referents.

>it's not the assumption)

>You know better than this.

>--
>Peter

>who would hide behind your chair
>and steal your crystallised ginger

From: Joseph Ashwood on 8 Dec 2006 04:19

"Sergei" <silentser(a)gmail.com> wrote in message
news:1165502590.529789.195440(a)n67g2000cwd.googlegroups.com...
> Sure. But what random data has to do with "few doesns of __terabytes__
> of data", which should be partishioned in blocks and hashed?

My estimate went to the level of 1024 TB in 512KB pieces, and scales very
well up to 2^275 bytes for which we don't even have designations.

> As I
> understood, all the estimations here were made by assuming that the
> data was random.

Actually they assume a much weaker constraint that the data wasn't chosen to
either collide on the block size or to break the hash you choose. My
proposal included encryption only because that is generally desirable in
such large situations.

> But why do you think this "few doesns of __terabytes__
> of data" contain a random data?

We don't.
Joe

From: Sergei on 8 Dec 2006 04:54

Joseph Ashwood wrote:
> "Sergei" <silentser(a)gmail.com> wrote in message
> news:1165502590.529789.195440(a)n67g2000cwd.googlegroups.com...
> > Sure. But what random data has to do with "few doesns of __terabytes__
> > of data", which should be partishioned in blocks and hashed?
>
> My estimate went to the level of 1024 TB in 512KB pieces, and scales very
> well up to 2^275 bytes for which we don't even have designations.
>
> > As I
> > understood, all the estimations here were made by assuming that the
> > data was random.
>
> Actually they assume a much weaker constraint that the data wasn't chosen to
> either collide on the block size or to break the hash you choose. My
> proposal included encryption only because that is generally desirable in
> such large situations.
>
> > But why do you think this "few doesns of __terabytes__
> > of data" contain a random data?
>
> We don't.
> Joe

Ok, but aren't such estimations as "approximately one collision among
2^64 (resp. 2^80) different inputs" can be made only for the ideal
case when the hashes produce truly random outputs for any kinds of
inputs? Basically, I want to know, what would say if I claim a slightly
different estimation for MD5 (resp. SHA-1): approximately one collision
among 2^62 (resp. 2^77) different inputs? Would you object?

I undestand, this is a little bit off topic, but there is no crypto
specialists in my department and I'm interested in this topic.

Sergei

From: Joseph Ashwood on 8 Dec 2006 05:49

"Sergei" <silentser(a)gmail.com> wrote in message
news:1165571665.354823.86170(a)f1g2000cwa.googlegroups.com...
>
> Joseph Ashwood wrote:
>> "Sergei" <silentser(a)gmail.com> wrote in message
>> news:1165502590.529789.195440(a)n67g2000cwd.googlegroups.com...
>> > Sure. But what random data has to do with "few doesns of __terabytes__
>> > of data", which should be partishioned in blocks and hashed?
>>
>> My estimate went to the level of 1024 TB in 512KB pieces, and scales very
>> well up to 2^275 bytes for which we don't even have designations.
>>
>> > As I
>> > understood, all the estimations here were made by assuming that the
>> > data was random.
>>
>> Actually they assume a much weaker constraint that the data wasn't chosen
>> to
>> either collide on the block size or to break the hash you choose. My
>> proposal included encryption only because that is generally desirable in
>> such large situations.
>>
>> > But why do you think this "few doesns of __terabytes__
>> > of data" contain a random data?
>>
>> We don't.
>> Joe
>
> Ok, but aren't such estimations as "approximately one collision among
> 2^64 (resp. 2^80) different inputs" can be made only for the ideal
> case when the hashes produce truly random outputs for any kinds of
> inputs?

You can think of cryptographic hash functions as entropy distillers, that is
you can think of them as collecting as much randomness from the input as
possible and generating their output. This means that in most environments
they do function as if it were the ideal case.

> Basically, I want to know, what would say if I claim a slightly
> different estimation for MD5 (resp. SHA-1): approximately one collision
> among 2^62 (resp. 2^77) different inputs? Would you object?

I would not object with any real conviction. Such differences are irrelevant
when talking about merely 2^48 bytes (256 TB). If you were at the level of
2^60 blocks, then it would start to make a difference, but then I would
recommend moving to SHA-256 which will be good to 2^128 blocks anyway.

> I undestand, this is a little bit off topic, but there is no crypto
> specialists in my department and I'm interested in this topic.

Actually, it's not too far off topic, the effort necessary to find a
collision in cryprographic hashes is a valid topic here.
Joe

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: cracking DES ciphertext with known plaintext using custom character set.
Next: Large prime for easily finding discreet logs