From: bmearns on
On Feb 9, 9:10 am, Mok-Kong Shen <mok-kong.s...(a)t-online.de> wrote:
> WTShaw wrote:
> > Remember that with higher bases, entropy may not be easily described
> > in bits, and might have more useful bang for the buck. Too difficult
> > subjects are avoided because they are that, not that they are not
> > worth knowing, or are even superior to the usual gruel.
>
> In textbook sources, English charaters (hence base 26) are said to have
> entropy somewhere between 1.0 and 1.5 bit pro chaaracter (See Stinson).
> As layman I don't know the details of how these figures got evaluated.
> Certainly that's not a trivial task, maybe indeed very very
> difficult, as you said. But that has nonetheless been done!
>
> M. K. Shen

That's not the English characters themselves, it's the English
language, which has a large amount of redundancy built in (as do all
natural languages). For instance, there are 17576 possible three-
letter combinations, but not all are used as words. Of course, only a
fraction of those are pronouncible, but there are still pronouncible
three-letter combinations that are not used (baz, tuv, caf, &c.). The
language therefore does not convey nearly as much information per
character as it potentially could (of course, there's a reason for
that: redundancy is useful for error detection and correction).

What Shaw is referring to (if I may) is simply a non binary alphabet,
unrelated to natural language. You could use the 26-character English
alphabet, or just as well use any other 26 characters, and get an
entropy of about 4.7 Shannons per character (it's simply the base-2
log of the size of the alphabet, 1 Shannon is the maximum amount of
information that 1 bit can convey).

If you tried to represent this alphabet in binary, you would
necessarily loose some entropy per symbol, because 26 is not a power
of two. To represent each character uniquely and independently would
require an average of 5 bits per character (that's just the ceiling of
4.7, since you can't encode a partial bit, at least not in this
context). You've now got 4.7 Shannons per 5 bits, or about 0.94
Shannons per bit. That's a loss, because a bit should be able to
convey 1 Shannon per bit (that's base-2 log of 2, because the binary
alphabet has a size of 2). Hence, depending on the source of the
random value, different bases may be more appropriate to represent
those values.

Whether or not that's relevant to your original point, I can't say.
Actually, I doubt it is since AES is defined to work on binary values.
It should be noted, though, that most plaintext is compressed before
being encrypted so even if the plain-text has only 1 bit per octet,
the input to the cipher will likely have much more than that.

-Brian
From: bmearns on
On Feb 9, 1:58 am, Mok-Kong Shen <mok-kong.s...(a)t-online.de> wrote:
> unruh wrote:
> >> Just an observation: If one uses a good block cipher like AES to
> >> encrypt, it is common that a single key is used to process a fairly
> >> long plaintext stream. But the key has at most 128 bits of entropy.
> >> Isn't it a miracle that the resulting ciphertext stream (a result
> >> of combination) has very high entropy? Or is it rather the case that
> >> the ciphertext stream doesn't possess much higher entropy per bit
> >> "after all" in comparison with the plaintext stream (the enhancement
> >> of entropy per bit being at most 128 divided by the (commonly
> >> relatively large) total number of bits being processed) and thus the
> >> achieved security, on which one "believes", were actually an illusion
> >> (cf. optical illusions)?
>
> > Of course it is. We know the attack. Try every one of the 2^128 keys and
> > see which one works. That is the 128 bits of entropy. the problem is
> > that "try all 2^128 keys" is really really tedious. Ie, this indicates
> > that if done properly, 128 bits of entropy is sufficient to hide any
> > text you wish, for all practical purposes. (at least for now). 128 bits
> > of entropy is really a lot of different states.
>
> My point is this: If one considers the input natural language text has,
> say, 1 bit of entropy per character (this is the lowest figure of
> estimate I have seen of all textbooks), then the ciphertext from
> encryption using AES has not much more than 1.0001 bit of entropy per
> character. I am not sure that people are commonly conscious of that.
>
> M. K. Shen

I'm also not sure that people commonly care about it. In my admittedly
limited experience, I've never come across any discussion of
encryption that relates the security to the entropy of the ciphertext.
Security comes from the inability to reverse the encryption process.

Notice that the entropy of the ciphertext isn't the limiting factor.
In fact, the cipher-text can't have less total entropy than the
plaintext, or it would be destroying information (it's strictly
possible, but it would be a lossy encryption scheme, you couldn't
guarantee that it would decrypt to the correct plaintext). Therefore,
the entropy of the plaintext will always be the limiting factor. In
other words, a brute force attack would make more sense against the
plaintext than against the ciphertext.

To take your example, imagine a plaintext of 1000000 characters, with
1 bit of entropy per character. That's 1000000 bits total entropy in
the plaintext. Assuming a perfectly random key and a good cipher, the
ciphertext will have 1000128 bits of total entropy. Not a big
improvement, but it's still more than the plaintext, so if you really
think a brute force attack on 1000128 bits of entropy is plausible,
than a brute force attack against the 1000000 bits of plaintext
entropy is even more plausible.

Fortunately, a brute force attack against 1000000 bits of entropy is
not plausible, nor even a brute force attack against 128 bits. In case
you're unfamiliar with the context, the estimated age of the universe
is less than 2^59 seconds. If you could try 2^40 different variants
per second (more than 1 trillion), it would still take a more than
half-a-billion times longer than the current age of the universe to
try all variants (2^128/2^40/2^59 = 2^(128-40-59) = 2^29 = 536870912)..

And of course, neither of those is the route a brute force attacker
would take, anyway. Except for short messages, the key will typically
have the least total entropy of anything in the system. Instead of
trying to enumerate all the 2^1000000 different possible plaintexts,
or 2^1000128 possible ciphertexts (I don't know what good that would
do you, anyway), a brute force attacker would enumerate all the 2^128
possible keys. But, as I said, that's really not remotely plausible.

More generally, the entropy per symbol is irrelevant in terms of brute
force attacks, whether you're talking about the key, the plaintext, or
the ciphertext. Even if all the entropy the ciphertext had was the 128
bits from the key, it wouldn't matter if that was spread out over 128
binary digits, or a million octets; there are still 2^128 different
possible values. The entropy per symbol is only relevant to the
efficiency of the encoding.

So in summary: the entropy of the ciphertext is inconsequential in
terms of security. If it has low entropy, that is necessarily because
the plaintext has low entropy.

-Brian
From: Mok-Kong Shen on
bmearns wrote:
[snip]
> More generally, the entropy per symbol is irrelevant in terms of brute
> force attacks, whether you're talking about the key, the plaintext, or
> the ciphertext. Even if all the entropy the ciphertext had was the 128
> bits from the key, it wouldn't matter if that was spread out over 128
> binary digits, or a million octets; there are still 2^128 different
> possible values. The entropy per symbol is only relevant to the
> efficiency of the encoding.
>
> So in summary: the entropy of the ciphertext is inconsequential in
> terms of security. If it has low entropy, that is necessarily because
> the plaintext has low entropy.

Your arguments are understadable. However I have one layman's problem:
Isn't the single theoretical "perfect" security, provided by OTP, based
on the other hand on the concept of entropy?

Thanks,

M. K. Shen
From: Mok-Kong Shen on
Spinner wrote:

> Silly statement. The fact that a totally provably random OTP has never
> been created is going to be a helluva suprise to a lot of people who
> have been using them for many manyt years. Not to mentioni there are
> websites where you can go get a list of quantum-generated (radioactive
> decay) random numbers.

The problem is "proving" that what one has at hand "is" a OTP. I can't
conceive how one could deliver a proof in the sense of proofs in math.
This is analogous to the case of a point in elementary geometry.
One simply can't have in the real world a point that has no "size",
though one could have arbitrarily good approximation of that. In matter
of OTP in "practice", see also the story about Venona.

M. K. Shen
From: bmearns on
On Feb 9, 3:20 pm, Mok-Kong Shen <mok-kong.s...(a)t-online.de> wrote:
> Spinner wrote:
> > Silly statement. The fact that a totally provably random OTP has never
> > been created is going to be a helluva suprise to a lot of people who
> > have been using them for many manyt years.  Not to mentioni there are
> > websites where you can go get a list of quantum-generated (radioactive
> > decay) random numbers.
>
> The problem is "proving" that what one has at hand "is" a OTP. I can't
> conceive how one could deliver a proof in the sense of proofs in math.
> This is analogous to the case of a point in elementary geometry.
> One simply can't have in the real world a point that has no "size",
> though one could have arbitrarily good approximation of that. In matter
> of OTP in "practice", see also the story about Venona.
>
> M. K. Shen

This in particular is far from my realm of expertise, but as I
understand it, it is the generator itself which can be proven to be
random. It's true, you can't look at the OTP itself and prove that it
is truly random, but if the values come from a truly random process,
like a quantum effect, then the generator can be proven random, and
therefore the data it produces is as well.

-Brian