My BLT [Cryptography]

Prev: Crypt ascii text in file
Next: Very new to GPG, two questions.

From: bmearns on 23 Feb 2010 14:14

On Feb 23, 12:36 pm, Mok-Kong Shen <mok-kong.s...(a)t-online.de> wrote:
> WTShaw wrote:
> > It's a matter of counting, dividing all 27 characters into three major
> > groups, dividing each of these major groups in groups of triads, and
> > dividing each triad into three characters.
>
> > For 332, pick any two of the characters in pool 3 and one in pool 2.
> > for x: mtr zgb dmx all are x
>
> > For "con sid er/ hwt pkm abl yfg jqu vxz," c=111 o=112 n=113 s=121
> > i=122 d=123 e=131 r=132 /=133 for the first nine.
>
> > Pick 1 character, d=123. pick a letter from poo1, pool2, and pool3..
> > any letter from each pool.
> > these would work; eku aiz cwu as all mean "d." For d, there are 9X9X8
> > possibilities.
> > For x, there are 8X7X9 possibilities, remembering that after one use
> > from pool3 which also contains / there is a shinking pool for choosing
> > the next pool3 ciphertext letter.
>
> Sorry for my cluelessness. You seem to define a way of mapping each
> letter of the alphabet to three numerals. But how does the encryption
> process goes after that step? Is your scheme essentially a
> monoalphabetic substitution? Or else a homophonic substitution (but then
> how, i.e. in which manner)? (Excuse me that I don't have yet much
> understood of what you wrote.)
>
> M. K. Shen

MK-

He's taken his key as the 27 character sequence "consider/
hwtpkmablyfgjquvxz". Break this into a three by three square of
trigraphs:
con sid er/
hwt pkm abl
yfg jqu vxz

The first number indicates what row the letter appears in, the second
is the column, and the third is the position in the trigraph at that
cell. So the letter d is in row 1, column 2, position 3: it's index is
123. The letter f is row 3, column 1, position 2: index 312.

So for enciphering, he starts by encoding each letter of plaintext
into this three-digit index as just shown. Then, he has to come up
with of representing that index. For this he uses the pools, which are
derived in a different manner: Pool 1 is the letters at position 1 of
each trigraph, Pool 2 is all the letters in position 2, and likewise
with Pool 3. So to encode the 3-digit index, he picks 3 letters, where
each letter is chosen arbitrarily from the pool with the corresponding
number. So for instance, as we saw the plaintext letter f has index
312, so it would be encoded with a letter from pool 3, followed by a
letter from pool 1, and then a letter from pool 2. The pools for this
key are:
1: csehpayjv
2: oirwkbfqx
3: nd/tmlguz
So f (312) could be encoded as "nco" or "djq" or "gex".

Decoding is just the reverse: lookup each letter of ciphertext to see
which pool it's in, then right down the number of the pool. Divide the
numbers into sets of three, and then each set gives an index into the
table from earlier, which points to a plaintext letter.

I'm very curious, though, to see if Shaw can offer any cryptanalysis
on this cipher. It's interesting, but I'm not convinced of it's
security.

-Brian

From: Richard Outerbridge on 23 Feb 2010 16:52

In article
<f7a29c3d-3fa8-4b27-8030-b343d57ce885(a)n5g2000vbq.googlegroups.com>,
bmearns <mearns.b(a)gmail.com> wrote:

> I'm very curious, though, to see if Shaw can offer any cryptanalysis
> on this cipher. It's interesting, but I'm not convinced of it's
> security.

Let's take a look at etaion, for English.

e : 131 [csehpayjv] [nd/tmlguz] [csehpayjv] 12.702%
t : 213 [oirwkbfqx] [csehpayjv] [nd/tmlguz] 9.056%
a : 231 [oirwkbfqx] [nd/tmlguz] [csehpayjv] 8.167%
i : 122 [csehpayjv] [oirwkbfqx] [oirwkbfqx] 6.996%
o : 112 [csehpayjv] [csehpayjv] [oirwkbfqx] 7.507%
n : 113 [csehpayjv] [csehpayjv] [nd/tmlguz] 6.749%

Total: 51.177%

Assuming you know the method (and the method is assumed known) even
though there are 729 different ways of encoding each of these trigrams
(thus making it a homophonic) they will not be absolutely unrelated, as
is the case with a true homophonic, or nomenclature.

I suspect the patterns would rapidly reveal themselves to frequency
analysis, and the original square deduced therefrom, even if all 729
different encodings for each letter are "randomly" employed. I have
no estimate for the unicity distance (how much ciphertext is needed
for key recovery).

outer

From: David Eather on 23 Feb 2010 19:57

On 24/02/2010 7:52 AM, Richard Outerbridge wrote:
> In article
> <f7a29c3d-3fa8-4b27-8030-b343d57ce885(a)n5g2000vbq.googlegroups.com>,
> bmearns<mearns.b(a)gmail.com> wrote:
>
>> I'm very curious, though, to see if Shaw can offer any cryptanalysis
>> on this cipher. It's interesting, but I'm not convinced of it's
>> security.
>
> Let's take a look at etaion, for English.
>
> e : 131 [csehpayjv] [nd/tmlguz] [csehpayjv] 12.702%
> t : 213 [oirwkbfqx] [csehpayjv] [nd/tmlguz] 9.056%
> a : 231 [oirwkbfqx] [nd/tmlguz] [csehpayjv] 8.167%
> i : 122 [csehpayjv] [oirwkbfqx] [oirwkbfqx] 6.996%
> o : 112 [csehpayjv] [csehpayjv] [oirwkbfqx] 7.507%
> n : 113 [csehpayjv] [csehpayjv] [nd/tmlguz] 6.749%
>
> Total: 51.177%
>
> Assuming you know the method (and the method is assumed known) even
> though there are 729 different ways of encoding each of these trigrams
> (thus making it a homophonic) they will not be absolutely unrelated, as
> is the case with a true homophonic, or nomenclature.
>
> I suspect the patterns would rapidly reveal themselves to frequency
> analysis, and the original square deduced therefrom, even if all 729
> different encodings for each letter are "randomly" employed. I have
> no estimate for the unicity distance (how much ciphertext is needed
> for key recovery).
>
> outer

I used the following table to calculate the unicity distance for an
alphabet of 27 letters. I included the SPACE character as it would be
the most useful for comprehension of an encrypted message.
The text was from "The Leading Facts of English History" by D. H.
Montgomery (c)1887. So, some small frequency variation is to be expected
when used on other texts. The calculated entropy rate per character is
4.097 bits.

SPACE 1759 17.59% SPACE 1759 17.59%
A 633 6.33% E 1059 10.59%
B 123 1.23% T 771 7.71%
C 247 2.47% A 633 6.33%
D 342 3.42% O 627 6.27%
E 1059 10.59% N 597 5.97%
F 214 2.15% I 558 5.58%
G 165 1.65% R 534 5.34%
H 498 4.98% S 526 5.26%
I 558 5.58% H 498 4.98%
J 13 0.13% D 342 3.42%
K 45 0.45% L 334 3.34%
L 334 3.34% C 247 2.47%
M 200 2.00% F 214 2.15%
N 597 5.97% M 200 2.00%
O 627 6.27% U 195 1.95%
P 155 1.55% G 165 1.65%
Q 9 0.09% W 162 1.62%
R 534 5.34% P 155 1.55%
S 526 5.26% Y 133 1.33%
T 771 7.71% B 123 1.23%
U 195 1.95% V 77 0.77%
V 77 0.77% K 45 0.45%
W 162 1.62% X 19 0.19%
X 19 0.19% J 13 0.13%
Y 133 1.33% Q 9 0.09%
Z 5 0.05% Z 5 0.05%
Total 10000 100.00% Total 10000 100.00%

since each character can be encoded in 729 different ways the
probability of each character is divided by 729. Entropy of each
character is calculated

H = P*LOG(p)/LOG(2) * -1

and then summed remembering that there are 729 encodings for each
character (an even distribution is assumed)so based on single character
probabilities the rate of the encoding is 13.607 bit per encoding (that
is per 3 digit group)

This gives a redundancy of

D = 13.607 - 4.097 (each 3 char grouping is only carrying to information
of the 27 character alphabet)
= 9.51 bits per encoding

and the unicity distance becomes

U = 13.607/9.51
= 3.32 characters

which is very, very bad. What it means is that because of the hogh
amount of redundancy in the cipher, any decryption that makes sense and
is over 4 characters long has a high probability to be correct.

From: David Eather on 24 Feb 2010 07:51

On 24/02/2010 10:57 AM, David Eather wrote:
> On 24/02/2010 7:52 AM, Richard Outerbridge wrote:
>> In article
>> <f7a29c3d-3fa8-4b27-8030-b343d57ce885(a)n5g2000vbq.googlegroups.com>,
>> bmearns<mearns.b(a)gmail.com> wrote:
>>
>>> I'm very curious, though, to see if Shaw can offer any cryptanalysis
>>> on this cipher. It's interesting, but I'm not convinced of it's
>>> security.
>>
>> Let's take a look at etaion, for English.
>>
>> e : 131 [csehpayjv] [nd/tmlguz] [csehpayjv] 12.702%
>> t : 213 [oirwkbfqx] [csehpayjv] [nd/tmlguz] 9.056%
>> a : 231 [oirwkbfqx] [nd/tmlguz] [csehpayjv] 8.167%
>> i : 122 [csehpayjv] [oirwkbfqx] [oirwkbfqx] 6.996%
>> o : 112 [csehpayjv] [csehpayjv] [oirwkbfqx] 7.507%
>> n : 113 [csehpayjv] [csehpayjv] [nd/tmlguz] 6.749%
>>
>> Total: 51.177%
>>
>> Assuming you know the method (and the method is assumed known) even
>> though there are 729 different ways of encoding each of these trigrams
>> (thus making it a homophonic) they will not be absolutely unrelated, as
>> is the case with a true homophonic, or nomenclature.
>>
>> I suspect the patterns would rapidly reveal themselves to frequency
>> analysis, and the original square deduced therefrom, even if all 729
>> different encodings for each letter are "randomly" employed.

I suspect it will be a surprisingly large amount of cipher text required
(compared to most pen and paper ciphers) if all 729 encodings for each
letter are "randomly" employed. My back of the envelope suggests 1458
three letter groupings before before a repeat of 2 three letter
groupings.

I have
>> no estimate for the unicity distance (how much ciphertext is needed
>> for key recovery).
>>
>> outer
>
> I used the following table to calculate the unicity distance for an
> alphabet of 27 letters. I included the SPACE character as it would be
> the most useful for comprehension of an encrypted message.
> The text was from "The Leading Facts of English History" by D. H.
> Montgomery (c)1887. So, some small frequency variation is to be expected
> when used on other texts. The calculated entropy rate per character is
> 4.097 bits.
>
I'll try to reformat the table

> SPACE 1759 17.59% SPACE 1759 17.59%
> A 633 6.33% E 1059 10.59%
> B 123 1.23% T 771 7.71%
> C 247 2.47% A 633 6.33%
> D 342 3.42% O 627 6.27%
> E 1059 10.59% N 597 5.97%
> F 214 2.15% I 558 5.58%
> G 165 1.65% R 534 5.34%
> H 498 4.98% S 526 5.26%
> I 558 5.58% H 498 4.98%
> J 13 0.13% D 342 3.42%
> K 45 0.45% L 334 3.34%
> L 334 3.34% C 247 2.47%
> M 200 2.00% F 214 2.15%
> N 597 5.97% M 200 2.00%
> O 627 6.27% U 195 1.95%
> P 155 1.55% G 165 1.65%
> Q 9 0.09% W 162 1.62%
> R 534 5.34% P 155 1.55%
> S 526 5.26% Y 133 1.33%
> T 771 7.71% B 123 1.23%
> U 195 1.95% V 77 0.77%
> V 77 0.77% K 45 0.45%
> W 162 1.62% X 19 0.19%
> X 19 0.19% J 13 0.13%
> Y 133 1.33% Q 9 0.09%
> Z 5 0.05% Z 5 0.05%
> Total 10000 100.00% Total 10000 100.00%
>
> since each character can be encoded in 729 different ways the
> probability of each character is divided by 729. Entropy of each
> character is calculated
>
> H = P*LOG(p)/LOG(2) * -1
>
> and then summed remembering that there are 729 encodings for each
> character (an even distribution is assumed)so based on single character
> probabilities the rate of the encoding is 13.607 bit per encoding (that
> is per 3 digit group)
>
> This gives a redundancy of
>
> D = 13.607 - 4.097 (each 3 char grouping is only carrying to information
> of the 27 character alphabet)
> = 9.51 bits per encoding
>
> and the unicity distance becomes
>
> U = 13.607/9.51
> = 3.32 characters
>
> which is very, very bad. What it means is that because of the high
> amount of redundancy in the cipher, any decryption that makes sense and
> is over 4 characters long has a high probability to be correct.
>
>
>
>
>
I should add that on the plus side that frequency analysis would require
a large amount of ciphertext that may be impossible to come across in a
pen and paper system. It also looks to be fairly hard to naively brute
force and two identical encrypted messages (with the 3 digit groups
chosen randomly) would not be immediately catastrophic.

As a clarification, I have sometimes used the term 3 digit group when
referring to the three alphabetic character groups. My bad for my poor
terminology.

I think a simple improvement would be to use a separate password to
generate the 3 pools.

From: bmearns on 24 Feb 2010 15:09

Thanks for the interesting analysis, Richard and David. I'm wondering
if our host has any to provide?

I'm very much an amateur at cryptanalysis, but I was able to do a
partially successful known-plain-text attack against the cipher by
hand over the course of a few hours, ending up with the pools fully
reconstructed, and 15 of the 27 table entries filled in. The reason I
wasn't able to reconstruct the entire table, despite having the full
pools, is because the order matters. There may be a clever way to
complete it from the tables without additional data, but I wasn't able
to come up with anything.

The attack was against the known plaintext "i have a card for you
later", encoded as "coi hdm fvp igc ute pzs ynl ouj cdm epa wdy ezo
hwg cuz def acr vto pgm lep yjk zin hmu ond qlj rpg cdv jlk" (note
that the encoding was done off the top of my head, so it's likely not
as good as a true random encoding). I came up with the following pools
(each in no particular order):
1: h y c j p a e v s
2: d n u g m l t z /
3: i o w q k r f b x

and the following partial table:
1 2 3
1 c?o e/r ?di
2 y?f v?? ?u?
3 ht? al? ???

My technique was just deductive reasoning. I started of course with
identical plaintext letters. For instance, 'a' is encoded 4 different
times, as 'igc', 'ouj', 'wdy', and 'qlj'. Therefore i, o, w, and q are
all in the same pool, as are {g u d l} and {c j y j}. Two of the
spaces are encoded as 'ynl' and 'cdm', so d and n are in the same
pool, and I can add n to the existing pool: {g u d l n}.

Another type of reasoning I used was when two different plaintext
characters had two corresponding triplet-values the same (as in, from
the same pool), I new the characters in the remaining triplet-position
had to be in different pools. For instance, if two different letters
were encoded as 'abz' and 'abq', then I knew q and z must be in
different pools from each other, or the triplet would encode the same
plaintext character. I don't think there are any of these in the
original encoded version, but once I was able to deduce that certain
characters were in the same pool, this type of situation started
popping up.

Finally, once I started getting fairly large partial pools, I could
reason that those two pools were distinct from one another. For
instance, if a partial pool contained {g u d l n m} and another had {h
y c j}, those must be different pools, because combined it would have
10 characters (which is too many, all pools have 9).

The mutually-exclusive type of deductions in the last two paragraphs
weren't nearly as helpful as the deductions from the first paragraph,
but they were eventually useful. After a while, I ended up with 3
pools which were all too large to be combined with one another, and
then I was able to determine that a particular character was not in
either of the first two pools, so there was only one place left for it
to go.

Anyway, I know that known-plain-text is generally the easiest type of
attack, but it took a relatively small amount of plaintext to
reconstruct the entire pool, and more than half the table. If a
sufficient crib was available, this sort of partial reconstruction
could be performed. Any characters that were able to be filled in to
the table could then be deciphered from the remainder of the unknown
message. For undeciphered letters (those which didn't make it into the
table from the crib), having the pools (even partial) would allow a
large number of possibilities to be removed so that the remainder
could be evaluated to see what makes sense, and of course each letter
that is deciphered will improve further analysis.

A final note about the analysis: the exact order of the table doesn't
necessarily have to match the original. The table I came up with
through the analysis is actually different from the one that was used
to encode the message originally, but as long as it is self consistent
and consistent with the crib/ciphertext, it should work.

All in all, it was kind of a fun game, somewhat similar to soduko.
Maybe they should start publishing these in the Sunday paper.

-Brian

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Crypt ascii text in file
Next: Very new to GPG, two questions.