From: Tran Ngoc Duong on 26 Jun 2010 07:44 On Jun 25, 7:04 pm, Mok-Kong Shen <mok-kong.s...(a)t-online.de> wrote: > In the i-th round, one does the following: > > W2 = fi_12(W1) ^ W2; W1 = fi_21(W2) ^ W1; > W4 = fi_34(W3) ^ W4; W3 = fi_43(W4) ^ W3; > W3 = fi_13(W1) ^ W3; W1 = fi_31(W3) ^ W1; > W4 = fi_24(W2) ^ W4; W2 = fi_42(W4) ^ W2; > The step 1 W2 = fi_12(W1) ^ W2; W1 = fi_21(W2) ^ W1; is (in general) not a multipermutation. So the uncertainty of W1 does not propagate completely to W3 in step 3. This statistical weakness indicates that maybe more rounds is needed. Thus the round structure is less efficient than it should be.
From: Mok-Kong Shen on 26 Jun 2010 08:52 Tran Ngoc Duong wrote: > Mok-Kong Shen wrote: >> In the i-th round, one does the following: >> >> W2 = fi_12(W1) ^ W2; W1 = fi_21(W2) ^ W1; >> W4 = fi_34(W3) ^ W4; W3 = fi_43(W4) ^ W3; >> W3 = fi_13(W1) ^ W3; W1 = fi_31(W3) ^ W1; >> W4 = fi_24(W2) ^ W4; W2 = fi_42(W4) ^ W2; >> > The step 1 > > W2 = fi_12(W1) ^ W2; W1 = fi_21(W2) ^ W1; > > is (in general) not a multipermutation. So the uncertainty of W1 does > not propagate completely to W3 in step 3. > > This statistical weakness indicates that maybe more rounds is needed. > Thus the round structure is less efficient than it should be. Certainly with this scheme more than one round will be required for achieving any reasonable propagation of the influence of individual input bits (avalanche). Do you have ideas of improving the given round structure, while keeping the matter (coding etc.) as simple as possible? Thanks. M. K. Shen
From: Maaartin on 26 Jun 2010 09:53 On Jun 26, 2:52 pm, Mok-Kong Shen <mok-kong.s...(a)t-online.de> wrote: > Tran Ngoc Duong wrote: > > Mok-Kong Shen wrote: > >> In the i-th round, one does the following: > > >> W2 = fi_12(W1) ^ W2; W1 = fi_21(W2) ^ W1; > >> W4 = fi_34(W3) ^ W4; W3 = fi_43(W4) ^ W3; > >> W3 = fi_13(W1) ^ W3; W1 = fi_31(W3) ^ W1; > >> W4 = fi_24(W2) ^ W4; W2 = fi_42(W4) ^ W2; > > > The step 1 > > > W2 = fi_12(W1) ^ W2; W1 = fi_21(W2) ^ W1; > > > is (in general) not a multipermutation. So the uncertainty of W1 does > > not propagate completely to W3 in step 3. > > > This statistical weakness indicates that maybe more rounds is needed. > > Thus the round structure is less efficient than it should be. I'm not sure, if this matters. The operations are simple and I have no idea how many rounds are necessary. However, better mixing should be preferred. > Certainly with this scheme more than one round will be required for > achieving any reasonable propagation of the influence of individual > input bits (avalanche). Do you have ideas of improving the given round > structure, while keeping the matter (coding etc.) as simple as possible? What you do (simplified; all functions are different and key dependent): W2 ^= f(W1); W1 ^= f(W2); W4 ^= f(W3); W3 ^= f(W4); W3 ^= f(W1); W1 ^= f(W3); W4 ^= f(W2); W2 ^= f(W4); You mixed W1 into W2 and back again. Intuitively, something like W2 ^= f(W1); W3 ^= f(W2); W4 ^= f(W3); W1 ^= f(W4); should be more efficient. Neither of the schemata allows any parallelism, so I'd prefer W2 ^= f(W1); W4 ^= f(W3); W1 ^= f(W4); W3 ^= f(W2); which executes twice as much rounds per time unit on my computer. I wonder if there's a theory behind. Finally, since all the functions are bijective, you can do something like for (i=1; i<=4; ++i) W[i] = f(W[i]); W = M * W where W is the vector of all W1...4 and M is an MDS matrix. No idea how good this can be. The function is obviously bijective, but inverting would be inefficient (which doesn't matter here).
From: Mok-Kong Shen on 26 Jun 2010 10:20 Maaartin wrote: > What you do (simplified; all functions are different and key > dependent): > > W2 ^= f(W1); W1 ^= f(W2); > W4 ^= f(W3); W3 ^= f(W4); > W3 ^= f(W1); W1 ^= f(W3); > W4 ^= f(W2); W2 ^= f(W4); > > You mixed W1 into W2 and back again. Intuitively, something like > > W2 ^= f(W1); > W3 ^= f(W2); > W4 ^= f(W3); > W1 ^= f(W4); > > should be more efficient. I suppose you are comparing the first with 2 rounds of the second. It would be very fine, if one could theoretically show that the second case always result in a better 'mixing' than the first. (I was not very sure, and thus had not used the second scheme.) M. K. Shen > Neither of the schemata allows any > parallelism, so I'd prefer > > W2 ^= f(W1); W4 ^= f(W3); > W1 ^= f(W4); W3 ^= f(W2); > > which executes twice as much rounds per time unit on my computer. I > wonder if there's a theory behind. > > Finally, since all the functions are bijective, you can do something > like > > for (i=1; i<=4; ++i) W[i] = f(W[i]); > W = M * W > > where W is the vector of all W1...4 and M is an MDS matrix. No idea > how good this can be. The function is obviously bijective, but > inverting would be inefficient (which doesn't matter here).
From: Tran Ngoc Duong on 26 Jun 2010 21:44
On Jun 26, 8:53 pm, Maaartin <grajc...(a)seznam.cz> wrote: > On Jun 26, 2:52 pm, Mok-Kong Shen <mok-kong.s...(a)t-online.de> wrote: > > > Tran Ngoc Duong wrote: > > > Mok-Kong Shen wrote: > > >> In the i-th round, one does the following: > > > >> W2 = fi_12(W1) ^ W2; W1 = fi_21(W2) ^ W1; > > >> W4 = fi_34(W3) ^ W4; W3 = fi_43(W4) ^ W3; > > >> W3 = fi_13(W1) ^ W3; W1 = fi_31(W3) ^ W1; > > >> W4 = fi_24(W2) ^ W4; W2 = fi_42(W4) ^ W2; > > > > The step 1 > > > > W2 = fi_12(W1) ^ W2; W1 = fi_21(W2) ^ W1; > > > > is (in general) not a multipermutation. So the uncertainty of W1 does > > > not propagate completely to W3 in step 3. > > > > This statistical weakness indicates that maybe more rounds is needed. > > > Thus the round structure is less efficient than it should be. > > I'm not sure, if this matters. The operations are simple and I have no > idea how many rounds are necessary. > So am I. There are examples and counter-examples of block ciphers regarding the "complete diffusion" approach. a) DES and CAST are counter-examples. The designers opted for non- bijective f-functions, which effectively means that the uncertainty of the first data half doesn't propagate to the second half completely. (Note: I am silent about the propagation of the key bits.) b) GOST and Skipjack are examples, as the f-functions used in these Feistel structures are bijective. In Skipjack, furthermore, every data word completely propagates to the remaining three. c) There is also stronger examples (I don't remember reference) where each round is a [classical, 2-block] Feistel structure whose f- function is not only bijective but also orthomorphic. The orthomorphic function, itself, may be implemented by a 2-round Feistel structure. It is worth to note that this is effectively 4-block 'mixing' structure which may fit Mok-Kong Shen's needs. For the classical (2-block) Feistel structure there is an argument speaking for the "complete diffusion" approach: the number of active rounds in an iterative differential characteristics. For an N-round cipher, an iterative differential characteristics may consist of about N/2 active rounds if the cipher is of type a), whereas the respective number is 2N/3 for type b), and 3N/4 for type c). > However, better mixing should be > preferred. > ... > What you do (simplified; all functions are different and key > dependent): > > W2 ^= f(W1); W1 ^= f(W2); > W4 ^= f(W3); W3 ^= f(W4); > W3 ^= f(W1); W1 ^= f(W3); > W4 ^= f(W2); W2 ^= f(W4); > > You mixed W1 into W2 and back again. Intuitively, something like > > W2 ^= f(W1); > W3 ^= f(W2); > W4 ^= f(W3); > W1 ^= f(W4); > > should be more efficient. Neither of the schemata allows any > parallelism, so I'd prefer > > W2 ^= f(W1); W4 ^= f(W3); > W1 ^= f(W4); W3 ^= f(W2); > > which executes twice as much rounds per time unit on my computer. > Mok-Kong Shen's algorithm allows the same parallelism as your preferred algorithm, namely two parallel f-computations. If it was slower on your computer that might be compiler's or programmer's problem. So performance is not an issue there. > I wonder if there's a theory behind. > Your non-preferred algorithm, its inverse, as well as your preferred algorithm may be based on the model on 4-bit circular shift feedback registers, with the generating polynomial 1 ^ x ^ x**4, 1 ^ x**3 ^ x**4, and 1 ^ x**2 ^ x**4, respectively. In the open literature there are some articles of Knudsen, Wagner, Robshaw and Reichardt (on the structure of Skipjack cipher). I believe the articles give enough information to deduce the optimal number of rounds, i.e. the maximum number of rounds that still prohibits the existence of iterative (truncated) differential characteristics. |