Prev: Podcast interview w/ Dr. Jay Kennedy re Plato's secret codes
Next: Prime Number SUMS model the Riemann Zeta function
From: Tran Ngoc Duong on 6 Aug 2010 09:55 This is the updated version of the article originally posted on 26-27 July. Changes include completely re-written subsections 3.1 "Data order" and 3.5 "Bias E", more notes explaining simplicity and efficiency, clarified description, added illustrations and examples (as suggested by Tom St. Denis and others), minor wording improvements. The algorithm itself remains unchanged. Many thanks for your consideration. For those who are not interested, I'm sorry for long post. Again. Tran Ngoc Duong. ======================================================================== THE BLOCK CIPHER NSABC (Public domain) Alice Nguyenova-Stepanikova Tran Ngoc Duong <tranngocduong at gmail dot com> (corresponding author) 27th July, 2010 A b s t r a c t. We introduce NSABC/w (Near-Skipjack Algebraic Block Cipher operating on w-bit word arithmetics, w even), a 4w-bit analogous of Skipjack [NSA98] with 5w-bit key. The Skipjack's internal 4-round Feistel structure is replaced with a w-bit, 2-round cascade of the binary operation (x,z) |-> (2*x*z + (1-2*e)*(x-z+e)) <<< (w/2), partially encrypting a data word x under control of a key word z and a round-dependent constant word e. Similar to the multiplication in IDEA [LM91, LMM91], the present operation bases on an algebraic group over w-bit words, so it is also capable of decrypting by means of the inverse element of z in the group. The cipher may utilize an optional secret 4w-bit "tweak", i.e. an easily changeable parameter with unique value for every block encrypted under the same key [LRW02], that is in turn derived from the block index and a 4w-bit [additional] key, to facilitate applications such as disk encryption. With w=64, on a modern 64-bit processor, encryption takes about 5 clock cycles per byte and, when the tweak is omitted, only 4.4 clock cycles per byte. 1. Introduction In the today's world full of cryptographic algorithms, one may wonder what makes a block cipher attractive. The authors' answer to the question is one word: elegance. If something looks nice, then there is a big chance that it is also good. An elegant specification makes it easier to memorize. Memorizability makes it easier to analyze, that allows for fruitful cryptanalytic results, leading to deeper understanding which, in turn, makes greater confidence in the algorithm. The elegance comprises the following features: -- Few algebraic operations, best use [quasi-]group ones. Using of many operations, such as tables of constants, S-boxes, data-dependent transpositions or operator selections results in "wild", hardly-tractable and possibly undesirable interactions between operations. [Quasi-]group operations are preferable because of the "perfect secrecy": if x is product of y and z in a quasigroup, knowledge about either variable gives no information about any of the other two. -- Simple and regular key schedule. A complex key schedule with, say, bit transpositions or non-linear operations, which effectively adds another, unrelated, function to the cipher, results in hardly-tractable and possibly undesirable interactions between the functions. From the pragmatic viewpoint, moreover, a simple and regular key schedule is very fast, which makes the cipher much more useful as a universal primitive to construct others, such as hash functions. IDEA, a secure block cipher designed by Xuejia Lai and James L. Massey [LM91, LMM91] is an example of elegance. Besides being elegant with an efficient choice and arrangement of algebraic operations, it is elegant for some more features: -- The use of "incompatible" group operations, where "incompatible" means there are no simple relations (such as distributivity) between them. The incompatibility excludes any exploitable algebraic property and makes it impossible to solve the cipher algebraically. -- The use of modular multiplication. Multiplication is apparently more powerful than addition and thus greatly contributes to security and efficiency of the cipher. However, modular multiplication in IDEA uses the Fermat prime modulus 2**w+1 which does not exist for w=32 or w=64, making it not extendable to machine word lengths nowaday. Furthermore, it rotates the [primary] key in a way that some key bits are re-used much more slowly than others, making the key schedule irregular. Skipjack, a secure block cipher designed by cryptographers of the U.S. National Security Agency [NSA98], is another example of elegant design. Besides being elegant with an efficient, simple and regular key schedule, it is elegant for one more feature: -- The use of two ciphers, an "outer" cipher, or *wrapper*, consisting of first and last rounds, and an "inner" cipher, or *core*, consisting of middle rounds. The terms are introduced in the design rationale of MARS [IBM98], a block cipher structurally similar to Skipjack. The MARS's designers justifies this two-layer structure that it breaks any repetitious property, it makes any iterative characteristic impossible, and it disallows any propagation of eventual vulnerabilities in either layer to the other one, thus making attacks more difficult. The wrapper is primarily aimed at fast diffusion and the core primarily at strong confusion. As Claude E. Shannon termed in his pioneer work [Sha49], *diffusion* here refers to the process of letting one input bit affect many output bits in very different ways, and *confusion* here refers to the process of letting one output bit be affected by many input bits in very different ways. Skipjack (as opposed to MARS) was sought elegant as the wrapper here is, in essence, the inverse function of the core. However, Skipjack uses an S-box, making it rather slow, hard to program in a secure manner, and not extendable to large machine word lengths, as such. This article describes an atempt to combine the ideas of elegance of IDEA with the nice structure of Skipjack into a scalable and tweakable block cipher called NSABC -- Near-Skipjack Algebraic Block Cipher. NSABC uses the entire overall structure of Skipjack, including the key schedule (except word re-numbering which is merely cosmetic), and only replaces the intereal 4-round Feistel structure of Skipjack with another structure. The new structure consists of two rounds of an operation, denoted "(o)", which is essentially multiplication of odd integers modulo 2**(w+1) that are represented by integers modulo 2**w. Each instance of "(o)" is followed by a swap of high and low half-words. Thus NSABC uses only two algebraic operations, XOR and "o", which are mutually incompatible. NSABC is scalable. It is defined for every even word length w, which allows scaling up with 8-bit increment in block length and 10-bit increment in key length. NSABC is tweakable. It can use an easily changeable parameter, called *tweak* [LRW02], to make a unique version of the cipher for every block encrypted under the same key. In algorithm design there is always a trade-off between security and efficiency, and designers always have to ask "What do we want, a very strong and fairly fast cipher, or fairly strong but very fast?" NSABC reflects the authors' view on the dilemma. If Skipjack is regarded as very secure but not very fast, then NSABC may be regarded as a design with stress on the second aspect: make it very fast, abeit not very secure. For 64-bit word length, NSABC key length is 320 bits, optionally plus 256 bits more, but the true level of security is yet to be determined. On the other hand, on a modern 64-bit processor it takes only 5 clock cycles per byte and only 4.4 clock cycles per byte if tweaking is omitted. NSABC is not patented and is put in public domain. As it bases on Skipjack, eventual users should be aware of Skipjack patent(s) which may be possibly held by the U.S. Government and take steps to make sure the use is free of legal issues. The authors are not aware of any patent related to other parts of the design. The rest of the article is organized as follows. Section 2 defines operations over machine words. Section 3 specifies the algorithm. Section 4 gives notes for efficient implementation. Section 5 gives numerical examples. Section 6 concludes the article. 2. Operations over words This section defines operations over words, mainly a multiplication-like group operation, denoted ".", a family containing it, parameterized by the group unit e, denoted "(.)", and quasi-group operations based on "." and "(.)", denoted "o" and "(o)" respectively. We write a**n to denote n-th power of a. We use the symbol +, -, *, ~, ^ to denote two-complement, w-bit (unless otherwise said), arithmetic/logic operations with the C programming languague meanings. Let's define binary operations "." and "o" as follows. x . y = 2*x*y + x + y x o y = 2*x*y + x - y Note that the bivariate polynomials x . y and x o y are permutation polynomials in either variable for every fixed value of the other variable [Riv99]. In other words, "." and "o" are quasi-group operations. Furthermore, "." is a group operation over the set of w-bit numbers. (This fact, although simple and straightforward, does not seem to have been mentioned in the literature.) The operation "." is very similar to [usual] modular multiplication. Indeed, it can be done (at least theoretically) by dropping the rightmost bit, which is always "1", of the product modulo 2**(w+1) of the operands each appended with an "1" bit. Symbolically, x . y = ((2*x+1)*(2*y+1) - 1) mod 2**(w+1) / 2 In other words, the group defined by "." is isomorphic to the multiplicative group of odd integers modulo 2**(w+1), via the isomorphism x |-> 2*x + 1 (mod 2**(w+1)) The unit of the group is 0. The inverse element of x, denoted x', is x' = - x/(2*x + 1) where "/" denotes the inverse operation of "*", i.e. modular multiplication by the modular multiplicative inverse of the denominator, which indeed exists for every x. The following relations hold. (~x) . y = ~(x . y) (1-x) o y = 1 - (x o y) x . y = -((-x) o y) x o y = -((-x) . y) (x . a) . a' = x (x o a) o a' = x Let's define trivariate polynomials p and q as follows. p(x,y,e) = (x-e).(y-e) + e = 2*x*y + (1 - 2*e)*(x + y - e) q(x,y,e) = -p(-x,y,e) = 2*x*y + (1 - 2*e)*(x - y + e) Then, over the set of w-bit numbers, p and q are permutation polynomials in either variable while keeping the other two fixed [Riv99]. Let e be fixed. The binary operations "(.)" and "(o)" defined by x (.) y = p(x,y,e) x (o) y = q(x,y,e) are quasigroup operations. Furthermore, "(.)" defines a group over the set of w-bit numbers that is isomorphic to the group defined by "." above. The isomorphism is x |-> x - e The unit of this group is e. The inverse element of x, denoted x('), is x(') = (x - e)' + e = ((2*e - 1)*x - 2*e*(e-1)) / (2*(x-e) + 1) The following relations hold. x (.) y = -((-x) (o) y) x (o) y = -((-x) (.) y) x (.) y = (x-e) . (y-e) + e x (o) y = (x+e) o (y-e) - e (x (.) a) (.) a(') = x (x (o) a) (o) a(') = x The operation "(o)" will be used for encryption and, due to the last relation, also for decryption. 3. Specification This section provides details of NSABC/w. From now on w, the word length, must be even. Throughout this article, X denotes a 4w-bit plaintext block, Y a 4w-bit ciphertext block, Z a 5w-bit key, T a 4w-bit secret *tweak*, i.e., a value that is used to encrypt only one block under the key. NSABC is specified in terms of two functions: ENCRYPT(X,Z,T), which encrypts X under control of Z and T, DECRYPT(Y,Z,T), which decrypts Y under control of Z and T. Relation: DECRYPT( ENCRYPT(X,Z,T),Z,T ) = X Tweaking is optional and can be disabled by letting T constantly zero. With tweaking disabled, NSABC becomes a conventional, non-tweakable, block cipher. 3.1. Data order We write a multi-part data values in string (number) notation or tuple (vector) notation. In *string notation*, the value is written as a sequence of symbols, possibly separated by space(s) that are insignificant. In *tuple notation*, the value is written as a sequence, in parentheses, of comma-separated symbols. For examples, z y x and 43 210 are in string notation, (x,y,z) and (0,1,2,3,4) are in tuple notation. The string notation indicates *high-first* order. That is, the first (i.e. leftmost) symbol denotes the most significant part of the value when it is intepreted as a number. Conversely, the tuple notation indicates *low-first* order. That is, the first symbol denotes the least significant part of the value when it is interpreted as a number. For examples, if x0, x1, x2 are words, then: -- In x2 x1 x0, the symbol x2 denotes the most significant word when the value x2 x1 x0 is interpreted as a 3-word number. -- In (x0,x1,x2), the symbol x0 denotes the least significant word when the value (x0,x1,x2) is interpreted as a 3-word number. For convenience, the same value may be written in string notation as well as in tuple notation. Thus, for examples, for every a, b, c, d, a b c d = (d,c,b,a) The term "part", used above, usually refers to "word", but it may also refer to "digit" (of a number), "component" (of a tuple or vector), as well as group thereof. For example, if x, y, z, t are known to be 1-digit, 2-digit, 3-digit and 4-digit values respectively, then (x,y,z,t) = 9876543210 means x=0, y=21, z=543 and t=9876. 3.2. Operations on word strings Let R denote the transposition that reverses the word order of a word string. For example, for 8-bit words, 0x0123AB R = 0xAB2301 Let S denote the transposition that swaps the high order and low order halves of every word of a word string. For example, for 8-bit words, 0x0123AB S = 0x1032BA Let ' denote the word-wise application of the inverse operator "'" on a word string. For example, (a,b,c)' = (a',b',c'). The following relations hold. xRR = x xSS = x x'' = x xRS = xSR xR' = x'R The operator "^" on word strings denote word-wise application of "^". For example, (a0,a1,a2) ^ (b0,b1,b2) = (a0^b0, a1^b1, a2^b2) The following relations hold. (x ^ y)S = xS ^ yS (x ^ y)R = xR ^ yR (x ^ y)RS = xRS ^ yRS Unless otherwise specified, operators "+" and "-" on word strings denote word-wise modular addition and subtraction, respectively, i.e. operands are regarded as vectors of words. For example, (a0,a1,a2) + (b0,b1,b2) = (a0+b0, a1+b1, a2+b2) (a0,a1,a2) - (b0,b1,b2) = (a0-b0, a1-b1, a2-b2) The following relations hold. (x + y)R = xR + yR (x - y)R = xR - yR 3.3. Functions ENCRYPT This function is specified in terms of three functions: DE, a data encryption function; KE, a key expansion function; and TE, a tweak expansion function. ENCRYPT uses a 64w-bit constant E, called *bias*, as a parameter to DE. Constant: E...64w-bit bias Input: X...4w-bit plaintext block Z...5w-bit key T...4w-bit tweak Output: Y...4w-bit ciphertext block Relation: ENCRYPT(X,Z,T) = DE(X, KE(Z), TE(T), E) 3.4. Functions DE, KE and TE These functions operate on a 4-word *data register* (x0,x1,x2,x3) and use a 5-word *key register* (z0,z1,z2,z3,z4) and a 4-word *tweak register* (t0,t1,t2,t3). The key register is initially loaded with the key Z. The tweak register is initially loaded with the tweak T. The data register is initially loaded with the plaintext block X, and after encipherment it contains the ciphertext block Y. [The concrete, vector, notation here specifies the order of words so, for example, x0 is initially loaded with the least significant word of X and after encipherment it contains the least significant word of Y.] These functions consist of 32 rounds of operations. In each round, a permutation G applies on the word x0, an exclusive-or (XOR) operation "mixes" x0 and t0 into another word of the data register, that is either x1 or x3, depends on the round *type* that is either *A* or *B*: -- In an A-typed round, the "mixing" takes place after G and targets x1 (see Fig. 1A); -- In a B-typed round, the "mixing" takes place before G and targets x3 (see Fig. 1B). The permutation G is key-dependent and round-dependent. In each round, the pair of key words (z0,z3) and a pair of words from the bias E enter the G-box. At the end of every round, there is a transposition on the data register that circularly exchanges contents of the four words, and other transpositions that do similarly for the key register and the tweak register. Encryption runs in four *passes*: firstly eight rounds of type A, then eight rounds of type B, then eight rounds of type A again, finally eight rounds of type B again. t3(0) t1(0) x3(0) x1(0) z4(0) z2(0) z0(0) | t2(0) | t0(0) | x2(0) | x0(0) | z3(0) | z1(0) | | | | | | | | | | | | | | | | | | | | | +--+--+ | | | | | | | | | | |E(0)--|> G <|---------------------o K(0) | | | | | |E(1)--|> <|------o K(1) | | | | | | | | | +--+--+ | | | | | | | | | | | | | | | | | | | | |L(0)o------------->+<---o | | | | | | | | | | | | | | | | | | \ \ \ | \ \ \ | \ \ \ \ | \ \ \ | \ \ \ | \ \ \ \ | +--\----\----\-+ +--\----\----\-+ +--\----\----\----\-+ | \ \ \ | \ \ \ | \ \ \ \ | | | | | | | | | | | | | | t2(1) | t0(1) | x2(1) | x0(1) | z3(1) | z1(1) | t3(1) t1(1) x3(1) x1(1) z4(1) z2(1) z0(1) Fig. 1A. Round 0 of DE, KE and TE. t3(8) t1(8) x3(8) x1(8) z4(8) z2(8) z0(8) | t2(8) | t0(8) | x2(8) | x0(8) | z3(8) | z1(8) | | | | | | | | | | | | | | | | |L(8)o--->+<-------------o | | | | | | | | | | | | | | | | | | | | | | | | | +--+--+ | | | | | | | | | | |E(16)-|> G <|---------------------o K(16) | | | | | |E(17)-|> <|------o K(17) | | | | | | | | | +--+--+ | | | | | | | | | | | | | | | | | | \ \ \ | \ \ \ | \ \ \ \ | \ \ \ | \ \ \ | \ \ \ \ | +--\----\----\-+ +--\----\----\-+ +--\----\----\----\-+ | \ \ \ | \ \ \ | \ \ \ \ | | | | | | | | | | | | | | t2(9) | t0(9) | x2(9) | x0(9) | z3(9) | z1(9) | t3(9) t1(9) x3(9) x1(9) z4(9) z2(9) z0(9) Fig. 1B. Round 8 of DE, KE and TE. KE (key expansion) Input: Z...5w-bit key Output: K...64w-bit expanded key Pseudo-code: (z0,z1,z2,z3,z4) := Z for k := 0, 1, 2,..., 31 loop K(2*k) := z0 K(2*k+1) := z3 (z0,z1,z2,z3,z4) := (z1,z2,z3,z4,z0) end loop K := (K(0),K(1),K(2),...,K(63)) TE (tweak expansion) Input: T...4w-bit tweak Output: L...32w-bit expanded tweak Pseudo-code: (t0,t1,t2,t3) := T for k := 0, 1, 2,..., 31 loop L(k) := t0 (t0,t1,t2,t3) := (t1,t2,t3,t0) end loop L := (L(0),L(1),L(2),...,L(31)) DE (data encryption) Input: X...4w-bit plaintext block K...64w-bit expanded key L...32w-bit expanded tweak E...64w-bit bias Output: Y...4w-bit ciphertext block Pseudo-code: (x0,x1,x2,x3) := X for k := 0, 1, 2,..., 31 loop if 0 <= k <= 7 or 16 <= k <= 23 then x0 := G( x0, (K(2*k), K(2*k+1)), (E(2*k), E(2*k+1)) ) x1 := x1 ^ x0 ^ L(k) elsif 8 <= k <= 15 or 24 <= k <= 31 then x3 := x3 ^ x0 ^ L(k) x0 := G( x0, (K(2*k), K(2*k+1)), (E(2*k), E(2*k+1)) ) end if (x0,x1,x2,x3) := (x1,x2,x3,x0) end loop Y := (x0,x1,x2,x3) Relations: Y = (x0(32),x1(32),x2(32),x3(32)) for 0 <= k <= 7 or 0 <= k <= 23: x0(k+1) = x1(k) ^ g(k) ^ L(k) x1(k+1) = x2(k) x2(k+1) = x3(k) x3(k+1) = g(k) for 8 <= k <= 15 or 24 <= k <= 31: x0(k+1) = x1(k) x1(k+1) = x2(k) x2(k+1) = x3(k) ^ x0(k) ^ L(k) x3(k+1) = g(k) for 0 <= k <= 31: g(k) = G(x0(k), (K(2*k), K(2*k+1)), (E(2*k), E(2*k+1))) K(2*k) = z0(k) K(2*k+1) = z3(k) z0(k+1) = z1(k) z1(k+1) = z2(k) z2(k+1) = z3(k) z3(k+1) = z4(k) z4(k+1) = z0(k) L(k) = t0(k) t0(k+1) = t1(k) t1(k+1) = t2(k) t2(k+1) = t3(k) t3(k+1) = t0(k) K = (K(0),K(1),K(2),...,K(63)) L = (L(0),L(1),L(2),...,L(31)) (x0(0),x1(0),x2(0),x3(0)) = X (z0(0),z1(0),z2(0),z3(0),z4(0)) = Z (t0(0),t1(0),t2(0),t3(0)) = T (E(0),E(1),E(2),...,E(63)) = E NOTES (1) The bias E is declared as a parameter to DE (although actually it is a well-specified constant in the algorithm) to indicate the possibility of further parameterization. One may choose his/her own E to create an other, "non-standard", variant of ENCRYPT [and DECRYPT]. (2) The full algorithm is illustrated in Fig. 2, where (X0,X1,X2,X3)=X, (Y0,Y1,Y2,Y3)=Y, (Z0,Z1,Z2,Z3,Z4)=Z, (T0,T1,T2,T3)=T and the constant E is omitted. The figure is obtained by "unrolling" [i.e. eliminating all circular transpositions of] the dataflow graph of the full cipher that would be resulted from cascading the individual rounds as in Fig. 1A and Fig. 1B. X3 X2 X1 X0 | | | | | | | 0 G<-Z0,Z3 | | T0->+<---------o | | 1 G<-Z1,Z4 | | T1->+<---------o | | 2 G<-Z2,Z0 | | T2->+<---------o | | 3 G<-Z3,Z1 | | | o------------------------------->+<-T3 | | | 4 G<-Z4,Z2 | | T0->+<---------o | | 5 G<-Z0,Z3 | | T1->+<---------o | | 6 G<-Z1,Z4 | | T2->+<---------o | | 7 G<-Z2,Z0 | | | | ----------------------------o | / | | | o--/---------------------------->+<-T3 / | | 8 G<-Z3,Z1 / | o--------->+<-T1 T0->+ | 9 G<-Z4,Z2 | | o--------->+<-T2 | | 10 G<-Z0,Z3 | | o--------->+<-T3 | | 11 G<-Z1,Z4 | | | T0->+<-------------------------------o | | | 12 G<-Z2,Z0 | | o--------->+<-T1 | | 13 G<-Z3,Z1 | | o--------->+<-T2 | | 14 G<-Z4,Z2 | | o--------->+<-T3 | | 15 G<-Z0,Z3 | | | | | | | | | | 16 G<-Z1,Z4 | | T0->+<---------o | | 17 G<-Z2,Z0 | | T1->+<---------o | | 18 G<-Z3,Z1 | | T2->+<---------o | | 19 G<-Z4,Z2 | | | o------------------------------->+<-T3 | | | 20 G<-Z0,Z3 | | T0->+<---------o | | 21 G<-Z1,Z4 | | T1->+<---------o | | 22 G<-Z2,Z0 | | T2->+<---------o | | 23 G<-Z3,Z1 | | | | ----------------------------o | / | | | o--/---------------------------->+<-T3 / | | 24 G<-Z4,Z2 / | o--------->+<-T1 T0->+ | 25 G<-Z0,Z3 | | o--------->+<-T2 | | 26 G<-Z1,Z4 | | o--------->+<-T3 | | 27 G<-Z2,Z0 | | | T0->+<-------------------------------o | | | 28 G<-Z3,Z1 | | o--------->+<-T1 | | 29 G<-Z4,Z2 | | o--------->+<-T2 | | 30 G<-Z0,Z3 | | o--------->+<-T3 | | 31 G<-Z1,Z4 | | | | | | | Y3 Y2 Y1 Y0 Fig. 2. The full cipher, by "unrolling" the graphs in Fig. 1A and Fig. 1B. (3) The round structure is up to word indexing the same as that of Skipjack. The word re-indexing, which is cryptographically insignificant, was introduced to ease description and illustration. 3.5. Bias E The 64w-bit constant E, which is used by ENCRYPT and DECRYPT and enters as a parameter to DE, is generated in 32 rounds of operation on a 2w-bit *bias register* (f0,f1) that is initialized by the constant value (0,1). In each round, the content of f1 is added to f0, then the content of f0 is added to f1, then the content of the register becomes a pair of words of E (see Fig. 3.) f1(5) f0(5) | | o---->[+] | | [+]<----o | | f1(6) f0(6) E(11) E(10) Fig. 3. Round 5 of E (regarded as a constant function). Input: none. Output: E...64w-bit bias Pseudo-code: (f0,f1) := (0,1) for k := 0,1,2,...,31 loop f0 := f0 + f1 f1 := f1 + f0 E(2*k) := f0 E(2*k+1)) := f1 end loop E := (E(0),E(1),E(2),...,E(63)) Relations: E = (E(0),E(1),E(2),...,E(63)) for k = 0,1,2,...,31: E(2*k) = f0(k+1) E(2*k+1) = f1(k+1) f0(k+1) = f0(k) + f1(k) f1(k+1) = f1(k) + f0(k+1) (f0(0),f1(0)) = (0,1) NOTES (1) It holds E(n) = F(n+2), where F(_) is defined by F(n+2) = F(n+1) + F(n) and (F(0),F(1)) = (0,1), i.e. E = (1, 2, 3, 5, 8, 13,..., 17167680177565) (mod 2**w). In other words, E is the sequence of 64 consecutive Fibonacci numbers immediately following the initial 0 and 1, modulo 2**w. (2) The round structure of E, usually known as *pseudo-Hadamard transform*, was used as a diffusion construct in prior cipher designs [Mas94, Sch+98]. Here it is not aimed at diffusion since it generates a constant, which has zero uncertainty. 3.6. Function G Function G, which takes as argument a w-bit intermediate plaintext word and is parameterized by a pair of w-bit subkey words (k0,k1) and a pair of constant words (e0,e1) to return a w-bit intermediate ciphertext word, is a cascade of two rounds, each consisting of an evaluation of the polynomial q (defined in section 2) followed by a swap of half-words (see Fig. 4). The first evaluation uses k0 and e0, the second one uses k1 and e1. argument | | +------+ e0---|> q <|---k0 +------+ | | \ / \/ /\ / \ | | +------+ e1---|> q <|---k1 +------+ | | \ / \/ /\ / \ | | result Fig. 4. Function G. Relation: G(x,(k0,k1),(e0,e1)) = q( q(x,k0,e0)S, k1, e1 )S NOTES (1) The algorithm uses 64 instances of the quasi-group operation "(o)", each with its own choice of the underlying group defined by the operation "(.)" with unit e. So, to be specific, instead of writing something like x (o) z, we wrote q(x,z,e). (2) Alternatively, it may be seen as using 64 identical instances of the simpler operation "o", or of the simplest, group operation ".", but operands and result of each instance of the operation are "biased" by adding or subtracting the constant e that is specific to the instance, and furthermore, for ".", the left operand enters and the result leaves it in altered sign. That's why E, the sequence of such e, is termed the "bias". 3.7. Function DECRYPT DECRYPT can be easily derived from the relation with ENCRYPT. Here it is given explitly in terms of functions DE, KE and TE. Relations: DE( DE(X,KE(Z),TE(T),E)RS, (KE(ZR)-ER)'+ER, TE(TRS), ER )RS = X DECRYPT(Y,Z,T) = DE( YRS, (KE(ZR)-ER)'+ER, TE(TRS), ER )RS In other words, encrypting the plaintext block Y in reverse half-word order (Y RS) using the tweak T in reverse half-word order (T RS) and the key schedule consisting of inverse words of an expanded key resulted from the key Z in reverse word order (Z R), where the inversions are of the [algebraic] groups defined by the operation "(.)" and each group is specified by its unit that is the corresponding word of the constant E in reverse word order (E R), recovers the plaintext block X in reverse half-word order (X RS). NOTES (1) Like Skipjack, decryption is similar to encryption. To decrypt with Skipjack, one swaps adjacent words, i.e. swaps the first and the second and swaps the third and the fourth words in the cipher block, the key and the plain block. To decrypt with NSABC, one reverses the word order, i.e., swaps the first and the last words as well as the second first and the second last ones. (2) Unlike Skipjack, DECRYPT cannot be expressed explicitly in terms of ENCRYPT. That's why our cipher is termed "near Skipjack" only. 3.8. Tweak derivation The 4w-bit secret tweak T is used to encrypt only one block [under a given key Z]. In order to encrypt multiple blocks the tweak is derived from the block index and a 4w-bit [additional] key, called *tweak key*, as follows. Let T(j) denote the tweak used to encrypt j-th block. For zero-th block, the tweak key is used as the tweak directly: T(0) = tweak key Successive tweak is computed from the current tweak by the recurrent relation: T(j) = T(j-1) + 2*T(0) + 1 or, equivalently, T(j) = T(0) . j where all operators, including ".", are defined on 4w-bit modular arithmetics and all operands, including 1 and j, are regarded as 4w-bit numbers. NOTES (1) The second relation is meant for sequential access and the third is meant for random access to the j-th block. There T(0) conveniently refers to the [unnamed] tweak key. (2) The family of functions {T: j |-> T(0).j}, parameterized by the tweak key T(0), is not epsilon-almost 2-XOR universal w.r.t. definition by [LRW02]. Eventual application of the present tweak derivation method in the Liskov-Rivest-Wagner construction, i.e. by defining ENCRYPT(X,Z,T,j) = DE(X^T(j),KE(Z),0)^T(j), is therefore impossible. 4. Notes on implementation This section provides methods for efficient software implementation for two types of environment: memory-constrained, such as embedded computers, and memory-abundant, such as servers and workstations. 4.1. Memory-constrained environment (1) The function ENCRYPT can be implemented without using any memory on a processor with at least 16 word registers: 4 for (x0,x1,x2,x3) -- data, 5 for (z0,z1,z2,z3,z4) -- key, 4 for (t0,t1,t2,t3) -- tweak, 2 for (f0,f1) -- bias, and 1 for k -- round index. Indeed, all other intermediate word strings (namely the expanded key K, the expanded tweak L and the bias E) can vanish because every word of them, once produced, can be consumed immediately, provided that the functions KE, TE, E (regarded as a constant function) and DE are programmed to run in parallel and in sync with each increment of k. (2) In an extremely memory-constrained environment, modes of operation that avoid DECRYPT (i.e. ones using ENCRYPT to decrypt) are preferable. That's because modular multiplicative inversion is so time-consuming that every practial implementation of DECRYPT must always pre-compute the key schedule. 4.2. Memory-abundant environment (1) The polynomial q in function G can be evaluated using only one multiplication and one addition, with a modified key schedule. Indeed, q(x,z,e) = M*x + N where x is a data word, z key word, e the unit of the [quasi-]group in use, and M = 2*(z - e) + 1 N = (2*e - 1)*(z - e) is a pair of subkeys precomputed in a [modified] key schedule consisting of 64 such pairs. (2) The cipher is partially parallelizable. The following procedure executes all 32 rounds in 20 steps, of which half performing two or three parallel evaluations of the function G. The procedure becomes evident by examining the dataflow graph of the algorithm, shown in Fig. 5, which is obtained by "unrolling" the one in Fig. 2. (Here "unrolling" means introducing a circular transposition so that the G-boxes at round k and k+3 lay on a vertical line.) Recall that g(k) is the result of G in round k. 1. Compute g(0) 2. Compute g(1) 3. Compute g(2) 4. Compute g(3) 5. Compute g(4) 6. Compute g(5), g(11) in parallel 7. Compute g(6), g(9) in parallel 8. Compute g(7), g(10), g(13) in parallel 9. Compute g(8), g(14) in parallel 10. Compute g(12), g(15) in parallel 11. Compute g(16) 12. Compute g(17) 13. Compute g(18) 14. Compute g(19) 15. Compute g(20) 16. Compute g(21), g(27) in parallel 17. Compute g(22), g(25) in parallel 18. Compute g(23), g(26), g(29) in parallel 19. Compute g(24), g(30) in parallel 20. Compute g(28), g(31) in parallel On a modern 64-bit processor, each step should take about 8 clock cycles on average. So, for w=64 i.e. 32-byte blocks it achieves 20*8/32 = 5 clock cycles per byte encrypted. If the tweak T is not used one may remove it from the source code completely to save about 1 clock cycle per byte per step, resulting in 20*7/32 = 4.4 clock cycles per byte. X0 X1 \ \ \0 \1 G-----G \ \ X2 X3 \ \ \ \ 1 \2 \3 \4 G-----G-----G-----G \ \ \ \ \ \ \ \ \ \ \ \ 4 \5 \6 \7 G-----G-----G-----G \ \ \ \ \ \ . . . |\ |\ |\ 7 | \8 | \9 | \10 G--|--G | G | G | \ | \ | | \| \| . + + |\ |\ |\ 10 | \11 | \12 | \13 G | G | G | G \ | \ | \ | \| \| \| + + + |\ |\ \ 13 | \14 | \15 \16 G | G | G G \ | \ | \ \| \| \ + + \ \ \ \ 16 \17 \18 \19 G-----G-----G-----G \ \ \ \ \ \ \ \ \ \ \ \ 19 \20 \21 \22 G-----G-----G-----G \ \ \ \ \ \ \ . . \ |\ |\ 22 \23 | \24 | \25 G-----G--|--G | G \ | \ | \ | \| . . + |\ |\ |\ 25 | \26 | \27 | \28 G | G | G | G \ | \ | \ | \| \| \| + + + |\ |\ |\ 28 | \29 | \30 | \31 G | G | G | G \ | \ | \ | \| \| \| + + + \ \ \ 31 \ \ \ G Y0 Y1 Y2 \ \Y3 Fig. 5. The full cipher, by "unrolling" the graph in Fig. 2. 5. Examples This section provides examples of NSABC/16 and NSABC/64 encryption. Tab. 5.1 shows the contents of all registers in NSABC/16 at the start of round k, for k = 0, 1, 2,..., 32 where k = 32 conveniently denotes the end of round 31, which is that of the algorithm, during the execution of ENCRYPT under key = 0x0AA00BB00CC00DD00EE0, tweak = 0x0001002203334444, and plain = 0x0123456789ABCDEF. Tab. 5.2 represents 26 vectors of ENCRYPT in NSABC/16 under parameters thereby specified. Tab. 5.3 represents a vector for iterated encryption in NSABC/64. Encryption runs under a fixed key Z. Using the tweak T(0), a plain block X(0) is encrypted into a cipher block denoted Y(0). Every Y(j) is then recorded and become X(j+1), the plain block for the next encipherment (if any), using a new tweak T(j+1) that is derived from T(j) by the method specified in subsection 3.8. The vector is printed with Z, T(0), X(0) and X(j) for some other values of j. All data are printed hexadecimally, except the key Z that is in base32-without- padding format specified by the IETF RFC 4648 [Jos06]. Tab. 5.1. Contents of registers during ENCRYPT in NSABC/16. ======= ======== ==================== ================ ================ Round Bias reg Key register Tweak register Data register index k f1 f0 z4 z3 z2 z1 z0 t3 t2 t1 t0 x3 x2 x1 x0 ======= ======== ==================== ================ ================ 0 00010000 0AA00BB00CC00DD00EE0 0001002203334444 0123456789ABCDEF 1 00020001 0EE00AA00BB00CC00DD0 4444000100220333 55BC012345679853 2 00050003 0DD00EE00AA00BB00CC0 0333444400010022 33DB55BC0123758F 3 000D0008 0CC00DD00EE00AA00BB0 0022033344440001 6ADC33DB55BC6BDD 4 00220015 0BB00CC00DD00EE00AA0 0001002203334444 012E6ADC33DB5493 5 00590037 0AA00BB00CC00DD00EE0 4444000100220333 C87D012E6ADCBFE2 6 00E90090 0EE00AA00BB00CC00DD0 0333444400010022 0750C87D012E6EBF 7 02620179 0DD00EE00AA00BB00CC0 0022033344440001 82C80750C87D83C4 8 063D03DB 0CC00DD00EE00AA00BB0 0001002203334444 A62C82C807506E50 9 10550A18 0BB00CC00DD00EE00AA0 4444000100220333 3BA48C3882C80750 10 2AC21A6D 0AA00BB00CC00DD00EE0 0333444400010022 42573FC78C3882C8 11 6FF1452F 0EE00AA00BB00CC00DD0 0022033344440001 C5C1C0BD3FC78C38 12 2511B520 0DD00EE00AA00BB00CC0 0001002203334444 637A49F8C0BD3FC7 13 FF42DA31 0CC00DD00EE00AA00BB0 4444000100220333 0A3318F949F8C0BD 14 D8B5D973 0BB00CC00DD00EE00AA0 0333444400010022 4278C9BD18F949F8 15 8ADDB228 0AA00BB00CC00DD00EE0 0022033344440001 74260BA2C9BD18F9 16 C7E23D05 0EE00AA00BB00CC00DD0 0001002203334444 616C6CDE0BA2C9BD 17 CCC904E7 0DD00EE00AA00BB00CC0 4444000100220333 7041616C6CDE3FA7 18 9E79D1B0 0CC00DD00EE00AA00BB0 0333444400010022 FE487041616C91A5 19 0EA27029 0BB00CC00DD00EE00AA0 0022033344440001 7714FE487041165A 20 8D6D7ECB 0AA00BB00CC00DD00EE0 0001002203334444 B3237714FE48C363 21 99A50C38 0EE00AA00BB00CC00DD0 4444000100220333 3B42B3237714814E 22 3F82A5DD 0DD00EE00AA00BB00CC0 0333444400010022 B0C33B42B323C4E4 23 24E1E55F 0CC00DD00EE00AA00BB0 0022033344440001 316FB0C33B42826E 24 2F210A40 0BB00CC00DD00EE00AA0 0001002203334444 5A0A316FB0C36149 25 68823961 0AA00BB00CC00DD00EE0 4444000100220333 71617F07316FB0C3 26 0A65A1E3 0EE00AA00BB00CC00DD0 0333444400010022 1CF3C2917F07316F 27 B6ADAC48 0DD00EE00AA00BB00CC0 0022033344440001 EBA52DBEC2917F07 28 19A262F5 0CC00DD00EE00AA00BB0 0001002203334444 648794A32DBEC291 29 96397C97 0BB00CC00DD00EE00AA0 4444000100220333 1E37E25294A32DBE 30 A90912D0 0AA00BB00CC00DD00EE0 0333444400010022 079D30BAE25294A3 31 64E2BBD9 0EE00AA00BB00CC00DD0 0022033344440001 BAFA931C30BAE252 32 859D20BB 0DD00EE00AA00BB00CC0 0001002203334444 5CA158A9931C30BA Tab. 5.2. Some vectors for NSABC/16 single block encipherments. ==================== ================ ================ ================ Z (key) T (tweak) X (plain) Y (cipher) ==================== ================ ================ ================ 00000000000000000000 0000000000000000 0000000000000098 1275C2B1DC5E3621 00000000000000000000 0000000000000000 0000000000009800 720620D75549C54D 00000000000000000000 0000000000000000 0000000000980000 1F2E2B943747B1DE 00000000000000000000 0000000000000000 0000000098000000 B1810BC2441D1445 00000000000000000000 0000000000000000 0000009800000000 ED61842661D49429 00000000000000000000 0000000000000000 0000980000000000 EA1C14B378A7B595 00000000000000000000 0000000000000000 0098000000000000 4DCBF3FCA282C4C4 00000000000000000000 0000000000000000 9800000000000000 0609058C12509447 00000000000000000000 0000000000000098 0000000000000000 AE4D0DFF58FFD3B2 00000000000000000000 0000000000009800 0000000000000000 D20388E6B94C96DF 00000000000000000000 0000000000980000 0000000000000000 A64BFCFDB32C5DBF 00000000000000000000 0000000098000000 0000000000000000 650F1DF8371AF343 00000000000000000000 0000009800000000 0000000000000000 BC8FA7FE1C928EAB 00000000000000000000 0000980000000000 0000000000000000 62C00204766B5BCE 00000000000000000000 0098000000000000 0000000000000000 40F6C79331EAD337 00000000000000000000 9800000000000000 0000000000000000 A9A7601DD937CFBA 00000000000000000098 0000000000000000 0000000000000000 2D078508E13FEA31 00000000000000009800 0000000000000000 0000000000000000 75FBFF2A6706DFDB 00000000000000980000 0000000000000000 0000000000000000 70BF79184C06F9AB 00000000000098000000 0000000000000000 0000000000000000 D505CBFAB03913EF 00000000009800000000 0000000000000000 0000000000000000 3EEC55CD065E275B 00000000980000000000 0000000000000000 0000000000000000 4ADA724B4913D4D0 00000098000000000000 0000000000000000 0000000000000000 F492E17C8F066561 00009800000000000000 0000000000000000 0000000000000000 F9060CFF3C7A6FA1 00980000000000000000 0000000000000000 0000000000000000 F7CDC677FF2F087F 98000000000000000000 0000000000000000 0000000000000000 008B4A112C5B2B5E Tab. 5.3. A vector for iterated NSABC/64 encryption. ====== ================================================================ Label Value ====== ================================================================ Z DIFFUZE2HOPEBESTCONFVSE2THROWDICENSAZGOODATLASTQOZSKIPJAXDAMNYCE T(0) 1DEA15C007316AC7EC1A15F00706AC7EFA17ED270CA7EA7CA717774E35A3E7BA X(0) 31ACB074E5A185A590D471305CAD03A0E43A8C9314C8B52C01843A8708C51AA3 X(1) 3ED6DCD4A1E13501493BF8CC306C0130987CF497E67D92F100B10451494A91BD X(2) 8242D4BC301A3FCEC9D751077D5C18A2384508B785BABC6A2446948862B47EE6 X(3) 4DB8C3D9385F21EE22C8D4CD81A9C4ED8FCC040DE2652F22DE5A6F8AAD977AD6 X(5) 5A756A9300A1F94CD073A8F336CDA083DBD445A39F0BF7596472EC7EDDD4816F X(8) 33F7CB09C2F6A07507C56F833D6746F33A54E0CEE66CE1F0FA428DA04B8C23BF X(13) AC1FCD227034D51A047CCBDA5FE27E64F9D3A59D24B382066174F49C63FBFFE9 X(21) 1DBC7DD4E5F3CC89EC726D5B0DDFB1E949DCCD8B2D67038AD998C8B33FF4F6F1 X(34) EA0EFFC95B7921F1A7266C2F902A8D97A3305DC0945F1BB10FED47C743F61E6D X(55) 65C8DEBA8EE3B6730480DF43F8F9B0FC763FEE15E1C93B06208D9A01411C3A17 X(89) 46043207307750E211E66BA40E2C21310CF7769881CBDBCF326E251E99BCD3C9 X(144) 9CF208AF5D94AD1486B255A19261BEBA6F15911CDAA2E339A5A7B590A5E73498 X(233) 903CA8F6BE987A3FE00EB97F94767678CB619A7898FFC871CD19BEADE4E5A95F X(377) D068EEA27849B0EB10101F9D79379DC5EE3D637764C9EFA462178CB467247FDC X(610) 821C6763628A36F2D383F516E550A11E60C7F12B5AB545F66C0B4AA09430C6CD X(987) B6CF160291E3D46CBBA728277466AE1BF9BDED75D3B6289E40EC1BA3287C83B9 6. Conclusion We defined NSABC, a Skipjack-like block cipher with fine scalability, utilizing a group operation over machine words of every size, that is basically modular multiplication, a powerful operation available on many processors. NSABC is meant to be simple. It uses no S-boxes or "magic" constants. It makes use only two well-known word-oriented algebraic operations: exclusive-or and modular multiplication. It re-uses the simple and regular structure of Skipjack that has become publicly known for over a decade -- sufficient time to be truly understood. The simplicity makes it easier to memorize and to analyze, which in turn makes deeper understanding and greater confidence in the algorithm. NSABC bases on some valuable design of a well-reputed agency in the branch. We therefore believe that it is worth analysis and it can withstand rigorous analysis. If this happens to be true, then we may have a strong cipher with 256-bit blocks, allowing to encrypt enormous amount of data under the same key, and with 320-bit keys, allowing to keep encrypted data secret over any imaginable time. NSABC is fast enough to be comparable with modern block ciphers. 7. References [IBM98] Burwick C., Coppersmith D., D'Avignon E., Gennaro G., Halevi S., Jutla C., Matyas S. M. Jr., O'Connor L., Peyravian M., Safford D., Zunic N.: MARS - a candidate cipher for AES. 1998. [LM91] Lai X., Massey J. L.: A proposal for a new block encryption standard. 1991. [LMM91] Lai X., Massey J. L., Murphy S.: Markov ciphers and differential cryptanalysis. 1991. [LRW02] Liskov M., Rivest R. L., Wagner D.: Tweakable block ciphers. 2002. [Mas94] Massey J. L.: A byte-oriented block-ciphering algorithm. 1994. [NSA98] National Security Agency: Skipjack and KEA specification. 1998. [Jos06] Josefsson S.: The Base16, Base32, and Base64 data encodings, RFC4648. 2006. [Riv99] Rivest R. L.: Permutation polynomials modulo 2**w. 1999. [Sha49] Shannon C. E.: Communication theory of secrecy systems. 1949. [Sch+98] Schneier B., Kelsey J., Whiting D., Wagner D., Hall C., Ferguson N.: Twofish -- a 128-bit block cipher. 1998. --- news://freenews.netfront.net/ - complaints: news(a)netfront.net --- |