The block cipher NSABC spec -- updated [Cryptography]

Prev: Podcast interview w/ Dr. Jay Kennedy re Plato's secret codes
Next: Prime Number SUMS model the Riemann Zeta function

From: Tran Ngoc Duong on 6 Aug 2010 09:55

This is the updated version of the article originally posted on 26-27
July. Changes include completely re-written subsections 3.1 "Data order"
and 3.5 "Bias E", more notes explaining simplicity and efficiency,
clarified description, added illustrations and examples (as suggested by
Tom St. Denis and others), minor wording improvements. The algorithm
itself remains unchanged.

Many thanks for your consideration. For those who are not interested,
I'm sorry for long post. Again.

Tran Ngoc Duong.

========================================================================
THE BLOCK CIPHER NSABC
(Public domain)

Alice Nguyenova-Stepanikova

Tran Ngoc Duong <tranngocduong at gmail dot com> (corresponding author)

27th July, 2010

A b s t r a c t. We introduce NSABC/w (Near-Skipjack Algebraic Block
Cipher operating on w-bit word arithmetics, w even), a 4w-bit analogous
of Skipjack [NSA98] with 5w-bit key. The Skipjack's internal 4-round
Feistel structure is replaced with a w-bit, 2-round cascade of the
binary operation (x,z) |-> (2*x*z + (1-2*e)*(x-z+e)) <<< (w/2),
partially encrypting a data word x under control of a key word z and a
round-dependent constant word e. Similar to the multiplication in IDEA
[LM91, LMM91], the present operation bases on an algebraic group over
w-bit words, so it is also capable of decrypting by means of the inverse
element of z in the group. The cipher may utilize an optional secret
4w-bit "tweak", i.e. an easily changeable parameter with unique value
for every block encrypted under the same key [LRW02], that is in turn
derived from the block index and a 4w-bit [additional] key, to
facilitate applications such as disk encryption. With w=64, on a modern
64-bit processor, encryption takes about 5 clock cycles per byte and,
when the tweak is omitted, only 4.4 clock cycles per byte.

1. Introduction

In the today's world full of cryptographic algorithms, one may wonder
what makes a block cipher attractive.

The authors' answer to the question is one word: elegance. If something
looks nice, then there is a big chance that it is also good.

An elegant specification makes it easier to memorize. Memorizability
makes it easier to analyze, that allows for fruitful cryptanalytic
results, leading to deeper understanding which, in turn, makes greater
confidence in the algorithm.

The elegance comprises the following features:

-- Few algebraic operations, best use [quasi-]group ones. Using of many
operations, such as tables of constants, S-boxes, data-dependent
transpositions or operator selections results in "wild",
hardly-tractable and possibly undesirable interactions between
operations. [Quasi-]group operations are preferable because of the
"perfect secrecy": if x is product of y and z in a quasigroup, knowledge
about either variable gives no information about any of the other two.

-- Simple and regular key schedule. A complex key schedule with, say,
bit transpositions or non-linear operations, which effectively adds
another, unrelated, function to the cipher, results in hardly-tractable
and possibly undesirable interactions between the functions. From the
pragmatic viewpoint, moreover, a simple and regular key schedule is very
fast, which makes the cipher much more useful as a universal primitive
to construct others, such as hash functions.

IDEA, a secure block cipher designed by Xuejia Lai and James L. Massey
[LM91, LMM91] is an example of elegance. Besides being elegant with an
efficient choice and arrangement of algebraic operations, it is elegant
for some more features:

-- The use of "incompatible" group operations, where "incompatible"
means there are no simple relations (such as distributivity) between
them. The incompatibility excludes any exploitable algebraic property
and makes it impossible to solve the cipher algebraically.

-- The use of modular multiplication. Multiplication is apparently more
powerful than addition and thus greatly contributes to security and
efficiency of the cipher.

However, modular multiplication in IDEA uses the Fermat prime modulus
2**w+1 which does not exist for w=32 or w=64, making it not extendable
to machine word lengths nowaday. Furthermore, it rotates the [primary]
key in a way that some key bits are re-used much more slowly than
others, making the key schedule irregular.

Skipjack, a secure block cipher designed by cryptographers of the U.S.
National Security Agency [NSA98], is another example of elegant design.
Besides being elegant with an efficient, simple and regular key
schedule, it is elegant for one more feature:

-- The use of two ciphers, an "outer" cipher, or *wrapper*, consisting
of first and last rounds, and an "inner" cipher, or *core*, consisting
of middle rounds. The terms are introduced in the design rationale of
MARS [IBM98], a block cipher structurally similar to Skipjack. The
MARS's designers justifies this two-layer structure that it breaks any
repetitious property, it makes any iterative characteristic impossible,
and it disallows any propagation of eventual vulnerabilities in either
layer to the other one, thus making attacks more difficult. The wrapper
is primarily aimed at fast diffusion and the core primarily at strong
confusion. As Claude E. Shannon termed in his pioneer work [Sha49],
*diffusion* here refers to the process of letting one input bit affect
many output bits in very different ways, and *confusion* here refers to
the process of letting one output bit be affected by many input bits in
very different ways. Skipjack (as opposed to MARS) was sought elegant as
the wrapper here is, in essence, the inverse function of the core.

However, Skipjack uses an S-box, making it rather slow, hard to program
in a secure manner, and not extendable to large machine word lengths, as
such.

This article describes an atempt to combine the ideas of elegance of
IDEA with the nice structure of Skipjack into a scalable and tweakable
block cipher called NSABC -- Near-Skipjack Algebraic Block Cipher.

NSABC uses the entire overall structure of Skipjack, including the key
schedule (except word re-numbering which is merely cosmetic), and only
replaces the intereal 4-round Feistel structure of Skipjack with another
structure. The new structure consists of two rounds of an operation,
denoted "(o)", which is essentially multiplication of odd integers
modulo 2**(w+1) that are represented by integers modulo 2**w. Each
instance of "(o)" is followed by a swap of high and low half-words. Thus
NSABC uses only two algebraic operations, XOR and "o", which are
mutually incompatible.

NSABC is scalable. It is defined for every even word length w, which
allows scaling up with 8-bit increment in block length and 10-bit
increment in key length.

NSABC is tweakable. It can use an easily changeable parameter, called
*tweak* [LRW02], to make a unique version of the cipher for every block
encrypted under the same key.

In algorithm design there is always a trade-off between security and
efficiency, and designers always have to ask "What do we want, a very
strong and fairly fast cipher, or fairly strong but very fast?"

NSABC reflects the authors' view on the dilemma. If Skipjack is regarded
as very secure but not very fast, then NSABC may be regarded as a design
with stress on the second aspect: make it very fast, abeit not very
secure. For 64-bit word length, NSABC key length is 320 bits, optionally
plus 256 bits more, but the true level of security is yet to be
determined. On the other hand, on a modern 64-bit processor it takes
only 5 clock cycles per byte and only 4.4 clock cycles per byte if
tweaking is omitted.

NSABC is not patented and is put in public domain. As it bases on
Skipjack, eventual users should be aware of Skipjack patent(s) which may
be possibly held by the U.S. Government and take steps to make sure the
use is free of legal issues. The authors are not aware of any patent
related to other parts of the design.

The rest of the article is organized as follows. Section 2 defines
operations over machine words. Section 3 specifies the algorithm.
Section 4 gives notes for efficient implementation. Section 5 gives
numerical examples. Section 6 concludes the article.

2. Operations over words

This section defines operations over words, mainly a multiplication-like
group operation, denoted ".", a family containing it, parameterized by
the group unit e, denoted "(.)", and quasi-group operations based on "."
and "(.)", denoted "o" and "(o)" respectively.

We write a**n to denote n-th power of a. We use the symbol +, -, *, ~, ^
to denote two-complement, w-bit (unless otherwise said),
arithmetic/logic operations with the C programming languague meanings.

Let's define binary operations "." and "o" as follows.

x . y = 2*x*y + x + y

x o y = 2*x*y + x - y

Note that the bivariate polynomials x . y and x o y are permutation
polynomials in either variable for every fixed value of the other
variable [Riv99]. In other words, "." and "o" are quasi-group
operations.

Furthermore, "." is a group operation over the set of w-bit numbers.
(This fact, although simple and straightforward, does not seem to have
been mentioned in the literature.) The operation "." is very similar to
[usual] modular multiplication. Indeed, it can be done (at least
theoretically) by dropping the rightmost bit, which is always "1", of
the product modulo 2**(w+1) of the operands each appended with an "1"
bit. Symbolically,

x . y = ((2*x+1)*(2*y+1) - 1) mod 2**(w+1) / 2

In other words, the group defined by "." is isomorphic to the
multiplicative group of odd integers modulo 2**(w+1), via the
isomorphism

x |-> 2*x + 1 (mod 2**(w+1))

The unit of the group is 0. The inverse element of x, denoted x', is

x' = - x/(2*x + 1)

where "/" denotes the inverse operation of "*", i.e. modular
multiplication by the modular multiplicative inverse of the denominator,
which indeed exists for every x.

The following relations hold.

(~x) . y = ~(x . y)

(1-x) o y = 1 - (x o y)

x . y = -((-x) o y)

x o y = -((-x) . y)

(x . a) . a' = x

(x o a) o a' = x

Let's define trivariate polynomials p and q as follows.

p(x,y,e) = (x-e).(y-e) + e

= 2*x*y + (1 - 2*e)*(x + y - e)

q(x,y,e) = -p(-x,y,e)

= 2*x*y + (1 - 2*e)*(x - y + e)

Then, over the set of w-bit numbers, p and q are permutation polynomials
in either variable while keeping the other two fixed [Riv99].

Let e be fixed. The binary operations "(.)" and "(o)" defined by

x (.) y = p(x,y,e)

x (o) y = q(x,y,e)

are quasigroup operations. Furthermore, "(.)" defines a group over the
set of w-bit numbers that is isomorphic to the group defined by "."
above. The isomorphism is

x |-> x - e

The unit of this group is e.

The inverse element of x, denoted x('), is

x(') = (x - e)' + e

= ((2*e - 1)*x - 2*e*(e-1)) / (2*(x-e) + 1)

The following relations hold.

x (.) y = -((-x) (o) y)

x (o) y = -((-x) (.) y)

x (.) y = (x-e) . (y-e) + e

x (o) y = (x+e) o (y-e) - e

(x (.) a) (.) a(') = x

(x (o) a) (o) a(') = x

The operation "(o)" will be used for encryption and, due to the last
relation, also for decryption.

3. Specification

This section provides details of NSABC/w. From now on w, the word
length, must be even.

Throughout this article, X denotes a 4w-bit plaintext block, Y a 4w-bit
ciphertext block, Z a 5w-bit key, T a 4w-bit secret *tweak*, i.e., a
value that is used to encrypt only one block under the key.

NSABC is specified in terms of two functions:

ENCRYPT(X,Z,T), which encrypts X under control of Z and T,

DECRYPT(Y,Z,T), which decrypts Y under control of Z and T.

Relation:
DECRYPT( ENCRYPT(X,Z,T),Z,T ) = X

Tweaking is optional and can be disabled by letting T constantly zero.
With tweaking disabled, NSABC becomes a conventional, non-tweakable,
block cipher.

3.1. Data order

We write a multi-part data values in string (number) notation or tuple
(vector) notation. In *string notation*, the value is written as a
sequence of symbols, possibly separated by space(s) that are
insignificant. In *tuple notation*, the value is written as a sequence,
in parentheses, of comma-separated symbols.

For examples, z y x and 43 210 are in string notation, (x,y,z) and
(0,1,2,3,4) are in tuple notation.

The string notation indicates *high-first* order. That is, the first
(i.e. leftmost) symbol denotes the most significant part of the value
when it is intepreted as a number.

Conversely, the tuple notation indicates *low-first* order. That is, the
first symbol denotes the least significant part of the value when it is
interpreted as a number.

For examples, if x0, x1, x2 are words, then:

-- In x2 x1 x0, the symbol x2 denotes the most significant word when the
value x2 x1 x0 is interpreted as a 3-word number.

-- In (x0,x1,x2), the symbol x0 denotes the least significant word when
the value (x0,x1,x2) is interpreted as a 3-word number.

For convenience, the same value may be written in string notation as
well as in tuple notation. Thus, for examples, for every a, b, c, d,

a b c d = (d,c,b,a)

The term "part", used above, usually refers to "word", but it may also
refer to "digit" (of a number), "component" (of a tuple or vector), as
well as group thereof.

For example, if x, y, z, t are known to be 1-digit, 2-digit, 3-digit and
4-digit values respectively, then (x,y,z,t) = 9876543210 means x=0,
y=21, z=543 and t=9876.

3.2. Operations on word strings

Let R denote the transposition that reverses the word order of a word
string. For example, for 8-bit words,

0x0123AB R = 0xAB2301

Let S denote the transposition that swaps the high order and low order
halves of every word of a word string. For example, for 8-bit words,

0x0123AB S = 0x1032BA

Let ' denote the word-wise application of the inverse operator "'" on a
word string. For example,

(a,b,c)' = (a',b',c').

The following relations hold.

xRR = x

xSS = x

x'' = x

xRS = xSR

xR' = x'R

The operator "^" on word strings denote word-wise application of "^".
For example,

(a0,a1,a2) ^ (b0,b1,b2) = (a0^b0, a1^b1, a2^b2)

The following relations hold.

(x ^ y)S = xS ^ yS

(x ^ y)R = xR ^ yR

(x ^ y)RS = xRS ^ yRS

Unless otherwise specified, operators "+" and "-" on word strings denote
word-wise modular addition and subtraction, respectively, i.e. operands
are regarded as vectors of words. For example,

(a0,a1,a2) + (b0,b1,b2) = (a0+b0, a1+b1, a2+b2)

(a0,a1,a2) - (b0,b1,b2) = (a0-b0, a1-b1, a2-b2)

The following relations hold.

(x + y)R = xR + yR

(x - y)R = xR - yR

3.3. Functions ENCRYPT

This function is specified in terms of three functions: DE, a data
encryption function; KE, a key expansion function; and TE, a tweak
expansion function.

ENCRYPT uses a 64w-bit constant E, called *bias*, as a parameter to DE.

Constant:
E...64w-bit bias

Input:
X...4w-bit plaintext block
Z...5w-bit key
T...4w-bit tweak

Output:
Y...4w-bit ciphertext block

Relation:
ENCRYPT(X,Z,T) = DE(X, KE(Z), TE(T), E)

3.4. Functions DE, KE and TE

These functions operate on a 4-word *data register* (x0,x1,x2,x3) and
use a 5-word *key register* (z0,z1,z2,z3,z4) and a 4-word *tweak
register* (t0,t1,t2,t3). The key register is initially loaded with the
key Z. The tweak register is initially loaded with the tweak T. The data
register is initially loaded with the plaintext block X, and after
encipherment it contains the ciphertext block Y. [The concrete, vector,
notation here specifies the order of words so, for example, x0 is
initially loaded with the least significant word of X and after
encipherment it contains the least significant word of Y.]

These functions consist of 32 rounds of operations. In each round, a
permutation G applies on the word x0, an exclusive-or (XOR) operation
"mixes" x0 and t0 into another word of the data register, that is either
x1 or x3, depends on the round *type* that is either *A* or *B*:

-- In an A-typed round, the "mixing" takes place after G and targets x1
(see Fig. 1A);

-- In a B-typed round, the "mixing" takes place before G and targets x3
(see Fig. 1B).

The permutation G is key-dependent and round-dependent. In each round,
the pair of key words (z0,z3) and a pair of words from the bias E enter
the G-box.

At the end of every round, there is a transposition on the data register
that circularly exchanges contents of the four words, and other
transpositions that do similarly for the key register and the tweak
register.

Encryption runs in four *passes*: firstly eight rounds of type A, then
eight rounds of type B, then eight rounds of type A again, finally eight
rounds of type B again.

t3(0) t1(0) x3(0) x1(0) z4(0) z2(0) z0(0)
| t2(0) | t0(0) | x2(0) | x0(0) | z3(0) | z1(0) |
| | | | | | | | | | | | |
| | | | | | | +--+--+ | | | | |
| | | | | |E(0)--|> G <|---------------------o K(0)
| | | | | |E(1)--|> <|------o K(1) | |
| | | | | | | +--+--+ | | | | |
| | | | | | | | | | | | |
| | |L(0)o------------->+<---o | | | | |
| | | | | | | | | | | | |
\ \ \ | \ \ \ | \ \ \ \ |
\ \ \ | \ \ \ | \ \ \ \ |
+--\----\----\-+ +--\----\----\-+ +--\----\----\----\-+
| \ \ \ | \ \ \ | \ \ \ \
| | | | | | | | | | | | |
| t2(1) | t0(1) | x2(1) | x0(1) | z3(1) | z1(1) |
t3(1) t1(1) x3(1) x1(1) z4(1) z2(1) z0(1)

Fig. 1A. Round 0 of DE, KE and TE.

t3(8) t1(8) x3(8) x1(8) z4(8) z2(8) z0(8)
| t2(8) | t0(8) | x2(8) | x0(8) | z3(8) | z1(8) |
| | | | | | | | | | | | |
| | |L(8)o--->+<-------------o | | | | |
| | | | | | | | | | | | |
| | | | | | | +--+--+ | | | | |
| | | | | |E(16)-|> G <|---------------------o K(16)
| | | | | |E(17)-|> <|------o K(17) | |
| | | | | | | +--+--+ | | | | |
| | | | | | | | | | | | |
\ \ \ | \ \ \ | \ \ \ \ |
\ \ \ | \ \ \ | \ \ \ \ |
+--\----\----\-+ +--\----\----\-+ +--\----\----\----\-+
| \ \ \ | \ \ \ | \ \ \ \
| | | | | | | | | | | | |
| t2(9) | t0(9) | x2(9) | x0(9) | z3(9) | z1(9) |
t3(9) t1(9) x3(9) x1(9) z4(9) z2(9) z0(9)

Fig. 1B. Round 8 of DE, KE and TE.

KE (key expansion)

Input:
Z...5w-bit key

Output:
K...64w-bit expanded key

Pseudo-code:
(z0,z1,z2,z3,z4) := Z
for k := 0, 1, 2,..., 31 loop
K(2*k) := z0
K(2*k+1) := z3
(z0,z1,z2,z3,z4) := (z1,z2,z3,z4,z0)
end loop
K := (K(0),K(1),K(2),...,K(63))

TE (tweak expansion)

Input:
T...4w-bit tweak

Output:
L...32w-bit expanded tweak

Pseudo-code:
(t0,t1,t2,t3) := T
for k := 0, 1, 2,..., 31 loop
L(k) := t0
(t0,t1,t2,t3) := (t1,t2,t3,t0)
end loop
L := (L(0),L(1),L(2),...,L(31))

DE (data encryption)

Input:
X...4w-bit plaintext block
K...64w-bit expanded key
L...32w-bit expanded tweak
E...64w-bit bias

Output:
Y...4w-bit ciphertext block

Pseudo-code:
(x0,x1,x2,x3) := X
for k := 0, 1, 2,..., 31 loop
if 0 <= k <= 7 or 16 <= k <= 23 then
x0 := G( x0, (K(2*k), K(2*k+1)), (E(2*k), E(2*k+1)) )
x1 := x1 ^ x0 ^ L(k)
elsif 8 <= k <= 15 or 24 <= k <= 31 then
x3 := x3 ^ x0 ^ L(k)
x0 := G( x0, (K(2*k), K(2*k+1)), (E(2*k), E(2*k+1)) )
end if
(x0,x1,x2,x3) := (x1,x2,x3,x0)
end loop
Y := (x0,x1,x2,x3)

Relations:
Y = (x0(32),x1(32),x2(32),x3(32))

for 0 <= k <= 7 or 0 <= k <= 23:
x0(k+1) = x1(k) ^ g(k) ^ L(k)
x1(k+1) = x2(k)
x2(k+1) = x3(k)
x3(k+1) = g(k)

for 8 <= k <= 15 or 24 <= k <= 31:
x0(k+1) = x1(k)
x1(k+1) = x2(k)
x2(k+1) = x3(k) ^ x0(k) ^ L(k)
x3(k+1) = g(k)

for 0 <= k <= 31:
g(k) = G(x0(k), (K(2*k), K(2*k+1)), (E(2*k), E(2*k+1)))
K(2*k) = z0(k)
K(2*k+1) = z3(k)
z0(k+1) = z1(k)
z1(k+1) = z2(k)
z2(k+1) = z3(k)
z3(k+1) = z4(k)
z4(k+1) = z0(k)
L(k) = t0(k)
t0(k+1) = t1(k)
t1(k+1) = t2(k)
t2(k+1) = t3(k)
t3(k+1) = t0(k)

K = (K(0),K(1),K(2),...,K(63))
L = (L(0),L(1),L(2),...,L(31))
(x0(0),x1(0),x2(0),x3(0)) = X
(z0(0),z1(0),z2(0),z3(0),z4(0)) = Z
(t0(0),t1(0),t2(0),t3(0)) = T
(E(0),E(1),E(2),...,E(63)) = E

NOTES

(1) The bias E is declared as a parameter to DE (although actually it is
a well-specified constant in the algorithm) to indicate the possibility
of further parameterization. One may choose his/her own E to create an
other, "non-standard", variant of ENCRYPT [and DECRYPT].

(2) The full algorithm is illustrated in Fig. 2, where (X0,X1,X2,X3)=X,
(Y0,Y1,Y2,Y3)=Y, (Z0,Z1,Z2,Z3,Z4)=Z, (T0,T1,T2,T3)=T and the constant E
is omitted. The figure is obtained by "unrolling" [i.e. eliminating all
circular transpositions of] the dataflow graph of the full cipher that
would be resulted from cascading the individual rounds as in Fig. 1A and
Fig. 1B.

X3 X2 X1 X0
| | | |
| | | 0 G<-Z0,Z3
| | T0->+<---------o
| | 1 G<-Z1,Z4 |
| T1->+<---------o |
| 2 G<-Z2,Z0 | |
T2->+<---------o | |
3 G<-Z3,Z1 | | |
o------------------------------->+<-T3
| | | 4 G<-Z4,Z2
| | T0->+<---------o
| | 5 G<-Z0,Z3 |
| T1->+<---------o |
| 6 G<-Z1,Z4 | |
T2->+<---------o | |
7 G<-Z2,Z0 | | |
| ----------------------------o
| / | | |
o--/---------------------------->+<-T3
/ | | 8 G<-Z3,Z1
/ | o--------->+<-T1
T0->+ | 9 G<-Z4,Z2 |
| o--------->+<-T2 |
| 10 G<-Z0,Z3 | |
o--------->+<-T3 | |
11 G<-Z1,Z4 | | |
T0->+<-------------------------------o
| | | 12 G<-Z2,Z0
| | o--------->+<-T1
| | 13 G<-Z3,Z1 |
| o--------->+<-T2 |
| 14 G<-Z4,Z2 | |
o--------->+<-T3 | |
15 G<-Z0,Z3 | | |
| | | |
| | | 16 G<-Z1,Z4
| | T0->+<---------o
| | 17 G<-Z2,Z0 |
| T1->+<---------o |
| 18 G<-Z3,Z1 | |
T2->+<---------o | |
19 G<-Z4,Z2 | | |
o------------------------------->+<-T3
| | | 20 G<-Z0,Z3
| | T0->+<---------o
| | 21 G<-Z1,Z4 |
| T1->+<---------o |
| 22 G<-Z2,Z0 | |
T2->+<---------o | |
23 G<-Z3,Z1 | | |
| ----------------------------o
| / | | |
o--/---------------------------->+<-T3
/ | | 24 G<-Z4,Z2
/ | o--------->+<-T1
T0->+ | 25 G<-Z0,Z3 |
| o--------->+<-T2 |
| 26 G<-Z1,Z4 | |
o--------->+<-T3 | |
27 G<-Z2,Z0 | | |
T0->+<-------------------------------o
| | | 28 G<-Z3,Z1
| | o--------->+<-T1
| | 29 G<-Z4,Z2 |
| o--------->+<-T2 |
| 30 G<-Z0,Z3 | |
o--------->+<-T3 | |
31 G<-Z1,Z4 | | |
| | | |
Y3 Y2 Y1 Y0

Fig. 2. The full cipher, by "unrolling"
the graphs in Fig. 1A and Fig. 1B.

(3) The round structure is up to word indexing the same as that of
Skipjack. The word re-indexing, which is cryptographically
insignificant, was introduced to ease description and illustration.

3.5. Bias E

The 64w-bit constant E, which is used by ENCRYPT and DECRYPT and enters
as a parameter to DE, is generated in 32 rounds of operation on a 2w-bit
*bias register* (f0,f1) that is initialized by the constant value (0,1).
In each round, the content of f1 is added to f0, then the content of f0
is added to f1, then the content of the register becomes a pair of words
of E (see Fig. 3.)

f1(5) f0(5)
| |
o---->[+]
| |
[+]<----o
| |
f1(6) f0(6)
E(11) E(10)

Fig. 3. Round 5 of E
(regarded as a constant function).

Input:
none.

Output:
E...64w-bit bias

Pseudo-code:
(f0,f1) := (0,1)
for k := 0,1,2,...,31 loop
f0 := f0 + f1
f1 := f1 + f0
E(2*k) := f0
E(2*k+1)) := f1
end loop
E := (E(0),E(1),E(2),...,E(63))

Relations:
E = (E(0),E(1),E(2),...,E(63))
for k = 0,1,2,...,31:
E(2*k) = f0(k+1)
E(2*k+1) = f1(k+1)
f0(k+1) = f0(k) + f1(k)
f1(k+1) = f1(k) + f0(k+1)
(f0(0),f1(0)) = (0,1)

NOTES

(1) It holds E(n) = F(n+2), where F(_) is defined by F(n+2) = F(n+1) +
F(n) and (F(0),F(1)) = (0,1), i.e. E = (1, 2, 3, 5, 8, 13,...,
17167680177565) (mod 2**w). In other words, E is the sequence of 64
consecutive Fibonacci numbers immediately following the initial 0 and 1,
modulo 2**w.

(2) The round structure of E, usually known as *pseudo-Hadamard
transform*, was used as a diffusion construct in prior cipher designs
[Mas94, Sch+98]. Here it is not aimed at diffusion since it generates a
constant, which has zero uncertainty.

3.6. Function G

Function G, which takes as argument a w-bit intermediate plaintext word
and is parameterized by a pair of w-bit subkey words (k0,k1) and a pair
of constant words (e0,e1) to return a w-bit intermediate ciphertext
word, is a cascade of two rounds, each consisting of an evaluation of
the polynomial q (defined in section 2) followed by a swap of half-words
(see Fig. 4). The first evaluation uses k0 and e0, the second one uses
k1 and e1.

argument
| |
+------+
e0---|> q <|---k0
+------+
| |
\ /
\/
/\
/ \
| |
+------+
e1---|> q <|---k1
+------+
| |
\ /
\/
/\
/ \
| |
result

Fig. 4. Function G.

Relation:
G(x,(k0,k1),(e0,e1)) = q( q(x,k0,e0)S, k1, e1 )S

NOTES

(1) The algorithm uses 64 instances of the quasi-group operation "(o)",
each with its own choice of the underlying group defined by the
operation "(.)" with unit e. So, to be specific, instead of writing
something like x (o) z, we wrote q(x,z,e).

(2) Alternatively, it may be seen as using 64 identical instances of the
simpler operation "o", or of the simplest, group operation ".", but
operands and result of each instance of the operation are "biased" by
adding or subtracting the constant e that is specific to the instance,
and furthermore, for ".", the left operand enters and the result leaves
it in altered sign. That's why E, the sequence of such e, is termed the
"bias".

3.7. Function DECRYPT

DECRYPT can be easily derived from the relation with ENCRYPT. Here it is
given explitly in terms of functions DE, KE and TE.

Relations:
DE( DE(X,KE(Z),TE(T),E)RS, (KE(ZR)-ER)'+ER, TE(TRS), ER )RS = X
DECRYPT(Y,Z,T) = DE( YRS, (KE(ZR)-ER)'+ER, TE(TRS), ER )RS

In other words, encrypting the plaintext block Y in reverse half-word
order (Y RS) using the tweak T in reverse half-word order (T RS) and the
key schedule consisting of inverse words of an expanded key resulted
from the key Z in reverse word order (Z R), where the inversions are of
the [algebraic] groups defined by the operation "(.)" and each group is
specified by its unit that is the corresponding word of the constant E
in reverse word order (E R), recovers the plaintext block X in reverse
half-word order (X RS).

NOTES

(1) Like Skipjack, decryption is similar to encryption. To decrypt with
Skipjack, one swaps adjacent words, i.e. swaps the first and the second
and swaps the third and the fourth words in the cipher block, the key
and the plain block. To decrypt with NSABC, one reverses the word order,
i.e., swaps the first and the last words as well as the second first and
the second last ones.

(2) Unlike Skipjack, DECRYPT cannot be expressed explicitly in terms of
ENCRYPT. That's why our cipher is termed "near Skipjack" only.

3.8. Tweak derivation

The 4w-bit secret tweak T is used to encrypt only one block [under a
given key Z]. In order to encrypt multiple blocks the tweak is derived
from the block index and a 4w-bit [additional] key, called *tweak key*,
as follows.

Let T(j) denote the tweak used to encrypt j-th block. For zero-th block,
the tweak key is used as the tweak directly:

T(0) = tweak key

Successive tweak is computed from the current tweak by the recurrent
relation:

T(j) = T(j-1) + 2*T(0) + 1

or, equivalently,

T(j) = T(0) . j

where all operators, including ".", are defined on 4w-bit modular
arithmetics and all operands, including 1 and j, are regarded as 4w-bit
numbers.

NOTES

(1) The second relation is meant for sequential access and the third is
meant for random access to the j-th block. There T(0) conveniently
refers to the [unnamed] tweak key.

(2) The family of functions {T: j |-> T(0).j}, parameterized by the
tweak key T(0), is not epsilon-almost 2-XOR universal w.r.t. definition
by [LRW02]. Eventual application of the present tweak derivation method
in the Liskov-Rivest-Wagner construction, i.e. by defining
ENCRYPT(X,Z,T,j) = DE(X^T(j),KE(Z),0)^T(j), is therefore impossible.

4. Notes on implementation

This section provides methods for efficient software implementation for
two types of environment: memory-constrained, such as embedded
computers, and memory-abundant, such as servers and workstations.

4.1. Memory-constrained environment

(1) The function ENCRYPT can be implemented without using any memory on
a processor with at least 16 word registers:

4 for (x0,x1,x2,x3) -- data,

5 for (z0,z1,z2,z3,z4) -- key,

4 for (t0,t1,t2,t3) -- tweak,

2 for (f0,f1) -- bias, and

1 for k -- round index.

Indeed, all other intermediate word strings (namely the expanded key K,
the expanded tweak L and the bias E) can vanish because every word of
them, once produced, can be consumed immediately, provided that the
functions KE, TE, E (regarded as a constant function) and DE are
programmed to run in parallel and in sync with each increment of k.

(2) In an extremely memory-constrained environment, modes of operation
that avoid DECRYPT (i.e. ones using ENCRYPT to decrypt) are preferable.
That's because modular multiplicative inversion is so time-consuming
that every practial implementation of DECRYPT must always pre-compute
the key schedule.

4.2. Memory-abundant environment

(1) The polynomial q in function G can be evaluated using only one
multiplication and one addition, with a modified key schedule. Indeed,

q(x,z,e) = M*x + N

where x is a data word, z key word, e the unit of the [quasi-]group in
use, and

M = 2*(z - e) + 1

N = (2*e - 1)*(z - e)

is a pair of subkeys precomputed in a [modified] key schedule consisting
of 64 such pairs.

(2) The cipher is partially parallelizable. The following procedure
executes all 32 rounds in 20 steps, of which half performing two or
three parallel evaluations of the function G. The procedure becomes
evident by examining the dataflow graph of the algorithm, shown in Fig.
5, which is obtained by "unrolling" the one in Fig. 2. (Here "unrolling"
means introducing a circular transposition so that the G-boxes at round
k and k+3 lay on a vertical line.) Recall that g(k) is the result of G
in round k.

1. Compute g(0)
2. Compute g(1)
3. Compute g(2)
4. Compute g(3)
5. Compute g(4)
6. Compute g(5), g(11) in parallel
7. Compute g(6), g(9) in parallel
8. Compute g(7), g(10), g(13) in parallel
9. Compute g(8), g(14) in parallel
10. Compute g(12), g(15) in parallel
11. Compute g(16)
12. Compute g(17)
13. Compute g(18)
14. Compute g(19)
15. Compute g(20)
16. Compute g(21), g(27) in parallel
17. Compute g(22), g(25) in parallel
18. Compute g(23), g(26), g(29) in parallel
19. Compute g(24), g(30) in parallel
20. Compute g(28), g(31) in parallel

On a modern 64-bit processor, each step should take about 8 clock cycles
on average. So, for w=64 i.e. 32-byte blocks it achieves 20*8/32 = 5
clock cycles per byte encrypted. If the tweak T is not used one may
remove it from the source code completely to save about 1 clock cycle
per byte per step, resulting in 20*7/32 = 4.4 clock cycles per byte.

X0 X1
\ \
\0 \1
G-----G
\
\
X2 X3 \
\ \ \
1 \2 \3 \4
G-----G-----G-----G
\ \ \
\ \ \
\ \ \
\ \ \
4 \5 \6 \7
G-----G-----G-----G
\ \ \
\ \ \
. . .
|\ |\ |\
7 | \8 | \9 | \10
G--|--G | G | G
| \ | \ |
| \| \|
. + +
|\ |\ |\
10 | \11 | \12 | \13
G | G | G | G
\ | \ | \ |
\| \| \|
+ + +
|\ |\ \
13 | \14 | \15 \16
G | G | G G
\ | \ | \
\| \| \
+ + \
\ \ \
16 \17 \18 \19
G-----G-----G-----G
\ \ \
\ \ \
\ \ \
\ \ \
19 \20 \21 \22
G-----G-----G-----G
\ \ \
\ \ \
\ . .
\ |\ |\
22 \23 | \24 | \25
G-----G--|--G | G
\ | \ |
\ | \|
. . +
|\ |\ |\
25 | \26 | \27 | \28
G | G | G | G
\ | \ | \ |
\| \| \|
+ + +
|\ |\ |\
28 | \29 | \30 | \31
G | G | G | G
\ | \ | \ |
\| \| \|
+ + +
\ \ \
31 \ \ \
G Y0 Y1 Y2
\
\Y3

Fig. 5. The full cipher, by "unrolling" the graph in Fig. 2.

5. Examples

This section provides examples of NSABC/16 and NSABC/64 encryption.

Tab. 5.1 shows the contents of all registers in NSABC/16 at the start of
round k, for k = 0, 1, 2,..., 32 where k = 32 conveniently denotes the
end of round 31, which is that of the algorithm, during the execution of
ENCRYPT under

key = 0x0AA00BB00CC00DD00EE0,

tweak = 0x0001002203334444, and

plain = 0x0123456789ABCDEF.

Tab. 5.2 represents 26 vectors of ENCRYPT in NSABC/16 under parameters
thereby specified.

Tab. 5.3 represents a vector for iterated encryption in NSABC/64.
Encryption runs under a fixed key Z. Using the tweak T(0), a plain block
X(0) is encrypted into a cipher block denoted Y(0). Every Y(j) is then
recorded and become X(j+1), the plain block for the next encipherment
(if any), using a new tweak T(j+1) that is derived from T(j) by the
method specified in subsection 3.8. The vector is printed with Z, T(0),
X(0) and X(j) for some other values of j. All data are printed
hexadecimally, except the key Z that is in base32-without- padding
format specified by the IETF RFC 4648 [Jos06].

Tab. 5.1. Contents of registers during ENCRYPT in NSABC/16.

======= ======== ==================== ================ ================
Round Bias reg Key register Tweak register Data register
index k f1 f0 z4 z3 z2 z1 z0 t3 t2 t1 t0 x3 x2 x1 x0
======= ======== ==================== ================ ================
0 00010000 0AA00BB00CC00DD00EE0 0001002203334444 0123456789ABCDEF
1 00020001 0EE00AA00BB00CC00DD0 4444000100220333 55BC012345679853
2 00050003 0DD00EE00AA00BB00CC0 0333444400010022 33DB55BC0123758F
3 000D0008 0CC00DD00EE00AA00BB0 0022033344440001 6ADC33DB55BC6BDD
4 00220015 0BB00CC00DD00EE00AA0 0001002203334444 012E6ADC33DB5493
5 00590037 0AA00BB00CC00DD00EE0 4444000100220333 C87D012E6ADCBFE2
6 00E90090 0EE00AA00BB00CC00DD0 0333444400010022 0750C87D012E6EBF
7 02620179 0DD00EE00AA00BB00CC0 0022033344440001 82C80750C87D83C4
8 063D03DB 0CC00DD00EE00AA00BB0 0001002203334444 A62C82C807506E50
9 10550A18 0BB00CC00DD00EE00AA0 4444000100220333 3BA48C3882C80750
10 2AC21A6D 0AA00BB00CC00DD00EE0 0333444400010022 42573FC78C3882C8
11 6FF1452F 0EE00AA00BB00CC00DD0 0022033344440001 C5C1C0BD3FC78C38
12 2511B520 0DD00EE00AA00BB00CC0 0001002203334444 637A49F8C0BD3FC7
13 FF42DA31 0CC00DD00EE00AA00BB0 4444000100220333 0A3318F949F8C0BD
14 D8B5D973 0BB00CC00DD00EE00AA0 0333444400010022 4278C9BD18F949F8
15 8ADDB228 0AA00BB00CC00DD00EE0 0022033344440001 74260BA2C9BD18F9
16 C7E23D05 0EE00AA00BB00CC00DD0 0001002203334444 616C6CDE0BA2C9BD
17 CCC904E7 0DD00EE00AA00BB00CC0 4444000100220333 7041616C6CDE3FA7
18 9E79D1B0 0CC00DD00EE00AA00BB0 0333444400010022 FE487041616C91A5
19 0EA27029 0BB00CC00DD00EE00AA0 0022033344440001 7714FE487041165A
20 8D6D7ECB 0AA00BB00CC00DD00EE0 0001002203334444 B3237714FE48C363
21 99A50C38 0EE00AA00BB00CC00DD0 4444000100220333 3B42B3237714814E
22 3F82A5DD 0DD00EE00AA00BB00CC0 0333444400010022 B0C33B42B323C4E4
23 24E1E55F 0CC00DD00EE00AA00BB0 0022033344440001 316FB0C33B42826E
24 2F210A40 0BB00CC00DD00EE00AA0 0001002203334444 5A0A316FB0C36149
25 68823961 0AA00BB00CC00DD00EE0 4444000100220333 71617F07316FB0C3
26 0A65A1E3 0EE00AA00BB00CC00DD0 0333444400010022 1CF3C2917F07316F
27 B6ADAC48 0DD00EE00AA00BB00CC0 0022033344440001 EBA52DBEC2917F07
28 19A262F5 0CC00DD00EE00AA00BB0 0001002203334444 648794A32DBEC291
29 96397C97 0BB00CC00DD00EE00AA0 4444000100220333 1E37E25294A32DBE
30 A90912D0 0AA00BB00CC00DD00EE0 0333444400010022 079D30BAE25294A3
31 64E2BBD9 0EE00AA00BB00CC00DD0 0022033344440001 BAFA931C30BAE252
32 859D20BB 0DD00EE00AA00BB00CC0 0001002203334444 5CA158A9931C30BA

Tab. 5.2. Some vectors for NSABC/16 single block encipherments.

==================== ================ ================ ================
Z (key) T (tweak) X (plain) Y (cipher)
==================== ================ ================ ================
00000000000000000000 0000000000000000 0000000000000098 1275C2B1DC5E3621
00000000000000000000 0000000000000000 0000000000009800 720620D75549C54D
00000000000000000000 0000000000000000 0000000000980000 1F2E2B943747B1DE
00000000000000000000 0000000000000000 0000000098000000 B1810BC2441D1445
00000000000000000000 0000000000000000 0000009800000000 ED61842661D49429
00000000000000000000 0000000000000000 0000980000000000 EA1C14B378A7B595
00000000000000000000 0000000000000000 0098000000000000 4DCBF3FCA282C4C4
00000000000000000000 0000000000000000 9800000000000000 0609058C12509447
00000000000000000000 0000000000000098 0000000000000000 AE4D0DFF58FFD3B2
00000000000000000000 0000000000009800 0000000000000000 D20388E6B94C96DF
00000000000000000000 0000000000980000 0000000000000000 A64BFCFDB32C5DBF
00000000000000000000 0000000098000000 0000000000000000 650F1DF8371AF343
00000000000000000000 0000009800000000 0000000000000000 BC8FA7FE1C928EAB
00000000000000000000 0000980000000000 0000000000000000 62C00204766B5BCE
00000000000000000000 0098000000000000 0000000000000000 40F6C79331EAD337
00000000000000000000 9800000000000000 0000000000000000 A9A7601DD937CFBA
00000000000000000098 0000000000000000 0000000000000000 2D078508E13FEA31
00000000000000009800 0000000000000000 0000000000000000 75FBFF2A6706DFDB
00000000000000980000 0000000000000000 0000000000000000 70BF79184C06F9AB
00000000000098000000 0000000000000000 0000000000000000 D505CBFAB03913EF
00000000009800000000 0000000000000000 0000000000000000 3EEC55CD065E275B
00000000980000000000 0000000000000000 0000000000000000 4ADA724B4913D4D0
00000098000000000000 0000000000000000 0000000000000000 F492E17C8F066561
00009800000000000000 0000000000000000 0000000000000000 F9060CFF3C7A6FA1
00980000000000000000 0000000000000000 0000000000000000 F7CDC677FF2F087F
98000000000000000000 0000000000000000 0000000000000000 008B4A112C5B2B5E

Tab. 5.3. A vector for iterated NSABC/64 encryption.

====== ================================================================
Label Value
====== ================================================================
Z DIFFUZE2HOPEBESTCONFVSE2THROWDICENSAZGOODATLASTQOZSKIPJAXDAMNYCE
T(0) 1DEA15C007316AC7EC1A15F00706AC7EFA17ED270CA7EA7CA717774E35A3E7BA
X(0) 31ACB074E5A185A590D471305CAD03A0E43A8C9314C8B52C01843A8708C51AA3
X(1) 3ED6DCD4A1E13501493BF8CC306C0130987CF497E67D92F100B10451494A91BD
X(2) 8242D4BC301A3FCEC9D751077D5C18A2384508B785BABC6A2446948862B47EE6
X(3) 4DB8C3D9385F21EE22C8D4CD81A9C4ED8FCC040DE2652F22DE5A6F8AAD977AD6
X(5) 5A756A9300A1F94CD073A8F336CDA083DBD445A39F0BF7596472EC7EDDD4816F
X(8) 33F7CB09C2F6A07507C56F833D6746F33A54E0CEE66CE1F0FA428DA04B8C23BF
X(13) AC1FCD227034D51A047CCBDA5FE27E64F9D3A59D24B382066174F49C63FBFFE9
X(21) 1DBC7DD4E5F3CC89EC726D5B0DDFB1E949DCCD8B2D67038AD998C8B33FF4F6F1
X(34) EA0EFFC95B7921F1A7266C2F902A8D97A3305DC0945F1BB10FED47C743F61E6D
X(55) 65C8DEBA8EE3B6730480DF43F8F9B0FC763FEE15E1C93B06208D9A01411C3A17
X(89) 46043207307750E211E66BA40E2C21310CF7769881CBDBCF326E251E99BCD3C9
X(144) 9CF208AF5D94AD1486B255A19261BEBA6F15911CDAA2E339A5A7B590A5E73498
X(233) 903CA8F6BE987A3FE00EB97F94767678CB619A7898FFC871CD19BEADE4E5A95F
X(377) D068EEA27849B0EB10101F9D79379DC5EE3D637764C9EFA462178CB467247FDC
X(610) 821C6763628A36F2D383F516E550A11E60C7F12B5AB545F66C0B4AA09430C6CD
X(987) B6CF160291E3D46CBBA728277466AE1BF9BDED75D3B6289E40EC1BA3287C83B9

6. Conclusion

We defined NSABC, a Skipjack-like block cipher with fine scalability,
utilizing a group operation over machine words of every size, that is
basically modular multiplication, a powerful operation available on many
processors.

NSABC is meant to be simple. It uses no S-boxes or "magic" constants. It
makes use only two well-known word-oriented algebraic operations:
exclusive-or and modular multiplication. It re-uses the simple and
regular structure of Skipjack that has become publicly known for over a
decade -- sufficient time to be truly understood. The simplicity makes
it easier to memorize and to analyze, which in turn makes deeper
understanding and greater confidence in the algorithm.

NSABC bases on some valuable design of a well-reputed agency in the
branch. We therefore believe that it is worth analysis and it can
withstand rigorous analysis. If this happens to be true, then we may
have a strong cipher with 256-bit blocks, allowing to encrypt enormous
amount of data under the same key, and with 320-bit keys, allowing to
keep encrypted data secret over any imaginable time.

NSABC is fast enough to be comparable with modern block ciphers.

7. References

[IBM98] Burwick C., Coppersmith D., D'Avignon E., Gennaro G.,
Halevi S., Jutla C., Matyas S. M. Jr., O'Connor L.,
Peyravian M., Safford D., Zunic N.: MARS - a candidate
cipher for AES. 1998.

[LM91] Lai X., Massey J. L.: A proposal for a new block encryption
standard. 1991.

[LMM91] Lai X., Massey J. L., Murphy S.: Markov ciphers and
differential cryptanalysis. 1991.

[LRW02] Liskov M., Rivest R. L., Wagner D.: Tweakable block ciphers.
2002.

[Mas94] Massey J. L.: A byte-oriented block-ciphering algorithm. 1994.

[NSA98] National Security Agency: Skipjack and KEA specification. 1998.

[Jos06] Josefsson S.: The Base16, Base32, and Base64 data
encodings, RFC4648. 2006.

[Riv99] Rivest R. L.: Permutation polynomials modulo 2**w. 1999.

[Sha49] Shannon C. E.: Communication theory of secrecy systems. 1949.

[Sch+98] Schneier B., Kelsey J., Whiting D., Wagner D., Hall C.,
Ferguson N.: Twofish -- a 128-bit block cipher. 1998.

--- news://freenews.netfront.net/ - complaints: news(a)netfront.net ---

|
Pages: 1
Prev: Podcast interview w/ Dr. Jay Kennedy re Plato's secret codes
Next: Prime Number SUMS model the Riemann Zeta function