From: Will Dickson on
On Wed, 17 Feb 2010 20:55:24 -0800, Paul Rubin wrote:

>
> GCM is probably inefficient to implement in pure Java. You could code
> EAX mode instead, based on the JCE primitives.

I looked at implementing GCM, but decided I didn't understand the
mathematics even remotely well enough to devise an efficient
implementation. I got the impression that a naive implementation of GCM
will be slower than a sophisticated one by a much larger factor than is
commonly the case for other algorithms.

If you don't mind using HMAC-SHA1 (encrypt then authenticate the
ciphertext), Thomas Pornin has written a very fast Java SHA-1
implementation you might find useful; in the tests I did, it thrashed
both the Sun and BC implementations (which were about the same), and my
implementation of EAX (which was marginally slower than Sun HMAC-SHA1). I
can't remember the exact numbers, but I probably have them written down
somewhere if you need them. The recent breaks on SHA-1 do not affect its
suitability for use within HMAC.

http://www.crypto-hash.fr/modules/wfdownloads/singlefile.php?cid=13&lid=10

HTH

Will.
From: Will Dickson on
On Thu, 18 Feb 2010 14:28:54 -0600, Will Dickson wrote:

> If you don't mind using HMAC-SHA1 (encrypt then authenticate the
> ciphertext), Thomas Pornin has written a very fast Java SHA-1
> implementation you might find useful; in the tests I did, it thrashed
> both the Sun and BC implementations (which were about the same), and my
> implementation of EAX (which was marginally slower than Sun HMAC-SHA1).

Oops - I should have said "my implementation of OMAC" rather than "of
EAX". I implemented OMAC in order to implement EAX.

From: Paul Rubin on
Will Dickson <wrd(a)UNROT_THIS.tynhehat.qrzba.pb.hx> writes:
>> GCM is probably inefficient to implement in pure Java.
> I looked at implementing GCM, but decided I didn't understand the
> mathematics even remotely well enough to devise an efficient
> implementation.

The latest x86 processors have hardware support for GCM and AES, and any
pure java implementation will suffer badly by comparison to that. Of
course it will take a while for the new instructions to be pervasive in
deployed systems, and there will always be some that have to fall back
on software.

> If you don't mind using HMAC-SHA1 (encrypt then authenticate the
> ciphertext), Thomas Pornin has written a very fast Java SHA-1

HMAC-SHA1 should have no security problems in practice. Even HMAC-MD5
is probably fine. But SHA1 is being replaced by SHA256, so HMAC-SHA256
is more forward-looking in terms of standards compliance, which I think
makes it a better choice for a long-lasting API. HMAC is rather slow no
matter what you do though. If you have AES hardware support, then
OMAC/CMAC (i.e. EAX mode) should be significantly faster. Software AES
may also be faster but that's less of a sure thing.

GCM may predominate in the future. It's parallelizeable, and uses less
computation per block to begin with, than something like EAX. I wonder
if a Java implementation exists. Maybe this OWASP thing should just
use it with the expectation that future JCE's will have native support.
From: Thomas Pornin on
According to Paul Rubin <no.email(a)nospam.invalid>:
> HMAC is rather slow no matter what you do though. If you have AES
> hardware support, then OMAC/CMAC (i.e. EAX mode) should be
> significantly faster. Software AES may also be faster but that's less
> of a sure thing.

HMAC works at the speed of the underlying hash function and beats
software AES hands down, unless you do something stupid like
HMAC-Whirlpool. On my PC (Intel Q6600, 2.4 GHz, 64-bit mode), the AES
implementation in OpenSSL (supposedly highly optimized) runs at
108 MB/s, while my generic C code in sphlib achieves 264 MB/s for SHA-1,
and even 143 MB/s for SHA-256 (187 MB/s for SHA-512). Some other
functions are faster than that, e.g. Shabal (a SHA-3 candidate)
processes data at 315 MB/s (that's three times faster than software
AES).

So for MACing alone, HMAC with a proper hash function will be much
faster than any AES-based MAC, for software implementations. Shabal is
quite fast but most of the other 13 remaining SHA-3 candidates are also
quite fast, so it seems plausible that this speed difference will remain
true for HMAC/SHA-3.

If we are talking about pure Java implementations then I expect the
difference to be the same or even bigger, because what kills speed in
Java is array accesses (the JVM insists on checking that the index is
not beyond the array limits). A SHA-1, SHA-256 or Shabal implementation
uses relatively few array accesses, and only for data decoding; internal
operations are arithmetic, over 32-bit words held in local variables. On
the other hand, an optimized AES implementation uses precomputed tables,
implying many more array accesses. So I would not be surprised if
HMAC/Shabal turned out to process data _more_ than three times faster
than AES, when using pure Java implementations.

So, no, HMAC is not slow.

However, there are two important points to take into account:

1. Things change quite a bit in presence of hardware support for AES.
Right now, very few processors have such opcodes, and those who have are
typically those for which software AES speed is already very convenient
for most purposes. On my PC, each of the four cores can encrypt data as
fast as the harddisk can eat it; and I have four cores but not four
harddisks. As shown above, the same core could follow a full-speed
gigabit ethernet link. No real speed problem here. Speed is more an
issue on limited hardware. My favourite example is a Linksys router on
which I reflashed OpenWRT. That kind of hardware does network-related
things all day long, and its 200 MHz MiPS-derivative processor is far
from being fast enough to crypto-process a mere 100 Mbits/s bandwidth.
That kind of hardware, also, does not have an hardware-accelerated
AES implementation. This means that while AES opcodes are kind of neat,
they are not (yet) sufficiently common to be industrially significant.
Right now, we need encryption and MAC algorithms for which _software_
implementations are fast, on cheap processors. Hardware-based AES may
become common in ten years or so.

2. While HMAC is faster than AES-based MACs, this discussion was
initially about combining MAC _and_ encryption. So the comparison is not
really "HMAC vs AES-based MAC" but rather "HMAC + AES-based encryption,
vs AES-based encryption+MAC". On the face of it, I expect a proper GCM
implementation to process data somewhat faster than the combination of
AES-CTR and HMAC. Remember, however, that in both cases the AES cost
dominates: using GCM instead of HMAC impacts only a part of the process
which accounts for no more than 25% of the total cost. So do not expect
big differences here. Personally, I would still favour GCM in that
situation, both for mathematical elegance, and to lower the amount of
needed code (think about CPU L1 cache pressure, especially in Java where
the JIT compiler appears to be quite cache-hungry).


--Thomas Pornin
From: Maaartin on
On Feb 19, 2:52 pm, Thomas Pornin <por...(a)bolet.org> wrote:
> According to Paul Rubin  <no.em...(a)nospam.invalid>:
>
> If we are talking about pure Java implementations then I expect the
> difference to be the same or even bigger, because what kills speed in
> Java is array accesses (the JVM insists on checking that the index is
> not beyond the array limits).

It does, but the server JVM can eliminate most of the overhead by
verifying in advance the checks will pass. IIRC, for client JVM
there's a weaker approach to be integrated in Java 7 JVM.

IMHO, more overhead is caused by the inability to access data as
byte[] and int[] at the same time, which is easy (although not
portable) in C using pointer casts. But this is only my weak opinion,
not based on any measurements.

Wouldn't using something like Poly1305 in Java be a good idea? I've
seen no implementation, but I'm quite sure it could be done without
much effort.