Prev: Does entropy ever fall out of a good hash function?
Next: Introducing dynamics into block encryptions
From: Boon on 1 Mar 2010 10:43 Carsten Krueger wrote: > If it's AES, compare it with Diskcryptor 0.9 (beta) > It's the fastest AES version I'm aware of. http://diskcryptor.net/index.php/DiskCryptor_en#Performance 104 MB/s @ 2.4 GHz = 23 cycles per byte Intel's AES-specific instructions (AES-NI) are 5-10 times faster. http://en.wikipedia.org/wiki/AES_instruction_set http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-aes-instructions-set/ Regards.
From: Harold Johanssen on 1 Mar 2010 18:59 On Mon, 01 Mar 2010 16:43:00 +0100, Boon wrote: > Carsten Krueger wrote: > >> If it's AES, compare it with Diskcryptor 0.9 (beta) It's the fastest >> AES version I'm aware of. > > http://diskcryptor.net/index.php/DiskCryptor_en#Performance > > 104 MB/s @ 2.4 GHz = 23 cycles per byte > > Intel's AES-specific instructions (AES-NI) are 5-10 times faster. > > http://en.wikipedia.org/wiki/AES_instruction_set > http://software.intel.com/en-us/articles/intel-advanced-encryption- standard-aes-instructions-set/ > > Regards. I have heard of an implementation on Core 2, by Kasper and Schwabe, getting 7.59 cycles per byte. That's nowhere near 5 times faster. Are they not using the AES-NI instructions?
From: Paul Rubin on 1 Mar 2010 20:10 Harold Johanssen <noemail(a)please.net> writes: > I have heard of an implementation on Core 2, by Kasper and > Schwabe, getting 7.59 cycles per byte. That's nowhere near 5 times > faster. Are they not using the AES-NI instructions? That might be a bit-slice implementation; I've never heard of a conventional one that fast. AES-NI is a Clarkdale+ feature (i.e. very recent) and Core 2 doesn't have it.
From: Harold Johanssen on 2 Mar 2010 16:27 On Tue, 02 Mar 2010 16:43:15 +0100, Carsten Krueger wrote: > Am Mon, 1 Mar 2010 23:59:40 +0000 (UTC) schrieb Harold Johanssen: > >> I have heard of an implementation on Core 2, by Kasper and >> Schwabe, getting 7.59 cycles per byte. > > AES-128 (mode ?) Counter mode. > > Discryptor does AES-256 XTS > > greetings > Carsten
From: Thomas Pornin on 3 Mar 2010 08:52 According to Paul Rubin <no.email(a)nospam.invalid>: > That might be a bit-slice implementation It is -- but with only eight parallel instances; i.e. it encrypts data by blocks of 128 bytes, whereas a "naive" bitslice implementation would use 128 parallel AES instances (since XMM registers are 128-bit long), i.e. 2048-byte blocks. Moreover, the announced cost of 7.59 cycles per byte includes the data orthogonalization (bitslicing requires data elements to be interleaved in registers). See the paper there: http://www.cryptojedi.org/papers/aesbs-20090616.pdf Note that such sycle counts are in ideal benchmark conditions, i.e. it does not fully capture the influence of caches. In practical conditions, the encryption code is integrated within some application, along a data processing path, and the encryption code competes with the rest of the application for cache usage. That AES implementation uses no table for encryption (one of the versions uses no table either for key setup), so that it uses very little L1 cache for data, and that's good (in the article, they present it as a resistance against timing attacks, but using very little data cache is good for practical performance). On the other hand, the code footprint is a bit more than 12 kB (not counting key setup), which is bearable (a typical x86 Intel has 32 kB L1 cache for code) but may prove to be an issue in some code-cramped situations. By comparison, a typical "normal" AES implementation will compile to about 2 kB of code (not counting key setup) and 4 kB of constant data (the tables). So that while cycle counts in micro-benchmarks are important, nothing really beats actual measures in a real situation. One can still plausibly predict that AES-NI instructions should rock, because not only they have great cycle counts (Intel promises about 1.3 cycle/byte on long runs) but they also lead to very compact implementations (less than a hundred bytes of code, and no table). This should also be good for PRNG. Also it is my cue to point out that x86 hardware of the Core2-or-more class is the least susceptible to have actual performance issues on encryption. Limited hardware, such as what is found in cheap mobile phones and in home routers and WiFi access points, is much more starved on CPU, and is in the position to do crypto all day long. A 40$ home router typically has a MiPS or ARM derivative with little cache (8 kB L1 cache for code on the Linksys router I have besides me), low power (200 MHz, and not super-scalar), no special AES instruction, no big registers (no SSE2, no MMX, no FPU, only 32-bit general purpose registers) and yet is hooked to a high-bandwidth network (54 Mbit/s WiFi, 100 Mbit/s Ethernet). There are millions of such beasts out there. In my view, performance on that kind of small hardware is industrially much more significant than whatever AES-NI provides. The same remarks can be done on SHA-3 candidates. Many provide good performance but only with big code footprint (e.g. 20 kB) or with the help of special instructions (such as SSE2 or AES-NI). --Thomas Pornin
|
Next
|
Last
Pages: 1 2 Prev: Does entropy ever fall out of a good hash function? Next: Introducing dynamics into block encryptions |