Performance figures
On this page, we summarize the performance figures of Keccak[r=1024,c=576] as they can be found in eBASH. We select results with recent compilers (preferably GCC ≥ 4.4) and recent SUPERCOP versions (preferably SUPERCOP ≥ 20100509) on a given processor.
For version 2 of the specifications, the number of rounds was increased from 18 to 24. On some machines, new performance figures are not yet available, in which case the figures below are actually extrapolated from those of Keccak[r=1024,c=576,nr=18].
For long messages, the speed is directly proportional to the bitrate r. The figures below are for the nominal value r=1024. To estimate the performance for other bitrate values, one can simply multiply the number of cycles/byte by 1024/r. For instance, Keccak[r=1088,c=512] is 6.25% faster than Keccak[] and it is also benchmarked on eBASH under the name keccakc512.
For short messages, the speed is determined by the number of calls to Keccak-f, i.e., just one when the message length is ≤r-25 bits.
64-bit platforms
| Processor | Compiler(s) | SUPERCOP version | Implementation | Short (cycles) | Long (c/b) | SHA-256 (c/b) | SHA-512 (c/b) |
|---|---|---|---|---|---|---|---|
| AMD Athlon 64 X2 | GCC 4.3.3 | 20090715 | plain 64-bit, LC | (*)2037 | (*)13.88 | 14.88 | 13.12 |
| AMD Phenom 9550 | GCC 4.4.1 | 20100610 | plain 64-bit, LC | 3990 | 12.99 | 15.13 | 9.92 |
| AMD Phenom II X4 955 | GCC 4.4.1 | 20100120 | plain 64-bit, LC | 2026 | 13.07 | 15.04 | 11.83 |
| HP Itanium II | GCC 3.4.4 | 20100611 | plain 64-bit, LC | 1398 | 6.22 | 21.12 | 9.30 |
| HP Itanium II | GCC 3.3.3 | 20100611 | plain 64-bit, LC | 1490 | 7.19 | 23.46 | 11.42 |
| IBM POWER4 | GCC 4.0.0 | 20100611 | plain 64-bit, LC | 3472 | 20.86 | 27.52 | 15.36 |
| IBM POWER5 | GCC 4.3.3 | 20090223 | plain 64-bit, LC | (*)3051 | (*)19.09 | 21.85 | 13.59 |
| Intel Core 2 Duo | GCC 4.4.1, ICC 11.10 | 20100702 | plain 64-bit, LC | 2000 | 12.61 | 15.36 | 11.60 |
| Intel Core 2 Duo E4600 | GCC 4.4.3, ICC 11.10 | 20100610 | plain 64-bit, LC | 1956 | 12.63 | 15.57 | 10.27 |
| Intel Core 2 Duo E8400 | GCC 4.4,3, ICC 11.10 | 20100509 | plain 64-bit, LC | 1908 | 12.60 | 15.51 | 10.26 |
| Intel Core 2 Quad Q9550 | GCC 4.4.1 | 20100712 | plain 64-bit, LC | 1921 | 12.67 | 15.39 | 10.27 |
| Intel Core i5 750 | GCC 4.4.1 | 20100425 | plain 64-bit, LC | 1926 | 10.98 | 14.08 | 10.61 |
| Intel Core i5 M 520 | GCC 4.4.3, ICC 11.10 | 20100120 | plain 64-bit, LC | 1794 | 10.87 | 13.90 | 10.48 |
| Intel Core i7 920 | GCC 4.4.4 | 20100611 | plain 64-bit, LC | 4200 | 13.09 | 16.94 | 11.45 |
| Intel Xeon E5420 | GCC 4.5.0, ICC 11.10 | 20100425 | plain 64-bit, LC | 1980 | 12.57 | 19.35 | 13.31 |
| Intel Xeon E5530 | GCC 4.4.1, ICC 11.10 | 20100702 | plain 64-bit, LC | 2000 | 13.11 | 16.94 | 12.80 |
32-bit platforms
| Processor | Compiler(s) | SUPERCOP version | Implementation | Short (cycles) | Long (c/b) | SHA-256 (c/b) | SHA-512 (c/b) |
|---|---|---|---|---|---|---|---|
| AMD Athlon | GCC 4.4.1 | 20100509 | SIMD64 | 5886 | 40.35 | 19.55 | 69.81 |
| ARM XScale-PXA270 rev 4 | GCC 4.3.1 | 20090226 | plain 32-bit, LC, BI | (*)20339 | (*)127.59 | 37.97 | 2554.25 |
| Intel Pentium 2 | GCC 4.3.3 | 20090408 | SIMD64 | (*)6803 | (*)46.88 | 31.44 | 99.83 |
| Intel Pentium 3 | GCC 4.4.1 | 20100120 | SIMD64 | 5993 | 40.87 | 24.83 | 70.46 |
| Intel Pentium 4 | GCC 4.4.1 | 20100509 | SIMD64 | 7744 | 49.18 | 32.69 | 37.87 |
| Intel Pentium M | GCC 4.4.1 | 20100509 | SIMD64 | 5060 | 33.82 | 21.62 | 29.96 |
64-bit platforms used in 32-bit mode
| Processor | Compiler(s) | SUPERCOP version | Implementation | Short (cycles) | Long (c/b) | SHA-256 (c/b) | SHA-512 (c/b) |
|---|---|---|---|---|---|---|---|
| AMD Athlon 64 X2 | GCC 4.3.3 | 20090715 | plain 32-bit, LC, BI(T) | (*)5624 | (*)40.63 | 14.96 | 24.46 |
| AMD Phenom 9550 | GCC 4.4.1 | 20100610 | SIMD64 | 9916 | 34.21 | 14.89 | 21.72 |
| IBM POWER4 | GCC 4.0.0 | 20100611 | plain 32-bit, LC, BI(T) | 6656 | 46.23 | 22.18 | 45.22 |
| IBM POWER5 | GCC 4.3.3 | 20090223 | plain 32-bit, LC, BI(T) | (*)5788 | (*)37.63 | 20.16 | 39.03 |
| Intel Core 2 Duo E4600 | GCC 4.4.3, ICC 11.10 | 20100610 | SIMD128 | 3060 | 20.30 | 15.73 | 18.30 |
| Intel Core 2 Quad Q9550 | GCC 4.4.1 | 20100712 | SIMD128 | 3366 | 21.80 | 15.52 | 18.37 |
| Intel Core i5 750 | GCC 4.4.1 | 20100425 | SIMD128 | 2814 | 18.40 | 18.84 | 47.11 |
| Intel Xeon E5420 | GCC 4.5.0, ICC 11.10 | 20100425 | SIMD128 | 3143 | 19.73 | 18.52 | 97.16 |
| Intel Xeon E5530 | GCC 4.4.1, ICC 11.10 | 20100702 | SIMD128 | 3096 | 19.42 | 17.15 | 16.13 |
Legend
- SUPERCOP version is the version of the SUPERCOP benchmarking software. This also determines the version of the hash function implementations therein.
- Implementation: this characterises the fastest implementation on this machine. It includes the following properties:
- plain 32-bit: plain C code using 32-bit operations;
- plain 64-bit: plain C code using 64-bit operations;
- LC: code using the lane complementing technique (see the Keccak main document for more details);
- BI: code using the bit interleaving technique to use 32-bit rotations instead of 64-bit ones;
- BI(T): same, where the bit interleaving is implemented with tables;
- SIMD64: the code uses the 64-bit SIMD operations of the processor (MMX on the AMD and Intel processors);
- SIMD128: the code uses the 128-bit SIMD operations of the processor (SSE2 on the AMD and Intel processors).
- Short gives the number of cycles to hash a 1-block message (≤ 124 bytes when r=1024).
- Long gives the number of cycles per byte to hash a long message with Keccak[r=1024,c=576].
- SHA-256 gives, as a comparison point, the number of cycles per byte of the fastest implementation of SHA-256 on this machine.
- SHA-512: same, for SHA-512.
- (*): the number is extrapolated from Keccak[r=1024,c=576,nr=18].