Software performance figures
On this page, we summarize the performance figures of Keccak[r=1024,c=576] as they can be found in eBASH. We select results with recent compilers (preferably GCC ≥ 4.4) and recent SUPERCOP (or XBX) versions on a given processor.
For long messages, the speed is directly proportional to the bitrate r. The figures below are for the nominal value r=1024. To estimate the performance for other bitrate values, one can simply multiply the number of cycles/byte by 1024/r. For instance, Keccak[r=1088,c=512] is 6.25% faster than Keccak[] and it is also benchmarked on eBASH under the name keccakc512.
For short messages, the speed is determined by the number of calls to Keccak-f, i.e., just one when the message length is ≤r-2 bits.
64-bit platforms
| Processor | Compiler(s) | SUPERCOP version | Implementation | Short (cycles) | Long (c/b) | SHA-256 (c/b) | SHA-512 (c/b) |
|---|---|---|---|---|---|---|---|
| AMD Athlon 64 X2 | GCC 4.4.3 | 20110106 | plain 64-bit, LC | 1974 | 13.05 | 14.88 | 9.93 |
| AMD Phenom 9550 | GCC 4.4.1 | 20110106 | plain 64-bit, LC | 2016 | 12.99 | 15.06 | 9.92 |
| AMD Phenom II X4 955 | GCC 4.4.1 | 20100120 | plain 64-bit, LC | 2026 | 13.07 | 15.04 | 11.83 |
| AMD Phenom II X6 1090T | GCC 4.4.3 | 20101204 | plain 64-bit, LC | 2670 | 12.98 | 15.05 | 11.51 |
| HP Itanium II | GCC 3.2.3 | 20101111 | plain 64-bit | 1318 | 6.28 | 20.47 | 9.30 |
| HP Itanium II | GCC 3.3.3 | 20110106 | plain 64-bit | 1428 | 7.03 | 22.39 | 11.36 |
| IBM POWER4 | GCC 4.0.0, XLC 8.0 | 20110106 | plain 64-bit, LC | 3480 | 20.92 | 25.34 | 15.37 |
| IBM POWER5 | GCC 4.4.3 | 20101002 | plain 64-bit, LC | 2984 | 16.91 | 22.19 | 13.52 |
| IBM PowerPC G5 970 | GCC 4.3.2 | 20110106 | plain 64-bit, LC | 3348 | 19.46 | 22.28 | 13.32 |
| ICT Loongson-2 V0.3 | GCC 4.3.3 | 20101002 | plain 64-bit, LC | 5416 | 24.72 | 35.03 | 24.27 |
| Intel Core 2 Duo | GCC 4.4.3 | 20110106 | plain 64-bit, LC | 2008 | 12.64 | 15.34 | 11.73 |
| Intel Core 2 Duo E4600 | GCC 4.4.3, ICC 11.10 | 20110106 | plain 64-bit, LC | 3965 | 12.63 | 15.55 | 10.27 |
| Intel Core 2 Duo E8400 | GCC 4.4,3, ICC 11.10 | 20101204 | plain 64-bit, LC | 1926 | 12.66 | 15.28 | 10.22 |
| Intel Core 2 Quad Q9550 | GCC 4.4.1 | 20110106 | plain 64-bit, LC | 1938 | 12.64 | 15.26 | 10.26 |
| Intel Core i5 750 | GCC 4.4.1 | 20100425 | plain 64-bit, LC | 1926 | 10.98 | 14.08 | 10.61 |
| Intel Core i5 M 520 | GCC 4.4.3, ICC 11.10 | 20100120 | plain 64-bit, LC | 1794 | 10.87 | 13.90 | 10.48 |
| Intel Core i7 920 | GCC 4.4.4 | 20100611 | plain 64-bit, LC | 4200 | 13.09 | 16.94 | 11.45 |
| Intel Xeon E5420 | GCC 4.6,0 | 20110106 | plain 64-bit, LC | 1935 | 12.64 | 15.16 | 11.79 |
| Intel Xeon E5530 | GCC 4.4.1, ICC 11.10 | 20110106 | plain 64-bit, LC | 2012 | 13.12 | 16.92 | 11.82 |
| Sun UltraSPARC IIIi | GCC 3.4.3 | 20101111 | plain 64-bit | 5724 | 37.89 | 27.71 | 20.50 |
| Sun UltraSPARC T1 | GCC 4.3.2 | 20100821 | plain 64-bit, LC | 18852 | 81.96 | 75.00 | 131.26 |
32-bit platforms
| Processor | Compiler(s) | SUPERCOP version | Implementation | Short (cycles) | Long (c/b) | SHA-256 (c/b) | SHA-512 (c/b) |
|---|---|---|---|---|---|---|---|
| AMD Athlon | GCC 4.4.3 | 20110106 | SIMD64 | 5430 | 37.97 | 19.53 | 70.65 |
| Atmel AT91RM9200 | 20101017 (XBX) | plain 32-bit, BI | 29025 | 115.00 | 47.37 | 122.51 | |
| Freescale i.MX515 | GCC 4.4.1 | 20110106 | plain 32-bit, BI | 8064 | 62.88 | 22.31 | 89.50 |
| Intel Pentium 3 | GCC 4.4.1 | 20101204 | SIMD64 | 5995 | 40.86 | 24.80 | 67.47 |
| Intel Pentium 4 | GCC 4.4.1 | 20110106 | plain 32-bit, BI | 7484 | 48.89 | 35.88 | 37.44 |
| Intel Pentium M | GCC 4.4.1 | 20100509 | SIMD64 | 5060 | 33.82 | 21.62 | 29.96 |
| Luminary Micro LM3S811 | 20101114 (XBX) | ARM assembly | 16466 | 103.19 | 40.64 | 172.77 | |
| Motorola PowerPC 750CXe | GCC 4.3.2 | 20110106 | plain 32-bit, BI | 7440 | 46.82 | 21.08 | 54.38 |
| Motorola PowerPC G4 7410 | GCC 4.3.2 | 20110106 | plain 32-bit, BI | 7408 | 46.72 | 21.17 | 54.10 |
| Motorola PowerPC G4 7447a | GCC 3.3 | 20110106 | plain 32-bit, LC, BI(T) | 7514 | 52.59 | 16.59 | 44.99 |
| TI OMAP 2420 | GCC 3.4.4 | 20101204 | plain 32-bit, BI | 14439 | 97.37 | 47.11 | 117.95 |
| TI AR7 (4KEc) | 20101114 (XBX) | plain 32-bit, BI | 70596 | 148.32 | 84.00 | 140.48 |
64-bit platforms used in 32-bit mode
| Processor | Compiler(s) | SUPERCOP version | Implementation | Short (cycles) | Long (c/b) | SHA-256 (c/b) | SHA-512 (c/b) |
|---|---|---|---|---|---|---|---|
| AMD Athlon 64 X2 | GCC 4.4.3 | 20110106 | SIMD64 | 5093 | 35.71 | 14.92 | 23.55 |
| AMD Phenom 9550 | GCC 4.4.1 | 20110106 | SIMD64 | 4979 | 34.21 | 14.99 | 21.48 |
| IBM POWER4 | GCC 4.0.0 | 20101111 | plain 32-bit, BI | 8080 | 46.88 | 21.25 | 44.03 |
| IBM POWER5 | GCC 4.4.3 | 20101002 | plain 32-bit, LC, BI(T) | 5376 | 35.52 | 19.66 | 39.19 |
| IBM PowerPC G5 970 | GCC 4.3.2 | 20110106 | plain 32-bit, LC, BI(T) | 6480 | 43.72 | 20.65 | 44.88 |
| ICT Loongson-2 V0.3 | GCC 4.3.3 | 20101002 | plain 64-bit, LC | 5410 | 25.29 | 36.75 | 24.37 |
| ICT Loongson-2 V0.3 | GCC 4.3.3 | 20101002 | plain 64-bit, LC | 9428 | 54.49 | 33.46 | 53.54 |
| Intel Core 2 Duo E4600 | GCC 4.4.3, ICC 11.10 | 20110106 | SIMD128 | 6048 | 19.18 | 15.61 | 18.13 |
| Intel Core 2 Quad Q9550 | GCC 4.4.1 | 20110106 | SIMD128 | 3392 | 21.95 | 15.48 | 18.34 |
| Intel Core i5 750 | GCC 4.4.1 | 20100425 | SIMD128 | 2814 | 18.40 | 18.84 | 47.11 |
| Intel Xeon E5420 | GCC 4.6,0 | 20110106 | SIMD128 | 3413 | 22.07 | 17.55 | 56.26 |
| Intel Xeon E5530 | GCC 4.4.1 | 20110106 | SIMD128 | 3336 | 21.63 | 17.26 | 16.05 |
Legend
- SUPERCOP version is the version of the SUPERCOP benchmarking software. This also determines the version of the hash function implementations therein.
- (XBX): the benchmark uses the eXternal Benchmarking eXtension (XBX) to SUPERCOP.
- Implementation: this characterises the fastest implementation on this machine. It includes the following properties:
- plain 32-bit: plain C code using 32-bit operations;
- plain 64-bit: plain C code using 64-bit operations;
- LC: code using the lane complementing technique (see the Keccak implementation overview for more details);
- BI: code using the bit interleaving technique to use 32-bit rotations instead of 64-bit ones;
- BI(T): same, where the bit interleaving is implemented with tables;
- SIMD64: the code uses the 64-bit SIMD operations of the processor (MMX on the AMD and Intel processors);
- SIMD128: the code uses the 128-bit SIMD operations of the processor (SSE2 on the AMD and Intel processors).
- Short gives the number of cycles to hash a 1-block message (≤ 127 bytes when r=1024).
- Long gives the number of cycles per byte to hash a long message with Keccak[r=1024,c=576].
- SHA-256 gives, as a comparison point, the number of cycles per byte of the fastest implementation of SHA-256 on this machine.
- SHA-512: same, for SHA-512.