poly1305: simplify reference implementation
Reduce code complexity by replacing the floating-point implementation
with a 32-bit implementation.
Moreover this improves the performance on 386:
name old time/op new time/op delta
64-2 972ns ± 2% 350ns ± 1% -64.04% (p=0.029 n=4+4)
1K-2 10.9µs ± 3% 4.2µs ± 1% -61.11% (p=0.029 n=4+4)
64Unaligned-2 969ns ± 2% 354ns ± 2% -63.44% (p=0.029 n=4+4)
1KUnaligned-2 10.8µs ± 3% 4.2µs ± 1% -61.15% (p=0.029 n=4+4)
name old speed new speed delta
64-2 65.8MB/s ± 2% 182.9MB/s ± 1% +177.93% (p=0.029 n=4+4)
1K-2 94.3MB/s ± 3% 242.3MB/s ± 1% +157.08% (p=0.029 n=4+4)
64Unaligned-2 66.0MB/s ± 2% 180.4MB/s ± 2% +173.32% (p=0.029 n=4+4)
1KUnaligned-2 94.4MB/s ± 3% 243.0MB/s ± 1% +157.36% (p=0.029 n=4+4)
There are already optimized versions for amd64 and arm,
and a optimized version for s390x seems to be planned.
See: https://go-review.googlesource.com/#/c/32812/
Change-Id: I7a5ac62ae33727b0e6060cb966de73a468317e30
Reviewed-on: https://go-review.googlesource.com/35294
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Adam Langley <agl@golang.org>
1 file changed