Optimize floating-point celt_inner_prod() and dual_inner_prod() for ARM NEON

The floating-point optimizations are not bit exact with C functions,
because of the different orders of floating-point operations.
But they are bit exact with the simulation C functions which simulate
the floating operations in the optimizations.

Change-Id: I149fda5b602fd5712b16fc8983df3c6c0c9e76ad

Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
4 files changed