Refactored FIRfilter and TDStretch hot-spot routines to help compiler
perform more efficient autovectorization.
Benchmarked:
- 2x/3x improvement in gcc-generated x86 SIMD code execution
times for SSE2/AVX instruction extensions accordingly, when
hand-tuned SSE intrinsics were disabled. Hand-tuned SSE code
still is slightly faster than gcc-produced AVX.
- 2.4x improvement for cumulative ARM NEON tunings when compared to
previous SoundTouch release.
Signed-off-by: Olli Parviainen <oparviai'at'iki.fi>