Segmented loads are slow, so here we use unit-strided load and narrowing shifts.
c910:
fcmul_add_c: 2179
fcmul_add_rvv_f64: 1652
c908:
fcmul_add_c: 4891.2
fcmul_add_rvv_f64: 2399.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
This is more spec-compliant because it does not rely
on dead-code elimination by the compiler. Especially
MSVC has problems with this, as can be seen in
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html
or
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html
This commit does not eliminate every instance where we rely
on dead code elimination: It only tackles branching to
the initialization of arch-specific dsp code, not e.g. all
uses of CONFIG_ and HAVE_ checks. But maybe it is already
enough to compile FFmpeg with MSVC with whole-programm-optimizations
enabled (if one does not disable too many components).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to inline it in af_afir.c (regardless of interposing);
moreover it removes a dependency of the checkasm test on
lavfi/af_afir.o.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only the AudioFIRDSPContext and the functions for its initialization
are needed outside of lavfi/af_afir.c.
Also rename the header to af_afirdsp.h to reflect the change.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>