1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-02 03:06:28 +02:00
FFmpeg/libavutil
Lynne f932b89ea3
lavu/tx: implement aarch64 NEON SIMD FFT
The fastest fast Fourier transform in not just the west, but the world,
now for the most popular toy ISA.

On a high level, it follows the design of the AVX2 version closely,
with the exception that the input is slightly less permuted as we don't have
to do lane switching with the input on double 4pt and 8pt.

On a low level, the lack of subadd/addsub instructions REALLY penalizes
any attempt at writing an FFT. That single register matters a lot,
and reloading it simply takes unacceptably long.
In x86 land, vendors would've noticed developers need this.
In ARM land, you get a badly designed complex multiplication instruction
we cannot use, that's not present on 95% of devices. Because only
compilers matter, right?

Future optimization options are very few, perhaps better register
management to use more ld1/st1s.

All timings below are in cycles:
A53:
Length | C           | New (lavu)  | Old (lavc)  | FFTW
------ |-------------|-------------|-------------|-----
4      |         842 | 420         | 1210        | 1460
8      |        1538 | 1020        | 1850        | 2520
16     |        3717 | 1900        | 3700        | 3990
32     |        9156 | 4070        | 8289        | 8860
64     |       21160 | 9931        | 18600       | 19625
128    |       49180 | 23278       | 41922       | 41922
256    |      112073 | 53876       | 93202       | 101092
512    |      252864 | 122884      | 205897      | 207868
1024   |      560512 | 278322      | 458071      | 453053
2048   |     1295402 | 775835      | 1038205     | 1020265
4096   |     3281263 | 2021221     | 2409718     | 2577554
8192   |     8577845 | 4780526     | 5673041     | 6802722

Apple M1
New  - Total for len 512 reps 2097152 = 1.459141 s
Old  - Total for len 512 reps 2097152 = 2.251344 s
FFTW - Total for len 512 reps 2097152 = 1.868429 s

New  - Total for len 1024 reps 4194304 = 6.490080 s
Old  - Total for len 1024 reps 4194304 = 9.604949 s
FFTW - Total for len 1024 reps 4194304 = 7.889281 s

New  - Total for len 16384 reps 262144 = 10.374001 s
Old  - Total for len 16384 reps 262144 = 15.266713 s
FFTW - Total for len 16384 reps 262144 = 12.341745 s

New  - Total for len 65536 reps 8192 = 1.769812 s
Old  - Total for len 65536 reps 8192 = 4.209413 s
FFTW - Total for len 65536 reps 8192 = 3.012365 s

New  - Total for len 131072 reps 4096 = 1.942836 s
Old  - Segfaults
FFTW - Total for len 131072 reps 4096 = 3.713713 s

Thanks to wbs for some simplifications, assembler fixes and a review
and to jannau for giving it a look.
2022-08-25 17:40:28 +02:00
..
aarch64 lavu/tx: implement aarch64 NEON SIMD FFT 2022-08-25 17:40:28 +02:00
arm avutil: use getauxval(3) for CPU capabilities on linux/android ARM 2022-02-07 13:42:40 -08:00
avr32
bfin
loongarch avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx 2022-08-05 02:59:58 +02:00
mips avutil/mips: Use $at as MMI macro temporary register 2021-07-28 23:31:48 +02:00
ppc avutil/ppc/cpu: Use proper header for OpenBSD PPC CPU detection 2022-06-25 12:16:51 +02:00
sh4
tests avutil/test/pixfmt_best: test the VUYA pixel format 2022-08-07 09:33:16 -03:00
tomi
x86 x86/tx_float: save a branch during coefficient deinterleaving 2022-08-09 03:35:12 +02:00
.gitignore
adler32.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
adler32.h avutil: Switch crypto APIs to size_t 2021-04-27 10:43:13 -03:00
aes_ctr.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
aes_ctr.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
aes_internal.h All: update names in copyright headers 2021-01-20 01:02:56 -06:00
aes.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
aes.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
attributes.h avutil/attributes: add support for clang in AV_NOWARN_DEPRECATED 2022-03-16 12:29:37 -03:00
audio_fifo.c avutil/audio_fifo: Avoid avutil.h inclusion 2022-02-24 12:56:49 +01:00
audio_fifo.h avutil/audio_fifo: Avoid avutil.h inclusion 2022-02-24 12:56:49 +01:00
avassert.h avutil/avassert: Don't include avutil.h 2022-02-24 12:56:49 +01:00
avsscanf.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
avstring.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
avstring.h avutil/{avstring,bprint}: add XML escaping from ffprobe to avutil 2021-03-05 19:45:00 +02:00
avutil.h libavutil: Deprecate av_fopen_utf8, provide an avpriv version 2022-05-23 13:52:26 +03:00
avutilres.rc
base64.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
base64.h
blowfish.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
blowfish.h
bprint.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
bprint.h
bswap.h
buffer_internal.h Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
buffer.c avutil/buffer: Never poison returned buffers 2022-08-10 18:49:35 +02:00
buffer.h avutil/buffer: constify some function parameters 2021-09-17 13:28:09 -03:00
camellia.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
camellia.h
cast5.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
cast5.h
channel_layout.c Revert "avutil/channel_layout: av_channel_layout_describe_bprint: Check for buffer end" 2022-07-04 14:04:54 -03:00
channel_layout.h channel_layout: add support for Ambisonic 2022-03-15 09:42:47 -03:00
color_utils.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
color_utils.h
colorspace.h
common.h Remove obsolete version.h inclusions 2022-02-24 12:56:49 +01:00
cpu_internal.h avutil/cpu_internal: Fix check for SSE2SLOW 2022-06-18 19:25:03 +02:00
cpu.c all: Replace if (ARCH_FOO) checks by #if ARCH_FOO 2022-06-15 04:56:37 +02:00
cpu.h avutil/cpu: add AVX512 Icelake flag 2022-03-10 16:45:48 -03:00
crc.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
crc.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
csp.c avutil/csp: create public API for colorspace structs 2022-06-01 13:52:38 -04:00
csp.h avutil/csp: create public API for colorspace structs 2022-06-01 13:52:38 -04:00
cuda_check.h avutil/log: Don't include avutil.h 2022-02-24 12:56:49 +01:00
des.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
des.h
detection_bbox.c avutil/detection_bbox: Fix av_detection_bbox_alloc failed if nb_bboxes == 0 2021-10-08 10:11:59 +08:00
detection_bbox.h lavu/detection_bbox.h: use AV_NUM_DETECTION_BBOX_CLASSIFY to replace AV_NUM_BBOX_CLASSIFY 2021-04-18 10:41:17 +08:00
dict.c
dict.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
display.c avutil/display: Don't include avutil.h 2022-02-24 12:56:49 +01:00
display.h avutil/display: Don't include avutil.h 2022-02-24 12:56:49 +01:00
dovi_meta.c lavu/frame: Add Dolby Vision metadata side data type 2022-01-04 11:59:02 +01:00
dovi_meta.h lavu/frame: Add Dolby Vision metadata side data type 2022-01-04 11:59:02 +01:00
downmix_info.c
downmix_info.h
dynarray.h
encryption_info.c Replace all occurences of av_mallocz_array() by av_calloc() 2021-09-20 01:03:52 +02:00
encryption_info.h
error.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
error.h avutil/error: Include macros.h for MKTAG 2021-07-29 22:02:05 +02:00
eval.c libavutil/eval: Remove CONFIG_TRAPV special handling 2021-02-10 12:28:29 +01:00
eval.h avutil/eval: Don't include avutil.h 2022-02-24 12:56:49 +01:00
ffmath.h
fifo.c avutil/fifo: Don't include avutil.h 2022-02-24 12:56:49 +01:00
fifo.h avutil/fifo: Don't include avutil.h 2022-02-24 12:56:49 +01:00
file_open.c avutil/wchar_filename,file_open: Support long file names on Windows 2022-06-09 13:03:47 +03:00
file.c
file.h avutil/file: Don't include avutil.h 2022-02-24 12:56:49 +01:00
film_grain_params.c libavutil: introduce AVFilmGrainParams side data 2020-11-25 23:06:33 +01:00
film_grain_params.h avcodec/h264_slice: compute and export film grain seed 2021-08-24 09:58:52 -03:00
fixed_dsp.c all: Replace if (ARCH_FOO) checks by #if ARCH_FOO 2022-06-15 04:56:37 +02:00
fixed_dsp.h Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
float2half.c avutil/half2float: use native _Float16 if available 2022-08-19 22:09:36 +02:00
float2half.h avutil/half2float: use native _Float16 if available 2022-08-19 22:09:36 +02:00
float_dsp.c all: Replace if (ARCH_FOO) checks by #if ARCH_FOO 2022-06-15 04:56:37 +02:00
float_dsp.h
frame.c lavu/frame: allow calling av_frame_make_writable() on non-refcounted frames 2022-08-02 10:44:37 +02:00
frame.h lavu/frame: allow calling av_frame_make_writable() on non-refcounted frames 2022-08-02 10:44:37 +02:00
getenv_utf8.h libavutil: Add wchartoutf8(), wchartoansi(), utf8toansi(), getenv_utf8(), freeenv_utf8() and getenv_dup() 2022-06-21 13:27:46 +03:00
half2float.c avutil/half2float: use native _Float16 if available 2022-08-19 22:09:36 +02:00
half2float.h avutil/half2float: use native _Float16 if available 2022-08-19 22:09:36 +02:00
hash.c avutil: Switch crypto APIs to size_t 2021-04-27 10:43:13 -03:00
hash.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
hdr_dynamic_metadata.c
hdr_dynamic_metadata.h
hdr_dynamic_vivid_metadata.c avutil: support for CUVA Vivid HDR metadata 2022-03-01 09:08:43 +08:00
hdr_dynamic_vivid_metadata.h avutil: support for CUVA Vivid HDR metadata 2022-03-01 09:08:43 +08:00
hmac.c Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
hmac.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
hwcontext_cuda_internal.h
hwcontext_cuda.c avutil/hwcontext_cuda: return more useful error codes from init functions 2021-11-22 23:03:21 +01:00
hwcontext_cuda.h
hwcontext_d3d11va.c avutil/hwcontext_d3d11va: add support for rgbaf16 pixel format 2022-08-13 15:21:59 +02:00
hwcontext_d3d11va.h libavutil/hwcontext_d3d11va: adding more texture information to the D3D11 hwcontext API 2021-09-08 17:48:02 -03:00
hwcontext_drm.c hwcontext_drm: make dependency on Linux kernel headers optional 2020-12-30 23:14:46 +01:00
hwcontext_drm.h
hwcontext_dxva2.c avutil/hwcontext_dxva2: add ARGB format 2021-11-13 19:22:57 +01:00
hwcontext_dxva2.h
hwcontext_internal.h Revert "avutils/hwcontext: When deriving a hwdevice, search for existing device in both directions" 2022-01-05 11:56:58 +08:00
hwcontext_mediacodec.c
hwcontext_mediacodec.h
hwcontext_opencl.c qsv: remove mfx/ prefix from mfx headers 2022-08-12 10:43:39 +08:00
hwcontext_opencl.h
hwcontext_qsv.c lavu/hwcontext_qsv: make qsv hwdevice works with oneVPL 2022-08-12 10:43:39 +08:00
hwcontext_qsv.h lavu/hwcontext_qsv: add loader field to AVQSVDeviceContext 2022-08-12 10:43:39 +08:00
hwcontext_stub.c )hwcontext: add a stub implementation for Vulkan functions 2022-07-05 15:20:08 +02:00
hwcontext_vaapi.c lavu/hwcontext_vaapi: Map the AYUV format 2022-08-03 14:10:12 -07:00
hwcontext_vaapi.h
hwcontext_vdpau.c avutil/buffer: Switch AVBuffer API to size_t 2021-04-27 10:43:13 -03:00
hwcontext_vdpau.h
hwcontext_videotoolbox.c avutil/hwcontext_videotoolbox: create real buffer pool 2022-04-29 17:27:37 +08:00
hwcontext_videotoolbox.h avutil/hwcontext_videotoolbox: add missing include for AVFrame 2022-08-08 11:08:55 +08:00
hwcontext_vulkan.c avutil/hwcontext_vulkan: fix typo in undef 2022-03-14 17:50:07 +01:00
hwcontext_vulkan.h hwcontext_vulkan: stricter semaphore number requirements 2021-12-10 17:04:22 +01:00
hwcontext.c lavu/hwcontext: clarify behavior on av_hwframe_map() failure 2022-02-17 11:05:44 +01:00
hwcontext.h lavu/hwcontext: clarify behavior on av_hwframe_map() failure 2022-02-17 11:05:44 +01:00
imgutils_internal.h
imgutils.c imgutils: expose av_image_copy_plane_uc_from() 2021-08-14 00:27:43 +02:00
imgutils.h avutil/imgutils: Don't include avutil.h 2022-02-24 12:56:49 +01:00
integer.c avutil/integer: Don't include common.h 2022-02-24 12:56:49 +01:00
integer.h avutil/integer: Don't include common.h 2022-02-24 12:56:49 +01:00
internal.h libavutil: Deprecate av_fopen_utf8, provide an avpriv version 2022-05-23 13:52:26 +03:00
intfloat.h
intmath.c
intmath.h
intreadwrite.h
lfg.c
lfg.h
libavutil.v
libm.h
lls.c all: Replace if (ARCH_FOO) checks by #if ARCH_FOO 2022-06-15 04:56:37 +02:00
lls.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
log2_tab.c
log.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
log.h avutil/log: Don't include avutil.h 2022-02-24 12:56:49 +01:00
lzo.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
lzo.h
macos_kperf.c lavu/kperf: use ff_thread_once() 2021-07-21 16:35:27 +02:00
macos_kperf.h lavu/kperf: use ff_thread_once() 2021-07-21 16:35:27 +02:00
macros.h avutil/common, macros: Move several macros from common.h to macros.h 2021-07-29 22:02:05 +02:00
Makefile configure: always enable gnu_windres if available 2022-08-13 14:42:36 +02:00
mastering_display_metadata.c
mastering_display_metadata.h
mathematics.c avutil/avassert: Don't include avutil.h 2022-02-24 12:56:49 +01:00
mathematics.h avutil/mathematics: Document av_rescale_rnd() behavior on non int64 results 2021-10-21 14:13:03 +02:00
md5.c avutil/md5: Avoid av_unused variable 2021-10-02 17:13:57 +02:00
md5.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
mem_internal.h avutil/mem_internal: Fix headers 2022-08-24 03:43:52 +02:00
mem.c avutil/mem: Handle fast allocations near UINT_MAX properly 2022-07-06 22:53:15 +02:00
mem.h avutil/mem: fix doc for reallocs 2022-05-26 17:18:23 +08:00
motion_vector.h
murmur3.c avutil: Switch crypto APIs to size_t 2021-04-27 10:43:13 -03:00
murmur3.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
objc.h avutil: add obj-c helpers into header-only include 2021-12-18 11:55:47 -08:00
opt.c avutil/opt: Combine multiple av_log statements 2022-08-03 21:09:24 +02:00
opt.h lavu: support AVChannelLayout AVOptions 2022-03-15 09:42:29 -03:00
parseutils.c avutil/parseutils: use quadhd for Quad HD 2022-01-12 13:42:26 +08:00
parseutils.h
pca.c
pca.h
pixdesc.c lavu/pixfmt: add packed RGBA float16 format 2022-08-13 15:21:46 +02:00
pixdesc.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
pixelutils.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
pixelutils.h avutil/pixelutils: Don't include common.h 2022-02-24 12:56:49 +01:00
pixfmt.h lavu/pixfmt: add packed RGBA float16 format 2022-08-13 15:21:46 +02:00
qsort.h Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
random_seed.c
random_seed.h
rational.c
rational.h
rc4.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
rc4.h
replaygain.h
reverse.c
reverse.h
ripemd.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
ripemd.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
samplefmt.c avutil/samplefmt: Don't include attributes.h, avutil.h 2022-02-24 12:56:49 +01:00
samplefmt.h avutil/samplefmt: Don't include attributes.h, avutil.h 2022-02-24 12:56:49 +01:00
sha512.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
sha512.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
sha.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
sha.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
slicethread.c avutil/avassert: Don't include avutil.h 2022-02-24 12:56:49 +01:00
slicethread.h
softfloat_ieee754.h
softfloat_tables.h
softfloat.h
spherical.c avutil/spherical: Use av_strstart instead of strncmp 2021-02-28 17:14:21 +01:00
spherical.h
stereo3d.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
stereo3d.h
tablegen.h
tea.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
tea.h
thread.h avutil/log: Don't include avutil.h 2022-02-24 12:56:49 +01:00
threadmessage.c avutil/fifo: Don't include avutil.h 2022-02-24 12:56:49 +01:00
threadmessage.h
time_internal.h
time.c lavu: use address-of operator checking clock_gettime 2020-12-28 01:12:26 -03:00
time.h
timecode.c avutil/timecode: use timecode fps for number of frame digits 2022-04-22 22:54:58 +02:00
timecode.h avutil/timecode: add av_timecode_init_from_components 2020-12-03 18:32:54 +01:00
timer.h avutil/log: Don't include avutil.h 2022-02-24 12:56:49 +01:00
timestamp.h
tree.c
tree.h Remove obsolete version.h inclusions 2021-07-22 14:34:31 +02:00
twofish.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
twofish.h
tx_double.c
tx_float.c
tx_int32.c
tx_priv.h lavu/tx: implement aarch64 NEON SIMD FFT 2022-08-25 17:40:28 +02:00
tx_template.c lavu/tx: optimize and simplify inverse MDCTs 2022-08-16 01:22:38 +02:00
tx.c lavu/tx: implement aarch64 NEON SIMD FFT 2022-08-25 17:40:28 +02:00
tx.h avutil/tx: Fix documentation of av_tx_uninit() 2022-02-11 19:38:41 +01:00
utils.c lib*/version: Move library version functions into files of their own 2022-05-10 06:49:32 +02:00
uuid.c avutil/uuid: add utility library for manipulating UUIDs as specified in RFC 4122 2022-06-12 18:34:28 +10:00
uuid.h avutil/uuid: add utility library for manipulating UUIDs as specified in RFC 4122 2022-06-12 18:34:28 +10:00
version_major.h Fix libversion.sh for split version headers, to unbreak shared library builds 2022-03-17 11:11:17 +02:00
version.c lib*/version: Move library version functions into files of their own 2022-05-10 06:49:32 +02:00
version.h lavu/pixfmt: add packed RGBA float16 format 2022-08-13 15:21:46 +02:00
video_enc_params.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
video_enc_params.h mpegvideo: use the AVVideoEncParams API for exporting QP tables 2021-01-01 14:23:19 +01:00
vulkan_functions.h hwcontext_vulkan: avoid using 64-bit enums 2022-01-27 10:27:09 +01:00
vulkan_glslang.c avutil/vulkan_glslang: fix compiling failure issue 2021-11-19 16:47:48 +01:00
vulkan_loader.h vulkan_loader: fix typo in error message 2021-11-18 06:40:52 +01:00
vulkan_shaderc.c lavu/vulkan: add support for using libshaderc as a GLSL compiler 2021-11-19 16:47:30 +01:00
vulkan.c lavu/vulkan: avoid using strlen as a loop condition 2022-02-22 06:30:12 +01:00
vulkan.h vulkan: fix checkheaders 2021-11-19 16:47:28 +01:00
wchar_filename.h avutil/wchar_filename: Make the header C++ compatible 2022-06-28 10:59:31 +02:00
xga_font_data.c
xga_font_data.h
xtea.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
xtea.h