1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-02 03:06:28 +02:00
Go to file
Lynne f932b89ea3
lavu/tx: implement aarch64 NEON SIMD FFT
The fastest fast Fourier transform in not just the west, but the world,
now for the most popular toy ISA.

On a high level, it follows the design of the AVX2 version closely,
with the exception that the input is slightly less permuted as we don't have
to do lane switching with the input on double 4pt and 8pt.

On a low level, the lack of subadd/addsub instructions REALLY penalizes
any attempt at writing an FFT. That single register matters a lot,
and reloading it simply takes unacceptably long.
In x86 land, vendors would've noticed developers need this.
In ARM land, you get a badly designed complex multiplication instruction
we cannot use, that's not present on 95% of devices. Because only
compilers matter, right?

Future optimization options are very few, perhaps better register
management to use more ld1/st1s.

All timings below are in cycles:
A53:
Length | C           | New (lavu)  | Old (lavc)  | FFTW
------ |-------------|-------------|-------------|-----
4      |         842 | 420         | 1210        | 1460
8      |        1538 | 1020        | 1850        | 2520
16     |        3717 | 1900        | 3700        | 3990
32     |        9156 | 4070        | 8289        | 8860
64     |       21160 | 9931        | 18600       | 19625
128    |       49180 | 23278       | 41922       | 41922
256    |      112073 | 53876       | 93202       | 101092
512    |      252864 | 122884      | 205897      | 207868
1024   |      560512 | 278322      | 458071      | 453053
2048   |     1295402 | 775835      | 1038205     | 1020265
4096   |     3281263 | 2021221     | 2409718     | 2577554
8192   |     8577845 | 4780526     | 5673041     | 6802722

Apple M1
New  - Total for len 512 reps 2097152 = 1.459141 s
Old  - Total for len 512 reps 2097152 = 2.251344 s
FFTW - Total for len 512 reps 2097152 = 1.868429 s

New  - Total for len 1024 reps 4194304 = 6.490080 s
Old  - Total for len 1024 reps 4194304 = 9.604949 s
FFTW - Total for len 1024 reps 4194304 = 7.889281 s

New  - Total for len 16384 reps 262144 = 10.374001 s
Old  - Total for len 16384 reps 262144 = 15.266713 s
FFTW - Total for len 16384 reps 262144 = 12.341745 s

New  - Total for len 65536 reps 8192 = 1.769812 s
Old  - Total for len 65536 reps 8192 = 4.209413 s
FFTW - Total for len 65536 reps 8192 = 3.012365 s

New  - Total for len 131072 reps 4096 = 1.942836 s
Old  - Segfaults
FFTW - Total for len 131072 reps 4096 = 3.713713 s

Thanks to wbs for some simplifications, assembler fixes and a review
and to jannau for giving it a look.
2022-08-25 17:40:28 +02:00
compat compat: add msvc windres wrapper 2022-08-13 14:42:52 +02:00
doc qsvenc_{hevc,h264}: add scenario option 2022-08-23 12:42:19 +08:00
ffbuild ffbuild/common: Fix CPPFLAGS applied for compiling C++ files 2022-05-24 21:30:52 +02:00
fftools fftools/ffmpeg_opt: try to propagate the requested output channel layout 2022-08-23 13:03:56 -03:00
libavcodec avcodec/wavpack: fix regression in decoding 2022-08-25 09:12:17 +02:00
libavdevice configure: always enable gnu_windres if available 2022-08-13 14:42:36 +02:00
libavfilter avfilter/af_silenceremove: do not trim non-silence from start 2022-08-23 22:18:02 +02:00
libavformat avformat/riff: add support for ICMV files 2022-08-25 09:12:07 +02:00
libavutil lavu/tx: implement aarch64 NEON SIMD FFT 2022-08-25 17:40:28 +02:00
libpostproc configure: always enable gnu_windres if available 2022-08-13 14:42:36 +02:00
libswresample lswr: take const AVChannelLayout* in swr_alloc_set_opts2() 2022-08-24 18:31:05 -05:00
libswscale swscale/x86/rgb_2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext 2022-08-23 12:21:00 +02:00
presets
tests avutil/half2float: adjust conversion of NaN 2022-08-19 22:09:36 +02:00
tools tools/target_dec_fuzzer: Adjust threshold for ZLIB 2022-08-23 20:03:03 +02:00
.gitattributes
.gitignore gitignore: add config_components.h 2022-03-17 18:35:41 -03:00
.mailmap
.travis.yml
Changelog lavc/vaapi_encode: enable 8bit 4:4:4 encoding for HEVC and VP9 2022-08-09 09:22:49 -07:00
configure configure: enable the av1_frame_split bsf for the av1 decoder 2022-08-23 14:30:28 +02:00
CONTRIBUTING.md
COPYING.GPLv2
COPYING.GPLv3
COPYING.LGPLv2.1
COPYING.LGPLv3
CREDITS
INSTALL.md
LICENSE.md
MAINTAINERS MAINTAINERS: Add ED25519 key for signing my commits in the future 2022-08-09 21:57:31 +02:00
Makefile Makefile: Prompt for reconfigure on lavc/hwaccels.h modification 2022-07-01 00:34:38 +02:00
README.md
RELEASE RELEASE: update after 5.1 branch 2022-07-13 00:31:42 +02:00

FFmpeg README

FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

  • libavcodec provides implementation of a wider range of codecs.
  • libavformat implements streaming protocols, container formats and basic I/O access.
  • libavutil includes hashers, decompressors and miscellaneous utility functions.
  • libavfilter provides means to alter decoded audio and video through a directed graph of connected filters.
  • libavdevice provides an abstraction to access capture and playback devices.
  • libswresample implements audio mixing and resampling routines.
  • libswscale implements color conversion and scaling routines.

Tools

  • ffmpeg is a command line toolbox to manipulate, convert and stream multimedia content.
  • ffplay is a minimalistic multimedia player.
  • ffprobe is a simple analysis tool to inspect multimedia content.
  • Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Coding examples are available in the doc/examples directory.

License

FFmpeg codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.

Contributing

Patches should be submitted to the ffmpeg-devel mailing list using git format-patch or git send-email. Github pull requests should be avoided because they are not part of our review process and will be ignored.