1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-13 21:28:01 +02:00
Go to file
Rémi Denis-Courmont 0183c2c830 lavc/aacpsdsp: use LMUL=2 and amortise strides
The input is laid out in 16 segments, of which 13 actually need to be
loaded. There are no really efficient ways to deal with this:
1) If we load 8 segments wit unit stride, then narrow to 16 segments with
   right shifts, we can only get one half-size vector per segment, or just 2
   elements per vector (EMUL=1/2) - at least with 128-bit vectors.
   This ends up unsurprisingly about as fas as the C code.
2) The current approach is to load with strides. We keep that approach,
   but improve it using three 4-segmented loads instead of 12 single-segment
   loads. This divides the number of distinct loaded addresses by 4.
3) A potential third approach would be to avoid segmentation altogether
   and splat the scalar coefficient into vectors. Then we can use a
   unit-stride and maximum EMUL. But the downside then is that we have to
   multiply the 3 (of 16) unused segments with zero as part of the
   multiply-accumulate operations.

In addition, we also reuse vectors mid-loop so as to increase the EMUL
from 1 to 2, which also improves performance a little bit.

Oeverall the gains are quite small with the device under test, as it does
not deal with segmented loads very well. But at least the code is tidier,
and should enjoy bigger speed-ups on better hardware implementation.

Before:
ps_hybrid_analysis_c:       1819.2
ps_hybrid_analysis_rvv_f32: 1037.0 (before)
ps_hybrid_analysis_rvv_f32:  990.0 (after)
2023-11-23 18:57:18 +02:00
compat configure: Set WIN32_LEAN_AND_MEAN at configure time 2023-08-14 22:57:28 +03:00
doc doc/git-howto: use less weird username for git URL 2023-11-22 10:21:50 +01:00
ffbuild ffbuild: Add gzip -n flag to fix reproducible builds 2023-11-05 11:30:13 +01:00
fftools fftools/ffplay: add hwaccel decoding support 2023-11-15 01:20:11 +08:00
libavcodec lavc/aacpsdsp: use LMUL=2 and amortise strides 2023-11-23 18:57:18 +02:00
libavdevice avdevice/decklink_dec: add explicit specifier 2023-11-21 08:02:29 +08:00
libavfilter avfilter/asrc_afirsrc: fix by one smaller allocation of buffer 2023-11-23 15:01:55 +01:00
libavformat avformat/rtmpproto: Pass rw_timeout to underlying transport protocol 2023-11-22 21:02:04 +08:00
libavutil lavu/fixed_dsp: R-V V fmul_window_scaled 2023-11-23 18:57:18 +02:00
libpostproc Bump versions after 6.1 2023-10-29 16:19:14 +01:00
libswresample Bump versions after 6.1 2023-10-29 16:19:14 +01:00
libswscale sws/rgb2rgb: fix unaligned accesses in R-V V YUYV to I422p 2023-11-13 18:34:29 +02:00
presets
tests test/checkasm: test llauddsp 2023-11-22 14:22:19 -03:00
tools tools/general_assembly: update to conform to new rules 2023-11-19 12:58:47 +01:00
.gitattributes
.gitignore
.mailmap mailmap: remap my email accounts 2023-11-03 20:57:49 +08:00
.travis.yml
Changelog avcodec: bump version after EVC additions 2023-11-20 11:55:51 -03:00
configure avcodec/evc_decoder: Provided support for EVC decoder 2023-11-20 11:55:51 -03:00
CONTRIBUTING.md
COPYING.GPLv2
COPYING.GPLv3
COPYING.LGPLv2.1
COPYING.LGPLv3
CREDITS Use https for repository links 2023-03-01 21:59:10 +01:00
INSTALL.md
LICENSE.md
MAINTAINERS MAINTAINERS: add myself as a mediacodec and videotoolbox maintainer 2023-11-08 16:15:04 +08:00
Makefile tools: Don't include the direct library names when linking 2023-10-02 22:49:07 +03:00
README.md
RELEASE RELEASE: update after 5.1 branch 2022-07-13 00:31:42 +02:00

FFmpeg README

FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

  • libavcodec provides implementation of a wider range of codecs.
  • libavformat implements streaming protocols, container formats and basic I/O access.
  • libavutil includes hashers, decompressors and miscellaneous utility functions.
  • libavfilter provides means to alter decoded audio and video through a directed graph of connected filters.
  • libavdevice provides an abstraction to access capture and playback devices.
  • libswresample implements audio mixing and resampling routines.
  • libswscale implements color conversion and scaling routines.

Tools

  • ffmpeg is a command line toolbox to manipulate, convert and stream multimedia content.
  • ffplay is a minimalistic multimedia player.
  • ffprobe is a simple analysis tool to inspect multimedia content.
  • Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Coding examples are available in the doc/examples directory.

License

FFmpeg codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.

Contributing

Patches should be submitted to the ffmpeg-devel mailing list using git format-patch or git send-email. Github pull requests should be avoided because they are not part of our review process and will be ignored.