mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-28 20:53:54 +02:00

Mirror of https://git.ffmpeg.org/ffmpeg.git

audio c ffmpeg fft hevc hls matroska mp4 mpeg multimedia rtmp rtsp streaming video webm

Go to file

Martin Storsjö ffbd1d2b00 arm: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. The filter coefficients are signed values, where the product of the multiplication with one individual filter coefficient doesn't overflow a 16 bit signed value (the largest filter coefficient is 127). But when the products are accumulated, the resulting sum can overflow the 16 bit signed range. Instead of accumulating in 32 bit, we accumulate the largest product (either index 3 or 4) last with a saturated addition. (The VP8 MC asm does something similar, but slightly simpler, by accumulating each half of the filter separately. In the VP9 MC filters, each half of the filter can also overflow though, so the largest component has to be handled individually.) Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_neon: 1.71 1.15 1.42 1.49 vp9_avg8_neon: 2.51 3.63 3.14 2.58 vp9_avg16_neon: 2.95 6.76 3.01 2.84 vp9_avg32_neon: 3.29 6.64 2.85 3.00 vp9_avg64_neon: 3.47 6.67 3.14 2.80 vp9_avg_8tap_smooth_4h_neon: 3.22 4.73 2.76 4.67 vp9_avg_8tap_smooth_4hv_neon: 3.67 4.76 3.28 4.71 vp9_avg_8tap_smooth_4v_neon: 5.52 7.60 4.60 6.31 vp9_avg_8tap_smooth_8h_neon: 6.22 9.04 5.12 9.32 vp9_avg_8tap_smooth_8hv_neon: 6.38 8.21 5.72 8.17 vp9_avg_8tap_smooth_8v_neon: 9.22 12.66 8.15 11.10 vp9_avg_8tap_smooth_64h_neon: 7.02 10.23 5.54 11.58 vp9_avg_8tap_smooth_64hv_neon: 6.76 9.46 5.93 9.40 vp9_avg_8tap_smooth_64v_neon: 10.76 14.13 9.46 13.37 vp9_put4_neon: 1.11 1.47 1.00 1.21 vp9_put8_neon: 1.23 2.17 1.94 1.48 vp9_put16_neon: 1.63 4.02 1.73 1.97 vp9_put32_neon: 1.56 4.92 2.00 1.96 vp9_put64_neon: 2.10 5.28 2.03 2.35 vp9_put_8tap_smooth_4h_neon: 3.11 4.35 2.63 4.35 vp9_put_8tap_smooth_4hv_neon: 3.67 4.69 3.25 4.71 vp9_put_8tap_smooth_4v_neon: 5.45 7.27 4.49 6.52 vp9_put_8tap_smooth_8h_neon: 5.97 8.18 4.81 8.56 vp9_put_8tap_smooth_8hv_neon: 6.39 7.90 5.64 8.15 vp9_put_8tap_smooth_8v_neon: 9.03 11.84 8.07 11.51 vp9_put_8tap_smooth_64h_neon: 6.78 9.48 4.88 10.89 vp9_put_8tap_smooth_64hv_neon: 6.99 8.87 5.94 9.56 vp9_put_8tap_smooth_64v_neon: 10.69 13.30 9.43 14.34 For the larger 8tap filters, the speedup vs C code is around 5-14x. This is significantly faster than libvpx's implementation of the same functions, at least when comparing the put_8tap_smooth_64 functions (compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from libvpx). Absolute runtimes from checkasm: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_64h_neon: 20150.3 14489.4 19733.6 10863.7 libvpx vpx_convolve8_horiz_neon: 52623.3 19736.4 21907.7 25027.7 vp9_put_8tap_smooth_64v_neon: 14455.0 12303.9 13746.4 9628.9 libvpx vpx_convolve8_vert_neon: 42090.0 17706.2 17659.9 16941.2 Thus, on the A9, the horizontal filter is only marginally faster than libvpx, while our version is significantly faster on the other cores, and the vertical filter is significantly faster on all cores. The difference is especially large on the A7. The libvpx implementation does the accumulation in 32 bit, which probably explains most of the differences. Signed-off-by: Martin Storsjö <martin@martin.st>		2016-11-03 09:35:38 +02:00
compat	Add a compat dummy stdatomic.h used when threading is disabled	2016-10-02 18:57:56 +02:00
doc	doc: Turn off noisy deprecation warnings in the option printer	2016-11-02 10:33:39 +01:00
libavcodec	arm: vp9: Add NEON optimizations of VP9 MC functions	2016-11-03 09:35:38 +02:00
libavdevice	bktr: Use memset(0) instead of zero initialization for struct sigaction	2016-10-22 17:34:55 +02:00
libavfilter	lavfi: Always propagate hw_frames_ctx through links	2016-11-02 20:29:05 +00:00
libavformat	rtmpproto: Restructure zlib code to avoid unreachable code warning	2016-11-02 10:33:39 +01:00
libavresample	build: Change structure of the linker version script templates	2016-05-29 16:43:11 +02:00
libavutil	audio_fifo: Drop write-only variable	2016-10-27 12:21:46 +02:00
libswscale	Adjust printf conversion specifiers to match variable signedness	2016-10-28 11:22:21 +02:00
presets
tests	vp9: Flip the order of arguments in MC functions	2016-11-03 09:12:02 +02:00
tools	aviocat: Support avio options	2016-10-25 15:43:56 +02:00
.gitattributes
.gitignore	build: Ignore generated mapfile and remove it on distclean	2016-05-27 11:27:24 +02:00
.travis.yml	travis: Enable OSX integration	2015-11-17 16:51:00 +01:00
arch.mak	ppc: vsx: Implement float_dsp	2015-05-31 12:07:11 +02:00
avconv_dxva2.c	avconv_dxva2: add a profile check for hevc	2016-07-20 16:33:09 +02:00
avconv_filter.c	avconv: make sure the filtergraph is freed on init failure	2016-10-02 11:41:45 +02:00
avconv_opt.c	avconv: support parsing bitstream filter options	2016-11-02 10:08:28 +01:00
avconv_qsv.c	avconv_qsv: use the actual pixel format provided by lavc	2016-07-22 19:08:12 +02:00
avconv_vaapi.c	avconv_vaapi: Convert to use hw_frames_ctx only	2016-08-30 22:16:01 +01:00
avconv_vda.c	avconv: vda: Unlock the pixel buffer once it is accessed	2015-07-09 00:10:13 +02:00
avconv_vdpau.c	avconv_vdpau: use the hwcontext device creation API	2016-05-26 15:40:34 +02:00
avconv.c	lavfi: Always propagate hw_frames_ctx through links	2016-11-02 20:29:05 +00:00
avconv.h	avconv: support parsing bitstream filter options	2016-11-02 10:08:28 +01:00
avplay.c	Use AVFrame.pts instead of deprecated pkt_pts.	2016-06-21 19:54:42 +02:00
avprobe.c	avprobe: Add -show_stream_entry to get a single stream property	2016-11-01 11:27:52 -04:00
Changelog	Changelog: mark the release 12 branch	2016-08-31 08:08:32 +02:00
cmdutils_common_opts.h
cmdutils.c	avconv: switch to the new BSF API	2016-03-20 08:15:01 +01:00
cmdutils.h	avconv: use read_file() for reading the 2pass stats	2015-07-19 09:37:11 +02:00
common.mak	build: Simplify postprocessing of linker version script files	2016-05-29 16:49:16 +02:00
configure	examples/avcodec: split the remaining two examples into separate files	2016-11-02 10:16:04 +01:00
COPYING.GPLv2
COPYING.GPLv3
COPYING.LGPLv2.1
COPYING.LGPLv3
CREDITS
INSTALL
library.mak	build: Drop duplicate asm recipe	2016-10-17 16:25:35 +02:00
LICENSE	Remove the legacy X11 screen grabber	2016-07-29 19:03:10 +02:00
Makefile	build: Hardcode avversion.h dependency	2016-10-27 11:54:06 +02:00
README
README.md	doc: Add travis badge	2015-09-14 00:19:08 +02:00
RELEASE	Make the RELEASE file match with the most recent tag	2016-10-14 13:52:51 -04:00
version.sh	build: remove hardcoded name of version header	2016-09-15 21:59:15 +02:00

README.md

Libav

Libav is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

libavcodec provides implementation of a wider range of codecs.
libavformat implements streaming protocols, container formats and basic I/O access.
libavutil includes hashers, decompressors and miscellaneous utility functions.
libavfilter provides a mean to alter decoded Audio and Video through chain of filters.
libavdevice provides an abstraction to access capture and playback devices.
libavresample implements audio mixing and resampling routines.
libswscale implements color conversion and scaling routines.

Tools

avconv is a command line toolbox to manipulate, convert and stream multimedia content.
avplay is a minimalistic multimedia player.
avprobe is a simple analisys tool to inspect multimedia content.
Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Conding examples are available in the doc/example directory.

License

Libav codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.