1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Go to file
Martin Storsjö ffbd1d2b00 arm: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

The filter coefficients are signed values, where the product of the
multiplication with one individual filter coefficient doesn't
overflow a 16 bit signed value (the largest filter coefficient is
127). But when the products are accumulated, the resulting sum can
overflow the 16 bit signed range. Instead of accumulating in 32 bit,
we accumulate the largest product (either index 3 or 4) last with a
saturated addition.

(The VP8 MC asm does something similar, but slightly simpler, by
accumulating each half of the filter separately. In the VP9 MC
filters, each half of the filter can also overflow though, so the
largest component has to be handled individually.)

Examples of relative speedup compared to the C version, from checkasm:
                       Cortex      A7     A8     A9    A53
vp9_avg4_neon:                   1.71   1.15   1.42   1.49
vp9_avg8_neon:                   2.51   3.63   3.14   2.58
vp9_avg16_neon:                  2.95   6.76   3.01   2.84
vp9_avg32_neon:                  3.29   6.64   2.85   3.00
vp9_avg64_neon:                  3.47   6.67   3.14   2.80
vp9_avg_8tap_smooth_4h_neon:     3.22   4.73   2.76   4.67
vp9_avg_8tap_smooth_4hv_neon:    3.67   4.76   3.28   4.71
vp9_avg_8tap_smooth_4v_neon:     5.52   7.60   4.60   6.31
vp9_avg_8tap_smooth_8h_neon:     6.22   9.04   5.12   9.32
vp9_avg_8tap_smooth_8hv_neon:    6.38   8.21   5.72   8.17
vp9_avg_8tap_smooth_8v_neon:     9.22  12.66   8.15  11.10
vp9_avg_8tap_smooth_64h_neon:    7.02  10.23   5.54  11.58
vp9_avg_8tap_smooth_64hv_neon:   6.76   9.46   5.93   9.40
vp9_avg_8tap_smooth_64v_neon:   10.76  14.13   9.46  13.37
vp9_put4_neon:                   1.11   1.47   1.00   1.21
vp9_put8_neon:                   1.23   2.17   1.94   1.48
vp9_put16_neon:                  1.63   4.02   1.73   1.97
vp9_put32_neon:                  1.56   4.92   2.00   1.96
vp9_put64_neon:                  2.10   5.28   2.03   2.35
vp9_put_8tap_smooth_4h_neon:     3.11   4.35   2.63   4.35
vp9_put_8tap_smooth_4hv_neon:    3.67   4.69   3.25   4.71
vp9_put_8tap_smooth_4v_neon:     5.45   7.27   4.49   6.52
vp9_put_8tap_smooth_8h_neon:     5.97   8.18   4.81   8.56
vp9_put_8tap_smooth_8hv_neon:    6.39   7.90   5.64   8.15
vp9_put_8tap_smooth_8v_neon:     9.03  11.84   8.07  11.51
vp9_put_8tap_smooth_64h_neon:    6.78   9.48   4.88  10.89
vp9_put_8tap_smooth_64hv_neon:   6.99   8.87   5.94   9.56
vp9_put_8tap_smooth_64v_neon:   10.69  13.30   9.43  14.34

For the larger 8tap filters, the speedup vs C code is around 5-14x.

This is significantly faster than libvpx's implementation of the same
functions, at least when comparing the put_8tap_smooth_64 functions
(compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from
libvpx).

Absolute runtimes from checkasm:
                          Cortex      A7        A8        A9       A53
vp9_put_8tap_smooth_64h_neon:    20150.3   14489.4   19733.6   10863.7
libvpx vpx_convolve8_horiz_neon: 52623.3   19736.4   21907.7   25027.7

vp9_put_8tap_smooth_64v_neon:    14455.0   12303.9   13746.4    9628.9
libvpx vpx_convolve8_vert_neon:  42090.0   17706.2   17659.9   16941.2

Thus, on the A9, the horizontal filter is only marginally faster than
libvpx, while our version is significantly faster on the other cores,
and the vertical filter is significantly faster on all cores. The
difference is especially large on the A7.

The libvpx implementation does the accumulation in 32 bit, which
probably explains most of the differences.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:35:38 +02:00
compat Add a compat dummy stdatomic.h used when threading is disabled 2016-10-02 18:57:56 +02:00
doc doc: Turn off noisy deprecation warnings in the option printer 2016-11-02 10:33:39 +01:00
libavcodec arm: vp9: Add NEON optimizations of VP9 MC functions 2016-11-03 09:35:38 +02:00
libavdevice bktr: Use memset(0) instead of zero initialization for struct sigaction 2016-10-22 17:34:55 +02:00
libavfilter lavfi: Always propagate hw_frames_ctx through links 2016-11-02 20:29:05 +00:00
libavformat rtmpproto: Restructure zlib code to avoid unreachable code warning 2016-11-02 10:33:39 +01:00
libavresample build: Change structure of the linker version script templates 2016-05-29 16:43:11 +02:00
libavutil audio_fifo: Drop write-only variable 2016-10-27 12:21:46 +02:00
libswscale Adjust printf conversion specifiers to match variable signedness 2016-10-28 11:22:21 +02:00
presets
tests vp9: Flip the order of arguments in MC functions 2016-11-03 09:12:02 +02:00
tools aviocat: Support avio options 2016-10-25 15:43:56 +02:00
.gitattributes
.gitignore build: Ignore generated mapfile and remove it on distclean 2016-05-27 11:27:24 +02:00
.travis.yml travis: Enable OSX integration 2015-11-17 16:51:00 +01:00
arch.mak
avconv_dxva2.c avconv_dxva2: add a profile check for hevc 2016-07-20 16:33:09 +02:00
avconv_filter.c avconv: make sure the filtergraph is freed on init failure 2016-10-02 11:41:45 +02:00
avconv_opt.c avconv: support parsing bitstream filter options 2016-11-02 10:08:28 +01:00
avconv_qsv.c avconv_qsv: use the actual pixel format provided by lavc 2016-07-22 19:08:12 +02:00
avconv_vaapi.c avconv_vaapi: Convert to use hw_frames_ctx only 2016-08-30 22:16:01 +01:00
avconv_vda.c
avconv_vdpau.c avconv_vdpau: use the hwcontext device creation API 2016-05-26 15:40:34 +02:00
avconv.c lavfi: Always propagate hw_frames_ctx through links 2016-11-02 20:29:05 +00:00
avconv.h avconv: support parsing bitstream filter options 2016-11-02 10:08:28 +01:00
avplay.c Use AVFrame.pts instead of deprecated pkt_pts. 2016-06-21 19:54:42 +02:00
avprobe.c avprobe: Add -show_stream_entry to get a single stream property 2016-11-01 11:27:52 -04:00
Changelog Changelog: mark the release 12 branch 2016-08-31 08:08:32 +02:00
cmdutils_common_opts.h
cmdutils.c avconv: switch to the new BSF API 2016-03-20 08:15:01 +01:00
cmdutils.h avconv: use read_file() for reading the 2pass stats 2015-07-19 09:37:11 +02:00
common.mak build: Simplify postprocessing of linker version script files 2016-05-29 16:49:16 +02:00
configure examples/avcodec: split the remaining two examples into separate files 2016-11-02 10:16:04 +01:00
COPYING.GPLv2
COPYING.GPLv3
COPYING.LGPLv2.1
COPYING.LGPLv3
CREDITS
INSTALL
library.mak build: Drop duplicate asm recipe 2016-10-17 16:25:35 +02:00
LICENSE Remove the legacy X11 screen grabber 2016-07-29 19:03:10 +02:00
Makefile build: Hardcode avversion.h dependency 2016-10-27 11:54:06 +02:00
README
README.md doc: Add travis badge 2015-09-14 00:19:08 +02:00
RELEASE Make the RELEASE file match with the most recent tag 2016-10-14 13:52:51 -04:00
version.sh build: remove hardcoded name of version header 2016-09-15 21:59:15 +02:00

Libav

Build Status

Libav is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

  • libavcodec provides implementation of a wider range of codecs.
  • libavformat implements streaming protocols, container formats and basic I/O access.
  • libavutil includes hashers, decompressors and miscellaneous utility functions.
  • libavfilter provides a mean to alter decoded Audio and Video through chain of filters.
  • libavdevice provides an abstraction to access capture and playback devices.
  • libavresample implements audio mixing and resampling routines.
  • libswscale implements color conversion and scaling routines.

Tools

  • avconv is a command line toolbox to manipulate, convert and stream multimedia content.
  • avplay is a minimalistic multimedia player.
  • avprobe is a simple analisys tool to inspect multimedia content.
  • Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Conding examples are available in the doc/example directory.

License

Libav codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.