1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-28 20:53:54 +02:00
Go to file
Martin Storsjö 388f6e6715 arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

This is cherrypicked from libav commit
9c8bc74c2b.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-14 21:13:30 +01:00
compat compat/atomics: rename header guards 2016-12-02 20:08:54 -03:00
doc lavfi/buffersink: add accessors for the stream properties. 2017-01-12 14:06:16 +01:00
libavcodec arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 2017-01-14 21:13:30 +01:00
libavdevice lavd/lavfi: use buffersink accessors. 2017-01-12 14:06:16 +01:00
libavfilter libavfilter/af_biquads: warn about clipping only after frame with clipping 2017-01-12 19:52:29 +01:00
libavformat Cosmetics: Reindent after last commit. 2017-01-14 06:07:06 +01:00
libavresample Bump minor versions after 3.2 branchpoint to seperate release 2016-10-26 20:52:42 +02:00
libavutil avutil/tests/audio_fifo.c: pass by reference for efficiency and change datatype to const 2017-01-13 00:17:10 +01:00
libpostproc Bump minor versions after 3.2 branchpoint to seperate release 2016-10-26 20:52:42 +02:00
libswresample swresample/arm: cosmetic fixes 2017-01-13 21:24:25 +01:00
libswscale swscale/swscale: Fix dereference of stride array before null check 2016-12-23 21:47:47 +01:00
presets
tests arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 2017-01-14 21:13:30 +01:00
tools tools/zmqsend: Do not truncate fgetc() return 2016-12-24 14:46:25 +01:00
.gitattributes
.gitignore Merge commit '6641819feedb086ebba3d2be89b8d33980f367e1' 2016-06-26 15:43:05 +02:00
.travis.yml Merge commit 'eda183287489b2c705843aa373a19c4e46fb2fec' 2015-11-22 17:12:24 +00:00
arch.mak mips: rename mipsdspr1 to mipsdsp 2015-12-04 02:35:42 +01:00
Changelog avcodec: add Newtek SpeedHQ decoder 2017-01-11 16:02:10 +01:00
cmdutils_common_opts.h cmdutils: add show_demuxers and show_muxers 2016-11-08 01:56:31 +01:00
cmdutils_opencl.c cmdutils_opencl: fix resource_leak cid 1396852 2017-01-13 07:54:49 +08:00
cmdutils.c cmdutils: remove duplicate windows.h include 2016-11-16 15:06:16 +01:00
cmdutils.h cmdutils: add show_demuxers and show_muxers 2016-11-08 01:56:31 +01:00
common.mak Merge commit 'c5fd4b50610f62cbb3baa4f4108139363128dea1' 2016-06-27 19:39:46 +02:00
configure huffyuvencdsp: move shared functions to a new lossless_videoencdsp context 2017-01-12 22:53:04 -03:00
CONTRIBUTING.md Add CONTRIBUTING.md 2016-09-18 10:02:13 +01:00
COPYING.GPLv2
COPYING.GPLv3
COPYING.LGPLv2.1
COPYING.LGPLv3
CREDITS
ffmpeg_cuvid.c doc: fix spelling errors 2016-10-21 23:58:47 +02:00
ffmpeg_dxva2.c Merge commit '18c506e9e6e8df8b1d496d093077b8240ea68c28' 2016-06-26 15:34:01 +02:00
ffmpeg_filter.c ffmpeg: use buffersink accessors. 2017-01-12 14:06:16 +01:00
ffmpeg_opt.c ffmpeg: Add -time_base option to hint the time base 2017-01-14 20:03:56 +01:00
ffmpeg_qsv.c ffmpeg: Add an option "qsv_device" to choose proper node for QSV child device (vaapi or dxva2) 2017-01-11 20:21:09 +00:00
ffmpeg_vaapi.c ffmpeg_vaapi: fix choice of decoder_format 2016-09-29 01:23:52 +02:00
ffmpeg_vdpau.c Merge commit 'f72db3f2f3a8c83a4f5dede8fa03434b2bf676c6' 2016-06-26 15:29:39 +02:00
ffmpeg_videotoolbox.c ffmpeg/videotoolbox: protect UTGetOSTypeFromString on both VDA and VT 2015-10-15 10:22:31 +02:00
ffmpeg.c ffmpeg: Add -time_base option to hint the time base 2017-01-14 20:03:56 +01:00
ffmpeg.h ffmpeg: Add -time_base option to hint the time base 2017-01-14 20:03:56 +01:00
ffplay.c ffplay: use buffersink accessors. 2017-01-12 14:06:16 +01:00
ffprobe.c lavc: Add spherical packet side data API 2016-12-07 14:40:06 -05:00
ffserver_config.c ffserver_config: Check for failure to allocate FFServerIPAddressACL 2016-12-22 19:23:08 +01:00
ffserver_config.h ffsrever: Make the status page bitexact if any stream is bitexact 2016-11-29 19:26:26 +01:00
ffserver.c ffserver: local OOB write with custom program name 2017-01-08 03:50:56 +01:00
INSTALL.md
library.mak Merge commit 'c5fd4b50610f62cbb3baa4f4108139363128dea1' 2016-06-27 19:39:46 +02:00
LICENSE.md lavfi/f_ebur128: relicense to LGPL 2016-11-27 20:46:20 +01:00
MAINTAINERS MAINTAINERS: update 2016-12-27 18:24:31 +01:00
Makefile Merge commit '6641819feedb086ebba3d2be89b8d33980f367e1' 2016-06-26 15:43:05 +02:00
README.md Add CONTRIBUTING.md 2016-09-18 10:02:13 +01:00
RELEASE RELEASE: Update for past 3.2 branch 2016-10-26 20:52:43 +02:00
version.sh version.sh: Fix spurious rebuilds. 2016-03-10 09:53:10 +01:00

FFmpeg README

FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

  • libavcodec provides implementation of a wider range of codecs.
  • libavformat implements streaming protocols, container formats and basic I/O access.
  • libavutil includes hashers, decompressors and miscellaneous utility functions.
  • libavfilter provides a mean to alter decoded Audio and Video through chain of filters.
  • libavdevice provides an abstraction to access capture and playback devices.
  • libswresample implements audio mixing and resampling routines.
  • libswscale implements color conversion and scaling routines.

Tools

  • ffmpeg is a command line toolbox to manipulate, convert and stream multimedia content.
  • ffplay is a minimalistic multimedia player.
  • ffprobe is a simple analysis tool to inspect multimedia content.
  • ffserver is a multimedia streaming server for live broadcasts.
  • Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Coding examples are available in the doc/examples directory.

License

FFmpeg codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.

Contributing

Patches should be submitted to the ffmpeg-devel mailing list using git format-patch or git send-email. Github pull requests should be avoided because they are not part of our review process and will be ignored.