mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-08-10 06:10:52 +02:00

Go to file

Martin Storsjö 1f7801c2bc aarch64: vp9: Add NEON optimizations of VP9 MC functions

This work is sponsored by, and copyright, Google.

These are ported from the ARM version; it is essentially a 1:1
port with no extra added features, but with some hand tuning
(especially for the plain copy/avg functions). The ARM version
isn't very register starved to begin with, so there's not much
to be gained from having more spare registers here - we only
avoid having to clobber callee-saved registers.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                     ARM   AArch64
vp9_avg4_neon:                      27.2      23.7
vp9_avg8_neon:                      56.5      54.7
vp9_avg16_neon:                    169.9     167.4
vp9_avg32_neon:                    585.8     585.2
vp9_avg64_neon:                   2460.3    2294.7
vp9_avg_8tap_smooth_4h_neon:       132.7     125.2
vp9_avg_8tap_smooth_4hv_neon:      478.8     442.0
vp9_avg_8tap_smooth_4v_neon:       126.0      93.7
vp9_avg_8tap_smooth_8h_neon:       241.7     234.2
vp9_avg_8tap_smooth_8hv_neon:      690.9     646.5
vp9_avg_8tap_smooth_8v_neon:       245.0     205.5
vp9_avg_8tap_smooth_64h_neon:    11273.2   11280.1
vp9_avg_8tap_smooth_64hv_neon:   22980.6   22184.1
vp9_avg_8tap_smooth_64v_neon:    11549.7   10781.1
vp9_put4_neon:                      18.0      17.2
vp9_put8_neon:                      40.2      37.7
vp9_put16_neon:                     97.4      99.5
vp9_put32_neon/armv8:              346.0     307.4
vp9_put64_neon/armv8:             1319.0    1107.5
vp9_put_8tap_smooth_4h_neon:       126.7     118.2
vp9_put_8tap_smooth_4hv_neon:      465.7     434.0
vp9_put_8tap_smooth_4v_neon:       113.0      86.5
vp9_put_8tap_smooth_8h_neon:       229.7     221.6
vp9_put_8tap_smooth_8hv_neon:      658.9     621.3
vp9_put_8tap_smooth_8v_neon:       215.0     187.5
vp9_put_8tap_smooth_64h_neon:    10636.7   10627.8
vp9_put_8tap_smooth_64hv_neon:   21076.8   21026.9
vp9_put_8tap_smooth_64v_neon:     9635.0    9632.4

These are generally about as fast as the corresponding ARM
routines on the same CPU (at least on the A53), in most cases
marginally faster.

The speedup vs C code is pretty much the same as for the 32 bit
case; on the A53 it's around 6-13x for ther larger 8tap filters.
The exact speedup varies a little, since the C versions generally
don't end up exactly as slow/fast as on 32 bit.

This is an adapted cherry-pick from libav commit
383d96aa22.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>

2016-11-15 15:10:03 -05:00

compat

compat/w32dlfcn.h: Add safe win32 dlopen/dlclose/dlsym functions.

2016-11-05 18:08:32 +11:00

doc

doc/filters: add metadata information for blackframe

2016-11-14 11:59:52 -09:00

libavcodec

aarch64: vp9: Add NEON optimizations of VP9 MC functions

2016-11-15 15:10:03 -05:00

libavdevice

lavd/xcbgrab: do not try to create refcounted packets.

2016-11-03 21:23:55 +01:00

libavfilter

lavfi/ebur128: use ff_ prefix

2016-11-13 19:11:07 -06:00

libavformat

lavf/Makefile: Fix rule for the data muxer.

2016-11-14 13:33:22 +01:00

libavresample

Bump minor versions after 3.2 branchpoint to seperate release

2016-10-26 20:52:42 +02:00

libavutil

aarch64: Add an offset parameter to the movrel macro

2016-11-15 15:10:03 -05:00

libpostproc

Bump minor versions after 3.2 branchpoint to seperate release

2016-10-26 20:52:42 +02:00

libswresample

Bump minor versions after 3.2 branchpoint to seperate release

2016-10-26 20:52:42 +02:00

libswscale

lsws: Add GRAY10 conversion.

2016-11-14 10:35:06 +01:00

presets

…

tests

Merge commit 'f8d17d53957056c053a46f9320fa7ae6fe1479a5'

2016-11-14 15:29:08 +01:00

tools

tools: add loudnorm script example to use loudnorm

2016-11-11 19:22:52 +01:00

.gitattributes

…

.gitignore

Merge commit '6641819feedb086ebba3d2be89b8d33980f367e1'

2016-06-26 15:43:05 +02:00

.travis.yml

Merge commit 'eda183287489b2c705843aa373a19c4e46fb2fec'

2015-11-22 17:12:24 +00:00

arch.mak

mips: rename mipsdspr1 to mipsdsp

2015-12-04 02:35:42 +01:00

Changelog

avformat: Add Pro-MPEG CoP #3-R2 FEC protocol

2016-11-13 11:38:15 +01:00

cmdutils_common_opts.h

cmdutils: add show_demuxers and show_muxers

2016-11-08 01:56:31 +01:00

cmdutils_opencl.c

all: use FFDIFFSIGN to resolve possible undefined behavior in comparators

2015-11-03 16:28:30 -05:00

cmdutils.c

cmdutils: add show_demuxers and show_muxers

2016-11-08 01:56:31 +01:00

cmdutils.h

cmdutils: add show_demuxers and show_muxers

2016-11-08 01:56:31 +01:00

common.mak

Merge commit 'c5fd4b50610f62cbb3baa4f4108139363128dea1'

2016-06-27 19:39:46 +02:00

configure

Merge commit '8c929037ec75fbe9f367e0a31ee34839e92de481'

2016-11-14 10:09:44 +01:00

CONTRIBUTING.md

Add CONTRIBUTING.md

2016-09-18 10:02:13 +01:00

COPYING.GPLv2

…

COPYING.GPLv3

…

COPYING.LGPLv2.1

…

COPYING.LGPLv3

…

CREDITS

…

ffmpeg_cuvid.c

doc: fix spelling errors

2016-10-21 23:58:47 +02:00

ffmpeg_dxva2.c

Merge commit '18c506e9e6e8df8b1d496d093077b8240ea68c28'

2016-06-26 15:34:01 +02:00

ffmpeg_filter.c

Merge commit '50722b4f0cbc5940e9e6e21d113888436cc89ff5'

2016-11-13 15:33:39 +01:00

ffmpeg_opt.c

Merge commit '50722b4f0cbc5940e9e6e21d113888436cc89ff5'

2016-11-13 15:33:39 +01:00

ffmpeg_qsv.c

ffmpeg_qsv: Fix hwaccel transcoding

2016-11-13 17:49:48 +00:00

ffmpeg_vaapi.c

ffmpeg_vaapi: fix choice of decoder_format

2016-09-29 01:23:52 +02:00

ffmpeg_vdpau.c

Merge commit 'f72db3f2f3a8c83a4f5dede8fa03434b2bf676c6'

2016-06-26 15:29:39 +02:00

ffmpeg_videotoolbox.c

ffmpeg/videotoolbox: protect UTGetOSTypeFromString on both VDA and VT

2015-10-15 10:22:31 +02:00

ffmpeg.c

Merge commit 'b55566db4c51d920a6496455bb30a608e5a50a41'

2016-11-14 14:56:52 +01:00

ffmpeg.h

Merge commit '50722b4f0cbc5940e9e6e21d113888436cc89ff5'

2016-11-13 15:33:39 +01:00

ffplay.c

Merge commit 'beb62dac629603eb074a44c44389c230b5caac7c'

2016-10-07 13:16:36 +02:00

ffprobe.c

lavf: add AV_DISPOSITION_TIMED_THUMBNAILS

2016-10-24 05:47:05 -05:00

ffserver_config.c

ffserver: Throw ffm.h out its not used except for a constant that is part of the format

2016-11-07 19:27:40 +01:00

ffserver_config.h

ffserver: Throw ffm.h out its not used except for a constant that is part of the format

2016-11-07 19:27:40 +01:00

ffserver.c

ffserver: use AVStream.codecpar in open_input_stream()

2016-11-08 12:12:19 +01:00

INSTALL.md

…

library.mak

Merge commit 'c5fd4b50610f62cbb3baa4f4108139363128dea1'

2016-06-27 19:39:46 +02:00

LICENSE.md

lavc: remove libfaac wrapper

2016-10-01 19:58:04 +01:00

MAINTAINERS

MAINTAINERS: Add myself to flvenc

2016-11-09 17:49:19 +01:00

Makefile

Merge commit '6641819feedb086ebba3d2be89b8d33980f367e1'

2016-06-26 15:43:05 +02:00

README.md

Add CONTRIBUTING.md

2016-09-18 10:02:13 +01:00

RELEASE

RELEASE: Update for past 3.2 branch

2016-10-26 20:52:43 +02:00

version.sh

version.sh: Fix spurious rebuilds.

2016-03-10 09:53:10 +01:00

README.md

FFmpeg README

FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

libavcodec provides implementation of a wider range of codecs.
libavformat implements streaming protocols, container formats and basic I/O access.
libavutil includes hashers, decompressors and miscellaneous utility functions.
libavfilter provides a mean to alter decoded Audio and Video through chain of filters.
libavdevice provides an abstraction to access capture and playback devices.
libswresample implements audio mixing and resampling routines.
libswscale implements color conversion and scaling routines.

Tools

ffmpeg is a command line toolbox to manipulate, convert and stream multimedia content.
ffplay is a minimalistic multimedia player.
ffprobe is a simple analysis tool to inspect multimedia content.
ffserver is a multimedia streaming server for live broadcasts.
Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Coding examples are available in the doc/examples directory.

License

FFmpeg codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.

Contributing

Patches should be submitted to the ffmpeg-devel mailing list using git format-patch or git send-email. Github pull requests should be avoided because they are not part of our review process and will be ignored.

Languages

C 90.1%

Assembly 7.9%

Makefile 1.3%

C++ 0.2%

Objective-C 0.2%

Other 0.1%