mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-08-10 06:10:52 +02:00

Go to file

Martin Storsjö 3c9546dfaf aarch64: vp9: Add NEON itxfm routines

This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the 16x16 and 32x32
transforms in slices 8 pixels wide instead of 4. This gives
a speedup of around 1.4x compared to the 32 bit version.

The fact that aarch64 doesn't have the same d/q register
aliasing makes some of the macros quite a bit simpler as well.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM  AArch64
vp9_inv_adst_adst_4x4_add_neon:       90.0     87.7
vp9_inv_adst_adst_8x8_add_neon:      400.0    354.7
vp9_inv_adst_adst_16x16_add_neon:   2526.5   1827.2
vp9_inv_dct_dct_4x4_add_neon:         74.0     72.7
vp9_inv_dct_dct_8x8_add_neon:        271.0    256.7
vp9_inv_dct_dct_16x16_add_neon:     1960.7   1372.7
vp9_inv_dct_dct_32x32_add_neon:    11988.9   8088.3
vp9_inv_wht_wht_4x4_add_neon:         63.0     57.7

The speedup vs C code (2-4x) is smaller than in the 32 bit case,
mostly because the C code ends up significantly faster (around
1.6x faster, with GCC 5.4) when built for aarch64.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                                A57 gcc-5.3   neon
vp9_inv_adst_adst_4x4_add_neon:       152.2   60.0
vp9_inv_adst_adst_8x8_add_neon:       948.2  288.0
vp9_inv_adst_adst_16x16_add_neon:    4830.4 1380.5
vp9_inv_dct_dct_4x4_add_neon:         153.0   58.6
vp9_inv_dct_dct_8x8_add_neon:         789.2  180.2
vp9_inv_dct_dct_16x16_add_neon:      3639.6  917.1
vp9_inv_dct_dct_32x32_add_neon:     20462.1 4985.0
vp9_inv_wht_wht_4x4_add_neon:          91.0   49.8

The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
is around 30-50% faster on the a57 compared to the a53.

Signed-off-by: Martin Storsjö <martin@martin.st>

2016-11-14 00:10:13 +02:00

compat

Add a compat dummy stdatomic.h used when threading is disabled

2016-10-02 18:57:56 +02:00

doc

examples/decode_audio: Add missing header for av_free()

2016-11-10 10:33:19 +01:00

libavcodec

aarch64: vp9: Add NEON itxfm routines

2016-11-14 00:10:13 +02:00

libavdevice

Use avpriv_report_missing_feature() where appropriate

2016-11-08 17:54:34 +01:00

libavfilter

vf_drawtext: Drop wrong void* cast

2016-11-12 16:47:07 +01:00

libavformat

Drop pointless void* casts

2016-11-13 18:44:01 +01:00

libavresample

build: Change structure of the linker version script templates

2016-05-29 16:43:11 +02:00

libavutil

arm: Clear the gp register alias at the end of functions

2016-11-10 14:01:04 +02:00

libswscale

swscale: Add GRAY12

2016-11-07 22:42:00 +01:00

presets

presets: spelling error in libvpx 1080p50_60

2011-10-22 00:28:56 +02:00

tests

checkasm: add vp9dsp.itxfm_add tests.

2016-11-11 11:09:05 +02:00

tools

aviocat: Support avio options

2016-10-25 15:43:56 +02:00

.gitattributes

Treat all '*.pnm' files as non-text file

2014-11-28 17:52:43 -05:00

.gitignore

build: Ignore generated mapfile and remove it on distclean

2016-05-27 11:27:24 +02:00

.travis.yml

travis: Enable OSX integration

2015-11-17 16:51:00 +01:00

arch.mak

ppc: vsx: Implement float_dsp

2015-05-31 12:07:11 +02:00

avconv_dxva2.c

avconv_dxva2: add a profile check for hevc

2016-07-20 16:33:09 +02:00

avconv_filter.c

avconv: make sure the filtergraph is freed on init failure

2016-10-02 11:41:45 +02:00

avconv_opt.c

avconv_opt: Consistently iterate through hwaccels array in all cases

2016-11-13 19:06:38 +01:00

avconv_qsv.c

avconv_qsv: use the actual pixel format provided by lavc

2016-07-22 19:08:12 +02:00

avconv_vaapi.c

avconv_vaapi: Convert to use hw_frames_ctx only

2016-08-30 22:16:01 +01:00

avconv_vda.c

avconv: vda: Unlock the pixel buffer once it is accessed

2015-07-09 00:10:13 +02:00

avconv_vdpau.c

avconv_vdpau: use the hwcontext device creation API

2016-05-26 15:40:34 +02:00

avconv.c

avconv: Drop stray leftover debug output

2016-11-09 20:51:55 +01:00

avconv.h

avconv: support parsing bitstream filter options

2016-11-02 10:08:28 +01:00

avplay.c

avplay: Correct function pointer assignments in options array

2016-11-08 17:20:30 +01:00

avprobe.c

avprobe: Add -show_stream_entry to get a single stream property

2016-11-01 11:27:52 -04:00

Changelog

Changelog: mark the release 12 branch

2016-08-31 08:08:32 +02:00

cmdutils_common_opts.h

avplay: Accept cpuflags option

2013-10-22 10:49:31 +02:00

cmdutils.c

avconv: switch to the new BSF API

2016-03-20 08:15:01 +01:00

cmdutils.h

avconv: use read_file() for reading the 2pass stats

2015-07-19 09:37:11 +02:00

common.mak

build: Simplify postprocessing of linker version script files

2016-05-29 16:49:16 +02:00

configure

libxvid: Require availability of mkstemp()

2016-11-11 10:17:07 +01:00

COPYING.GPLv2

…

COPYING.GPLv3

…

COPYING.LGPLv2.1

cosmetics: Delete empty lines at end of file.

2012-02-09 12:26:45 +01:00

COPYING.LGPLv3

…

CREDITS

partially rename FFmpeg to Libav

2011-03-16 21:54:39 +01:00

INSTALL

doc: clarify configure features

2011-04-07 02:54:12 +02:00

library.mak

build: Drop duplicate asm recipe

2016-10-17 16:25:35 +02:00

LICENSE

Remove the legacy X11 screen grabber

2016-07-29 19:03:10 +02:00

Makefile

build: Hardcode avversion.h dependency

2016-10-27 11:54:06 +02:00

README

doc: Add more information in the README

2014-08-16 00:49:22 +02:00

README.md

doc: Add travis badge

2015-09-14 00:19:08 +02:00

RELEASE

Make the RELEASE file match with the most recent tag

2016-10-14 13:52:51 -04:00

version.sh

build: remove hardcoded name of version header

2016-09-15 21:59:15 +02:00

README.md

Libav

Libav is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

libavcodec provides implementation of a wider range of codecs.
libavformat implements streaming protocols, container formats and basic I/O access.
libavutil includes hashers, decompressors and miscellaneous utility functions.
libavfilter provides a mean to alter decoded Audio and Video through chain of filters.
libavdevice provides an abstraction to access capture and playback devices.
libavresample implements audio mixing and resampling routines.
libswscale implements color conversion and scaling routines.

Tools

avconv is a command line toolbox to manipulate, convert and stream multimedia content.
avplay is a minimalistic multimedia player.
avprobe is a simple analisys tool to inspect multimedia content.
Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Conding examples are available in the doc/example directory.

License

Libav codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.

Languages

C 90.1%

Assembly 7.9%

Makefile 1.3%

C++ 0.2%

Objective-C 0.2%

Other 0.1%