1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00
Commit Graph

43902 Commits

Author SHA1 Message Date
Martin Storsjö
ffbd1d2b00 arm: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

The filter coefficients are signed values, where the product of the
multiplication with one individual filter coefficient doesn't
overflow a 16 bit signed value (the largest filter coefficient is
127). But when the products are accumulated, the resulting sum can
overflow the 16 bit signed range. Instead of accumulating in 32 bit,
we accumulate the largest product (either index 3 or 4) last with a
saturated addition.

(The VP8 MC asm does something similar, but slightly simpler, by
accumulating each half of the filter separately. In the VP9 MC
filters, each half of the filter can also overflow though, so the
largest component has to be handled individually.)

Examples of relative speedup compared to the C version, from checkasm:
                       Cortex      A7     A8     A9    A53
vp9_avg4_neon:                   1.71   1.15   1.42   1.49
vp9_avg8_neon:                   2.51   3.63   3.14   2.58
vp9_avg16_neon:                  2.95   6.76   3.01   2.84
vp9_avg32_neon:                  3.29   6.64   2.85   3.00
vp9_avg64_neon:                  3.47   6.67   3.14   2.80
vp9_avg_8tap_smooth_4h_neon:     3.22   4.73   2.76   4.67
vp9_avg_8tap_smooth_4hv_neon:    3.67   4.76   3.28   4.71
vp9_avg_8tap_smooth_4v_neon:     5.52   7.60   4.60   6.31
vp9_avg_8tap_smooth_8h_neon:     6.22   9.04   5.12   9.32
vp9_avg_8tap_smooth_8hv_neon:    6.38   8.21   5.72   8.17
vp9_avg_8tap_smooth_8v_neon:     9.22  12.66   8.15  11.10
vp9_avg_8tap_smooth_64h_neon:    7.02  10.23   5.54  11.58
vp9_avg_8tap_smooth_64hv_neon:   6.76   9.46   5.93   9.40
vp9_avg_8tap_smooth_64v_neon:   10.76  14.13   9.46  13.37
vp9_put4_neon:                   1.11   1.47   1.00   1.21
vp9_put8_neon:                   1.23   2.17   1.94   1.48
vp9_put16_neon:                  1.63   4.02   1.73   1.97
vp9_put32_neon:                  1.56   4.92   2.00   1.96
vp9_put64_neon:                  2.10   5.28   2.03   2.35
vp9_put_8tap_smooth_4h_neon:     3.11   4.35   2.63   4.35
vp9_put_8tap_smooth_4hv_neon:    3.67   4.69   3.25   4.71
vp9_put_8tap_smooth_4v_neon:     5.45   7.27   4.49   6.52
vp9_put_8tap_smooth_8h_neon:     5.97   8.18   4.81   8.56
vp9_put_8tap_smooth_8hv_neon:    6.39   7.90   5.64   8.15
vp9_put_8tap_smooth_8v_neon:     9.03  11.84   8.07  11.51
vp9_put_8tap_smooth_64h_neon:    6.78   9.48   4.88  10.89
vp9_put_8tap_smooth_64hv_neon:   6.99   8.87   5.94   9.56
vp9_put_8tap_smooth_64v_neon:   10.69  13.30   9.43  14.34

For the larger 8tap filters, the speedup vs C code is around 5-14x.

This is significantly faster than libvpx's implementation of the same
functions, at least when comparing the put_8tap_smooth_64 functions
(compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from
libvpx).

Absolute runtimes from checkasm:
                          Cortex      A7        A8        A9       A53
vp9_put_8tap_smooth_64h_neon:    20150.3   14489.4   19733.6   10863.7
libvpx vpx_convolve8_horiz_neon: 52623.3   19736.4   21907.7   25027.7

vp9_put_8tap_smooth_64v_neon:    14455.0   12303.9   13746.4    9628.9
libvpx vpx_convolve8_vert_neon:  42090.0   17706.2   17659.9   16941.2

Thus, on the A9, the horizontal filter is only marginally faster than
libvpx, while our version is significantly faster on the other cores,
and the vertical filter is significantly faster on all cores. The
difference is especially large on the A7.

The libvpx implementation does the accumulation in 32 bit, which
probably explains most of the differences.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:35:38 +02:00
Martin Storsjö
2e55e26b40 vp9: Flip the order of arguments in MC functions
This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Mark Thompson
e3fb74f7f9 lavfi: Always propagate hw_frames_ctx through links
Also adds a new flag to mark filters which are aware of hwframes and
will perform this task themselves, and marks all appropriate filters
with this flag.

This is required to allow software-mapped hardware frames to work,
because we need to have the frames context available for any later
mapping operation in the filter graph.

The output from the filter graph should only propagate further to an
encoder if the hardware format actually matches the visible format
(mapped frames are valid here and have an hw_frames_ctx, but this
should not be given to the encoder as its hardware context).
2016-11-02 20:29:05 +00:00
Mark Thompson
7e2561fa83 lavfi: Use ff_get_video_buffer in all filters using hwframes 2016-11-02 20:07:15 +00:00
Mark Thompson
7433feb82f lavfi: Make default get_video_buffer work with hardware frames 2016-11-02 20:07:15 +00:00
Diego Biurrun
2025d37871 doc: Turn off noisy deprecation warnings in the option printer 2016-11-02 10:33:39 +01:00
Diego Biurrun
f4ca8ea92a rtmpproto: Restructure zlib code to avoid unreachable code warning
libavformat\rtmpproto.c(1165) : warning C4702: unreachable code
2016-11-02 10:33:39 +01:00
Diego Biurrun
baab87c4f3 bink: Have function pointer prototype match implementation
libavcodec/binkdsp.c(156) : warning C4028: formal parameter 1 different from declaration
2016-11-02 10:33:39 +01:00
Diego Biurrun
4cf2ffb7c4 idct: Have function pointer prototype match implementation
libavcodec/idctdsp.c(175) : warning C4028: formal parameter 2 different from declaration
2016-11-02 10:33:39 +01:00
Diego Biurrun
39cea6570c aactab: Move extern keyword to the front of array declarations
libavcodec/aactab.h:49:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration]
2016-11-02 10:33:36 +01:00
Diego Biurrun
85baef4ff1 vf_drawtext: Move static keyword to beginning of variable declaration
libavfilter/vf_drawtext.c:226:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
2016-11-02 10:29:00 +01:00
Anton Khirnov
636515c324 examples/decode_video: remove a stray unrelated comment 2016-11-02 10:20:41 +01:00
Anton Khirnov
8191f960a6 examples/decode_video: constify the AVCodec instance 2016-11-02 10:20:25 +01:00
Anton Khirnov
5b4d7ac7ae examples/encode_video: use the AVFrame API for allocating the frame
It is more efficient and so preferred over allocating the buffers
manually.
2016-11-02 10:20:01 +01:00
Anton Khirnov
d0a603a534 examples/encode_video: set the framerate 2016-11-02 10:19:37 +01:00
Anton Khirnov
e02524025b examples/encode_video: constify the AVCodec instance 2016-11-02 10:18:34 +01:00
Anton Khirnov
7b1f03477f examples/avcodec: split the remaining two examples into separate files 2016-11-02 10:16:04 +01:00
Anton Khirnov
90265814f9 examples/decode_audio: constify the AVCodec instance 2016-11-02 10:13:37 +01:00
Anton Khirnov
f5df897c4b examples/avcodec: split audio decoding into a separate example
The four examples (audio/video encoding/decoding) are completely
independent so it makes little sense to have them all in one file.
2016-11-02 10:13:27 +01:00
Anton Khirnov
f76698e759 examples/encode_audio: use the AVFrame API for allocating the data
It is simpler and more efficient.
2016-11-02 10:12:39 +01:00
Anton Khirnov
c00a11ab38 examples/encode_audio: constify AVCodec instances 2016-11-02 10:11:48 +01:00
Anton Khirnov
40aaa8dadf examples/avcodec: split audio encoding into a separate example
The four examples (audio/video encoding/decoding) are completely
independent so it makes little sense to have them all in one file.
2016-11-02 10:11:46 +01:00
James Almer
064f19f39e avconv: support parsing bitstream filter options
Example usage:

avconv -i INPUT -bsf filter[=opt1=val1:opt2=val2] OUTPUT

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-02 10:08:28 +01:00
Vittorio Giovara
ecd2ec69ce mov: Evaluate the movie display matrix
This matrix needs to be applied after all others have (currently only
display matrix from trak), but cannot be handled in movie box, since
streams are not allocated yet. So store it in main context, and apply
it when appropriate, that is after parsing the tkhd one.

Fate tests are updated accordingly.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-01 12:19:00 -04:00
Vittorio Giovara
b90c8a3d08 fate: Add tests for mov display matrix
Rotation, sample/display aspect ratio and pure matrix export.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-01 11:55:54 -04:00
Vittorio Giovara
7d308bf84b avprobe: Add -show_stream_entry to get a single stream property
This is needed for improved fate testing and it is modeled after
-show_format_entry. The main behavioral difference is that when a print
function is called with an empty key, rather than discarding it, the
closes key in the hierarchy is used instead.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-01 11:27:52 -04:00
Mark Thompson
218ed7250c openssl: Allow newer TLS versions than TLSv1
The use of TLSv1_*_method() disallows newer protocol versions; instead
use SSLv23_*_method() and then explicitly disable the deprecated
protocol versions which should not be supported.
2016-10-31 19:34:42 +00:00
Luca Barbato
dad7514f9e xcb: Add all the libraries to the link line explicitly
Avoid an underlink issue on recent distributions.

CC: libav-stable@libav.org
2016-10-30 21:55:03 +01:00
Luca Barbato
c541a44e02 Revert "rtmpproto: Don't include a client version in the unencrypted C1 handshake"
This reverts commit 7d8d726be7.
2016-10-30 21:55:03 +01:00
Luca Barbato
801ac7156d qsv: Be informative when reporting that no data has been consumed 2016-10-30 21:55:03 +01:00
Diego Biurrun
30015305f3 Use avpriv_request_sample() where appropriate 2016-10-29 18:32:21 +02:00
Diego Biurrun
07cac07c0c dash: Use correct ISO C scanf conversion specifier 2016-10-28 13:29:52 +02:00
Diego Biurrun
3ec6f855d0 srt: Adjust signedness of sscanf format strings
Fixes several warnings from -Wformat.
2016-10-28 13:28:36 +02:00
Diego Biurrun
7a2b2b6a92 dxtory: Drop nonsense ISO C printf conversion specifiers for standard types 2016-10-28 13:24:55 +02:00
Diego Biurrun
c454dfcff9 Use ISO C printf conversion specifiers where appropriate 2016-10-28 13:24:44 +02:00
Diego Biurrun
fbe425c8d2 hap: Adjust printf length modifiers to match variable types
libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t {aka unsigned int}’ [-Wformat=]
libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘size_t {aka unsigned int}’ [-Wformat=]
2016-10-28 11:22:22 +02:00
Diego Biurrun
1263b2039e Adjust printf conversion specifiers to match variable signedness 2016-10-28 11:22:21 +02:00
Diego Biurrun
ca1e5eea0c Remove some pointless TRACE level debug code
This also kills some warnings with certain compiler options.
2016-10-27 12:54:14 +02:00
Diego Biurrun
07eea5a5de nut: Drop pointless TRACE level debug code
The code has little usefulness and uses the __PRETTY_FUNCTION__ GNU extension.
2016-10-27 12:54:07 +02:00
Diego Biurrun
c3dad1bf3b nsv: Drop unnecessary TRACE level debug code
The output is rather silly and the code uses non-standard __FUNCTION__.
2016-10-27 12:21:46 +02:00
Diego Biurrun
47756f51fe dnxhdenc: Drop pointless, commented-out debug output 2016-10-27 12:21:46 +02:00
Diego Biurrun
0456e68439 audio_fifo: Drop write-only variable 2016-10-27 12:21:46 +02:00
Diego Biurrun
0574780d7a h264_loopfilter: Do not print value of uninitialized variable
libavcodec/h264_loopfilter.c:531:111: warning: variable 'edge' is uninitialized when used here [-Wuninitialized]
2016-10-27 12:21:46 +02:00
Diego Biurrun
2555269985 mpegaudio: Do not print value of uninitialized variable
libavcodec/mpegaudiodec_template.c:885:97: warning: variable 'x' is uninitialized when used here [-Wuninitialized]
2016-10-27 12:21:46 +02:00
Diego Biurrun
14cab426b0 build: Hardcode avversion.h dependency
Since avversion.h is a generated header it must be created before
dependencies can be determined as a side effect of compilation.
Otherwise Make stops and restarts the build process to generate
avversion.h and produces related error messages.
2016-10-27 11:54:06 +02:00
Martin Storsjö
f22363c729 openssl: Avoid double semicolons after the GET_BIO_DATA macro
When the macro is expanded with a semicolon following it and the
macro itself contains a semicolon, we ended up in double semicolons,
which is treated as a statement that disallows further declarations.

This avoids errors about mixed declarations and statements on gcc,
after ee05079766.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-25 21:48:35 +03:00
Luca Barbato
052b97855d aviocat: Support avio options
Useful to test protocols that require options to be used.
2016-10-25 15:43:56 +02:00
Yogender Gupta
99aeae20de scale_npp: fix passthrough mode
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-25 14:48:13 +02:00
Mark Thompson
0aec37e625 vaapi_decode: Remove vestigial unmap code
The buffer map/unmap code was in an early version of this before it
was committed, but the unmap was never removed.  While wrong, this
was harmless (and therefore unnoticed) because the buffers can't be
mapped at this point - all drivers just did nothing with the call.
2016-10-24 20:17:47 +01:00
Mark Thompson
5e879b54a3 vaapi_decode: Clear parameter buffers to fix picture reuse
When decoding interlaced pictures, the structure is reused to render
to the same surface twice.  The parameter buffers were not being
cleared, which caused the i965 driver to error out.
2016-10-24 20:17:47 +01:00