1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-08 13:22:53 +02:00
Commit Graph

21213 Commits

Author SHA1 Message Date
Mark Thompson
fd0fae6037 pthread_frame: Unreference hw_frames_ctx on per-thread codec contexts
When decoding with threads enabled, the get_format callback will be
called with one of the per-thread codec contexts rather than with the
outer context.  If a hwaccel is in use too, this will add a reference
to the hardware frames context on that codec context, which will then
propagate to all of the other per-thread contexts for decoding.  Once
the decoder finishes, however, the per-thread contexts are not freed
normally, so these references leak.
2016-11-10 20:36:11 +00:00
Martin Storsjö
11623217e3 arm: vp9mc: Use a different helper register for PIC loads
This fixes crashes since 557c1675cf in linux PIC builds.

Previously, movrelx silently used r12 as helper register, which
doesn't work when r12 is the destination register.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 14:01:04 +02:00
Martin Storsjö
6a62795d40 aarch64: h264idct: Use the offset parameter to movrel
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:18:22 +02:00
Martin Storsjö
557c1675cf arm: vp9mc: Minor adjustments from review of the aarch64 version
This work is sponsored by, and copyright, Google.

The speedup for the large horizontal filters is surprisingly
big on A7 and A53, while there's a minor slowdown (almost within
measurement noise) on A8 and A9.

                            Cortex    A7        A8        A9       A53
orig:
vp9_put_8tap_smooth_64h_neon:    20270.0   14447.3   19723.9   10910.9
new:
vp9_put_8tap_smooth_64h_neon:    20165.8   14466.5   19730.2   10668.8

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:18:22 +02:00
Martin Storsjö
383d96aa22 aarch64: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; it is essentially a 1:1
port with no extra added features, but with some hand tuning
(especially for the plain copy/avg functions). The ARM version
isn't very register starved to begin with, so there's not much
to be gained from having more spare registers here - we only
avoid having to clobber callee-saved registers.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                     ARM   AArch64
vp9_avg4_neon:                      27.2      23.7
vp9_avg8_neon:                      56.5      54.7
vp9_avg16_neon:                    169.9     167.4
vp9_avg32_neon:                    585.8     585.2
vp9_avg64_neon:                   2460.3    2294.7
vp9_avg_8tap_smooth_4h_neon:       132.7     125.2
vp9_avg_8tap_smooth_4hv_neon:      478.8     442.0
vp9_avg_8tap_smooth_4v_neon:       126.0      93.7
vp9_avg_8tap_smooth_8h_neon:       241.7     234.2
vp9_avg_8tap_smooth_8hv_neon:      690.9     646.5
vp9_avg_8tap_smooth_8v_neon:       245.0     205.5
vp9_avg_8tap_smooth_64h_neon:    11273.2   11280.1
vp9_avg_8tap_smooth_64hv_neon:   22980.6   22184.1
vp9_avg_8tap_smooth_64v_neon:    11549.7   10781.1
vp9_put4_neon:                      18.0      17.2
vp9_put8_neon:                      40.2      37.7
vp9_put16_neon:                     97.4      99.5
vp9_put32_neon/armv8:              346.0     307.4
vp9_put64_neon/armv8:             1319.0    1107.5
vp9_put_8tap_smooth_4h_neon:       126.7     118.2
vp9_put_8tap_smooth_4hv_neon:      465.7     434.0
vp9_put_8tap_smooth_4v_neon:       113.0      86.5
vp9_put_8tap_smooth_8h_neon:       229.7     221.6
vp9_put_8tap_smooth_8hv_neon:      658.9     621.3
vp9_put_8tap_smooth_8v_neon:       215.0     187.5
vp9_put_8tap_smooth_64h_neon:    10636.7   10627.8
vp9_put_8tap_smooth_64hv_neon:   21076.8   21026.9
vp9_put_8tap_smooth_64v_neon:     9635.0    9632.4

These are generally about as fast as the corresponding ARM
routines on the same CPU (at least on the A53), in most cases
marginally faster.

The speedup vs C code is pretty much the same as for the 32 bit
case; on the A53 it's around 6-13x for ther larger 8tap filters.
The exact speedup varies a little, since the C versions generally
don't end up exactly as slow/fast as on 32 bit.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:15:56 +02:00
Martin Storsjö
a4cfcddcb0 vp9: Make the subpel filters non-static
Make them aligned, to allow efficient access to them from simd.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:05:57 +02:00
Anton Khirnov
84f225684c pthread_frame: properly propagate the hw frame context across frame threads 2016-11-10 09:00:11 +01:00
Diego Biurrun
72a19f4013 mpegaudiodsp: aarch64: Adjust function prototype after 2caa93b813 2016-11-10 00:13:48 +01:00
Diego Biurrun
67deba8a41 Use avpriv_report_missing_feature() where appropriate 2016-11-08 17:54:34 +01:00
Vittorio Giovara
47a795727f hevc: Support extradata changes from multiple stsd
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-08 11:22:29 -05:00
Vittorio Giovara
2fe30b4743 hevc: Allow parsing external extradata buffers 2016-11-08 11:22:29 -05:00
Vittorio Giovara
5be2153111 hevc: Move hevc_decode_extradata before frame decoding
Avoids a forward-declaration in the following commit.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-08 11:22:29 -05:00
Vittorio Giovara
bed2c4b265 lavc: Add hevc main10 profile to avconv cli 2016-11-08 11:22:29 -05:00
Vittorio Giovara
17dac56b8f lavu: Rename ycgco color space appropriately
Planes are ordered as the name suggests now.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-08 11:22:29 -05:00
Diego Biurrun
0361e4dcb4 h264_qpel: x86: Move function with only one instance out of template macro
libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]
2016-11-08 17:21:02 +01:00
Andreas Cadhalpun
43de8b328b lzf: update pointer p after realloc
This fixes heap-use-after-free detected by AddressSanitizer.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-11-07 22:42:00 +01:00
Anton Khirnov
4ab61cd983 qsv{enc,dec}: extend the internal frame allocator
Handle the internal frame requests, which is required by the HEVC
encoding plugin.

Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
2016-11-07 12:48:00 +01:00
Anton Khirnov
00aeedd841 qsv{dec,enc}: use a struct as a memory id with internal memory allocator
This will allow implementing the allocator more fully, which is needed
by the HEVC encoder plugin with video memory input.

Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
2016-11-07 12:47:54 +01:00
Anton Khirnov
404e51478e qsv{dec,enc}: always use an internal mfxFrameSurface1
For encoding, this avoids modifying the input surface, which we are not
allowed to do.
This will also be useful in the following commits.

Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
2016-11-07 12:47:46 +01:00
Hendrik Leppkes
fabfbfe571 dxva2: fix surface selection when compiled with both d3d11va and dxva2
Fixes a regression introduced in
be630b1e08

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-07 10:05:12 +01:00
Derek Buitenhuis
db0b3dccb3 libx265: Add option to force IDR frames
This is in the same the same vein as 380146924e.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-07 10:16:10 +02:00
Diego Biurrun
3cba09e522 x86: Drop stray semicolons after function definitions
libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
2016-11-05 12:41:45 +01:00
Martin Storsjö
392caa65df arm: vp9mc: Insert a literal pool at the middle of the file
This fixes errors like this when building non-pic binaries with armv6
as baseline:

Error: invalid literal constant: pool needs to be closer

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-04 21:37:53 +02:00
Diego Biurrun
67351924fa Drop unreachable break and return statements 2016-11-03 20:17:12 +01:00
Diego Biurrun
6354957a95 dnxhdenc: Have function pointer prototype match implementation
libavcodec/dnxhdenc.c(326) : warning C4028: formal parameter 1 different from declaration
libavcodec/dnxhdenc.c(329) : warning C4028: formal parameter 1 different from declaration
2016-11-03 17:43:55 +01:00
Diego Biurrun
c778eb15b8 pixblockdsp: Have function pointer prototype match implementation
libavcodec/pixblockdsp.c(58) : warning C4028: formal parameter 1 different from declaration
libavcodec/pixblockdsp.c(63) : warning C4028: formal parameter 1 different from declaration
libavcodec/pixblockdsp.c(66) : warning C4028: formal parameter 1 different from declaration
2016-11-03 17:43:55 +01:00
Diego Biurrun
99ddeddc7f ituh263dec: Have function signature match across declaration and definition
libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 1 different from declaration
libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 2 different from declaration
2016-11-03 17:43:55 +01:00
Diego Biurrun
13fcdfb976 svq3: Drop unused function dctcoef_get()
libavcodec/svq3.c:627:29: warning: unused function 'dctcoef_get' [-Wunused-function]
2016-11-03 15:52:12 +01:00
Diego Biurrun
ee59f05408 intrax8: Have function signature match across declaration and definition
libavcodec/intrax8.c(776) : warning C4028: formal parameter 1 different from declaration
2016-11-03 15:50:48 +01:00
Martin Storsjö
1a469a5e42 options_table: Remove a now unnecessary include of config.h
The include of config.h was added in 2012 in 1d9c2dc8, due to
the use of CONFIG_SNOW_ENCODER ifdefs within options_table.h.
When the snow codec was dropped later (in a0c5917f8 in 2013),
this include no longer served any purpose.

options_table.h is included in builds for the host as well, when
building documentation. config.h should not be included in code
that is built for the host, since it can contain workarounds
for the target compiler/environment, like adding a missing define
of restrict, defining getenv(x) to NULL for environments that lack
getenv.

The seemingly innocent include reordering in 2025d37871 broke
builds that have getenv(x) defined to NULL in config.h (Windows CE
and Windows Phone/RT), since libavcodec/options_table.h include
config.h, while libavformat/options_table.h end up bringing in
more system headers, and those system headers can contain a proper
definition of getenv, which clash with the getenv define in config.h.
This was avoided earlier as long as libavformat/options_table.h (or
avformat.h) was included before libavcodec/options_table.h.

This fixes builds for Windows Phone/RT and CE.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 11:25:50 +02:00
Martin Storsjö
ffbd1d2b00 arm: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

The filter coefficients are signed values, where the product of the
multiplication with one individual filter coefficient doesn't
overflow a 16 bit signed value (the largest filter coefficient is
127). But when the products are accumulated, the resulting sum can
overflow the 16 bit signed range. Instead of accumulating in 32 bit,
we accumulate the largest product (either index 3 or 4) last with a
saturated addition.

(The VP8 MC asm does something similar, but slightly simpler, by
accumulating each half of the filter separately. In the VP9 MC
filters, each half of the filter can also overflow though, so the
largest component has to be handled individually.)

Examples of relative speedup compared to the C version, from checkasm:
                       Cortex      A7     A8     A9    A53
vp9_avg4_neon:                   1.71   1.15   1.42   1.49
vp9_avg8_neon:                   2.51   3.63   3.14   2.58
vp9_avg16_neon:                  2.95   6.76   3.01   2.84
vp9_avg32_neon:                  3.29   6.64   2.85   3.00
vp9_avg64_neon:                  3.47   6.67   3.14   2.80
vp9_avg_8tap_smooth_4h_neon:     3.22   4.73   2.76   4.67
vp9_avg_8tap_smooth_4hv_neon:    3.67   4.76   3.28   4.71
vp9_avg_8tap_smooth_4v_neon:     5.52   7.60   4.60   6.31
vp9_avg_8tap_smooth_8h_neon:     6.22   9.04   5.12   9.32
vp9_avg_8tap_smooth_8hv_neon:    6.38   8.21   5.72   8.17
vp9_avg_8tap_smooth_8v_neon:     9.22  12.66   8.15  11.10
vp9_avg_8tap_smooth_64h_neon:    7.02  10.23   5.54  11.58
vp9_avg_8tap_smooth_64hv_neon:   6.76   9.46   5.93   9.40
vp9_avg_8tap_smooth_64v_neon:   10.76  14.13   9.46  13.37
vp9_put4_neon:                   1.11   1.47   1.00   1.21
vp9_put8_neon:                   1.23   2.17   1.94   1.48
vp9_put16_neon:                  1.63   4.02   1.73   1.97
vp9_put32_neon:                  1.56   4.92   2.00   1.96
vp9_put64_neon:                  2.10   5.28   2.03   2.35
vp9_put_8tap_smooth_4h_neon:     3.11   4.35   2.63   4.35
vp9_put_8tap_smooth_4hv_neon:    3.67   4.69   3.25   4.71
vp9_put_8tap_smooth_4v_neon:     5.45   7.27   4.49   6.52
vp9_put_8tap_smooth_8h_neon:     5.97   8.18   4.81   8.56
vp9_put_8tap_smooth_8hv_neon:    6.39   7.90   5.64   8.15
vp9_put_8tap_smooth_8v_neon:     9.03  11.84   8.07  11.51
vp9_put_8tap_smooth_64h_neon:    6.78   9.48   4.88  10.89
vp9_put_8tap_smooth_64hv_neon:   6.99   8.87   5.94   9.56
vp9_put_8tap_smooth_64v_neon:   10.69  13.30   9.43  14.34

For the larger 8tap filters, the speedup vs C code is around 5-14x.

This is significantly faster than libvpx's implementation of the same
functions, at least when comparing the put_8tap_smooth_64 functions
(compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from
libvpx).

Absolute runtimes from checkasm:
                          Cortex      A7        A8        A9       A53
vp9_put_8tap_smooth_64h_neon:    20150.3   14489.4   19733.6   10863.7
libvpx vpx_convolve8_horiz_neon: 52623.3   19736.4   21907.7   25027.7

vp9_put_8tap_smooth_64v_neon:    14455.0   12303.9   13746.4    9628.9
libvpx vpx_convolve8_vert_neon:  42090.0   17706.2   17659.9   16941.2

Thus, on the A9, the horizontal filter is only marginally faster than
libvpx, while our version is significantly faster on the other cores,
and the vertical filter is significantly faster on all cores. The
difference is especially large on the A7.

The libvpx implementation does the accumulation in 32 bit, which
probably explains most of the differences.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:35:38 +02:00
Martin Storsjö
2e55e26b40 vp9: Flip the order of arguments in MC functions
This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Diego Biurrun
baab87c4f3 bink: Have function pointer prototype match implementation
libavcodec/binkdsp.c(156) : warning C4028: formal parameter 1 different from declaration
2016-11-02 10:33:39 +01:00
Diego Biurrun
4cf2ffb7c4 idct: Have function pointer prototype match implementation
libavcodec/idctdsp.c(175) : warning C4028: formal parameter 2 different from declaration
2016-11-02 10:33:39 +01:00
Diego Biurrun
39cea6570c aactab: Move extern keyword to the front of array declarations
libavcodec/aactab.h:49:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration]
2016-11-02 10:33:36 +01:00
Luca Barbato
801ac7156d qsv: Be informative when reporting that no data has been consumed 2016-10-30 21:55:03 +01:00
Diego Biurrun
30015305f3 Use avpriv_request_sample() where appropriate 2016-10-29 18:32:21 +02:00
Diego Biurrun
3ec6f855d0 srt: Adjust signedness of sscanf format strings
Fixes several warnings from -Wformat.
2016-10-28 13:28:36 +02:00
Diego Biurrun
7a2b2b6a92 dxtory: Drop nonsense ISO C printf conversion specifiers for standard types 2016-10-28 13:24:55 +02:00
Diego Biurrun
c454dfcff9 Use ISO C printf conversion specifiers where appropriate 2016-10-28 13:24:44 +02:00
Diego Biurrun
fbe425c8d2 hap: Adjust printf length modifiers to match variable types
libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t {aka unsigned int}’ [-Wformat=]
libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘size_t {aka unsigned int}’ [-Wformat=]
2016-10-28 11:22:22 +02:00
Diego Biurrun
1263b2039e Adjust printf conversion specifiers to match variable signedness 2016-10-28 11:22:21 +02:00
Diego Biurrun
47756f51fe dnxhdenc: Drop pointless, commented-out debug output 2016-10-27 12:21:46 +02:00
Diego Biurrun
0574780d7a h264_loopfilter: Do not print value of uninitialized variable
libavcodec/h264_loopfilter.c:531:111: warning: variable 'edge' is uninitialized when used here [-Wuninitialized]
2016-10-27 12:21:46 +02:00
Diego Biurrun
2555269985 mpegaudio: Do not print value of uninitialized variable
libavcodec/mpegaudiodec_template.c:885:97: warning: variable 'x' is uninitialized when used here [-Wuninitialized]
2016-10-27 12:21:46 +02:00
Mark Thompson
0aec37e625 vaapi_decode: Remove vestigial unmap code
The buffer map/unmap code was in an early version of this before it
was committed, but the unmap was never removed.  While wrong, this
was harmless (and therefore unnoticed) because the buffers can't be
mapped at this point - all drivers just did nothing with the call.
2016-10-24 20:17:47 +01:00
Mark Thompson
5e879b54a3 vaapi_decode: Clear parameter buffers to fix picture reuse
When decoding interlaced pictures, the structure is reused to render
to the same surface twice.  The parameter buffers were not being
cleared, which caused the i965 driver to error out.
2016-10-24 20:17:47 +01:00
Gwenole Beauchesne
754b20d7eb vaapi_h264: fix RefPicList[] field flags.
Use new H264Ref.reference field to track field picture flags. The
H264Picture.reference flag in DPB is now irrelevant here.

This is a regression from git commit a12d3188, and that affected
multiple interlaced video streams.

Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
2016-10-24 20:17:47 +01:00
Pierre Edouard Lepere
6d5636ad9a hevc: x86: Add add_residual() SIMD optimizations
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.

Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-10-22 17:33:35 +02:00
Vittorio Giovara
0d9b9bd37f lavu: Add JEDEC P22 color primaries 2016-10-21 11:46:21 -04:00
Anton Khirnov
59c90097a0 hevc: factor out a repeated condition 2016-10-21 10:11:20 +02:00
Anton Khirnov
0bfdcce4d4 hevc: move the SliceType enum to hevc.h
Those values are decoder-independent and are also use by the VA-API
encoder.
2016-10-21 10:11:20 +02:00
Diego Biurrun
788544ff0e audiodsp: x86: Remove pointless header file
Its single forward declaration can be moved to the only place
it is used, like is done for all other dsp init files.
2016-10-19 15:20:41 +02:00
Diego Biurrun
b89804da9b x86: videodsp: Add parentheses to expression to work around warning
libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds
2016-10-19 10:13:34 +02:00
Diego Biurrun
58224dc5f3 ppc: avcodec: Drop silly "_ppc" suffixes from files in ppc subdirectories 2016-10-18 00:10:36 +02:00
Mark Thompson
0cf86fabfa vaapi_encode: Write sequence header as extradata
Only works if packed headers are supported, where we can know the
output before generating the first frame.
2016-10-17 21:07:25 +01:00
Mark Thompson
f9bb356e0e vaapi_h265: Include header for slice types
The include was changed correctly in 4abe3b049d
but then mistakenly changed back by c359d624d3
(it's not just the NAL unit types which are used).
2016-10-17 20:53:28 +01:00
Diego Biurrun
6be7944ee2 x86: Add missing colons after assembly labels
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00
Anton Khirnov
89b35a139e lavc: add a bitstream filter for extracting extradata from packets
This is intended as a replacement for the 'split' function exported by
some parsers.
2016-10-16 20:27:16 +02:00
Anton Khirnov
f6e2f8a9ff hevcdec: move parameter set parsing into a separate header
This code is independent from the decoder, so it makes more sense for it
to to have its own header.
2016-10-16 20:26:47 +02:00
Anton Khirnov
150c896a9e hevcdec: split ff_hevc_diag_scan* declarations into a separate header
This will be useful in the following commits.
2016-10-16 20:26:40 +02:00
Anton Khirnov
645c6ff423 hevcdec: drop the prototype of a non-existing function 2016-10-16 20:26:35 +02:00
Anton Khirnov
c359d624d3 hevcdec: move decoder-independent declarations into a separate header
This way they can be reused by other code without including the whole
decoder-specific hevcdec.h
Also, add the HEVC_ prefix to them, since similarly named values exist
for H.264 as well and are sometimes used in the same code.
2016-10-16 20:26:28 +02:00
Anton Khirnov
4abe3b049d hevc: rename hevc.[ch] to hevcdec.[ch]
This is more consistent with the rest of libav and frees up the hevc.h
name for decoder-independent shared declarations.
2016-10-16 20:26:17 +02:00
Kieran Kunhya
81f1f6c3f6 Add GBRAP12 pixel format support
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-12 21:33:34 +02:00
Vittorio Giovara
14e7e19a90 lavc: bsf: Document input/output codecparam alloc/init process 2016-10-12 11:06:58 -04:00
Alexandra Hájková
112cee0241 hevc: Add SSE2 and AVX IDCT
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-11 18:21:04 +02:00
Martin Storsjö
9b2ccafb48 aarch64: Add missing sign extension in ff_h264_idct8_add_neon
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-10 14:57:53 +03:00
Yogender Gupta
cbd84b8a51 nvenc: Fix error log
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-10-09 20:58:10 +02:00
Yogender Gupta
da2848375a nvenc: Force high_444 profile for 444 input
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-10-07 10:41:38 +02:00
Anton Khirnov
e4128c08d7 Revert "hevc: x86: Refactor IDCT macro declarations"
This reverts commit d9dccc0389. There were
outstanding objections to this commit.
2016-10-06 15:24:04 +02:00
Diego Biurrun
5801f9ed24 h264_intrapred: x86: Update comments left behind in 95c89da36e 2016-10-06 12:32:34 +02:00
Diego Biurrun
d9dccc0389 hevc: x86: Refactor IDCT macro declarations 2016-10-06 12:32:34 +02:00
Steve Lhomme
be630b1e08 d3d11va: Use the proper decoding slice index
The decoding buffer index expected by D3D11VA is the one from the
ID3D11Texture2D not the one from the ID3D11VideoDecoderOutputView array
in AVD3D11VAContext.

Otherwise, when providing decoder slices that do not start from 0,
pictures appear in bogus order. For an invalid index crashes and
image corruption can occur.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-05 18:37:27 +02:00
Ronald S. Bultje
715f139c9b vp9lpf/x86: make filter_16_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
8915320db9 vp9lpf/x86: make filter_48/84/88_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
725a216481 vp9lpf/x86: make filter_44_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
5bfa96c4b3 vp9lpf/x86: make filter_16_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
b905e8d2fe vp9lpf/x86: make filter_48/84_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
37637e6590 vp9lpf/x86: make filter_88_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
be10834bd9 vp9lpf/x86: make filter_44_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
7c62891efe vp9lpf/x86: save one register in SIGN_ADD/SUB.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
c6375a83d1 vp9lpf/x86: store unpacked intermediates for filter6/14 on stack.
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
4ce8ba72f9 vp9lpf/x86: move variable assigned inside macro branch.
The value is not used outside the branch.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
e4961035b2 vp9lpf/x86: simplify ABSSUM_CMP by inverting the comparison meaning.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
683da2788e vp9lpf/x86: remove unused register from ABSSUB_CMP macro.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6e74e9636b vp9lpf/x86: slightly simplify 44/48/84/88 h stores.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6411c328a2 vp9lpf/x86: make cglobal statement more conservative in register allocation.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
a6e288d624 vp9lpf/x86: save one register in loopfilter surface coverage.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
0ed21bdc9e vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
f2e3d706a1 vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}().
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
92d47550ea vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16
Similar gains as the ssse3 version once again

Additional improvements by Clément Bœsch <u@pkh.me>.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
6bea478158 vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
1f451eed60 vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2().
Similar gains in performance as the SSSE3 version

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
a692724c58 vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
a451324ddd vp9: ignore reference segmentation map if error_resilience flag is set.
Fixes ffvp9_fails_where_libvpx.succeeds.webm.

Bug-Id: ffmpeg/3849.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:07 +02:00
Carl Eugen Hoyos
c19830aa2c rscc: Support palette format
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-10-02 15:42:03 -04:00
Vittorio Giovara
b8d5070db6 avcodec: Document AV_PKT_DATA_PALETTE side data type 2016-10-02 15:42:03 -04:00
Mark Thompson
5a5df90d9c vaapi_h265: Add main 10 encode support 2016-10-02 20:23:18 +01:00
Mark Thompson
b8cac1e830 vaapi_h265: Fix buffering parameters
A decoder may need this to be set correctly to output frames in the
right order.
2016-10-02 20:23:18 +01:00