1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-19 05:49:09 +02:00

21674 Commits

Author SHA1 Message Date
Luca Barbato
a46a4f722d dca: Refactor dca_filter_channels() a little
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-03-21 12:08:52 +01:00
Luca Barbato
e245d4f45c dca: Validate the channel map
Having a mismatch between the number of channels in the stream and those
in the channel map will lead to a segfault or worse.

Bug-Id: 1016

CC: libav-stable@libav.org

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-03-21 12:08:49 +01:00
Konda Raju
3df77b58e3 nvenc: Allow different const qps for I, P and B frames
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-03-21 12:06:44 +01:00
wm4
1a7ddba576 lavc: vdpau: add support for new hw_frames_ctx and hw_device_ctx API
This supports retrieving the device from a provided hw_frames_ctx, and
automatically creating a hw_frames_ctx if hw_device_ctx is set.

The old API is not deprecated yet. The user can still use
av_vdpau_bind_context() (with or without setting hw_frames_ctx), or use
the API before that by allocating and setting hwaccel_context manually.
2017-03-20 23:15:43 +00:00
wm4
16a163b55a lavc: Add hwaccel_flags field to AVCodecContext
This "reuses" the flags introduced for the av_vdpau_bind_context() API
function, and makes them available to all hwaccels. This does not affect
the current vdpau API, as av_vdpau_bind_context() should obviously
override the AVCodecContext.hwaccel_flags flags for the sake of
compatibility.
2017-03-20 23:15:43 +00:00
Diego Biurrun
cfee5e1a0f build: Add missing object dependency for extract_extradata bitstream filter 2017-03-20 13:16:51 +01:00
Martin Storsjö
7995ebfad1 arm/aarch64: vp9: Fix vertical alignment
Align the second/third operands as they usually are.

Due to the wildly varying sizes of the written out operands
in aarch64 assembly, the column alignment is usually not as clear
as in arm assembly.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-16 23:09:00 +02:00
Diego Biurrun
681a86aba6 x86: fft: Port to cpuflags 2017-03-14 17:23:32 +01:00
Diego Biurrun
e9bb77fb10 x86: h264: Simplify DEQUANT macro with cpuflags 2017-03-14 17:23:32 +01:00
Diego Biurrun
307eb1a8ee x86: vp8dsp: port FILTER_BILINEAR macro to cpuflags 2017-03-14 17:23:32 +01:00
Diego Biurrun
994c4bc107 x86util: Port all macros to cpuflags
Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2
macro name, drop pointless check for MMX support, we always assume MMX is
available in our SIMD code, fix spelling.
2017-03-14 17:23:32 +01:00
Anton Khirnov
522d850e68 h264_cavlc: check the value of run_before
Section 9.2.3.2 of the spec implies that run_before must not be larger
than zeros_left.

Fixes invalid reads with corrupted files.

CC: libav-stable@libav.org
Bug-Id: 1000
Found-By: Kamil Frankowicz
2017-03-12 20:42:13 +01:00
Anton Khirnov
83b2b34d06 h2645_parse: use the bytestream2 API for packet splitting
The code does some nontrivial jumping around in the buffer, so it is
safer to use a checked API rather than do everything manually.

Fixes a bug in nalff parsing, where the length field is currently not
counted in the buffer size check, resulting in possible overreads with
invalid files.

CC: libav-stable@libav.org
Bug-Id: 1002
Found-By: Kamil Frankowicz
2017-03-12 20:42:12 +01:00
Anton Khirnov
b76f6a76c6 h264dec: initialize field_started to 0 on each decode call
It might be incorrectly set to 1 if the previous call exited with an
error.

Bug-Id: 1019
CC: libav-stable@libav.org
2017-03-12 20:42:12 +01:00
Martin Storsjö
3a0d5e206d arm/aarch64: vp9itxfm: Skip loading the min_eob pointer when it won't be used
In the half/quarter cases where we don't use the min_eob array, defer
loading the pointer until we know it will be needed.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-11 22:07:30 +02:00
Martin Storsjö
98ee855ae0 arm: vp9itxfm: Template the quarter/half idct32 function
This reduces the number of lines and reduces the duplication.

Also simplify the eob check for the half case.

If we are in the half case, we know we at least will need to do the
first three slices, we only need to check eob for the fourth one,
so we can hardcode the value to check against instead of loading
from the min_eob array.

Since at most one slice can be skipped in the first pass, we can
unroll the loop for filling zeros completely, as it was done for
the quarter case before.

This allows skipping loading the min_eob pointer when using the
quarter/half cases.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-11 22:07:12 +02:00
Kieran Kunhya
5f794aa165 Add Cineform HD Decoder
Decodes YUV 4:2:2 10-bit and RGB 12-bit files.
Older files with more subbands, skips, Bayer, alpha not supported.

Further fixes and refactorings by Anton Khirnov <anton@khirnov.net>,
Diego Biurrun <diego@biurrun.de>, Vittorio Giovara <vittorio.giovara@gmail.com>

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-03-09 18:37:29 +01:00
Konda Raju
f6790b5e10 add initial QP value options
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-03-09 17:24:00 +01:00
wm4
8a60bba0ae avcodec: clarify some decoding/encoding API details
Make it clear that there is no timing-dependent behavior. In particular,
there is no state in which both input and output are denied, and where
you have to wait for a while yourself to make progress (apparently some
hardware decoders like to do this).

Avoid wording that makes references to time. It shouldn't be mistaken
for some kind of asynchronous API (like POSIX read() can return EAGAIN
if there is no new input yet). It's a state machine, so try to use
appropriate terms.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-03-09 17:07:24 +01:00
Vittorio Giovara
b44bd7ee7f pixlet: Fix architecture-dependent code and values
The constants used in the decoder used floating point precision,
and this caused different values to be generated on different
architectures. Additionally on big endian machines, the fate test
would output bytes in native order, which is different from the one
hardcoded in the test.

So, eradicate floating point numbers and use fixed point (32.32)
arithmetics everywhere, replacing constants with precomputed integer
values, and force the pixel format output to be the same in the fate
test.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2017-03-06 18:15:02 -05:00
Diego Biurrun
6eef263aca x86: Merge align directives into SECTION_RODATA declarations where possible 2017-03-05 14:26:06 +01:00
Ganapathy Kasi
3303f86467 nvenc: Remove qmin and qmax constraints for nvenc vbr
qmin and qmax are not necessary for nvenc vbr.

Also fix for using 2 pass vbr mode for slow preset through ctx->flag NVENC_TWO_PASSES.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-03-04 08:23:28 +01:00
Paul B Mahol
aba5b94859 Add Apple Pixlet decoder
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2017-03-01 11:52:29 -05:00
Diego Biurrun
39e208f4d4 build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.
2017-03-01 10:18:15 +01:00
Diego Biurrun
fde7ee8710 x86: hevc: Add missing colons after assembly labels
This fixes several warnings of the sort
warning: label alone on a line without a colon might be in error
2017-03-01 09:23:42 +01:00
Michael Niedermayer
d7b2bb5391 h264_sei: Check actual presence of picture timing SEI message
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2017-02-28 10:32:50 -05:00
Ben Chang
d8f36a6aa3 nvenc: Fix the preset mapping list
The map is a sparse array and does not need a empty element to terminate
it.

The empty element is stored after the last one inserted in the list,
overwriting whichever element was next with zeros.

Bug-Id: 1029

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-02-28 11:54:02 +01:00
Anton Khirnov
984736dd9e lavc: make sure not to return EAGAIN from codecs
This error is treated specially by the API.

CC: libav-stable@libav.org
2017-02-25 09:57:44 +01:00
Anton Khirnov
b2788fe934 svq3: fix the slice size check
Currently it incorrectly compares bits with bytes.

Also, move the check right before where it's relevant, so that the
correct number of remaining bits is used.

CC: libav-stable@libav.org
2017-02-25 09:57:43 +01:00
John Stebbins
248dc5c164 h264dec: fix dropped initial SEI recovery point 2017-02-24 08:24:13 -07:00
Martin Storsjö
b8f66c0838 aarch64: vp9itxfm: Reorder iadst16 coeffs
This matches the order they are in the 16 bpp version.

There they are in this order, to make sure we access them in the
same order they are declared, easing loading only half of the
coefficients at a time.

This makes the 8 bpp version match the 16 bpp version better.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:04:34 +02:00
Martin Storsjö
08074c092d arm: vp9itxfm: Reorder iadst16 coeffs
This matches the order they are in the 16 bpp version.

There they are in this order, to make sure we access them in the
same order they are declared, easing loading only half of the
coefficients at a time.

This makes the 8 bpp version match the 16 bpp version better.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:04:33 +02:00
Martin Storsjö
09eb88a12e aarch64: vp9itxfm: Reorder the idct coefficients for better pairing
All elements are used pairwise, except for the first one.
Previously, the 16th element was unused. Move the unused element
to the second slot, to make the later element pairs not split
across registers.

This simplifies loading only parts of the coefficients,
reducing the difference to the 16 bpp version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:04:32 +02:00
Martin Storsjö
de06bdfe6c arm: vp9itxfm: Reorder the idct coefficients for better pairing
All elements are used pairwise, except for the first one.
Previously, the 16th element was unused. Move the unused element
to the second slot, to make the later element pairs not split
across registers.

This simplifies loading only parts of the coefficients,
reducing the difference to the 16 bpp version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:04:31 +02:00
Martin Storsjö
65aa002d54 aarch64: vp9itxfm: Avoid reloading the idct32 coefficients
The idct32x32 function actually pushed d8-d15 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

After this, we still can skip pushing d12-d15.

Before:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3
After:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:03:44 +02:00
Martin Storsjö
402546a172 arm: vp9itxfm: Avoid reloading the idct32 coefficients
The idct32x32 function actually pushed q4-q7 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

Since the idct16 core transform avoids clobbering q4-q7 (but clobbers
q2-q3 instead, to avoid needing to back up and restore q4-q7 at all
in the idct16 function), and the lanewise vmul needs a register in
the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5
while doing idct16.

While keeping these coefficients in registers, we still can skip pushing
q7.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_32x32_sub32_add_neon:  18553.8  17182.7  14303.3  12089.7
After:
vp9_inv_dct_dct_32x32_sub32_add_neon:  18470.3  16717.7  14173.6  11860.8

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:03:43 +02:00
Martin Storsjö
575e31e931 arm: vp9lpf: Implement the mix2_44 function with one single filter pass
For this case, with 8 inputs but only changing 4 of them, we can fit
all 16 input pixels into a q register, and still have enough temporary
registers for doing the loop filter.

The wd=8 filters would require too many temporary registers for
processing all 16 pixels at once though.

Before:                          Cortex A7      A8     A9     A53
vp9_loop_filter_mix2_v_44_16_neon:   289.7   256.2  237.5   181.2
After:
vp9_loop_filter_mix2_v_44_16_neon:   221.2   150.5  177.7   138.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:03:09 +02:00
Martin Storsjö
3bf9c48320 aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1
This is one cycle faster in total, and three instructions fewer.

Before:
vp9_loop_filter_mix2_v_44_16_neon: 123.2
After:
vp9_loop_filter_mix2_v_44_16_neon: 122.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:03:00 +02:00
Martin Storsjö
c582cb8537 arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit
The theoretical maximum value of E is 193, so we can just
saturate the addition to 255.

Before:                     Cortex A7      A8      A9     A53  A53/AArch64
vp9_loop_filter_v_4_8_neon:     143.0   127.7   114.8    88.0         87.7
vp9_loop_filter_v_8_8_neon:     241.0   197.2   173.7   140.0        136.7
vp9_loop_filter_v_16_8_neon:    497.0   419.5   379.7   293.0        275.7
vp9_loop_filter_v_16_16_neon:   965.2   818.7   731.4   579.0        452.0
After:
vp9_loop_filter_v_4_8_neon:     136.0   125.7   112.6    84.0         83.0
vp9_loop_filter_v_8_8_neon:     234.0   195.5   171.5   136.0        133.7
vp9_loop_filter_v_16_8_neon:    490.0   417.5   377.7   289.0        271.0
vp9_loop_filter_v_16_16_neon:   951.2   814.7   732.3   571.0        446.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:02:36 +02:00
Diego Biurrun
ed6a891c36 Place attribute_deprecated in the right position for struct declarations
libavcodec/vaapi.h:58:1: warning: attribute 'deprecated' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
2017-02-23 12:23:20 +01:00
Diego Biurrun
00b160af11 nvenc: Fix nvec vs. nvenc typo 2017-02-20 09:50:03 +01:00
Mark Thompson
7cb9296db8 webp: Fix alpha decoding
This was broken by 4e528206bc4d968706401206cf54471739250ec7 - the webp
decoder was assuming that it could set the output pixfmt of the vp8
decoder directly, but after that change it no longer could because
ff_get_format() was used instead.  This adds an internal get_format()
callback to webp use of the vp8 decoder to override the pixfmt
appropriately.
2017-02-18 19:53:20 +00:00
Mark Thompson
17aeee5832 vaapi_encode: Discard output buffer if picture submission fails
Previously this was leaking, though it actually hit an assert making
sure that the buffer had already been cleared when freeing the picture.
2017-02-16 20:58:42 +00:00
Martin Storsjö
030de53e9c libopenh264dec: Let the framework use the h264_mp4toannexb bitstream filter
This avoids a lot of boilerplate code within the decoder wrapper itself.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-15 23:05:58 +02:00
Mark Thompson
5dd9a4b88b vaapi: Implement device-only setup
In this case, the user only supplies a device and the frame context
is allocated internally by lavc.
2017-02-13 21:44:43 +00:00
Mark Thompson
44f2eda39f lavc: Add device context field to AVCodecContext
For use by codec implementations which can allocate frames internally.
2017-02-13 20:14:27 +00:00
Martin Storsjö
07b5136c48 aarch64: vp9lpf: Fix broken indentation/vertical alignment
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-12 21:57:23 +02:00
Martin Storsjö
b0806088d3 aarch64: vp9lpf: Interleave the start of flat8in into the calculation above
This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 22:54:18 +02:00
Martin Storsjö
e18c39005a arm: vp9lpf: Interleave the start of flat8in into the calculation above
This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 22:54:01 +02:00
Luca Barbato
9c2d36fcaf dv: Convert to the new bitstream reader 2017-02-11 20:29:44 +01:00