1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-29 22:00:58 +02:00

21320 Commits

Author SHA1 Message Date
Anton Khirnov
8dfba25ce8 pthread_frame: ensure the threads don't run simultaneously with hwaccel 2016-12-19 08:09:19 +01:00
Anton Khirnov
373fd76b4d hevcdec: do not set decoder-global SPS prematurely
It should only be set after the decoder state has been fully initialized
for using that SPS.
Fixes possible invalid reads on get_format() failure.

CC: libav-stable@libav.org
2016-12-19 08:07:15 +01:00
Janne Grunau
2425d7329f arm64: replace 'bic' with immediate with 'and' with inverted immediate
The former is not an official pseudo instruction although gas and llvm's
internal assembler support it. Fixes a build error with xcode 6.2
reported by Memphiz on github.
2016-12-14 21:53:05 +01:00
Diego Biurrun
ea7ee4b4e3 ppc: Centralize compiler-specific altivec.h #include handling in one place
Also move #includes into canonical order where appropriate.
2016-12-14 14:08:43 +01:00
Diego Biurrun
39929e55eb ppc: hevcdsp: Use shorthands for vector types
This is more consistent and fixes compilation with clang.
2016-12-14 14:08:43 +01:00
Diego Biurrun
554e55bbf0 decode.h: Add missing headers to fix standalone compilation 2016-12-14 14:08:43 +01:00
Wan-Teh Chang
343e283399 pthread_frame: use better memory orders for frame progress
This improves commit 59c70227405c214b29971e6272f3a3ff6fcce3d0.

In ff_thread_report_progress(), the fast code path can load
progress[field] with the relaxed memory order, and the slow code path
can store progress[field] with the release memory order. These changes
are mainly intended to avoid confusion when one inspects the source code.
They are unlikely to have measurable performance improvement.

ff_thread_report_progress() and ff_thread_await_progress() form a pair.
ff_thread_await_progress() reads progress[field] with the acquire memory
order (in the fast code path). Therefore, one expects to see
ff_thread_report_progress() write progress[field] with the matching
release memory order.

In the fast code path in ff_thread_report_progress(), the atomic load of
progress[field] doesn't need the acquire memory order because the
calling thread is trying to make the data it just decoded visible to the
other threads, rather than trying to read the data decoded by other
threads.

In ff_thread_get_buffer(), initialize progress[0] and progress[1] using
atomic_init().

Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-12-14 11:16:51 +01:00
Derek Buitenhuis
5c7f2cf81d h264_slice: Wait for refs to be available before we use them in error concealment
This could happen when there was a frame number gap and frame threading was used.

Debugging-by: Ronald S. Bultje <rsbultje@gmail.com>
Debugging-by: Justin Ruggles <justin.ruggles@gmail.com>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

CC:libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-12-14 10:38:15 +01:00
Anton Khirnov
86157e6db2 hevc: decouple calling get_format() from exporting the SPS parameters
This makes sure ff_get_format() does not get called unnecessarily from
update_thread_context().
2016-12-14 09:06:45 +01:00
Anton Khirnov
730c023260 binkaudio: switch to the new send/receive API
It is more natural for this codec and allows to avoid awkward constructs
like "consuming 0 bytes from input". Also, keep a reference to the input
packet to avoid unnecessary copying.
2016-12-14 09:06:45 +01:00
Anton Khirnov
fa1749dd34 vp9: split superframes in the filtering stage before actual decoding
Significantly increases the efficiency of frame threading, since
individual frames in a superframe can now be decoded in parallel.
2016-12-14 09:06:45 +01:00
Anton Khirnov
03a80925ef lavc: add a bitstream filter for splitting VP9 superframes
Partially based on code by Ronald S. Bultje <rsbultje@gmail.com>.
2016-12-14 09:06:45 +01:00
Anton Khirnov
8fb4210ad8 qsvdec_h2645: switch to the new generic filtering mechanism
Drop the internal manual conversion from the MP4 format to Annex B.
2016-12-14 09:06:45 +01:00
Anton Khirnov
972c71e9cb lavc: add support for filtering packets before decoding 2016-12-14 09:06:45 +01:00
Anton Khirnov
061a0c14bb decode: restructure the core decoding code
Currently, the new decoding API is pretty much just a wrapper around the
old deprecated one. This is problematic, since it interferes with making
full use of the flexibility added by the new API. The old API should
also be removed at some future point.

Reorganize the code so that the new send_packet/receive_frame functions
call the actual decoding directly and change the old deprecated
avcodec_decode_* functions into wrappers around the new API.

The new internal API for decoders is now changing as well. Before this
commit, it mirrors the public API, so the decoders need to implement
send_packet() and receive_frame() callbacks. This turns out to require
awkward constructs in both the decoders and the generic code. After this
commit, the decoders only implement the receive_frame() callback and
call a new internal function, ff_decode_get_packet() to obtain input
data, in the same manner to how the bitstream filters now work.

avcodec will now always make a reference to the input packet, which means
that non-refcounted input packets will be copied. Keeping the previous
behaviour, where this copy could sometimes be avoided, would make the
code significantly more complex and fragile for only dubious gains,
since packets are typically small and everyone who cares about
performance should use refcounted packets anyway.
2016-12-14 09:06:44 +01:00
Anton Khirnov
549d0bdca5 decode: be more explicit about storing the last packet properties
The current code stores a pointer to the packet passed to the decoder,
which is then used during get_buffer() for timestamps and side data
passthrough. However, since this is a pointer to user data which we do
not own, storing it is potentially dangerous. It is also ill defined for
the new decoding API with split input/output.

Fix this problem by making an explicit internally owned copy of the
packet properties.
2016-12-14 09:06:44 +01:00
Anton Khirnov
47e547b321 lavc: add a null bitstream filter
It is useful for testing/debugging and will also be used as the default
filter in the following commit adding pre-decode filtering to avoid
having a separate non-filtered codepath.
2016-12-14 09:06:44 +01:00
Anton Khirnov
0309ddcfb2 lavc: handle MP3 in get_audio_frame_duration() 2016-12-14 09:06:44 +01:00
Diego Biurrun
6aa4ba7131 dxva2: Keep code shared between dxva2 and d3d11va under the correct #if
This partially reverts commit ac648bb835edd3f67bda2267d0e72e5e582eb5a1.
2016-12-12 13:44:25 +01:00
Alexandra Hajkova
b0e6b3f477 hevc: ppc: Add HEVC 4x4 IDCT for PowerPC
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-12-12 09:25:16 +01:00
Diego Biurrun
a6901b9c6b Drop libxvid rate control support for mpegvideo encoding
The feature has outlived is usefulness and complicates the code.
2016-12-11 09:27:40 +01:00
Diego Biurrun
ac648bb835 dxva2: Simplify some ifdefs 2016-12-11 09:27:40 +01:00
Mark Thompson
7d81698b89 vaapi_h265: Fix CFR mode with framerate set in AVCodecContext
Same issue as 17a0f9481cf07af0feb3838ca315b970117e8000.
2016-12-10 16:55:44 +00:00
Diego Biurrun
932cc6496e vdpau: Do not #include vdpau_x11.h from the main vdpau header
That header should only be included in the special bits that use X11 code.
2016-12-09 08:41:53 +01:00
Diego Biurrun
92e6b31c3b dxva2: Adjust multiple inclusion guard names to follow convention 2016-12-09 08:41:52 +01:00
Andreas Cadhalpun
fc85646ad4 libopusdec: fix out-of-bounds read
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-12-08 15:53:58 -05:00
Andreas Cadhalpun
dc2ad09493 libschroedingerdec: fix leaking of framewithpts
Also preserve the return value from ff_get_buffer().

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-08 15:53:58 -05:00
Andreas Cadhalpun
8c3a643808 libschroedingerdec: don't produce empty frames
They are not valid and can cause problems/crashes for API users.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-12-08 15:53:58 -05:00
Timothy Gu
d3da8a0035 omx: Fix allocation check
Also use av_mallocz_array().

Bug-Id: CID 1396839
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-08 15:53:58 -05:00
Timothy Gu
d32bdadda8 qsvdec: Fix memory leak on error
Bug-Id: CID 1396851
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-08 15:53:58 -05:00
Diego Biurrun
d5759701a8 libkvazaar: Add missing header #includes
This fixes compilation after the next version bump.
2016-12-08 21:34:30 +01:00
Diego Biurrun
fbec58daa2 build: Add an internal component for hevc_ps code
This allows expressing dependencies in a more correct way.
2016-12-08 20:12:23 +01:00
Vittorio Giovara
2fb6acd9c2 lavc: Add spherical packet side data API
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-07 14:34:34 -05:00
Diego Biurrun
624aa8ab22 build: Add missing Makefile entries and ifdefs for QSV hwaccels 2016-12-07 15:46:57 +01:00
Diego Biurrun
e1dc5358af build: Create a component for MPEG audio header decoding
Fixes standalone compilation of the libmp3lame encoder.
2016-12-05 16:13:05 +01:00
Diego Biurrun
0fdc9f81a0 build: Add missing hevc_ps dependency for QSV HEVC encoder 2016-12-05 16:13:04 +01:00
Alexandra Hájková
6c916192f3 mimic: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
cdc6727c3e metasound: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
6fad5abcad lagarith: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
c3defda0d8 indeo: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
f5b7bd2a7c imc: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
39ecf0588f webp: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
James Almer
33a2b73b98 mpeg4audio: correctly propagate meaningful error values
Signed-off-by: James Almer <jamrial@gmail.com>
2016-12-02 12:16:30 -05:00
Wan-Teh Chang
d82d5379ca mmaldec: initialize refcount using atomic_init()
This is how we initialize refcount in libavutil/buffer.c.

Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-02 12:16:26 -05:00
Vittorio Giovara
5168026a05 options_table: Do not rely on enum size as option bound
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-02 11:36:46 -05:00
Vittorio Giovara
ff9db5cfd1 lavc: Use a stricter check for the color properties values
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-02 11:36:42 -05:00
Diego Biurrun
0a35f128f3 cabac: x86: Give optimizations header a more meaningful name 2016-12-01 08:23:54 +01:00
Martin Storsjö
cad42fadcd aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

vp9_inv_dct_dct_16x16_sub16_add_neon:   1373.2
vp9_inv_dct_dct_32x32_sub32_add_neon:   8089.0

By skipping individual 8x16 or 8x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     235.3
vp9_inv_dct_dct_16x16_sub2_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub8_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   1372.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     555.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5190.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub8_add_neon:    5183.1
vp9_inv_dct_dct_32x32_sub12_add_neon:   6161.5
vp9_inv_dct_dct_32x32_sub16_add_neon:   6155.5
vp9_inv_dct_dct_32x32_sub20_add_neon:   7136.3
vp9_inv_dct_dct_32x32_sub24_add_neon:   7128.4
vp9_inv_dct_dct_32x32_sub28_add_neon:   8098.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8098.8

I.e. in general a very minor overhead for the full subpartition case due
to the additional cmps, but a significant speedup for the cases when we
only need to process a small part of the actual input data.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:57:05 +02:00
Martin Storsjö
9c8bc74c2b arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:54:07 +02:00
Martin Storsjö
3c87039a40 arm: vp9itxfm: Only reload the idct coeffs for the iadst_idct combination
This avoids reloading them if they haven't been clobbered, if the
first pass also was idct.

This is similar to what was done in the aarch64 version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:53:52 +02:00