1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

21412 Commits

Author SHA1 Message Date
Luca Barbato
f8f7ad758d qsv: Set the correct range for la_depth
Setting an invalid range for it makes the encoder behave inconsistently.
2017-01-13 08:42:10 +01:00
Anton Khirnov
1202b71269 theora: export cropping information instead of handling it internally 2017-01-12 16:29:17 +01:00
Anton Khirnov
c3e84820d6 h264dec: export cropping information instead of handling it internally 2017-01-12 16:29:12 +01:00
Anton Khirnov
4fded0480f h264dec: be more explicit in handling container cropping
The current condition can trigger in cases where it shouldn't, with
unexpected results.
Make sure that:
- container cropping is really based on the original dimensions from the
  caller
- those dimenions are discarded on size change

The code is still quite hacky and eventually should be deprecated and
removed, with the decision about which cropping is used delegated to the
caller.
2017-01-12 16:28:05 +01:00
Anton Khirnov
a02ae1c683 hevcdec: export cropping information instead of handling it internally 2017-01-12 16:27:56 +01:00
Anton Khirnov
019ab88a95 lavc: add an option for exporting cropping information to the caller
Also, add generic code for handling cropping, so the decoders can export
just the cropping size and not bother with the rest.
2017-01-12 16:24:15 +01:00
Anton Khirnov
b68e353136 qsvdec: do not sync PIX_FMT_QSV surfaces
Introducing enforced sync points in arbitrary places is bad for
performance. Since the vast majority of receiving code (QSV VPP or
encoders, retrieving frames through hwcontext) will do the syncing, this
change should not be visible to most callers. But bumping micro just in
case.

This is also consistent with what VAAPI hwaccel does.
2017-01-12 16:21:39 +01:00
Steve Lhomme
ac3c3ee678 dxva2: allow an empty array of ID3D11VideoDecoderOutputView
We can pick the correct slice index directly from the ID3D11VideoDecoderOutputView
casted from data[3].

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-01-12 16:19:13 +01:00
Steve Lhomme
f67235a28c dxva2: get the slice number directly from the surface in D3D11VA
No need to loop through the known surfaces, we'll use the requested surface
anyway.

The loop is only done for DXVA2.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-01-12 16:09:41 +01:00
Mark Thompson
89725a8512 vaapi_h264: Scale log2_max_pic_order_cnt_lsb with max_b_frames
Before this change, it was possible to overflow pic_order_cnt_lsb and
generate a stream with invalid POC numbering.  This makes sure that
the field is large enough that a single IDR B* P sequence uses fewer
than half the available POC lsb values.
2017-01-11 23:03:58 +00:00
Mark Thompson
a3c3a5eac2 vaapi_encode: Support forcing IDR frames via AVFrame.pict_type 2017-01-11 23:03:58 +00:00
Mark Thompson
37fab0661a vaapi_encode: Fix GOP sizing
This change makes the configured GOP size be respected exactly -
previously the value could be exceeded slightly due to flaws in the
frame type selection logic.
2017-01-11 23:03:58 +00:00
Alexandra Hájková
bd6496fa07 interplayvideo: Convert to the new bitstream reader 2017-01-09 15:21:47 +01:00
Alexandra Hájková
4e25051031 adx: Convert to the new bitstream reader 2017-01-09 15:21:47 +01:00
Alexandra Hájková
9aec009f65 dvbsubdec: Convert to the new bitstream reader 2017-01-09 15:21:47 +01:00
Alexandra Hájková
d7fe11634c motionpixels: Convert to the new bitstream reader 2017-01-09 15:18:16 +01:00
Anton Khirnov
f1af37b510 h264dec: make ff_h264_decode_init() static
It is not called from outside h264dec.c anymore.
2017-01-09 13:21:13 +01:00
Anton Khirnov
e7de05f98f h264dec: drop a redundant check
Cropping parameters are already checked for validity during SPS parsing,
no need to check them again.
2017-01-09 13:21:13 +01:00
Steve Lhomme
2835e9a9fd hevcdec: add P010 support for D3D11VA
Given it's the same API than DVXA2 I don't know why the same output was not
enabled for both.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-01-09 10:48:54 +01:00
Steve Lhomme
0ac2d86c47 dxva2: Factorize DXVA context validity test into a single macro
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-01-08 16:41:24 +01:00
Steve Lhomme
f8a42d4f26 dxva2: Make ff_dxva2_get_surface() static and drop its name prefix
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-01-08 16:41:07 +01:00
Jun Zhao
9b1db2d338 vaapi_h264: Fix POC on IDR frames
In H.264 section 8.2.1, we have that "The bitstream shall not contain
data that result in Min(TopFieldOrderCnt, BottomFieldOrderCnt) not
equal to 0 for a coded IDR frame".  This fixes the encoder to always
conform to this - previously the POC values formed an unbroken
sequence, not resetting to zero on IDR frames.

Signed-off-by: Mark Thompson <sw@jkqxz.net>
2017-01-04 21:52:06 +00:00
Mark Thompson
d08e02d929 vaapi_h265: Fix build failure with old libva without 10-bit surfaces
10-bit surface support was added in libva 1.6.2, earlier versions
support H.265 encoding in 8-bit only.
2017-01-04 21:49:41 +00:00
Martin Storsjö
85ad5ea72c aarch64: vp9mc: Fix a comment to refer to a register with the right name
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-03 14:16:10 +02:00
Martin Storsjö
65074791e8 aarch64: vp9dsp: Fix vertical alignment in the init file
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-03 14:15:58 +02:00
Martin Storsjö
c536e5e869 arm: vp9mc: Fix vertical alignment of operands
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-03 14:15:45 +02:00
Diego Biurrun
53618054b6 parser: Add missing #include for printing ISO C99 conversion specifiers 2016-12-25 13:22:50 +01:00
Diego Biurrun
0b77a59336 Use correct printf conversion specifiers for POSIX integer types 2016-12-23 19:30:00 +01:00
Diego Biurrun
92db508307 build: Generate pkg-config files from Make and not from configure
This moves work from the configure to the Make stage where it can
be parallelized and ensures that pkgconfig files are updated when
library versions change.

Bug-Id: 449
2016-12-22 12:30:54 +01:00
Diego Biurrun
f9edc734e0 ratecontrol: Drop xvid-rc-related struct members unused after a6901b9c6 2016-12-21 11:13:20 +01:00
Ruta Gadkari
5b26d3b789 nvenc: Update check for lookahead
By default it is -1.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-12-21 06:16:52 +01:00
Martin Storsjö
a0c443a398 aarch64: vp9itxfm: Use the offset parameter to movrel
This fixes build failures for iOS, broken since cad42fadcd.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-12-19 22:49:51 +02:00
Alexandra Hájková
fc322d6a70 tta: Convert to the new bitstream reader 2016-12-19 13:52:36 +01:00
Alexandra Hájková
00c72a1e01 mlp: Convert to the new bitstream reader 2016-12-19 13:22:29 +01:00
Alexandra Hájková
fa64aea12e unary: Convert to the new bitstream reader 2016-12-19 12:35:05 +01:00
Anton Khirnov
45286a625c h264dec: make sure to only end a field if it has been started
Calling ff_h264_field_end() when the per-field state is not properly
initialized leads to all kinds of undefined behaviour.

CC: libav-stable@libav.org
Bug-Id: 977 978 992
2016-12-19 08:15:58 +01:00
Anton Khirnov
c2fa6bb0e8 mpeg12dec: move setting first_field to mpeg_field_start()
For field picture, the first_field is set based on its previous value.
Before this commit, first_field is set when reading the picture
coding extension. However, in corrupted files there may be multiple
picture coding extension headers, so the final value of first_field that
is actually used during decoding can be wrong. That can lead to various
undefined behaviour, like predicting from a non-existing field.

Fix this problem, by setting first_field in mpeg_field_start(), which
should be called exactly once per field.

CC: libav-stable@libav.org
Bug-ID: 999
2016-12-19 08:15:49 +01:00
Anton Khirnov
e807491fc6 mpeg12dec: avoid signed overflow in bitrate calculation
CC: libav-stable@libav.org
Bug-Id: 981
Found-By: Agostino Sarubbo
2016-12-19 08:15:42 +01:00
Anton Khirnov
58405de095 mpegvideo_parser: avoid signed overflow in bitrate calculation
CC: libav-stable@libav.org
Bug-Id: 981
Found-By: Agostino Sarubbo
2016-12-19 08:15:07 +01:00
Anton Khirnov
cfa4eb4fba vaapi_decode: use the correct logging context 2016-12-19 08:13:28 +01:00
Anton Khirnov
ea8b730d8e hevcdec: add a VAAPI hwaccel
Partially based on a patch by Timo Rothenpieler <timo@rothenpieler.org>.
Additional scaling list handling fix by Jun Zhao <mypopydev@gmail.com>.
2016-12-19 08:13:08 +01:00
Anton Khirnov
d4a91e6534 pthread_frame: do not run hwaccel decoding asynchronously unless it's safe
Certain hardware decoding APIs are not guaranteed to be thread-safe, so
having the user access decoded hardware surfaces while the decoder is
running in another thread can cause failures (this is mainly known to
happen with DXVA2).

For such hwaccels, only allow the decoding thread to run while the user
is inside a lavc decode call (avcodec_send_packet/receive_frame).
2016-12-19 08:10:22 +01:00
Anton Khirnov
8dfba25ce8 pthread_frame: ensure the threads don't run simultaneously with hwaccel 2016-12-19 08:09:19 +01:00
Anton Khirnov
373fd76b4d hevcdec: do not set decoder-global SPS prematurely
It should only be set after the decoder state has been fully initialized
for using that SPS.
Fixes possible invalid reads on get_format() failure.

CC: libav-stable@libav.org
2016-12-19 08:07:15 +01:00
Janne Grunau
2425d7329f arm64: replace 'bic' with immediate with 'and' with inverted immediate
The former is not an official pseudo instruction although gas and llvm's
internal assembler support it. Fixes a build error with xcode 6.2
reported by Memphiz on github.
2016-12-14 21:53:05 +01:00
Diego Biurrun
ea7ee4b4e3 ppc: Centralize compiler-specific altivec.h #include handling in one place
Also move #includes into canonical order where appropriate.
2016-12-14 14:08:43 +01:00
Diego Biurrun
39929e55eb ppc: hevcdsp: Use shorthands for vector types
This is more consistent and fixes compilation with clang.
2016-12-14 14:08:43 +01:00
Diego Biurrun
554e55bbf0 decode.h: Add missing headers to fix standalone compilation 2016-12-14 14:08:43 +01:00
Wan-Teh Chang
343e283399 pthread_frame: use better memory orders for frame progress
This improves commit 59c7022740.

In ff_thread_report_progress(), the fast code path can load
progress[field] with the relaxed memory order, and the slow code path
can store progress[field] with the release memory order. These changes
are mainly intended to avoid confusion when one inspects the source code.
They are unlikely to have measurable performance improvement.

ff_thread_report_progress() and ff_thread_await_progress() form a pair.
ff_thread_await_progress() reads progress[field] with the acquire memory
order (in the fast code path). Therefore, one expects to see
ff_thread_report_progress() write progress[field] with the matching
release memory order.

In the fast code path in ff_thread_report_progress(), the atomic load of
progress[field] doesn't need the acquire memory order because the
calling thread is trying to make the data it just decoded visible to the
other threads, rather than trying to read the data decoded by other
threads.

In ff_thread_get_buffer(), initialize progress[0] and progress[1] using
atomic_init().

Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-12-14 11:16:51 +01:00
Derek Buitenhuis
5c7f2cf81d h264_slice: Wait for refs to be available before we use them in error concealment
This could happen when there was a frame number gap and frame threading was used.

Debugging-by: Ronald S. Bultje <rsbultje@gmail.com>
Debugging-by: Justin Ruggles <justin.ruggles@gmail.com>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

CC:libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-12-14 10:38:15 +01:00
Anton Khirnov
86157e6db2 hevc: decouple calling get_format() from exporting the SPS parameters
This makes sure ff_get_format() does not get called unnecessarily from
update_thread_context().
2016-12-14 09:06:45 +01:00
Anton Khirnov
730c023260 binkaudio: switch to the new send/receive API
It is more natural for this codec and allows to avoid awkward constructs
like "consuming 0 bytes from input". Also, keep a reference to the input
packet to avoid unnecessary copying.
2016-12-14 09:06:45 +01:00
Anton Khirnov
fa1749dd34 vp9: split superframes in the filtering stage before actual decoding
Significantly increases the efficiency of frame threading, since
individual frames in a superframe can now be decoded in parallel.
2016-12-14 09:06:45 +01:00
Anton Khirnov
03a80925ef lavc: add a bitstream filter for splitting VP9 superframes
Partially based on code by Ronald S. Bultje <rsbultje@gmail.com>.
2016-12-14 09:06:45 +01:00
Anton Khirnov
8fb4210ad8 qsvdec_h2645: switch to the new generic filtering mechanism
Drop the internal manual conversion from the MP4 format to Annex B.
2016-12-14 09:06:45 +01:00
Anton Khirnov
972c71e9cb lavc: add support for filtering packets before decoding 2016-12-14 09:06:45 +01:00
Anton Khirnov
061a0c14bb decode: restructure the core decoding code
Currently, the new decoding API is pretty much just a wrapper around the
old deprecated one. This is problematic, since it interferes with making
full use of the flexibility added by the new API. The old API should
also be removed at some future point.

Reorganize the code so that the new send_packet/receive_frame functions
call the actual decoding directly and change the old deprecated
avcodec_decode_* functions into wrappers around the new API.

The new internal API for decoders is now changing as well. Before this
commit, it mirrors the public API, so the decoders need to implement
send_packet() and receive_frame() callbacks. This turns out to require
awkward constructs in both the decoders and the generic code. After this
commit, the decoders only implement the receive_frame() callback and
call a new internal function, ff_decode_get_packet() to obtain input
data, in the same manner to how the bitstream filters now work.

avcodec will now always make a reference to the input packet, which means
that non-refcounted input packets will be copied. Keeping the previous
behaviour, where this copy could sometimes be avoided, would make the
code significantly more complex and fragile for only dubious gains,
since packets are typically small and everyone who cares about
performance should use refcounted packets anyway.
2016-12-14 09:06:44 +01:00
Anton Khirnov
549d0bdca5 decode: be more explicit about storing the last packet properties
The current code stores a pointer to the packet passed to the decoder,
which is then used during get_buffer() for timestamps and side data
passthrough. However, since this is a pointer to user data which we do
not own, storing it is potentially dangerous. It is also ill defined for
the new decoding API with split input/output.

Fix this problem by making an explicit internally owned copy of the
packet properties.
2016-12-14 09:06:44 +01:00
Anton Khirnov
47e547b321 lavc: add a null bitstream filter
It is useful for testing/debugging and will also be used as the default
filter in the following commit adding pre-decode filtering to avoid
having a separate non-filtered codepath.
2016-12-14 09:06:44 +01:00
Anton Khirnov
0309ddcfb2 lavc: handle MP3 in get_audio_frame_duration() 2016-12-14 09:06:44 +01:00
Diego Biurrun
6aa4ba7131 dxva2: Keep code shared between dxva2 and d3d11va under the correct #if
This partially reverts commit ac648bb835.
2016-12-12 13:44:25 +01:00
Alexandra Hajkova
b0e6b3f477 hevc: ppc: Add HEVC 4x4 IDCT for PowerPC
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-12-12 09:25:16 +01:00
Diego Biurrun
a6901b9c6b Drop libxvid rate control support for mpegvideo encoding
The feature has outlived is usefulness and complicates the code.
2016-12-11 09:27:40 +01:00
Diego Biurrun
ac648bb835 dxva2: Simplify some ifdefs 2016-12-11 09:27:40 +01:00
Mark Thompson
7d81698b89 vaapi_h265: Fix CFR mode with framerate set in AVCodecContext
Same issue as 17a0f9481c.
2016-12-10 16:55:44 +00:00
Diego Biurrun
932cc6496e vdpau: Do not #include vdpau_x11.h from the main vdpau header
That header should only be included in the special bits that use X11 code.
2016-12-09 08:41:53 +01:00
Diego Biurrun
92e6b31c3b dxva2: Adjust multiple inclusion guard names to follow convention 2016-12-09 08:41:52 +01:00
Andreas Cadhalpun
fc85646ad4 libopusdec: fix out-of-bounds read
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-12-08 15:53:58 -05:00
Andreas Cadhalpun
dc2ad09493 libschroedingerdec: fix leaking of framewithpts
Also preserve the return value from ff_get_buffer().

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-08 15:53:58 -05:00
Andreas Cadhalpun
8c3a643808 libschroedingerdec: don't produce empty frames
They are not valid and can cause problems/crashes for API users.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-12-08 15:53:58 -05:00
Timothy Gu
d3da8a0035 omx: Fix allocation check
Also use av_mallocz_array().

Bug-Id: CID 1396839
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-08 15:53:58 -05:00
Timothy Gu
d32bdadda8 qsvdec: Fix memory leak on error
Bug-Id: CID 1396851
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-08 15:53:58 -05:00
Diego Biurrun
d5759701a8 libkvazaar: Add missing header #includes
This fixes compilation after the next version bump.
2016-12-08 21:34:30 +01:00
Diego Biurrun
fbec58daa2 build: Add an internal component for hevc_ps code
This allows expressing dependencies in a more correct way.
2016-12-08 20:12:23 +01:00
Vittorio Giovara
2fb6acd9c2 lavc: Add spherical packet side data API
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-07 14:34:34 -05:00
Diego Biurrun
624aa8ab22 build: Add missing Makefile entries and ifdefs for QSV hwaccels 2016-12-07 15:46:57 +01:00
Diego Biurrun
e1dc5358af build: Create a component for MPEG audio header decoding
Fixes standalone compilation of the libmp3lame encoder.
2016-12-05 16:13:05 +01:00
Diego Biurrun
0fdc9f81a0 build: Add missing hevc_ps dependency for QSV HEVC encoder 2016-12-05 16:13:04 +01:00
Alexandra Hájková
6c916192f3 mimic: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
cdc6727c3e metasound: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
6fad5abcad lagarith: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
c3defda0d8 indeo: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
f5b7bd2a7c imc: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
39ecf0588f webp: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
James Almer
33a2b73b98 mpeg4audio: correctly propagate meaningful error values
Signed-off-by: James Almer <jamrial@gmail.com>
2016-12-02 12:16:30 -05:00
Wan-Teh Chang
d82d5379ca mmaldec: initialize refcount using atomic_init()
This is how we initialize refcount in libavutil/buffer.c.

Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-02 12:16:26 -05:00
Vittorio Giovara
5168026a05 options_table: Do not rely on enum size as option bound
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-02 11:36:46 -05:00
Vittorio Giovara
ff9db5cfd1 lavc: Use a stricter check for the color properties values
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-02 11:36:42 -05:00
Diego Biurrun
0a35f128f3 cabac: x86: Give optimizations header a more meaningful name 2016-12-01 08:23:54 +01:00
Martin Storsjö
cad42fadcd aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

vp9_inv_dct_dct_16x16_sub16_add_neon:   1373.2
vp9_inv_dct_dct_32x32_sub32_add_neon:   8089.0

By skipping individual 8x16 or 8x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     235.3
vp9_inv_dct_dct_16x16_sub2_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub8_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   1372.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     555.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5190.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub8_add_neon:    5183.1
vp9_inv_dct_dct_32x32_sub12_add_neon:   6161.5
vp9_inv_dct_dct_32x32_sub16_add_neon:   6155.5
vp9_inv_dct_dct_32x32_sub20_add_neon:   7136.3
vp9_inv_dct_dct_32x32_sub24_add_neon:   7128.4
vp9_inv_dct_dct_32x32_sub28_add_neon:   8098.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8098.8

I.e. in general a very minor overhead for the full subpartition case due
to the additional cmps, but a significant speedup for the cases when we
only need to process a small part of the actual input data.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:57:05 +02:00
Martin Storsjö
9c8bc74c2b arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:54:07 +02:00
Martin Storsjö
3c87039a40 arm: vp9itxfm: Only reload the idct coeffs for the iadst_idct combination
This avoids reloading them if they haven't been clobbered, if the
first pass also was idct.

This is similar to what was done in the aarch64 version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:53:52 +02:00
Clément Bœsch
c4c5f5386c vp9dsp: add DC only versions for idct/idct.
before:

time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null -
real    0m11.125s
user    0m11.059s
sys     0m0.050s

time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null -
real    0m10.944s
user    0m10.819s
sys     0m0.064s

after:

time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null -
real    0m8.153s
user    0m8.034s
sys     0m0.050s

time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null -
real    0m8.038s
user    0m7.980s
sys     0m0.039s

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:48:28 +02:00
Diego Biurrun
e4382a4ab4 hevc: Eliminate pointless variable indirection 2016-11-30 14:11:44 +01:00
Diego Biurrun
5c89022542 hevc: Drop pointless av_unused attribute 2016-11-30 14:11:43 +01:00
Diego Biurrun
0983f9117f metasound: Drop unused tables 2016-11-30 13:44:05 +01:00
Diego Biurrun
212c6a1d70 mjpegdec: Check return values of functions that may fail 2016-11-29 13:13:35 +01:00
Diego Biurrun
3ee5f25d37 dxva2: Adjust printf length modifiers where appropriate 2016-11-29 13:13:35 +01:00
Anton Khirnov
3fe2a01df7 lavc: move decoding-related code from utils.c to a new file 2016-11-29 10:39:20 +01:00
Anton Khirnov
328cd2b599 lavc: move encoding-related code from utils.c to a new file 2016-11-29 10:39:20 +01:00