1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

21503 Commits

Author SHA1 Message Date
Mark Thompson
89725a8512 vaapi_h264: Scale log2_max_pic_order_cnt_lsb with max_b_frames
Before this change, it was possible to overflow pic_order_cnt_lsb and
generate a stream with invalid POC numbering.  This makes sure that
the field is large enough that a single IDR B* P sequence uses fewer
than half the available POC lsb values.
2017-01-11 23:03:58 +00:00
Mark Thompson
a3c3a5eac2 vaapi_encode: Support forcing IDR frames via AVFrame.pict_type 2017-01-11 23:03:58 +00:00
Mark Thompson
37fab0661a vaapi_encode: Fix GOP sizing
This change makes the configured GOP size be respected exactly -
previously the value could be exceeded slightly due to flaws in the
frame type selection logic.
2017-01-11 23:03:58 +00:00
Alexandra Hájková
bd6496fa07 interplayvideo: Convert to the new bitstream reader 2017-01-09 15:21:47 +01:00
Alexandra Hájková
4e25051031 adx: Convert to the new bitstream reader 2017-01-09 15:21:47 +01:00
Alexandra Hájková
9aec009f65 dvbsubdec: Convert to the new bitstream reader 2017-01-09 15:21:47 +01:00
Alexandra Hájková
d7fe11634c motionpixels: Convert to the new bitstream reader 2017-01-09 15:18:16 +01:00
Anton Khirnov
f1af37b510 h264dec: make ff_h264_decode_init() static
It is not called from outside h264dec.c anymore.
2017-01-09 13:21:13 +01:00
Anton Khirnov
e7de05f98f h264dec: drop a redundant check
Cropping parameters are already checked for validity during SPS parsing,
no need to check them again.
2017-01-09 13:21:13 +01:00
Steve Lhomme
2835e9a9fd hevcdec: add P010 support for D3D11VA
Given it's the same API than DVXA2 I don't know why the same output was not
enabled for both.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-01-09 10:48:54 +01:00
Steve Lhomme
0ac2d86c47 dxva2: Factorize DXVA context validity test into a single macro
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-01-08 16:41:24 +01:00
Steve Lhomme
f8a42d4f26 dxva2: Make ff_dxva2_get_surface() static and drop its name prefix
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-01-08 16:41:07 +01:00
Jun Zhao
9b1db2d338 vaapi_h264: Fix POC on IDR frames
In H.264 section 8.2.1, we have that "The bitstream shall not contain
data that result in Min(TopFieldOrderCnt, BottomFieldOrderCnt) not
equal to 0 for a coded IDR frame".  This fixes the encoder to always
conform to this - previously the POC values formed an unbroken
sequence, not resetting to zero on IDR frames.

Signed-off-by: Mark Thompson <sw@jkqxz.net>
2017-01-04 21:52:06 +00:00
Mark Thompson
d08e02d929 vaapi_h265: Fix build failure with old libva without 10-bit surfaces
10-bit surface support was added in libva 1.6.2, earlier versions
support H.265 encoding in 8-bit only.
2017-01-04 21:49:41 +00:00
Martin Storsjö
85ad5ea72c aarch64: vp9mc: Fix a comment to refer to a register with the right name
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-03 14:16:10 +02:00
Martin Storsjö
65074791e8 aarch64: vp9dsp: Fix vertical alignment in the init file
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-03 14:15:58 +02:00
Martin Storsjö
c536e5e869 arm: vp9mc: Fix vertical alignment of operands
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-03 14:15:45 +02:00
Diego Biurrun
53618054b6 parser: Add missing #include for printing ISO C99 conversion specifiers 2016-12-25 13:22:50 +01:00
Diego Biurrun
0b77a59336 Use correct printf conversion specifiers for POSIX integer types 2016-12-23 19:30:00 +01:00
Diego Biurrun
92db508307 build: Generate pkg-config files from Make and not from configure
This moves work from the configure to the Make stage where it can
be parallelized and ensures that pkgconfig files are updated when
library versions change.

Bug-Id: 449
2016-12-22 12:30:54 +01:00
Diego Biurrun
f9edc734e0 ratecontrol: Drop xvid-rc-related struct members unused after a6901b9c6 2016-12-21 11:13:20 +01:00
Ruta Gadkari
5b26d3b789 nvenc: Update check for lookahead
By default it is -1.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-12-21 06:16:52 +01:00
Martin Storsjö
a0c443a398 aarch64: vp9itxfm: Use the offset parameter to movrel
This fixes build failures for iOS, broken since cad42fadcd.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-12-19 22:49:51 +02:00
Alexandra Hájková
fc322d6a70 tta: Convert to the new bitstream reader 2016-12-19 13:52:36 +01:00
Alexandra Hájková
00c72a1e01 mlp: Convert to the new bitstream reader 2016-12-19 13:22:29 +01:00
Alexandra Hájková
fa64aea12e unary: Convert to the new bitstream reader 2016-12-19 12:35:05 +01:00
Anton Khirnov
45286a625c h264dec: make sure to only end a field if it has been started
Calling ff_h264_field_end() when the per-field state is not properly
initialized leads to all kinds of undefined behaviour.

CC: libav-stable@libav.org
Bug-Id: 977 978 992
2016-12-19 08:15:58 +01:00
Anton Khirnov
c2fa6bb0e8 mpeg12dec: move setting first_field to mpeg_field_start()
For field picture, the first_field is set based on its previous value.
Before this commit, first_field is set when reading the picture
coding extension. However, in corrupted files there may be multiple
picture coding extension headers, so the final value of first_field that
is actually used during decoding can be wrong. That can lead to various
undefined behaviour, like predicting from a non-existing field.

Fix this problem, by setting first_field in mpeg_field_start(), which
should be called exactly once per field.

CC: libav-stable@libav.org
Bug-ID: 999
2016-12-19 08:15:49 +01:00
Anton Khirnov
e807491fc6 mpeg12dec: avoid signed overflow in bitrate calculation
CC: libav-stable@libav.org
Bug-Id: 981
Found-By: Agostino Sarubbo
2016-12-19 08:15:42 +01:00
Anton Khirnov
58405de095 mpegvideo_parser: avoid signed overflow in bitrate calculation
CC: libav-stable@libav.org
Bug-Id: 981
Found-By: Agostino Sarubbo
2016-12-19 08:15:07 +01:00
Anton Khirnov
cfa4eb4fba vaapi_decode: use the correct logging context 2016-12-19 08:13:28 +01:00
Anton Khirnov
ea8b730d8e hevcdec: add a VAAPI hwaccel
Partially based on a patch by Timo Rothenpieler <timo@rothenpieler.org>.
Additional scaling list handling fix by Jun Zhao <mypopydev@gmail.com>.
2016-12-19 08:13:08 +01:00
Anton Khirnov
d4a91e6534 pthread_frame: do not run hwaccel decoding asynchronously unless it's safe
Certain hardware decoding APIs are not guaranteed to be thread-safe, so
having the user access decoded hardware surfaces while the decoder is
running in another thread can cause failures (this is mainly known to
happen with DXVA2).

For such hwaccels, only allow the decoding thread to run while the user
is inside a lavc decode call (avcodec_send_packet/receive_frame).
2016-12-19 08:10:22 +01:00
Anton Khirnov
8dfba25ce8 pthread_frame: ensure the threads don't run simultaneously with hwaccel 2016-12-19 08:09:19 +01:00
Anton Khirnov
373fd76b4d hevcdec: do not set decoder-global SPS prematurely
It should only be set after the decoder state has been fully initialized
for using that SPS.
Fixes possible invalid reads on get_format() failure.

CC: libav-stable@libav.org
2016-12-19 08:07:15 +01:00
Janne Grunau
2425d7329f arm64: replace 'bic' with immediate with 'and' with inverted immediate
The former is not an official pseudo instruction although gas and llvm's
internal assembler support it. Fixes a build error with xcode 6.2
reported by Memphiz on github.
2016-12-14 21:53:05 +01:00
Diego Biurrun
ea7ee4b4e3 ppc: Centralize compiler-specific altivec.h #include handling in one place
Also move #includes into canonical order where appropriate.
2016-12-14 14:08:43 +01:00
Diego Biurrun
39929e55eb ppc: hevcdsp: Use shorthands for vector types
This is more consistent and fixes compilation with clang.
2016-12-14 14:08:43 +01:00
Diego Biurrun
554e55bbf0 decode.h: Add missing headers to fix standalone compilation 2016-12-14 14:08:43 +01:00
Wan-Teh Chang
343e283399 pthread_frame: use better memory orders for frame progress
This improves commit 59c7022740.

In ff_thread_report_progress(), the fast code path can load
progress[field] with the relaxed memory order, and the slow code path
can store progress[field] with the release memory order. These changes
are mainly intended to avoid confusion when one inspects the source code.
They are unlikely to have measurable performance improvement.

ff_thread_report_progress() and ff_thread_await_progress() form a pair.
ff_thread_await_progress() reads progress[field] with the acquire memory
order (in the fast code path). Therefore, one expects to see
ff_thread_report_progress() write progress[field] with the matching
release memory order.

In the fast code path in ff_thread_report_progress(), the atomic load of
progress[field] doesn't need the acquire memory order because the
calling thread is trying to make the data it just decoded visible to the
other threads, rather than trying to read the data decoded by other
threads.

In ff_thread_get_buffer(), initialize progress[0] and progress[1] using
atomic_init().

Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-12-14 11:16:51 +01:00
Derek Buitenhuis
5c7f2cf81d h264_slice: Wait for refs to be available before we use them in error concealment
This could happen when there was a frame number gap and frame threading was used.

Debugging-by: Ronald S. Bultje <rsbultje@gmail.com>
Debugging-by: Justin Ruggles <justin.ruggles@gmail.com>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>

CC:libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-12-14 10:38:15 +01:00
Anton Khirnov
86157e6db2 hevc: decouple calling get_format() from exporting the SPS parameters
This makes sure ff_get_format() does not get called unnecessarily from
update_thread_context().
2016-12-14 09:06:45 +01:00
Anton Khirnov
730c023260 binkaudio: switch to the new send/receive API
It is more natural for this codec and allows to avoid awkward constructs
like "consuming 0 bytes from input". Also, keep a reference to the input
packet to avoid unnecessary copying.
2016-12-14 09:06:45 +01:00
Anton Khirnov
fa1749dd34 vp9: split superframes in the filtering stage before actual decoding
Significantly increases the efficiency of frame threading, since
individual frames in a superframe can now be decoded in parallel.
2016-12-14 09:06:45 +01:00
Anton Khirnov
03a80925ef lavc: add a bitstream filter for splitting VP9 superframes
Partially based on code by Ronald S. Bultje <rsbultje@gmail.com>.
2016-12-14 09:06:45 +01:00
Anton Khirnov
8fb4210ad8 qsvdec_h2645: switch to the new generic filtering mechanism
Drop the internal manual conversion from the MP4 format to Annex B.
2016-12-14 09:06:45 +01:00
Anton Khirnov
972c71e9cb lavc: add support for filtering packets before decoding 2016-12-14 09:06:45 +01:00
Anton Khirnov
061a0c14bb decode: restructure the core decoding code
Currently, the new decoding API is pretty much just a wrapper around the
old deprecated one. This is problematic, since it interferes with making
full use of the flexibility added by the new API. The old API should
also be removed at some future point.

Reorganize the code so that the new send_packet/receive_frame functions
call the actual decoding directly and change the old deprecated
avcodec_decode_* functions into wrappers around the new API.

The new internal API for decoders is now changing as well. Before this
commit, it mirrors the public API, so the decoders need to implement
send_packet() and receive_frame() callbacks. This turns out to require
awkward constructs in both the decoders and the generic code. After this
commit, the decoders only implement the receive_frame() callback and
call a new internal function, ff_decode_get_packet() to obtain input
data, in the same manner to how the bitstream filters now work.

avcodec will now always make a reference to the input packet, which means
that non-refcounted input packets will be copied. Keeping the previous
behaviour, where this copy could sometimes be avoided, would make the
code significantly more complex and fragile for only dubious gains,
since packets are typically small and everyone who cares about
performance should use refcounted packets anyway.
2016-12-14 09:06:44 +01:00
Anton Khirnov
549d0bdca5 decode: be more explicit about storing the last packet properties
The current code stores a pointer to the packet passed to the decoder,
which is then used during get_buffer() for timestamps and side data
passthrough. However, since this is a pointer to user data which we do
not own, storing it is potentially dangerous. It is also ill defined for
the new decoding API with split input/output.

Fix this problem by making an explicit internally owned copy of the
packet properties.
2016-12-14 09:06:44 +01:00
Anton Khirnov
47e547b321 lavc: add a null bitstream filter
It is useful for testing/debugging and will also be used as the default
filter in the following commit adding pre-decode filtering to avoid
having a separate non-filtered codepath.
2016-12-14 09:06:44 +01:00
Anton Khirnov
0309ddcfb2 lavc: handle MP3 in get_audio_frame_duration() 2016-12-14 09:06:44 +01:00
Diego Biurrun
6aa4ba7131 dxva2: Keep code shared between dxva2 and d3d11va under the correct #if
This partially reverts commit ac648bb835.
2016-12-12 13:44:25 +01:00
Alexandra Hajkova
b0e6b3f477 hevc: ppc: Add HEVC 4x4 IDCT for PowerPC
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-12-12 09:25:16 +01:00
Diego Biurrun
a6901b9c6b Drop libxvid rate control support for mpegvideo encoding
The feature has outlived is usefulness and complicates the code.
2016-12-11 09:27:40 +01:00
Diego Biurrun
ac648bb835 dxva2: Simplify some ifdefs 2016-12-11 09:27:40 +01:00
Mark Thompson
7d81698b89 vaapi_h265: Fix CFR mode with framerate set in AVCodecContext
Same issue as 17a0f9481c.
2016-12-10 16:55:44 +00:00
Diego Biurrun
932cc6496e vdpau: Do not #include vdpau_x11.h from the main vdpau header
That header should only be included in the special bits that use X11 code.
2016-12-09 08:41:53 +01:00
Diego Biurrun
92e6b31c3b dxva2: Adjust multiple inclusion guard names to follow convention 2016-12-09 08:41:52 +01:00
Andreas Cadhalpun
fc85646ad4 libopusdec: fix out-of-bounds read
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-12-08 15:53:58 -05:00
Andreas Cadhalpun
dc2ad09493 libschroedingerdec: fix leaking of framewithpts
Also preserve the return value from ff_get_buffer().

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-08 15:53:58 -05:00
Andreas Cadhalpun
8c3a643808 libschroedingerdec: don't produce empty frames
They are not valid and can cause problems/crashes for API users.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-12-08 15:53:58 -05:00
Timothy Gu
d3da8a0035 omx: Fix allocation check
Also use av_mallocz_array().

Bug-Id: CID 1396839
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-08 15:53:58 -05:00
Timothy Gu
d32bdadda8 qsvdec: Fix memory leak on error
Bug-Id: CID 1396851
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-08 15:53:58 -05:00
Diego Biurrun
d5759701a8 libkvazaar: Add missing header #includes
This fixes compilation after the next version bump.
2016-12-08 21:34:30 +01:00
Diego Biurrun
fbec58daa2 build: Add an internal component for hevc_ps code
This allows expressing dependencies in a more correct way.
2016-12-08 20:12:23 +01:00
Vittorio Giovara
2fb6acd9c2 lavc: Add spherical packet side data API
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-07 14:34:34 -05:00
Diego Biurrun
624aa8ab22 build: Add missing Makefile entries and ifdefs for QSV hwaccels 2016-12-07 15:46:57 +01:00
Diego Biurrun
e1dc5358af build: Create a component for MPEG audio header decoding
Fixes standalone compilation of the libmp3lame encoder.
2016-12-05 16:13:05 +01:00
Diego Biurrun
0fdc9f81a0 build: Add missing hevc_ps dependency for QSV HEVC encoder 2016-12-05 16:13:04 +01:00
Alexandra Hájková
6c916192f3 mimic: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
cdc6727c3e metasound: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
6fad5abcad lagarith: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
c3defda0d8 indeo: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
f5b7bd2a7c imc: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
Alexandra Hájková
39ecf0588f webp: Convert to the new bitstream reader 2016-12-03 14:36:03 +01:00
James Almer
33a2b73b98 mpeg4audio: correctly propagate meaningful error values
Signed-off-by: James Almer <jamrial@gmail.com>
2016-12-02 12:16:30 -05:00
Wan-Teh Chang
d82d5379ca mmaldec: initialize refcount using atomic_init()
This is how we initialize refcount in libavutil/buffer.c.

Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-02 12:16:26 -05:00
Vittorio Giovara
5168026a05 options_table: Do not rely on enum size as option bound
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-02 11:36:46 -05:00
Vittorio Giovara
ff9db5cfd1 lavc: Use a stricter check for the color properties values
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-12-02 11:36:42 -05:00
Diego Biurrun
0a35f128f3 cabac: x86: Give optimizations header a more meaningful name 2016-12-01 08:23:54 +01:00
Martin Storsjö
cad42fadcd aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

vp9_inv_dct_dct_16x16_sub16_add_neon:   1373.2
vp9_inv_dct_dct_32x32_sub32_add_neon:   8089.0

By skipping individual 8x16 or 8x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     235.3
vp9_inv_dct_dct_16x16_sub2_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub8_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   1372.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     555.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5190.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub8_add_neon:    5183.1
vp9_inv_dct_dct_32x32_sub12_add_neon:   6161.5
vp9_inv_dct_dct_32x32_sub16_add_neon:   6155.5
vp9_inv_dct_dct_32x32_sub20_add_neon:   7136.3
vp9_inv_dct_dct_32x32_sub24_add_neon:   7128.4
vp9_inv_dct_dct_32x32_sub28_add_neon:   8098.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8098.8

I.e. in general a very minor overhead for the full subpartition case due
to the additional cmps, but a significant speedup for the cases when we
only need to process a small part of the actual input data.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:57:05 +02:00
Martin Storsjö
9c8bc74c2b arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:54:07 +02:00
Martin Storsjö
3c87039a40 arm: vp9itxfm: Only reload the idct coeffs for the iadst_idct combination
This avoids reloading them if they haven't been clobbered, if the
first pass also was idct.

This is similar to what was done in the aarch64 version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:53:52 +02:00
Clément Bœsch
c4c5f5386c vp9dsp: add DC only versions for idct/idct.
before:

time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null -
real    0m11.125s
user    0m11.059s
sys     0m0.050s

time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null -
real    0m10.944s
user    0m10.819s
sys     0m0.064s

after:

time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null -
real    0m8.153s
user    0m8.034s
sys     0m0.050s

time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null -
real    0m8.038s
user    0m7.980s
sys     0m0.039s

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:48:28 +02:00
Diego Biurrun
e4382a4ab4 hevc: Eliminate pointless variable indirection 2016-11-30 14:11:44 +01:00
Diego Biurrun
5c89022542 hevc: Drop pointless av_unused attribute 2016-11-30 14:11:43 +01:00
Diego Biurrun
0983f9117f metasound: Drop unused tables 2016-11-30 13:44:05 +01:00
Diego Biurrun
212c6a1d70 mjpegdec: Check return values of functions that may fail 2016-11-29 13:13:35 +01:00
Diego Biurrun
3ee5f25d37 dxva2: Adjust printf length modifiers where appropriate 2016-11-29 13:13:35 +01:00
Anton Khirnov
3fe2a01df7 lavc: move decoding-related code from utils.c to a new file 2016-11-29 10:39:20 +01:00
Anton Khirnov
328cd2b599 lavc: move encoding-related code from utils.c to a new file 2016-11-29 10:39:20 +01:00
James Almer
45d199d5b0 aac_adtstoasc_bsf: validate and forward extradata if the stream is already ASC
Fixes AAC AudioSpecificConfig passthrough.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-29 10:39:20 +01:00
Andreas Cadhalpun
1762a39e09 mss2: only use error correction for matching block counts
This fixes a heap-buffer-overflow in ff_er_frame_end when decoding mss2
with coded_width/coded_height larger than width/height.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-11-29 10:38:01 +01:00
Diego Biurrun
eb135516e6 ac3enc: Avoid unnecessary macro indirections 2016-11-28 17:19:30 +01:00
Diego Biurrun
f0d3e43bd7 ac3enc: Reshuffle functions to avoid forward declarations 2016-11-28 17:19:30 +01:00
Diego Biurrun
e22c63ac74 ac3enc: Reshuffle some float/fixed-mode ifdefs to avoid a dummy function 2016-11-28 17:19:30 +01:00
Anton Khirnov
4adbb44ad1 tta: avoid undefined shifts
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-11-25 21:42:33 +01:00
Anton Khirnov
dc4b625028 tta: use get_unary() instead of a custom implementation
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-11-25 21:42:33 +01:00
Martin Storsjö
2f99117f6f aarch64: vp9itxfm: Don't repeatedly set x9 when nothing overwrites it
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-24 13:39:21 +02:00
Alexandra Hájková
178b4ea5f9 xsubdec: Convert to the new bitstream reader 2016-11-24 11:22:13 +01:00
Alexandra Hájková
be35ef92a4 xan: Convert to the new bitstream reader 2016-11-24 11:22:13 +01:00
Alexandra Hájková
f9c59f26c8 wnv1: Convert to the new bitstream reader 2016-11-24 11:22:13 +01:00
Alexandra Hájková
0536e7d782 vima: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
e5bdfc6790 vble: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
104a4289f9 utvideodec: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
85f760fedd twinvq: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
0bea79afa6 tscc2: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
8e4cadea5d truespeech: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
0ac07d0b8d tiertex: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
9ab1a3e283 truemotion2: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
9f78e3a46d svq1dec: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
6efbc88a5c smacker: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Alexandra Hájková
087bc8d704 sipr: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Alexandra Hájková
f26cbb555b rtjpeg: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Alexandra Hájková
c60cda7cb4 ra288: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Alexandra Hájková
7d8075cf47 ra144: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Martin Storsjö
79566ec8c7 arm: vp9itxfm: Rename a macro parameter to fit better
Since the same parameter is used for both input and output,
the name inout is more fitting.

This matches the naming used below in the dmbutterfly macro.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-23 23:56:56 +02:00
Martin Storsjö
721bc37522 arm/aarch64: vp9itxfm: Fix indentation of macro arguments
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-23 23:56:16 +02:00
James Almer
aa498c3183 avpacket: fix leak on realloc in av_packet_add_side_data()
If realloc fails, the pointer is overwritten and the previously allocated buffer
is leaked, which goes against the expected functionality of keeping the packet
unchanged in case of error.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-23 13:17:52 +01:00
Andreas Cadhalpun
f92d7bdfdd libopusdec: default to stereo for invalid number of channels
This fixes an out-of-bounds read if avc->channels is 0.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-23 13:03:15 +01:00
Diego Biurrun
b34c6cd57a dvbsub: cosmetics: Group all debug code together 2016-11-23 07:40:46 +01:00
Diego Biurrun
b8cd7a3c8d dvbsub: Check for errors from system()
libavcodec/dvbsubdec.c:145:5: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result]
libavcodec/dvbsubdec.c:148:5: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result]
2016-11-23 07:36:32 +01:00
Diego Biurrun
6427379f23 als: Restructure DEBUG ifdefs to avoid unused function parameter warnings 2016-11-22 17:28:17 +01:00
Diego Biurrun
367f95af55 ac3enc: Restructure DEBUG ifdefs to avoid unused function parameter warnings 2016-11-22 17:28:17 +01:00
Diego Biurrun
81a3c42abe Drop some bogus Doxygen documentation. 2016-11-21 14:29:11 +01:00
Diego Biurrun
a1d9de304f Fix some mismatches between function parameter and doxygen parameter names. 2016-11-21 14:29:10 +01:00
Martin Storsjö
4d960a1185 aarch64: vp9itxfm: Use w3 instead of x3 for the int eob parameter
The clobbering tests in checkasm are only invoked when testing
correctness, so this bug didn't show up when benchmarking the
dc-only version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-18 23:17:33 +02:00
Janne Grunau
e5b0fc170f arm: vp9itxfm: Simplify the stack alignment code
This is one instruction less for thumb, and only have got
1/2 arm/thumb specific instructions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-18 23:17:26 +02:00
Alexandra Hájková
0b5a26e8bc qdm2: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:22 +01:00
Alexandra Hájková
0dabd329e8 qcelp: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:18 +01:00
Alexandra Hájková
770406d1e8 pcx: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:14 +01:00
Alexandra Hájková
b3441350fa opus: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:11 +01:00
Alexandra Hájková
6f94a64bd6 nellymoser: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:08 +01:00
Alexandra Hájková
15d4dbfd4a jvdec: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:04 +01:00
Alexandra Hájková
1df549bfa2 hqx: Convert to the new bitstream header
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:43 +01:00
Alexandra Hájková
c5e01d9170 hq_hqa: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:39 +01:00
Alexandra Hájková
b2c56301f9 gsm: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:36 +01:00
Alexandra Hájková
2188d53906 g72x: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:33 +01:00
Alexandra Hájková
799703c3ea g2meet: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:26 +01:00
Alexandra Hájková
b37b681f77 fraps: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:14 +01:00
Alexandra Hájková
692ba4fe64 flashsv: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:10 +01:00
Alexandra Hájková
418ccdd703 faxcompr: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:07 +01:00
Alexandra Hájková
8df1ac6b78 exr: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:04 +01:00
Alexandra Hájková
2906d8dcb3 escape130: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:01 +01:00
Alexandra Hájková
c43eb73172 escape124: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:57 +01:00
Alexandra Hájková
d8618570be dvdsubdec: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:53 +01:00
Alexandra Hájková
928f8c7ce3 dss_sp: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:38 +01:00
Alexandra Hájková
942e84d2a3 cook: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:32 +01:00
Alexandra Hájková
e561146611 cljrdec: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:29 +01:00
Alexandra Hájková
b4c0daa83c cdxl: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:24 +01:00
Alexandra Hájková
0977a7c2f6 binkaudio: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:15 +01:00
Alexandra Hájková
9a23b59943 bink: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:10 +01:00
Alexandra Hájková
dae9b0b9c6 avs: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:04 +01:00
Alexandra Hájková
edd4c19a78 atrac3plus: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:33:59 +01:00
Alexandra Hájková
0272119202 atrac: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:33:50 +01:00
Alexandra Hájková
41679be1a2 asvdec: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:33:45 +01:00
Alexandra Hájková
012c451153 adpcm: Convert to the new bitstream header
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:33:01 +01:00
Alexandra Hájková
ed006ae4e2 4xm: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:32:57 +01:00
Alexandra Hájková
b25180801b on2avc: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:32:54 +01:00
Alexandra Hájková
7d957b3f47 ea: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:32:45 +01:00
Alexandra Hájková
adb1ebb36c eamad: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:32:40 +01:00
Alexandra Hájková
d182d8a6d3 cllc: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:31:59 +01:00
Alexandra Hájková
dd3d7ddf2a lavc: add a new bitstream reader to replace get_bits
The new bit reader features a simpler API and an implementation without
stacks of nested macros.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:31:56 +01:00
Luca Barbato
adb0e941c3 avpacket: Mark src pointer as constant 2016-11-17 19:41:12 +01:00
Diego Biurrun
76167140a9 qsvdec: Drop stray extra braces around initializer
libavcodec/qsvdec.c:93:5: warning: braces around scalar initializer
2016-11-17 16:53:48 +01:00
Diego Biurrun
715b824346 qsv: Drop some unused variables 2016-11-17 16:53:48 +01:00
Janne Grunau
e7ae8f7a71 aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne};
The latter is 1 cycle faster on a cortex-53 and since the operands are
bytewise (or larger) bitmask (impossible to overflow to zero) both are
equivalent.
2016-11-16 09:05:18 +01:00
Janne Grunau
d7595de0b2 aarch64: vp9: use alternative returns in the core loop filter function
Since aarch64 has enough free general purpose registers use them to
branch to the appropiate storage code. 1-2 cycles faster for the
functions using loop_filter 8/16, ... on a cortex-a53. Mixed results
(up to 2 cycles faster/slower) on a cortex-a57.
2016-11-16 09:05:18 +01:00
Gianluigi Tiesi
e17567a831 libilbc: support for latest git of libilbc
In the latest git commits of libilbc developers removed WebRtc_xxx typedefs.
This commit uses int types instead. It's safe to apply also for previous
versions since WebRtc_Word16 was always a typedef of int16_t and
WebRtc_UWord16 a typedef of uint16_t.

Reviewed-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-11-16 08:21:05 +01:00
Diego Biurrun
f7407f56cb golomb: Replace __PRETTY_FUNCTION__ with __func__ for tracing
The former is a GNU extension while the latter is C99.
2016-11-15 09:41:08 +01:00
Mark Thompson
e0b164576f qsv: Add VP8 decoder 2016-11-14 19:38:20 +00:00
Mark Thompson
182cf170a5 vp8: Return stream format information from parser 2016-11-14 19:38:19 +00:00
Mark Thompson
b6582b2927 qsv: Add VC-1 decoder
It uses the same code as the MPEG-2 decoder, so the file is renamed
to contain all "other" (that is, not H.26[45]) codecs.
2016-11-14 19:38:19 +00:00
Mark Thompson
fea4dc05b4 vc1: Return stream format information from parser 2016-11-14 19:38:19 +00:00
Mark Thompson
0940b748bd qsvdec: Only warn about unconsumed data if it happens more than once 2016-11-14 19:38:19 +00:00
Mark Thompson
030d84fa2e qsvdec: Pass field order information to libmfx
The VC-1 decoder fails to initialise if this is not set.
2016-11-14 19:38:19 +00:00
Mark Thompson
cd1047f391 qsvdec: Pass the correct profile to libmfx
This was correct for H.26[45], because libmfx uses the same values
derived from profile_idc and the constraint_set flags, but it is
wrong for other codecs.

Also avoid passing FF_LEVEL_UNKNOWN (-99) as the level, as this is
certainly invalid.
2016-11-14 19:38:19 +00:00
Mark Thompson
3297577f3e mpegvideo: Return correct coded frame sizes from parser 2016-11-14 19:38:19 +00:00
Janne Grunau
31756abe29 aarch64: vp9: loop_filter: fix typo in skip flatout8 check
The 16_16 loop filter functions could miss an early exit before
flatout8.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-14 08:51:58 +02:00
Martin Storsjö
9d2afd1eb8 aarch64: vp9: Implement NEON loop filters
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the loop filters with
16 pixels at a time. The implementation is fully templated, with
a single macro which can generate versions for both 8 and
16 pixels wide, for both 4, 8 and 16 pixels loop filters
(and the 4/8 mixed versions as well).

For the 8 pixel wide versions, it is pretty close in speed (the
v_4_8 and v_8_8 filters are the best examples of this; the h_4_8
and h_8_8 filters seem to get some gain in the load/transpose/store
part). For the 16 pixels wide ones, we get a speedup of around
1.2-1.4x compared to the 32 bit version.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM AArch64
vp9_loop_filter_h_4_8_neon:          144.0   127.2
vp9_loop_filter_h_8_8_neon:          207.0   182.5
vp9_loop_filter_h_16_8_neon:         415.0   328.7
vp9_loop_filter_h_16_16_neon:        672.0   558.6
vp9_loop_filter_mix2_h_44_16_neon:   302.0   203.5
vp9_loop_filter_mix2_h_48_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_84_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_88_16_neon:   376.0   305.2
vp9_loop_filter_mix2_v_44_16_neon:   193.2   128.2
vp9_loop_filter_mix2_v_48_16_neon:   246.7   218.4
vp9_loop_filter_mix2_v_84_16_neon:   248.0   218.5
vp9_loop_filter_mix2_v_88_16_neon:   302.0   218.2
vp9_loop_filter_v_4_8_neon:           89.0    88.7
vp9_loop_filter_v_8_8_neon:          141.0   137.7
vp9_loop_filter_v_16_8_neon:         295.0   272.7
vp9_loop_filter_v_16_16_neon:        546.0   453.7

The speedup vs C code in checkasm tests is around 2-7x, which is
pretty much the same as for the 32 bit version. Even if these functions
are faster than their 32 bit equivalent, the C version that we compare
to also became around 1.3-1.7x faster than the C version in 32 bit.

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 4-5x.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                         A57 gcc-5.3  neon
loop_filter_h_4_8_neon:        256.6  93.4
loop_filter_h_8_8_neon:        307.3 139.1
loop_filter_h_16_8_neon:       340.1 254.1
loop_filter_h_16_16_neon:      827.0 407.9
loop_filter_mix2_h_44_16_neon: 524.5 155.4
loop_filter_mix2_h_48_16_neon: 644.5 173.3
loop_filter_mix2_h_84_16_neon: 630.5 222.0
loop_filter_mix2_h_88_16_neon: 697.3 222.0
loop_filter_mix2_v_44_16_neon: 598.5 100.6
loop_filter_mix2_v_48_16_neon: 651.5 127.0
loop_filter_mix2_v_84_16_neon: 591.5 167.1
loop_filter_mix2_v_88_16_neon: 855.1 166.7
loop_filter_v_4_8_neon:        271.7  65.3
loop_filter_v_8_8_neon:        312.5 106.9
loop_filter_v_16_8_neon:       473.3 206.5
loop_filter_v_16_16_neon:      976.1 327.8

The speed-up compared to the C functions is 2.5 to 6 and the cortex-a57
is again 30-50% faster than the cortex-a53.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-14 00:10:13 +02:00
Martin Storsjö
52d196fb30 arm: vp9itxfm: Simplify txfm string comparisons
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-14 00:10:13 +02:00
Martin Storsjö
3c9546dfaf aarch64: vp9: Add NEON itxfm routines
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the 16x16 and 32x32
transforms in slices 8 pixels wide instead of 4. This gives
a speedup of around 1.4x compared to the 32 bit version.

The fact that aarch64 doesn't have the same d/q register
aliasing makes some of the macros quite a bit simpler as well.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM  AArch64
vp9_inv_adst_adst_4x4_add_neon:       90.0     87.7
vp9_inv_adst_adst_8x8_add_neon:      400.0    354.7
vp9_inv_adst_adst_16x16_add_neon:   2526.5   1827.2
vp9_inv_dct_dct_4x4_add_neon:         74.0     72.7
vp9_inv_dct_dct_8x8_add_neon:        271.0    256.7
vp9_inv_dct_dct_16x16_add_neon:     1960.7   1372.7
vp9_inv_dct_dct_32x32_add_neon:    11988.9   8088.3
vp9_inv_wht_wht_4x4_add_neon:         63.0     57.7

The speedup vs C code (2-4x) is smaller than in the 32 bit case,
mostly because the C code ends up significantly faster (around
1.6x faster, with GCC 5.4) when built for aarch64.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                                A57 gcc-5.3   neon
vp9_inv_adst_adst_4x4_add_neon:       152.2   60.0
vp9_inv_adst_adst_8x8_add_neon:       948.2  288.0
vp9_inv_adst_adst_16x16_add_neon:    4830.4 1380.5
vp9_inv_dct_dct_4x4_add_neon:         153.0   58.6
vp9_inv_dct_dct_8x8_add_neon:         789.2  180.2
vp9_inv_dct_dct_16x16_add_neon:      3639.6  917.1
vp9_inv_dct_dct_32x32_add_neon:     20462.1 4985.0
vp9_inv_wht_wht_4x4_add_neon:          91.0   49.8

The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
is around 30-50% faster on the a57 compared to the a53.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-14 00:10:13 +02:00
Diego Biurrun
800d91d348 Drop pointless void* casts 2016-11-13 18:44:01 +01:00
Diego Biurrun
d316f9cefc aac: Drop pointless cast 2016-11-13 18:44:00 +01:00
Diego Biurrun
3b50dbc51f ratecontrol: Use correct function pointer casts instead of void*
libavcodec/ratecontrol.c:120:9: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]
libavcodec/ratecontrol.c:121:9: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]
2016-11-12 16:47:06 +01:00
Martin Storsjö
dd299a2d6d arm: vp9: Add NEON loop filters
This work is sponsored by, and copyright, Google.

The implementation tries to have smart handling of cases
where no pixels need the full filtering for the 8/16 width
filters, skipping both calculation and writeback of the
unmodified pixels in those cases. The actual effect of this
is hard to test with checkasm though, since it tests the
full filtering, and the benefit depends on how many filtered
blocks use the shortcut.

Examples of relative speedup compared to the C version, from checkasm:
                          Cortex       A7     A8     A9    A53
vp9_loop_filter_h_4_8_neon:          2.72   2.68   1.78   3.15
vp9_loop_filter_h_8_8_neon:          2.36   2.38   1.70   2.91
vp9_loop_filter_h_16_8_neon:         1.80   1.89   1.45   2.01
vp9_loop_filter_h_16_16_neon:        2.81   2.78   2.18   3.16
vp9_loop_filter_mix2_h_44_16_neon:   2.65   2.67   1.93   3.05
vp9_loop_filter_mix2_h_48_16_neon:   2.46   2.38   1.81   2.85
vp9_loop_filter_mix2_h_84_16_neon:   2.50   2.41   1.73   2.85
vp9_loop_filter_mix2_h_88_16_neon:   2.77   2.66   1.96   3.23
vp9_loop_filter_mix2_v_44_16_neon:   4.28   4.46   3.22   5.70
vp9_loop_filter_mix2_v_48_16_neon:   3.92   4.00   3.03   5.19
vp9_loop_filter_mix2_v_84_16_neon:   3.97   4.31   2.98   5.33
vp9_loop_filter_mix2_v_88_16_neon:   3.91   4.19   3.06   5.18
vp9_loop_filter_v_4_8_neon:          4.53   4.47   3.31   6.05
vp9_loop_filter_v_8_8_neon:          3.58   3.99   2.92   5.17
vp9_loop_filter_v_16_8_neon:         3.40   3.50   2.81   4.68
vp9_loop_filter_v_16_16_neon:        4.66   4.41   3.74   6.02

The speedup vs C code is around 2-6x. The numbers are quite
inconclusive though, since the checkasm test runs multiple filterings
on top of each other, so later rounds might end up with different
codepaths (different decisions on which filter to apply, based
on input pixel differences). Disabling the early-exit in the asm
doesn't give a fair comparison either though, since the C code
only does the necessary calcuations for each row.

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 4-9x.

This is pretty similar in runtime to the corresponding routines
in libvpx. (This is comparing vpx_lpf_vertical_16_neon,
vpx_lpf_horizontal_edge_8_neon and vpx_lpf_horizontal_edge_16_neon
to vp9_loop_filter_h_16_8_neon, vp9_loop_filter_v_16_8_neon
and vp9_loop_filter_v_16_16_neon - note that the naming of horizonal
and vertical is flipped between the libraries.)

In order to have stable, comparable numbers, the early exits in both
asm versions were disabled, forcing the full filtering codepath.

                           Cortex           A7      A8      A9     A53
vp9_loop_filter_h_16_8_neon:             597.2   472.0   482.4   415.0
libvpx vpx_lpf_vertical_16_neon:         626.0   464.5   470.7   445.0
vp9_loop_filter_v_16_8_neon:             500.2   422.5   429.7   295.0
libvpx vpx_lpf_horizontal_edge_8_neon:   586.5   414.5   415.6   383.2
vp9_loop_filter_v_16_16_neon:            905.0   784.7   791.5   546.0
libvpx vpx_lpf_horizontal_edge_16_neon: 1060.2   751.7   743.5   685.2

Our version is consistently faster on on A7 and A53, marginally slower on
A8, and sometimes faster, sometimes slower on A9 (marginally slower in all
three tests in this particular test run).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-11 14:16:42 +02:00
Diego Biurrun
f7d183f084 libxvid: Check return value of write() call
libavcodec/libxvid_rc.c:106:9: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
2016-11-11 10:17:07 +01:00
Diego Biurrun
e5e8a26dcf libxvid: Use proper context in av_log() calls 2016-11-11 10:17:07 +01:00
Diego Biurrun
12db2832e4 libxvid: Require availability of mkstemp()
The replacement code uses tempnam(), which is dangerous.
Such a fringe feature is not worth the trouble.
2016-11-11 10:17:07 +01:00
Martin Storsjö
a67ae67083 arm: vp9: Add NEON itxfm routines
This work is sponsored by, and copyright, Google.

For the transforms up to 8x8, we can fit all the data (including
temporaries) in registers and just do a straightforward transform
of all the data. For 16x16, we do a transform of 4x16 pixels in
4 slices, using a temporary buffer. For 32x32, we transform 4x32
pixels at a time, in two steps of 4x16 pixels each.

Examples of relative speedup compared to the C version, from checkasm:
                         Cortex       A7     A8     A9    A53
vp9_inv_adst_adst_4x4_add_neon:     3.39   5.83   4.17   4.01
vp9_inv_adst_adst_8x8_add_neon:     3.79   4.86   4.23   3.98
vp9_inv_adst_adst_16x16_add_neon:   3.33   4.36   4.11   4.16
vp9_inv_dct_dct_4x4_add_neon:       4.06   6.16   4.59   4.46
vp9_inv_dct_dct_8x8_add_neon:       4.61   6.01   4.98   4.86
vp9_inv_dct_dct_16x16_add_neon:     3.35   3.44   3.36   3.79
vp9_inv_dct_dct_32x32_add_neon:     3.89   3.50   3.79   4.42
vp9_inv_wht_wht_4x4_add_neon:       3.22   5.13   3.53   3.77

Thus, the speedup vs C code is around 3-6x.

This is mostly marginally faster than the corresponding routines
in libvpx on most cores, tested with their 32x32 idct (compared to
vpx_idct32x32_1024_add_neon). These numbers are slightly in libvpx's
favour since their version doesn't clear the input buffer like ours
do (although the effect of that on the total runtime probably is
negligible.)

                           Cortex       A7       A8       A9      A53
vp9_inv_dct_dct_32x32_add_neon:    18436.8  16874.1  14235.1  11988.9
libvpx vpx_idct32x32_1024_add_neon 20789.0  13344.3  15049.9  13030.5

Only on the Cortex A8, the libvpx function is faster. On the other cores,
ours is slightly faster even though ours has got source block clearing
integrated.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-11 11:09:05 +02:00
Mark Thompson
fd0fae6037 pthread_frame: Unreference hw_frames_ctx on per-thread codec contexts
When decoding with threads enabled, the get_format callback will be
called with one of the per-thread codec contexts rather than with the
outer context.  If a hwaccel is in use too, this will add a reference
to the hardware frames context on that codec context, which will then
propagate to all of the other per-thread contexts for decoding.  Once
the decoder finishes, however, the per-thread contexts are not freed
normally, so these references leak.
2016-11-10 20:36:11 +00:00
Martin Storsjö
11623217e3 arm: vp9mc: Use a different helper register for PIC loads
This fixes crashes since 557c1675cf in linux PIC builds.

Previously, movrelx silently used r12 as helper register, which
doesn't work when r12 is the destination register.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 14:01:04 +02:00
Martin Storsjö
6a62795d40 aarch64: h264idct: Use the offset parameter to movrel
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:18:22 +02:00
Martin Storsjö
557c1675cf arm: vp9mc: Minor adjustments from review of the aarch64 version
This work is sponsored by, and copyright, Google.

The speedup for the large horizontal filters is surprisingly
big on A7 and A53, while there's a minor slowdown (almost within
measurement noise) on A8 and A9.

                            Cortex    A7        A8        A9       A53
orig:
vp9_put_8tap_smooth_64h_neon:    20270.0   14447.3   19723.9   10910.9
new:
vp9_put_8tap_smooth_64h_neon:    20165.8   14466.5   19730.2   10668.8

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:18:22 +02:00
Martin Storsjö
383d96aa22 aarch64: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; it is essentially a 1:1
port with no extra added features, but with some hand tuning
(especially for the plain copy/avg functions). The ARM version
isn't very register starved to begin with, so there's not much
to be gained from having more spare registers here - we only
avoid having to clobber callee-saved registers.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                     ARM   AArch64
vp9_avg4_neon:                      27.2      23.7
vp9_avg8_neon:                      56.5      54.7
vp9_avg16_neon:                    169.9     167.4
vp9_avg32_neon:                    585.8     585.2
vp9_avg64_neon:                   2460.3    2294.7
vp9_avg_8tap_smooth_4h_neon:       132.7     125.2
vp9_avg_8tap_smooth_4hv_neon:      478.8     442.0
vp9_avg_8tap_smooth_4v_neon:       126.0      93.7
vp9_avg_8tap_smooth_8h_neon:       241.7     234.2
vp9_avg_8tap_smooth_8hv_neon:      690.9     646.5
vp9_avg_8tap_smooth_8v_neon:       245.0     205.5
vp9_avg_8tap_smooth_64h_neon:    11273.2   11280.1
vp9_avg_8tap_smooth_64hv_neon:   22980.6   22184.1
vp9_avg_8tap_smooth_64v_neon:    11549.7   10781.1
vp9_put4_neon:                      18.0      17.2
vp9_put8_neon:                      40.2      37.7
vp9_put16_neon:                     97.4      99.5
vp9_put32_neon/armv8:              346.0     307.4
vp9_put64_neon/armv8:             1319.0    1107.5
vp9_put_8tap_smooth_4h_neon:       126.7     118.2
vp9_put_8tap_smooth_4hv_neon:      465.7     434.0
vp9_put_8tap_smooth_4v_neon:       113.0      86.5
vp9_put_8tap_smooth_8h_neon:       229.7     221.6
vp9_put_8tap_smooth_8hv_neon:      658.9     621.3
vp9_put_8tap_smooth_8v_neon:       215.0     187.5
vp9_put_8tap_smooth_64h_neon:    10636.7   10627.8
vp9_put_8tap_smooth_64hv_neon:   21076.8   21026.9
vp9_put_8tap_smooth_64v_neon:     9635.0    9632.4

These are generally about as fast as the corresponding ARM
routines on the same CPU (at least on the A53), in most cases
marginally faster.

The speedup vs C code is pretty much the same as for the 32 bit
case; on the A53 it's around 6-13x for ther larger 8tap filters.
The exact speedup varies a little, since the C versions generally
don't end up exactly as slow/fast as on 32 bit.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:15:56 +02:00
Martin Storsjö
a4cfcddcb0 vp9: Make the subpel filters non-static
Make them aligned, to allow efficient access to them from simd.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:05:57 +02:00
Anton Khirnov
84f225684c pthread_frame: properly propagate the hw frame context across frame threads 2016-11-10 09:00:11 +01:00
Diego Biurrun
72a19f4013 mpegaudiodsp: aarch64: Adjust function prototype after 2caa93b813 2016-11-10 00:13:48 +01:00
Diego Biurrun
67deba8a41 Use avpriv_report_missing_feature() where appropriate 2016-11-08 17:54:34 +01:00
Vittorio Giovara
47a795727f hevc: Support extradata changes from multiple stsd
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-08 11:22:29 -05:00