1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-08 13:22:53 +02:00
Commit Graph

36755 Commits

Author SHA1 Message Date
Philip Langdale
4e6d1c1f4e avcodec/vdpau_hevc: Fix potential out-of-bounds write
The maximum number of references is 16, so the index value cannot
exceed 15.

Fixes Coverity CID 1348139, 1348140, 1348141
2016-11-30 16:14:39 -08:00
Philip Langdale
5512dbe37f avcodec/crystalhd: Handle errors from av_image_get_linesize
This function can return an error in certain situations.

Fixes Coverity CID 703707.
2016-11-30 16:14:39 -08:00
James Darnley
13d71c28cc avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions
Yorkfield:
 - sse2:
   - complex: 4.13x faster (1514 vs. 367 cycles)
   - simple:  4.38x faster (1836 vs. 419 cycles)

Skylake:
 - sse2:
   - complex: 3.61x faster ( 936 vs. 260 cycles)
   - simple:  3.97x faster (1126 vs. 284 cycles)
 - avx (versus sse2):
   - complex: 1.07x faster (260 vs. 244 cycles)
   - simple:  1.03x faster (284 vs. 274 cycles)
2016-11-30 22:58:28 +01:00
James Darnley
1dae7ffa0b avcodec/h264: mmx 4:2:2 idct add8 function
2.87 times faster (1830 vs. 638 cycles)
2016-11-30 22:58:27 +01:00
James Darnley
815ea8c6cc avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter
2.1 times faster (401 vs. 194 cycles)
2016-11-30 22:58:27 +01:00
Timo Rothenpieler
c2f3af57a5 avcodec/nvenc: mark intentional fall through 2016-11-30 12:36:23 +01:00
Miroslav Slugeň
f2dd6aee80 avcodec/nvenc: always reduce DAR width and height
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2016-11-30 12:36:23 +01:00
Philip Langdale
27038693bb avcodec/nvenc: Delay identification of underlying format of cuda frames
When input surfaces are cuda frames, we will not know what the actual
underlying format (nv12, p010, etc) is at surface allocation time.

On the other hand, we will know when the input frames are actually
registered and associated with a surface.

So, let's delay format discovery until registration time, which is
actually how we handle other frame properties, such as dimensions.

By itself, this change doesn't allow for transcoding of 10bit
content from cuvid, but it reduces the problem to the hardcoding of
the sw format in ffmpeg_cuvid.c

Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2016-11-30 12:36:23 +01:00
Michael Niedermayer
2475858889 avcodec/flac_parser: Update nb_headers_buffered
Fixes infinite loop
Fixes: fuzz.flac

Found-by: Frank Liberato <liberato@google.com>
Reviewed-by: Frank Liberato <liberato@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-30 00:58:56 +01:00
Paul B Mahol
d56c7830c0 avcodec/raw: add gray10 support in nut
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2016-11-29 11:23:20 +01:00
Daniel Verkamp
e856ac2373 avcodec/msrledec: implement vertical offset in 4-bit RLE
The delta escape (2) is supposed to work the same in 4-bit RLE as in
8-bit RLE.  This is documented in the MSDN Bitmap Compression page:
https://msdn.microsoft.com/en-us/library/windows/desktop/dd183383(v=vs.85).aspx

The unchecked modification of line is safe, since the loop condition
(line >= 0) will check it before any pixel data is written.

Fixes ticket #5153 (output now matches ImageMagick for the provided sample).

Signed-off-by: Daniel Verkamp <daniel@drv.nu>
2016-11-29 10:57:49 +01:00
Alex Converse
8899057d91 libvpxenc: Report encoded VP9 level
Report the actual level of the encoded output if a level is
targeted or the level is passively tracked with a target of 0.
2016-11-28 12:02:43 -08:00
Andreas Cadhalpun
801b5c18c7 pngdec: check if previous frame exists instead of trusting sequence_number
This fixes a segmentation fault caused by calling memcpy with NULL as
second argument in handle_p_frame_apng.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-27 23:46:30 +01:00
Michael Niedermayer
d9883ded34 avcodec/me_cmp: Fix median_sad size
Fixes out of array read
Fixes: COV1396255

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-27 14:34:57 +01:00
Hendrik Leppkes
99ee8ee093 dxva2_vc1: support multiple slices 2016-11-26 13:11:36 +01:00
Hendrik Leppkes
36e27c87e7 vc1dec: support multiple slices in frame coded images with hwaccel
Based on a patch by Jun Zhao <mypopydev@gmail.com>
2016-11-26 13:11:32 +01:00
James Almer
6e1902bab4 avcodec/aac_adtstoasc_bsf: validate and forward extradata if the stream is already ASC
Fixes ticket #5973

Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-11-25 18:24:56 -03:00
Andreas Cadhalpun
2566ad98b0 mss2: only use error correction for matching block counts
This fixes a heap-buffer-overflow in ff_er_frame_end when decoding mss2
with coded_width/coded_height larger than width/height.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-25 21:05:03 +01:00
Philip Langdale
829db8effd avcodec/nvenc: Remove aspect-ratio decompensation logic
This dubious behaviour in nvenc was finally removed by nvidia, and
as we refuse to run on anything older than 7.0, we don't need to
keep it around for old versions.
2016-11-25 10:13:58 -08:00
James Almer
50b1453915 avcodec/mpeg4audio: correctly propagate meaningful error values
Signed-off-by: James Almer <jamrial@gmail.com>
2016-11-25 10:40:59 -03:00
Martin Vignali
5099c541bb libavcodec/exr: add support for uint32 channel decoding with pxr24
Doesn't decode the uint32 layer, but decodes the half part of the file.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-25 00:57:38 +01:00
Andreas Cadhalpun
8c8f543b81 libopusdec: default to stereo for invalid number of channels
This fixes an out-of-bounds read if avc->channels is 0.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-25 00:36:36 +01:00
Andreas Cadhalpun
995512328e pgssubdec: only set w/h/linesize when allocating data
Rects with positive w/h/linesize but no data are invalid.

Reviewed-by: Petri Hintukainen <phintuka@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-24 01:48:43 +01:00
Jun Zhao
584eea5bf3 lavc/vaapi_hevc: fix scaling list duplicate transfer issue.
scaling list is already transfered to raster scan during head parsing,
so no need to transfer it again.

And after this fix, FATE test SLIST_A_Sony_4/SLIST_B_Sony_8/
SLIST_C_Sony_3/SLIST_D_Sony_9 will pass in i965/Skylake.

Signed-off-by: Wang, Yi A <yi.a.wamg@intel.com>
Signed-off-by: Jun Zhao <jun.zhao@intel.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
2016-11-23 21:38:10 +00:00
Philip Langdale
dd10e7253a avcodec/cuvid: Restore initialization of pixel format in init()
I moved this into the handle_video_sequence callback because that's
the earliest time you can make an accurate decision as to what the
format should be.

However, transcoding requires that the decision between using
the accelerated PIX_FMT_CUDA vs a normal pix format happen at init()
time. There is enough information available to make that decision
and things work out with the underlying format only being discovered
in the sequence callback.
2016-11-23 13:23:34 -08:00
Michael Niedermayer
69f7dd3524 avcodec/options_table: make channel_layouts uint64
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-23 02:01:05 +01:00
Andreas Cadhalpun
946ecd19ea smacker: limit recursion depth of smacker_decode_bigtree
This fixes segmentation faults due to stack-overflow caused by too deep
recursion.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-23 00:57:10 +01:00
Alex Converse
3ee59939a1 libvpxenc: Support targeting a VP9 level
Levels are specified at https://www.webmproject.org/vp9/levels/
2016-11-22 11:31:48 -08:00
Philip Langdale
81147b5596 avcodec/cuvid: Add support for P010/P016 as an output surface format
The nvidia 375.xx driver introduces support for P016 output surfaces,
for 10bit and 12bit HEVC content (it's also the first driver to support
hardware decoding of 12bit content).

The cuvid api, as far as I can tell, only declares one output format
that they appear to refer to as P016 in the driver strings. Of course,
10bit content in P016 is identical to P010, and it is useful for
compatibility purposes to declare the format to be P010 to work with
other components that only know how to consume P010 (and to avoid
triggering swscale conversions that are lossy when they shouldn't be).

For simplicity, this change does not maintain the previous ability
to output dithered NV12 for 10/12 bit input video - the user will need
to update their driver to decode such videos.
2016-11-22 10:09:30 -08:00
Timo Rothenpieler
5ea8f70623 avcodec/libx264: fix forced_idr logic
Currently, it forces IDR frames for both true and false.
Not entirely sure what the original idea behind the tri-state bool
option is.

Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-11-22 16:35:08 +01:00
Miroslav Slugen
10db40f374 avcodec/cuvid: allow setting number of used surfaces
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2016-11-22 10:34:27 +01:00
Miroslav Slugeň
de2faec2fa avcodec/nvenc: better surface allocation alghoritm, fix rc_lookahead
User selectable surfaces are not working correctly, if you set number of
surfaces on cmdline, it will always use minimum 32 or 48 depends on
selected resolution, but in nvenc it is not necessary to use so many
surfaces.

So from now you can define as low as 1 surface and nvenc will still
work, it will ofcourse lower GPU memory usage by 95% and async_delay to zero

That was the easy part, now littlebit more...

Next part of this patch is to always prefer rc_lookahead to be more
important for number of surfaces, than user defined surfaces value.
Maximum rc_lookahead from nvidia documentation is 32, but could increase
in future generations so there is no limit for this yet. Value
async_depth is still accepted and prefered over rc_lookahead.

There were also bug when you request more than rc_lookahead > 31, it
will always set maximum 31, because surface numbers recalculation was
after setting lookahead, which is now fixed.

Results:
If you set -rc_lookahead 32 and -bf 3 it will now use only 40 surfaces
and lower GPU memory usage by 20%, also it will now increase PSNR by 0.012dB

Two more comments:

1. from my internal test, i don't understand addition of 4 more surfaces
when lookahead is calculated, i didn't used this and everything works as
with those 4 more extra surfaces, does anybody know what is going on
there? I looks like it was used for B frames which are calculated
separately, because B frames maximum is 4.

2. rc_lookahead is defined default to -1, but in test condition if
(ctx->rc_lookahead) which sets lookahead it will be always true, i don't
know if this is intended behavior, so in default behavior is lookahead
always on!

This is default condition when rc_lokkahead is -1 (not defined on
cmdline), whis is maybe something that is not intended:
ctx->encode_config.rcParams.enableLookahead = 1;
ctx->encode_config.rcParams.lookaheadDepth  = 0;
ctx->encode_config.rcParams.disableIadapt   = 0;
ctx->encode_config.rcParams.disableBadapt   = 0;

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2016-11-22 10:34:27 +01:00
Miroslav Slugeň
c4aca65a42 avcodec/nvenc: maximum usable surfaces are limited to maximum registered frames
Maximum usable surfaces is limited to MAX_REGISTERED_FRAMES constant in
nvenc.h

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2016-11-22 10:34:27 +01:00
Timo Rothenpieler
a66835bcb1 avcodec/nvenc: use dynamically loaded CUDA 2016-11-22 10:34:27 +01:00
Timo Rothenpieler
d9ad18f3b4 avcodec/cuvid: use dynamically loaded CUDA/CUVID
And remove the now obsolete compat headers.
2016-11-22 10:34:27 +01:00
Mark Thompson
f242e0a0ff vaapi_encode: Fix format specifier for bitrate logging
Same as e0df56f25d.  This was accidentally
reintroduced while merging c8241e730f.
2016-11-21 22:59:58 +00:00
Jun Zhao
e72662e131 lavc/vaapi_encode_h264: fix poc incorrect issue after meeting idr frame.
when meeting IDR frame, vaapi_encode_h264 poc number don't reset, now fix
this issue based on h264 spec. Some decoder don't care this case, but this
fix will enhance the encoder action. Before this fix, poc number is
negative in some case.

Reviewed-by: Jun Zhao <jun.zhao@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
2016-11-21 22:37:02 +00:00
Mark Thompson
30ebabca7c vaapi_h265: Fix buffering parameters
A decoder may need this to be set correctly to output frames in the
right order.

(cherry picked from commit b8cac1e830)
2016-11-21 22:13:41 +00:00
Mark Thompson
ae0230cc3e vaapi_h265: Fix slice header writing
This was not observed earlier because the only syntax element which
it normally misses with the current setup is slice_qp_delta, but that
is always going to be zero (in IDR frames QP isn't varied on the
slice) which will always exp-golomb code as a single 1 bit.  The
immediately following part is the byte alignment, which is always a 1
bit followed by 0s which are ignored, so as long as the bitstream is
never aligned at that point we will never notice because the only
difference is that an ignored bit is a 1 instead of a 0.

(cherry picked from commit fc30a90898)
2016-11-21 22:13:41 +00:00
Mark Thompson
6796e6ea84 vaapi_h264: Write bitstream restriction fields
(cherry picked from commit ec17ab381e)
2016-11-21 22:13:41 +00:00
Mark Thompson
658c5afaa0 vaapi_h264: Fix CFR mode with frame_rate set in AVCodecContext
(cherry picked from commit 17a0f9481c)
2016-11-21 22:13:41 +00:00
Mark Thompson
ded1859df1 vaapi_encode: Decide on GOP setup before initialising sequence parameters
This was always too late; several fields related to it have been incorrectly
zero since the encoder was added.

(cherry picked from commit 314b421dd8)
2016-11-21 22:13:41 +00:00
Mark Thompson
ee1d04f970 vaapi_h264: Set max_num_ref_frames to 1 when not using B frames
(cherry picked from commit 956a54129d)
2016-11-21 22:13:41 +00:00
Mark Thompson
94f446c628 vaapi_encode: Sync to input surface rather than output
While outwardly bizarre, this change makes the behaviour consistent
with other VAAPI encoders which sync to the encode /input/ picture in
order to wait for /output/ from the encoder.  It is not harmful on
i965 (because synchronisation already happens in vaRenderPicture(),
so it has no effect there), and it allows the encoder to work on
mesa/gallium which assumes this behaviour.

(cherry picked from commit 086e4b58b5)
2016-11-21 22:13:41 +00:00
Mark Thompson
478a4b7e6d vaapi_encode: Check packed header capabilities
This improves behaviour with drivers which do not support packed
headers, such as AMD VCE on mesa/gallium.

(cherry picked from commit 892bbbcdc1)
2016-11-21 22:13:41 +00:00
Mark Thompson
c8241e730f vaapi_encode: Refactor initialisation
This allows better checking of capabilities and will make it easier
to add more functionality later.

It also commonises some duplicated code around rate control setup
and adds more comments explaining the internals.

(cherry picked from commit 80a5d05108)
2016-11-21 22:13:41 +00:00
Mark Thompson
06d73d002e vaapi_h264: Fix HRD bit_rate/cpb_size scaling
There should be an extra offset of 6 on bit_rate_scale and of 4 on
cpb_size_scale which were not accounted for here.

(cherry picked from commit 3a9662af6c)
2016-11-21 22:13:41 +00:00
Carl Eugen Hoyos
322568c079 lavc/ffv1: Support YUV4xxP12 and GRAY12. 2016-11-20 22:23:01 +01:00
James Almer
574929d8b6 avcodec/avpacket: fix leak on realloc in av_packet_add_side_data()
If realloc fails, the pointer is overwritten and the previously allocated
buffer is leaked, which goes against the expected behavior of keeping the
packet unchanged in case of error.

Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-11-19 20:23:25 -03:00
Andreas Cadhalpun
7289aa2d71 options_table: limit codec parameters to sane values
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-18 22:40:42 +01:00
James Almer
2de1c79b61 x86/vp9itxfm: add missing AVX2 guards
Fixes compilation with Yasm 1.1.0 and older.

Signed-off-by: James Almer <jamrial@gmail.com>
2016-11-18 17:01:11 -03:00
Michael Niedermayer
d1d18de6ad avcodec/ffv1dec: Set packed_at_lsb for 16bit YUV
This avoids unneeded computations

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-18 18:04:28 +01:00
Michael Niedermayer
d7a3bb2088 avcodec/ffv1dec: Support gray 10/12/16 explicitly avoid shifts
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-18 18:04:28 +01:00
James Almer
16c429166d Revert "apngdec: use side data to pass extradata to the decoder"
This reverts commit e0c6b32046.

Said commit changed the behavior of the demuxer and decoder in a non
backwards compatible way.
Demuxers should make extradata available at init if possible, and send
new extradata as side data within a packet if needed.

A better fix for the remuxing crash will follow.

Signed-off-by: James Almer <jamrial@gmail.com>
2016-11-18 12:24:28 -03:00
Hendrik Leppkes
07502e473f Merge commit '7a76371437f9562c3414f985523f883489e3936a'
* commit '7a76371437f9562c3414f985523f883489e3936a':
  libopenh264enc: Simplify init by setting FF_CODEC_CAP_INIT_CLEANUP

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:47:08 +01:00
Hendrik Leppkes
7e9474ca47 Merge commit '2d097c16b833c532ac974a7f1fd05c0a1f3b7675'
* commit '2d097c16b833c532ac974a7f1fd05c0a1f3b7675':
  libopenh264enc: Return a more sensible error code in some init failure paths

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:46:02 +01:00
Hendrik Leppkes
0bd76401d1 Merge commit '36b380dcd52ef47d7ba0559ed51192c88d82a9bd'
* commit '36b380dcd52ef47d7ba0559ed51192c88d82a9bd':
  libopenh264dec: Simplify the init thanks to FF_CODEC_CAP_INIT_CLEANUP being set

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:45:08 +01:00
Michael Niedermayer
ae514b1254 avcodec/ass_split: Change order of operations in ass_split_section()
This matches the other branch
Fixes out of array read
Fixes: 4d142ca76d39fe685effcf5017098723/asan_heap-oob_31ae824_8611_348fdb64f9009b63c8a8eae9a0e497c5.mkv

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-17 18:05:18 +01:00
Hendrik Leppkes
2f1a539d4b Merge commit '61bd0ed781b56eea1e8e851aab34a2ee3b59fbac'
* commit '61bd0ed781b56eea1e8e851aab34a2ee3b59fbac':
  h264: Log more information about invalid NALu size

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:24:25 +01:00
Hendrik Leppkes
cca4fd4778 Merge commit 'a8cbe5a0ccebf60a8a8b0aba5d5716dd54c1595c'
* commit 'a8cbe5a0ccebf60a8a8b0aba5d5716dd54c1595c':
  h264_ps: export actual height in MBs as SPS.mb_height

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:17:21 +01:00
Hendrik Leppkes
e999a4ed6c Merge commit '2866d108c9e9da7baf53ff57a51d470691049a57'
* commit '2866d108c9e9da7baf53ff57a51d470691049a57':
  vp8dsp: Remove the comment saying that the height is equal to the width

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:06:28 +01:00
Hendrik Leppkes
2818aaaba0 Merge commit '5f74bd31a9bd1ac7655103b11743c12d38e0419f'
* commit '5f74bd31a9bd1ac7655103b11743c12d38e0419f':
  vp8/armv6: mc: avoid boolean expression in calculation

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:05:07 +01:00
Carl Eugen Hoyos
55a424c5a8 lavc/ffv1dec: Scale output for msb-packed compression to full 16bit.
2% slowdown for existing decode-line timer.
2016-11-17 13:00:47 +01:00
Carl Eugen Hoyos
f8247c0cce lavc/ffv1enc: Support pix_fmt GRAY10. 2016-11-17 12:47:39 +01:00
Michael Niedermayer
2c9106257f avcodec/mpeg4videodec: Workaround interlaced mpeg4 edge MC bug
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-17 12:21:48 +01:00
Michael Niedermayer
85407c7e63 avcodec/mpegvideo: Fix edge emu buffer overlap with interlaced mpeg4
Fixes Ticket5936
Regression since c5fc8ae126

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-17 12:21:48 +01:00
Martin Vignali
52da3f6f70 libavcodec/exr : fix channel size calculation for uint32 channel
uint32 need 4 bytes not 1.
Fix decoding when there is half/float and uint32 channel.

This fixes crashes due to pointer corruption caused by invalid writes.

The problem was introduced in commit
03152e74df.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 23:45:44 +01:00
Andreas Cadhalpun
ce3147eb19 exr: reindent after previous commit
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 22:37:24 +01:00
Andreas Cadhalpun
ffdc5d09e4 exr: fix out-of-bounds read
channel_index can be -1.

This problem was introduced in commit
2dd7b46132.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 22:37:17 +01:00
Andreas Cadhalpun
3c0328d58d libschroedingerdec: fix leaking of framewithpts
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 19:31:11 +01:00
Andreas Cadhalpun
a86ebbf7f6 libschroedingerdec: don't produce empty frames
They are not valid and can cause problems/crashes for API users.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 19:30:49 +01:00
Andreas Cadhalpun
90ebf3c428 dds: limit 4 bpp handling to AV_PIX_FMT_PAL8
This fixes NULL pointer dereferencing for formats, where frame->data[1]
is not allocated.

The problem was introduced in commit
257fbc3af4.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 19:29:45 +01:00
Thierry Foucu
c512546689 Fix -Werror=parentheses error
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-16 02:39:57 +01:00
Michael Niedermayer
1546d487cf avcodec/rv40: Test remaining space in loop of get_dimension()
Fixes infinite loop
Fixes: 178/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_RV40_fuzzer

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-15 23:08:43 +01:00
Andreas Cadhalpun
1abcd972c4 mlz: limit next_code to data buffer size
This fixes a heap-buffer-overflow detected by AddressSanitizer.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-15 22:01:08 +01:00
Martin Storsjö
f1212e472b aarch64: vp9: Implement NEON loop filters
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the loop filters with
16 pixels at a time. The implementation is fully templated, with
a single macro which can generate versions for both 8 and
16 pixels wide, for both 4, 8 and 16 pixels loop filters
(and the 4/8 mixed versions as well).

For the 8 pixel wide versions, it is pretty close in speed (the
v_4_8 and v_8_8 filters are the best examples of this; the h_4_8
and h_8_8 filters seem to get some gain in the load/transpose/store
part). For the 16 pixels wide ones, we get a speedup of around
1.2-1.4x compared to the 32 bit version.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM AArch64
vp9_loop_filter_h_4_8_neon:          144.0   127.2
vp9_loop_filter_h_8_8_neon:          207.0   182.5
vp9_loop_filter_h_16_8_neon:         415.0   328.7
vp9_loop_filter_h_16_16_neon:        672.0   558.6
vp9_loop_filter_mix2_h_44_16_neon:   302.0   203.5
vp9_loop_filter_mix2_h_48_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_84_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_88_16_neon:   376.0   305.2
vp9_loop_filter_mix2_v_44_16_neon:   193.2   128.2
vp9_loop_filter_mix2_v_48_16_neon:   246.7   218.4
vp9_loop_filter_mix2_v_84_16_neon:   248.0   218.5
vp9_loop_filter_mix2_v_88_16_neon:   302.0   218.2
vp9_loop_filter_v_4_8_neon:           89.0    88.7
vp9_loop_filter_v_8_8_neon:          141.0   137.7
vp9_loop_filter_v_16_8_neon:         295.0   272.7
vp9_loop_filter_v_16_16_neon:        546.0   453.7

The speedup vs C code in checkasm tests is around 2-7x, which is
pretty much the same as for the 32 bit version. Even if these functions
are faster than their 32 bit equivalent, the C version that we compare
to also became around 1.3-1.7x faster than the C version in 32 bit.

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 4-5x.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                         A57 gcc-5.3  neon
loop_filter_h_4_8_neon:        256.6  93.4
loop_filter_h_8_8_neon:        307.3 139.1
loop_filter_h_16_8_neon:       340.1 254.1
loop_filter_h_16_16_neon:      827.0 407.9
loop_filter_mix2_h_44_16_neon: 524.5 155.4
loop_filter_mix2_h_48_16_neon: 644.5 173.3
loop_filter_mix2_h_84_16_neon: 630.5 222.0
loop_filter_mix2_h_88_16_neon: 697.3 222.0
loop_filter_mix2_v_44_16_neon: 598.5 100.6
loop_filter_mix2_v_48_16_neon: 651.5 127.0
loop_filter_mix2_v_84_16_neon: 591.5 167.1
loop_filter_mix2_v_88_16_neon: 855.1 166.7
loop_filter_v_4_8_neon:        271.7  65.3
loop_filter_v_8_8_neon:        312.5 106.9
loop_filter_v_16_8_neon:       473.3 206.5
loop_filter_v_16_16_neon:      976.1 327.8

The speed-up compared to the C functions is 2.5 to 6 and the cortex-a57
is again 30-50% faster than the cortex-a53.

This is an adapted cherry-pick from libav commits
9d2afd1eb8 and
31756abe29.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Martin Storsjö
f43079e11c aarch64: vp9: Add NEON itxfm routines
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the 16x16 and 32x32
transforms in slices 8 pixels wide instead of 4. This gives
a speedup of around 1.4x compared to the 32 bit version.

The fact that aarch64 doesn't have the same d/q register
aliasing makes some of the macros quite a bit simpler as well.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM  AArch64
vp9_inv_adst_adst_4x4_add_neon:       90.0     87.7
vp9_inv_adst_adst_8x8_add_neon:      400.0    354.7
vp9_inv_adst_adst_16x16_add_neon:   2526.5   1827.2
vp9_inv_dct_dct_4x4_add_neon:         74.0     72.7
vp9_inv_dct_dct_8x8_add_neon:        271.0    256.7
vp9_inv_dct_dct_16x16_add_neon:     1960.7   1372.7
vp9_inv_dct_dct_32x32_add_neon:    11988.9   8088.3
vp9_inv_wht_wht_4x4_add_neon:         63.0     57.7

The speedup vs C code (2-4x) is smaller than in the 32 bit case,
mostly because the C code ends up significantly faster (around
1.6x faster, with GCC 5.4) when built for aarch64.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                                A57 gcc-5.3   neon
vp9_inv_adst_adst_4x4_add_neon:       152.2   60.0
vp9_inv_adst_adst_8x8_add_neon:       948.2  288.0
vp9_inv_adst_adst_16x16_add_neon:    4830.4 1380.5
vp9_inv_dct_dct_4x4_add_neon:         153.0   58.6
vp9_inv_dct_dct_8x8_add_neon:         789.2  180.2
vp9_inv_dct_dct_16x16_add_neon:      3639.6  917.1
vp9_inv_dct_dct_32x32_add_neon:     20462.1 4985.0
vp9_inv_wht_wht_4x4_add_neon:          91.0   49.8

The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
is around 30-50% faster on the a57 compared to the a53.

This is an adapted cherry-pick from libav commit
3c9546dfaf.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Martin Storsjö
1f7801c2bc aarch64: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; it is essentially a 1:1
port with no extra added features, but with some hand tuning
(especially for the plain copy/avg functions). The ARM version
isn't very register starved to begin with, so there's not much
to be gained from having more spare registers here - we only
avoid having to clobber callee-saved registers.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                     ARM   AArch64
vp9_avg4_neon:                      27.2      23.7
vp9_avg8_neon:                      56.5      54.7
vp9_avg16_neon:                    169.9     167.4
vp9_avg32_neon:                    585.8     585.2
vp9_avg64_neon:                   2460.3    2294.7
vp9_avg_8tap_smooth_4h_neon:       132.7     125.2
vp9_avg_8tap_smooth_4hv_neon:      478.8     442.0
vp9_avg_8tap_smooth_4v_neon:       126.0      93.7
vp9_avg_8tap_smooth_8h_neon:       241.7     234.2
vp9_avg_8tap_smooth_8hv_neon:      690.9     646.5
vp9_avg_8tap_smooth_8v_neon:       245.0     205.5
vp9_avg_8tap_smooth_64h_neon:    11273.2   11280.1
vp9_avg_8tap_smooth_64hv_neon:   22980.6   22184.1
vp9_avg_8tap_smooth_64v_neon:    11549.7   10781.1
vp9_put4_neon:                      18.0      17.2
vp9_put8_neon:                      40.2      37.7
vp9_put16_neon:                     97.4      99.5
vp9_put32_neon/armv8:              346.0     307.4
vp9_put64_neon/armv8:             1319.0    1107.5
vp9_put_8tap_smooth_4h_neon:       126.7     118.2
vp9_put_8tap_smooth_4hv_neon:      465.7     434.0
vp9_put_8tap_smooth_4v_neon:       113.0      86.5
vp9_put_8tap_smooth_8h_neon:       229.7     221.6
vp9_put_8tap_smooth_8hv_neon:      658.9     621.3
vp9_put_8tap_smooth_8v_neon:       215.0     187.5
vp9_put_8tap_smooth_64h_neon:    10636.7   10627.8
vp9_put_8tap_smooth_64hv_neon:   21076.8   21026.9
vp9_put_8tap_smooth_64v_neon:     9635.0    9632.4

These are generally about as fast as the corresponding ARM
routines on the same CPU (at least on the A53), in most cases
marginally faster.

The speedup vs C code is pretty much the same as for the 32 bit
case; on the A53 it's around 6-13x for ther larger 8tap filters.
The exact speedup varies a little, since the C versions generally
don't end up exactly as slow/fast as on 32 bit.

This is an adapted cherry-pick from libav commit
383d96aa22.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Martin Storsjö
6bec60a683 arm: vp9: Add NEON loop filters
This work is sponsored by, and copyright, Google.

The implementation tries to have smart handling of cases
where no pixels need the full filtering for the 8/16 width
filters, skipping both calculation and writeback of the
unmodified pixels in those cases. The actual effect of this
is hard to test with checkasm though, since it tests the
full filtering, and the benefit depends on how many filtered
blocks use the shortcut.

Examples of relative speedup compared to the C version, from checkasm:
                          Cortex       A7     A8     A9    A53
vp9_loop_filter_h_4_8_neon:          2.72   2.68   1.78   3.15
vp9_loop_filter_h_8_8_neon:          2.36   2.38   1.70   2.91
vp9_loop_filter_h_16_8_neon:         1.80   1.89   1.45   2.01
vp9_loop_filter_h_16_16_neon:        2.81   2.78   2.18   3.16
vp9_loop_filter_mix2_h_44_16_neon:   2.65   2.67   1.93   3.05
vp9_loop_filter_mix2_h_48_16_neon:   2.46   2.38   1.81   2.85
vp9_loop_filter_mix2_h_84_16_neon:   2.50   2.41   1.73   2.85
vp9_loop_filter_mix2_h_88_16_neon:   2.77   2.66   1.96   3.23
vp9_loop_filter_mix2_v_44_16_neon:   4.28   4.46   3.22   5.70
vp9_loop_filter_mix2_v_48_16_neon:   3.92   4.00   3.03   5.19
vp9_loop_filter_mix2_v_84_16_neon:   3.97   4.31   2.98   5.33
vp9_loop_filter_mix2_v_88_16_neon:   3.91   4.19   3.06   5.18
vp9_loop_filter_v_4_8_neon:          4.53   4.47   3.31   6.05
vp9_loop_filter_v_8_8_neon:          3.58   3.99   2.92   5.17
vp9_loop_filter_v_16_8_neon:         3.40   3.50   2.81   4.68
vp9_loop_filter_v_16_16_neon:        4.66   4.41   3.74   6.02

The speedup vs C code is around 2-6x. The numbers are quite
inconclusive though, since the checkasm test runs multiple filterings
on top of each other, so later rounds might end up with different
codepaths (different decisions on which filter to apply, based
on input pixel differences). Disabling the early-exit in the asm
doesn't give a fair comparison either though, since the C code
only does the necessary calcuations for each row.

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 4-9x.

This is pretty similar in runtime to the corresponding routines
in libvpx. (This is comparing vpx_lpf_vertical_16_neon,
vpx_lpf_horizontal_edge_8_neon and vpx_lpf_horizontal_edge_16_neon
to vp9_loop_filter_h_16_8_neon, vp9_loop_filter_v_16_8_neon
and vp9_loop_filter_v_16_16_neon - note that the naming of horizonal
and vertical is flipped between the libraries.)

In order to have stable, comparable numbers, the early exits in both
asm versions were disabled, forcing the full filtering codepath.

                           Cortex           A7      A8      A9     A53
vp9_loop_filter_h_16_8_neon:             597.2   472.0   482.4   415.0
libvpx vpx_lpf_vertical_16_neon:         626.0   464.5   470.7   445.0
vp9_loop_filter_v_16_8_neon:             500.2   422.5   429.7   295.0
libvpx vpx_lpf_horizontal_edge_8_neon:   586.5   414.5   415.6   383.2
vp9_loop_filter_v_16_16_neon:            905.0   784.7   791.5   546.0
libvpx vpx_lpf_horizontal_edge_16_neon: 1060.2   751.7   743.5   685.2

Our version is consistently faster on on A7 and A53, marginally slower on
A8, and sometimes faster, sometimes slower on A9 (marginally slower in all
three tests in this particular test run).

This is an adapted cherry-pick from libav commit
dd299a2d6d.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Martin Storsjö
b4dc7c341e arm: vp9: Add NEON itxfm routines
This work is sponsored by, and copyright, Google.

For the transforms up to 8x8, we can fit all the data (including
temporaries) in registers and just do a straightforward transform
of all the data. For 16x16, we do a transform of 4x16 pixels in
4 slices, using a temporary buffer. For 32x32, we transform 4x32
pixels at a time, in two steps of 4x16 pixels each.

Examples of relative speedup compared to the C version, from checkasm:
                         Cortex       A7     A8     A9    A53
vp9_inv_adst_adst_4x4_add_neon:     3.39   5.83   4.17   4.01
vp9_inv_adst_adst_8x8_add_neon:     3.79   4.86   4.23   3.98
vp9_inv_adst_adst_16x16_add_neon:   3.33   4.36   4.11   4.16
vp9_inv_dct_dct_4x4_add_neon:       4.06   6.16   4.59   4.46
vp9_inv_dct_dct_8x8_add_neon:       4.61   6.01   4.98   4.86
vp9_inv_dct_dct_16x16_add_neon:     3.35   3.44   3.36   3.79
vp9_inv_dct_dct_32x32_add_neon:     3.89   3.50   3.79   4.42
vp9_inv_wht_wht_4x4_add_neon:       3.22   5.13   3.53   3.77

Thus, the speedup vs C code is around 3-6x.

This is mostly marginally faster than the corresponding routines
in libvpx on most cores, tested with their 32x32 idct (compared to
vpx_idct32x32_1024_add_neon). These numbers are slightly in libvpx's
favour since their version doesn't clear the input buffer like ours
do (although the effect of that on the total runtime probably is
negligible.)

                           Cortex       A7       A8       A9      A53
vp9_inv_dct_dct_32x32_add_neon:    18436.8  16874.1  14235.1  11988.9
libvpx vpx_idct32x32_1024_add_neon 20789.0  13344.3  15049.9  13030.5

Only on the Cortex A8, the libvpx function is faster. On the other cores,
ours is slightly faster even though ours has got source block clearing
integrated.

This is an adapted cherry-pick from libav commits
a67ae67083 and
52d196fb30.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Martin Storsjö
68caef9d48 arm: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

The filter coefficients are signed values, where the product of the
multiplication with one individual filter coefficient doesn't
overflow a 16 bit signed value (the largest filter coefficient is
127). But when the products are accumulated, the resulting sum can
overflow the 16 bit signed range. Instead of accumulating in 32 bit,
we accumulate the largest product (either index 3 or 4) last with a
saturated addition.

(The VP8 MC asm does something similar, but slightly simpler, by
accumulating each half of the filter separately. In the VP9 MC
filters, each half of the filter can also overflow though, so the
largest component has to be handled individually.)

Examples of relative speedup compared to the C version, from checkasm:
                       Cortex      A7     A8     A9    A53
vp9_avg4_neon:                   1.71   1.15   1.42   1.49
vp9_avg8_neon:                   2.51   3.63   3.14   2.58
vp9_avg16_neon:                  2.95   6.76   3.01   2.84
vp9_avg32_neon:                  3.29   6.64   2.85   3.00
vp9_avg64_neon:                  3.47   6.67   3.14   2.80
vp9_avg_8tap_smooth_4h_neon:     3.22   4.73   2.76   4.67
vp9_avg_8tap_smooth_4hv_neon:    3.67   4.76   3.28   4.71
vp9_avg_8tap_smooth_4v_neon:     5.52   7.60   4.60   6.31
vp9_avg_8tap_smooth_8h_neon:     6.22   9.04   5.12   9.32
vp9_avg_8tap_smooth_8hv_neon:    6.38   8.21   5.72   8.17
vp9_avg_8tap_smooth_8v_neon:     9.22  12.66   8.15  11.10
vp9_avg_8tap_smooth_64h_neon:    7.02  10.23   5.54  11.58
vp9_avg_8tap_smooth_64hv_neon:   6.76   9.46   5.93   9.40
vp9_avg_8tap_smooth_64v_neon:   10.76  14.13   9.46  13.37
vp9_put4_neon:                   1.11   1.47   1.00   1.21
vp9_put8_neon:                   1.23   2.17   1.94   1.48
vp9_put16_neon:                  1.63   4.02   1.73   1.97
vp9_put32_neon:                  1.56   4.92   2.00   1.96
vp9_put64_neon:                  2.10   5.28   2.03   2.35
vp9_put_8tap_smooth_4h_neon:     3.11   4.35   2.63   4.35
vp9_put_8tap_smooth_4hv_neon:    3.67   4.69   3.25   4.71
vp9_put_8tap_smooth_4v_neon:     5.45   7.27   4.49   6.52
vp9_put_8tap_smooth_8h_neon:     5.97   8.18   4.81   8.56
vp9_put_8tap_smooth_8hv_neon:    6.39   7.90   5.64   8.15
vp9_put_8tap_smooth_8v_neon:     9.03  11.84   8.07  11.51
vp9_put_8tap_smooth_64h_neon:    6.78   9.48   4.88  10.89
vp9_put_8tap_smooth_64hv_neon:   6.99   8.87   5.94   9.56
vp9_put_8tap_smooth_64v_neon:   10.69  13.30   9.43  14.34

For the larger 8tap filters, the speedup vs C code is around 5-14x.

This is significantly faster than libvpx's implementation of the same
functions, at least when comparing the put_8tap_smooth_64 functions
(compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from
libvpx).

Absolute runtimes from checkasm:
                          Cortex      A7        A8        A9       A53
vp9_put_8tap_smooth_64h_neon:    20150.3   14489.4   19733.6   10863.7
libvpx vpx_convolve8_horiz_neon: 52623.3   19736.4   21907.7   25027.7

vp9_put_8tap_smooth_64v_neon:    14455.0   12303.9   13746.4    9628.9
libvpx vpx_convolve8_vert_neon:  42090.0   17706.2   17659.9   16941.2

Thus, on the A9, the horizontal filter is only marginally faster than
libvpx, while our version is significantly faster on the other cores,
and the vertical filter is significantly faster on all cores. The
difference is especially large on the A7.

The libvpx implementation does the accumulation in 32 bit, which
probably explains most of the differences.

This is an adapted cherry-pick from libav commits
ffbd1d2b00,
392caa65df,
557c1675cf and
11623217e3.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Martin Storsjö
6409e9b6cc vp9dsp: Deduplicate the subpel filters
Make them aligned, to allow efficient access to them from simd.

This is an adapted cherry-pick from libav commit
a4cfcddcb0.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Michael Niedermayer
2baf36caed avcodec/ituh263dec: Avoid spending a long time in slice sync
Fixes: 177/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_FLV1_fuzzer

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-15 18:27:31 +01:00
Ronald S. Bultje
83a139e3d8 vp9: add avx2 iadst16 implementations.
Also a small cosmetic change to the avx2 idct16 version to make it
explicit that one of the arguments to the write-out macros is unused
for >=avx2 (it uses pmovzxbw instead of punpcklbw).
2016-11-15 11:01:36 -05:00
Michael Niedermayer
0eb3198005 avcodec/movtextdec: Add error message for tsmb_size check
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-15 15:08:20 +01:00
Michael Niedermayer
a609905723 avcodec/movtextdec: Fix tsmb_size check==0 check
Fixes: 173/fuzz-3-ffmpeg_SUBTITLE_AV_CODEC_ID_MOV_TEXT_fuzzer

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-15 15:08:20 +01:00
Michael Niedermayer
6ea2715768 avcodec/movtextdec: Fix potential integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-15 15:08:20 +01:00
Hendrik Leppkes
51f5542c77 Merge commit 'e8b96a77010dd62624c3c65c357d7ae3b397ceaa'
* commit 'e8b96a77010dd62624c3c65c357d7ae3b397ceaa':
  arm: Fix a typo in a comment

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 15:21:49 +01:00
Hendrik Leppkes
5a447edd47 Merge commit 'dc08bbf63a217c839aa4c143f2a1d0b7e2e6d997'
* commit 'dc08bbf63a217c839aa4c143f2a1d0b7e2e6d997':
  vp8dsp: Clarify the first dimension of the mc function tables

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 15:21:24 +01:00
Hendrik Leppkes
68b0d7e0be Merge commit '924e2ecd2b7d51cca60c79351ef16b04dd4245c3'
* commit '924e2ecd2b7d51cca60c79351ef16b04dd4245c3':
  qsvdec: when a frames ctx is supplied, use its frame dimensions

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 15:20:09 +01:00
Hendrik Leppkes
3c81fa9a9c Merge commit '92736c74fb1633e36f7134a880422a9b7db14d3f'
* commit '92736c74fb1633e36f7134a880422a9b7db14d3f':
  qsvdec: add support for P010 (10-bit 420) decoding

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 15:20:00 +01:00
Hendrik Leppkes
220e773915 Merge commit 'ce320cf1c4daab3e2e3726ed7d2e879d10f7b991'
* commit 'ce320cf1c4daab3e2e3726ed7d2e879d10f7b991':
  qsvdec: use the same mfxFrameInfo for allocating frames that was passed to DECODE_Init

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 15:19:51 +01:00
Hendrik Leppkes
1bc6cdf2fc Merge commit '536bb17e9659c5ed7576a218d4085cdd6d5742fa'
* commit '536bb17e9659c5ed7576a218d4085cdd6d5742fa':
  qsvdec: make ff_qsv_map_pixfmt() return a MFX fourcc as well

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 15:19:43 +01:00
Hendrik Leppkes
985bc8b496 Merge commit '6c445990e64124ad64c79423dfd3764520648c89'
* commit '6c445990e64124ad64c79423dfd3764520648c89':
  tiffenc: Check zlib support for deflate option during initialization

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 12:32:08 +01:00
Hendrik Leppkes
bebab21176 Merge commit '9f732e4c996243c1e57c2bbbec6c8b94c37a7a22'
* commit '9f732e4c996243c1e57c2bbbec6c8b94c37a7a22':
  tiffenc: Check av_pix_fmt_desc_get() return value

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 12:30:35 +01:00
Hendrik Leppkes
bbd0ebfd83 Merge commit 'd8f3b0fb584677d4882e3a2d7c28f8b15c7319f5'
* commit 'd8f3b0fb584677d4882e3a2d7c28f8b15c7319f5':
  targaenc: Move size check to initialization function

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 12:16:32 +01:00
Hendrik Leppkes
25004c7e6e Merge commit 'eeb6849cedac099d41feb482da581f4059c63ca7'
* commit 'eeb6849cedac099d41feb482da581f4059c63ca7':
  rle: K&R formatting cosmetics

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 12:03:00 +01:00
Hendrik Leppkes
444e65299b Merge commit '326d9116936ab61d13ac4142b49c7337daf7c4c0'
* commit '326d9116936ab61d13ac4142b49c7337daf7c4c0':
  build: Drop unnecessary libavcodec <-> libavformat object dependencies

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 12:01:17 +01:00
Hendrik Leppkes
a0bc6b51d4 Merge commit 'e72d6fa08a3c1876109149401753a8d2c736d418'
* commit 'e72d6fa08a3c1876109149401753a8d2c736d418':
  build: Move MP2 muxer declaration away from MP3 muxer code

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 11:20:15 +01:00
Hendrik Leppkes
9b4cc0f35c Merge commit 'fe27792fd779ac4cdd5e57be5f6f488483c307b2'
* commit 'fe27792fd779ac4cdd5e57be5f6f488483c307b2':
  build: Move ff_mpeg12_frame_rate_tab to a separate file

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 10:42:36 +01:00
Carl Eugen Hoyos
0674d1938e lavc/hevc_ps: Use correct pix_fmt for 10bit 4:0:0.
Fixes the second sample from ticket #5544.
2016-11-14 10:36:25 +01:00
Hendrik Leppkes
575e8d11f1 Merge commit '8c929037ec75fbe9f367e0a31ee34839e92de481'
* commit '8c929037ec75fbe9f367e0a31ee34839e92de481':
  build: Add a new component for H.264 parsing code

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 10:09:44 +01:00
Dmitry Kalinkin
dc23e359ef lavc/audiotoolboxdec: fix OSX SDK detection
__MAC_10_11 can be present in updated revision of an older SDK so it
can't reliably detect availability of kAudioFormatEnhancedAC3 constant.

Fixes: b4daa2c40f ('lavc/audiotoolboxdec: add eac3 decoder')
Cc: Rodger Combs <rodger.combs@gmail.com>
Signed-off-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com>
Previous version reviewed by: Rodger Combs <rodger.combs@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-14 02:35:26 +01:00
Carl Eugen Hoyos
b1367f7e5e lavc/dpx: Support GRAY12 colourspace. 2016-11-14 00:33:12 +01:00
Hendrik Leppkes
bd0db4a32d Merge commit '7a745f014f528d1001394ae4d2f4ed1a20bf7fa2'
* commit '7a745f014f528d1001394ae4d2f4ed1a20bf7fa2':
  options_table: Add aliases for color properties

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-13 22:29:04 +01:00
Mark Thompson
2dee500f4c vaapi_encode: Respect driver quirks around buffer destruction
No longer leaks memory when used with a driver with the "render does
not destroy param buffers" quirk (i.e. Intel i965).

(cherry picked from commit 221ffca631)
Fixes ticket #5871.
2016-11-13 20:39:48 +00:00
Hendrik Leppkes
2d7cf6f72b Merge commit 'f172e22d6aed0bff36e975bafb0183b6779f9444'
* commit 'f172e22d6aed0bff36e975bafb0183b6779f9444':
  pixdesc: Add aliases to SMPTE color properties

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-13 18:35:28 +01:00
Hendrik Leppkes
724a71dced Merge commit '8a62d2c28fbacd1ae20c35887a1eecba2be14371'
* commit '8a62d2c28fbacd1ae20c35887a1eecba2be14371':
  vaapi_encode: Maintain a pool of bitstream output buffers

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-13 17:38:40 +01:00
Hendrik Leppkes
db854c6c4a Merge commit '4a081f224e12f4227ae966bcbdd5384f22121ecf'
* commit '4a081f224e12f4227ae966bcbdd5384f22121ecf':
  libavcodec: fix constness in clobber test avcodec_open2() wrappers

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-13 17:30:33 +01:00
Andreas Cadhalpun
7112b56a34 vp9_mc_template: limit assert to SCALED == 0
The handling of the other block sizes was limited to 'SCALED == 0' in
commit dc96c0f9fc, so this assert should
be disabled, too, as it can now be triggered.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-13 12:38:15 +01:00
Michael Niedermayer
04bd1b38ee avcodec/htmlsubtitles: Fix reading one byte beyond the array
Fixes: fuzz-2-ffmpeg_SUBTITLE_AV_CODEC_ID_SUBRIP_fuzzer

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-12 03:23:03 +01:00
Andreas Cadhalpun
cdb5479c9d pnmdec: make sure v is capped by maxval
Otherwise put_bits can be called with a value that doesn't fit in the
sample_len, causing an assertion failure.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-12 01:36:47 +01:00
Andreas Cadhalpun
484151df7c pnm: limit maxval to UINT16_MAX
From 'man ppm': The maximum color value (Maxval), again in ASCII decimal.
                Must be less than 65536.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-12 01:36:47 +01:00
Andreas Cadhalpun
360bc0d90a smvjpegdec: make sure cur_frame is not negative
This fixes a heap-buffer-overflow detected by AddressSanitizer.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-12 01:36:47 +01:00
Andreas Cadhalpun
c82b8ef0e4 dvbsubdec: fix division by zero in compute_default_clut
This problem was introduced in commit
4b90dcb849.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-10 21:01:59 +01:00
Andreas Cadhalpun
1e33035ee7 proresdec_lgpl: explicitly check coff[3] against slice_data_size
The implicit checks via v_data_size and a_data_size don't work in the case
'(hdr_size > 7) && !ctx->alpha_info'.

This fixes segmentation faults due to invalid reads.

This problem was introduced in commit
547c2f002a.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-10 21:00:44 +01:00
Sasi Inguva
18108f3618 lavc/utils.c: Make sure skip_samples never goes negative.
Signed-off-by: Sasi Inguva <isasi@google.com>
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-10 17:44:47 +01:00
Tom Butterworth
bd6fa80d56 avcodec/hap: add "compressor" option to Hap encoder to disable secondary compression
The secondary compression in Hap is optional, this change exposes that option to
the user as some use-cases favour higher bitrate files to reduce workload
decoding.
Adds "none" or "snappy" as options for "compressor". Selecting "none" disregards
"chunks" option: chunking is only of benefit decompressing Snappy.

Reviewed-by: Martin Vignali <martin.vignali@gmail.com>
Signed-off-by: Tom Butterworth <bangnoise@gmail.com>
2016-11-10 14:27:38 +00:00
Carl Eugen Hoyos
08be65a075 lavc/hevc_ps: Fix an error message. 2016-11-10 08:22:26 +01:00
Carl Eugen Hoyos
edb8af6e92 lavc/hevc_ps: Use correct pix_fmt for 12bit 4:0:0.
Fixes part of ticket #5544.
2016-11-10 08:11:12 +01:00
Michael Niedermayer
2bc66d9e43 nut: add gray12 support
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-10 01:18:43 +01:00
Andreas Cadhalpun
226d35c845 escape124: reject codebook size 0
It causes a cb_depth of 32, leading to assertion failures in get_bits.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-09 21:10:59 +01:00
Tom Butterworth
0a24587588 avcodec/hap: pass texture-compression destination as argument, not in context
This allows a subsequent change to compress directly into the output packet when possible.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Tom Butterworth <bangnoise@gmail.com>
2016-11-08 17:05:27 +00:00
Rostislav Pehlivanov
317be31eaf opus: move the entropy decoding functions to opus_rc.c
The intention is to have both encoding and decoding functions
in opus_rc.c.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2016-11-08 14:18:59 +00:00
Rostislav Pehlivanov
0660a09dd1 opus: move all tables to a separate file
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2016-11-08 14:18:59 +00:00
Rostislav Pehlivanov
0cf6853804 aacenc: quit when the audio queue reaches 0 rather than keeping track of empty frames
The libopus encoder does the same thing and its better than
keeping track of when the empty flush frames appear.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2016-11-08 00:50:51 +00:00
Andreas Cadhalpun
5249706e9d mpegaudio_parser: don't return AVERROR_PATCHWELCOME
The API does not allow returning AVERROR codes.

It triggers an assert in av_parser_parse2.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-07 19:41:17 +01:00
Andreas Cadhalpun
0747754622 mpeg4audio: validate sample_rate
A negative sample rate doesn't make sense and triggers assertions in
av_rescale_rnd.

Also check for errors from avpriv_mpeg4audio_get_config in
ff_mp4_read_dec_config_descr.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-07 00:51:49 +01:00
Andreas Cadhalpun
bb6a7b6f75 lzf: update pointer p after realloc
This fixes heap-use-after-free detected by AddressSanitizer.

Reviewed-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-05 18:56:26 +01:00
Matt Oliver
6ead033bca avcodec/nvenc.c: Use new safe dlopen code.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2016-11-05 18:09:03 +11:00
James Almer
51e329918d avcodec/rawdec: check for side data before checking its size
Fixes valgrind warnings about usage of uninitialized values.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-11-04 23:38:56 -03:00
Andreas Cadhalpun
db79dedb1a diracdec: check return code of get_buffer_with_edge
If it fails, buffers aren't allocated, causing NULL pointer dereferencing.

Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-04 20:35:23 +01:00
Andreas Cadhalpun
24d20496d2 diracdec: clear slice_params_num_buf on allocation failure
Otherwise it can be non-zero next time decode_lowdelay is called, causing
slice_params_buf not to be allocated, leading to a NULL pointer dereference.

The problem was introduced in commit
dcad4677d6.

Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-04 20:34:51 +01:00
Andreas Cadhalpun
8a4ea96448 diracdec: use correct buffer for slice_params_buf realloc
This fixes a double-free detected by AddressSanitizer.

The problem was introduced in commit
dcad4677d6.

Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-04 20:34:38 +01:00
Tom Butterworth
92280f86b4 avcodec/hap: consistent name for codec
"Vidvox Hap", not "Vidvox Hap encoder" or "Vidvox Hap decoder". Fixes
bad name in "ffmpeg -codecs", matches other codec naming.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2016-11-04 11:19:47 -08:00
Anton Khirnov
fb240a6276 qsvenc: do not re-execute encoding on all positive status codes
It should only be done for DEVICE_BUSY/IN_EXECUTION

(cherry picked from commit 0956fd4606)
Fixes ticket #5924.
2016-11-04 18:56:01 +00:00
Derek Buitenhuis
8a8902f221 libx265: Add option to force IDR frames
This is in the same the same vein as c981b1145a.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-04 02:45:51 +01:00
Michael Niedermayer
cee1f4c069 avcodec/ac3dec: Check expacc
this is somewhat a magic number, which can be understood from reading section
"7.1.2 Exponent Strategy" of the ac3 specification, in short:
Three exponents each represented as number 0-4 are grouped together and
base-5 encoded, so the maximal correct value is 25*4 + 5*4 + 4 = 124.

Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-03 22:05:46 +01:00
Vittorio Giovara
067910ed13 hevc: Move hevc_decode_extradata before frame decoding
Avoids a forward-declaration in the following commit.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-03 16:28:04 +01:00
Andreas Cadhalpun
3932ccc472 ppc: pixblockdsp: do unaligned block accesses correctly again
This was broken by the following Libav commit:
4c387c7 ppc: dsputil: do unaligned block accesses correctly

The following tests fail due to this:
fate-checkasm
fate-vsynth1-dnxhd-2k-hr-hq fate-vsynth1-dnxhd-edge1-hr
fate-vsynth1-dnxhd-edge2-hr fate-vsynth1-dnxhd-edge3-hr
fate-vsynth1-dnxhd-hr-sq-mov fate-vsynth1-dnxhd-hr-hq-mov
fate-vsynth2-dnxhd-2k-hr-hq fate-vsynth2-dnxhd-edge1-hr
fate-vsynth2-dnxhd-edge2-hr fate-vsynth2-dnxhd-edge3-hr
fate-vsynth2-dnxhd-hr-sq-mov fate-vsynth2-dnxhd-hr-hq-mov
fate-vsynth3-dnxhd-2k-hr-hq fate-vsynth3-dnxhd-edge1-hr
fate-vsynth3-dnxhd-edge2-hr fate-vsynth3-dnxhd-edge3-hr
fate-vsynth3-dnxhd-hr-sq-mov fate-vsynth3-dnxhd-hr-hq-mov

Fixes trac ticket #5508.

Reviewed-by: Carl Eugen Hoyos <ceffmpeg@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-03 01:23:36 +01:00
Philip Langdale
d0a9af851e crystalhd: Update high level description
We don't need to document the horrible hacks that we removed.
2016-11-02 13:47:57 -07:00
Philip Langdale
a07c07e7aa crystalhd: Simplify output frame handling
The old code had to retain a partial frame across two calls in
the case of separate interlaced fields. Now, we know that we'll
get both fields within the same receive_frame call, and so we
don't need to manage the frame as private state any more.
2016-11-02 13:47:57 -07:00
Philip Langdale
3019b4f648 crystalhd: Loop for a frame internally where possible.
It's not possible to return EAGAIN when we've passed input EOF and are
in draining mode. If do return EAGAIN, we're saying there's no way to
get any more output - which isn't true in many cases.

So let's handled these cases in an internal loop as best we can.
2016-11-02 13:47:57 -07:00
Philip Langdale
0eb836942f crystalhd: Keep NOPTS_VALUE so we know it's not there. 2016-11-02 13:47:57 -07:00
Philip Langdale
13dbf77b81 crystalhd: Remove h.264 parser
Now that we don't need to do ridiculous things to work out if a
frame is interlaced or not, we don't need an extra h.264 parser.
2016-11-02 13:47:57 -07:00
Philip Langdale
89ba55dc0d crystalhd: We don't need the track the last picture number anymore
This was needed to detect an interlaced failure case that doesn't
happen with the new decode api.
2016-11-02 13:47:57 -07:00
Philip Langdale
badce88fdf crystalhd: Remove trust_interlaced heuristic
It seems that without all the other 1:1 heuristics, we don't have
a fundamental problem trusting the interlaced flag on output
pictures. That's a relief.
2016-11-02 13:47:57 -07:00
Philip Langdale
6cc390dd5a crystalhd: Revert back to letting hardware handle packed b-frames
I'm not sure why, but the mpeg4_unpack_bframes bsf is not
interacting well with seeking. Looking at the code, it should be
ok, with possibly one warning shown, but I see it getting stuck
for an extended period of time after a seek where a packed frame
is cached to be shown later.

So, I gave up on that and went back to making the old hardware
based path work. Turns out that it wasn't broken except that some
samples have a 6 byte drop packet which I wasn't accounting for.

Now it works again and seeks are good.
2016-11-02 13:47:57 -07:00
Philip Langdale
b5d714f493 crystalhd: Switch to new decode API and remove the insanity
The new decode API allows for m:n decode patterns, which is what
you need to use this hardware in a sane way. There are so many
situations where 1:1 doesn't happen naturally that it's a miracle
I got it working as well as I did.

With this change, we can throw all of the crazy heuristics and
sleeps(!) out, and things work correctly.
2016-11-02 13:47:30 -07:00
Philip Langdale
234d3cbf46 crystalhd: Fix up the missing first sample
Why on earth the hardware returns garbage for the first sample of
a decoded picture is anyone's guess. The simplest reasonable way
to patch it up is to copy the first sample of the second line. This
should result in the correct chroma values (because the data was
original 4:2:0 upsampled to 4:2:2) even if the luma is isn't.
2016-11-02 13:43:59 -07:00