1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-28 20:53:54 +02:00
Commit Graph

86815 Commits

Author SHA1 Message Date
Michael Niedermayer
850c6db97d avcodec/utvideodec: Factor multiply out of inner loop
0.5% faster loop

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Steven Liu <lingjiujianke@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-28 14:08:21 +02:00
Michael Niedermayer
5eb4701b7d avcodec/utvideodec: bswap directly without memcpy
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-28 14:08:21 +02:00
Michael Niedermayer
676a589c93 avcodec/utvideodec: enable unchecked bitreader
inner reader loop becomes 16% faster

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-28 14:08:21 +02:00
Michael Niedermayer
9c604b34d4 avcodec/utvideodec: hardcode vlc bits
2.5% faster vlc decoding

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-28 14:08:21 +02:00
Michael Niedermayer
1835c5e7a4 avcodec/utvideodec: Move bitstream end check out of inner loop
This is not needed when the buffer is large enough for the worst case of a line

2% faster vlc reading

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-28 14:08:21 +02:00
Clément Bœsch
b12a36170b lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis 2017-06-28 12:22:39 +02:00
Clément Bœsch
ff0ecef624 lavc/aarch64: add a few SIMD functions for AAC PS
☭ tests/checkasm/checkasm --bench --test=aacpsdsp
checkasm: using random seed 3318985180
MMX implied by specified flags
MMX implied by specified flags
NEON:
 - aacpsdsp.add_squares        [OK]
 - aacpsdsp.mul_pair_single    [OK]
 - aacpsdsp.hybrid_analysis    [OK]
 - aacpsdsp.stereo_interpolate [OK]
checkasm: all 5 tests passed
nop: 10.0
ps_add_squares_c: 63221.2
ps_add_squares_neon: 22311.7
ps_hybrid_analysis_c: 2466.6
ps_hybrid_analysis_neon: 1521.9
ps_mul_pair_single_c: 68592.0
ps_mul_pair_single_neon: 17426.6
ps_stereo_interpolate_c: 72344.3
ps_stereo_interpolate_neon: 72308.8
ps_stereo_interpolate_ipdopd_c: 117415.2
ps_stereo_interpolate_ipdopd_neon: 113386.3
2017-06-28 12:22:39 +02:00
Clément Bœsch
9bbb0fbd31 lavc/aacpsdsp: fix a few spaces (cosmetics) 2017-06-28 12:22:39 +02:00
Clément Bœsch
edd041e64c checkasm: add AAC PS tests
This includes various fixes and improvements from James Almer.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-28 12:22:39 +02:00
Clément Bœsch
e4a27e2f2d lavc/arm: fix lack of precision in ff_ps_stereo_interpolate_neon
The code originally pre-multiply by 2 the steps, causing the running sum
of the h factors to drift away due to the lack of precision. It quickly
causes an inaccuracy > 0.01.

I tried diverse approaches such as multiply by 2.0 (instead of adding
the value itself) without success.

I'm unable to bench the impact of this change, feel free to compare.

This commit fixes the incoming aacpsdsp tests.

Following is an alternative simplified function (matching the incoming
AArch64 code) that may be used:

function ff_ps_stereo_interpolate_neon, export=1
        vld1.32         {q0}, [r2]
        vld1.32         {q1}, [r3]
        ldr             r12, [sp]
        vmov.f32        q8, q0
        vmov.f32        q9, q1
        vzip.32         q8, q0
        vzip.32         q9, q1
1:
        vld1.32         {d4}, [r0,:64]
        vld1.32         {d6}, [r1,:64]
        vadd.f32        q8, q8, q9
        vadd.f32        q0, q0, q1
        vmov.f32        d5, d4
        vmov.f32        d7, d6
        vmul.f32        q2, q2, q8
        vmla.f32        q2, q3, q0
        vst1.32         {d4}, [r0,:64]!
        vst1.32         {d5}, [r1,:64]!
        subs            r12, r12, #1
        bgt             1b
        bx              lr
endfunc
2017-06-28 11:59:34 +02:00
James Almer
d2ef9e6e7f x86/vf_blend: use ABS2 macro 2017-06-27 20:45:55 -03:00
Michael Niedermayer
516c213f08 avcodec/x86/vp9dsp_init_16bpp: Fix linking to missing ff_vp9_ipred_dr_32x32_16_avx2() on 32bit
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-28 00:31:33 +02:00
Hendrik Leppkes
15b00aea41 hwcontext_d3d11va: use correct license header 2017-06-28 00:19:55 +02:00
Michael Niedermayer
c578c9c229 libswresample/swresample: remove obsolete code
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-27 23:21:53 +02:00
Michael Niedermayer
2c874548d6 avcodec/hevcdec: do basic validity check on delta_chroma_weight and offset
Fixes: runtime error: signed integer overflow: 2147483520 + 128 cannot be represented in type 'int'
Fixes: 2385/clusterfuzz-testcase-minimized-6594333576790016

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-27 23:21:12 +02:00
Ilia Valiakhmetov
35a5d9715d avcodec/vp9: add 64-bit ipred_dr_32x32_16 avx2 implementation
vp9_diag_downright_32x32_12bpp_c: 429.7
vp9_diag_downright_32x32_12bpp_sse2: 158.9
vp9_diag_downright_32x32_12bpp_ssse3: 144.6
vp9_diag_downright_32x32_12bpp_avx: 141.0
vp9_diag_downright_32x32_12bpp_avx2: 73.8

Almost 50% faster than avx implementation

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-06-27 16:10:50 -04:00
James Almer
0daa1cf073 x86/vf_blend: optimize difference and negation functions
Process more pixels per loop.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 13:17:23 -03:00
James Almer
fa50d9360b x86/vf_blend: add sse and ssse3 extremity functions
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 13:17:23 -03:00
Anton Khirnov
d14179e3d4 hwframe: Allow hwaccel frame allocators to align surface sizes
Hardware accelerated decoding generally uses AVHWFramesContext for pool
allocation of hardware surfaces. These are setup to allocate surfaces
aligned to hardware and hwaccel API requirements. Due to the
architecture, av_hwframe_get_buffer() will return AVFrames with
the dimensions set to the aligned sizes.

This causes some decoders (like hevc) return these aligned size as
final frame size, instead of cropping them to the video's actual
dimensions. To make sure this doesn't happen, crop the frame to the
size the decoder expects when ff_get_buffer() is called.

Merges Libav commit 3fdf50f9e8.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-06-27 18:05:02 +02:00
wm4
f0bcedaf37 dxva: verbose-log decoder GUID list
Helpful for debugging.

Merges Libav commit 068eaa534e.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-06-27 18:05:02 +02:00
wm4
289d387330 hwcontext_d3d11va: add option to enable debug mode
Basically copied from VLC (LGPL):

http://git.videolan.org/?p=vlc.git;a=blob;f=modules/video_output/win32/direct3d11.c;h=e9fcb83dcabfe778f26e63d19f218caf06a7c3ae;hb=HEAD#l1482
http://git.videolan.org/?p=vlc.git;a=blob;f=modules/codec/avcodec/d3d11va.c;h=85e7d25caebc059a9770da2ef4bb8fe90816d76d;hb=HEAD#l599

Merges Libav commit cfc9e7c94e.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-06-27 18:05:02 +02:00
wm4
8d7fdba7b8 dxva: support DXGI_FORMAT_420_OPAQUE decoding
Some devices (some phones, apparently) will support only this opaque
format. Of course this won't work with CLI, because copying data
directly is not supported.

Automatic frame allocation (setting AVCodecContext.hw_device_ctx) does
not support this mode, even if it's the only supported mode. But since
opaque surfaces are generally less useful, that's probably ok.

Merges Libav commit 5030e3856c.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-06-27 18:05:02 +02:00
wm4
6f5ff3269b hwcontext_d3d11va: allocate staging texture lazily
Makes dealing with formats that can not be used for staging textures
easier (DXGI_FORMAT_420_OPAQUE). It also saves memory if the staging
texture is never needed, so this is a good thing.

Merges Libav commit 98d73e4174.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-06-27 18:05:02 +02:00
wm4
1509d739a0 hwcontext_d3d11va: fix crash on frames_init failure
It appears in this case, frames_ininit is called twice (once by
av_hwframe_ctx_init(), and again by unreffing the frames ctx ref).

Merges Libav commit 086321c612.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-06-27 18:05:02 +02:00
wm4
39f201a0ec dxva: fix some warnings
Some existed since forever, some are new.

The cast in get_surface() is silly, but unless we change the av_log
function signature, or all callers of ff_dxva2_get_surface_index(), it's
needed to remove the const warning.

Merges Libav commit 752ddb4556.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-06-27 18:05:02 +02:00
wm4
e2afcc33e0 dxva: add declarative profile checks
Make supported codec profiles part of each dxva_modes entry. Every DXVA2
mode is representative for a codec with a subset of supported profiles,
so reflecting that in dxva_modes seems appropriate.

In practice, this will more strictly check MPEG2 profiles, will stop
relying on the surface format checks for selecting the correct HEVC
profile, and remove the verbose messages for mismatching H264/HEVC
profiles. Instead of the latter, it will now print the more nebulous "No
decoder device for codec found" verbose message.

This also respects AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH. Move the
Main10 HEVC entry before the normal one to make this work better.

Originally inspired by VLC's code.

Merges Libav commit 70e5e7c022.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-06-27 18:05:02 +02:00
Martin Storsjö
3125a4a8a8 d3d11va: Link directly to dxgi.dll and d3d11.dll functions if LoadLibrary is unavailable
When targeting the UWP API subset, the LoadLibrary function is not
available (and the fallback, LoadPackagedLibrary, can't be used to
load system DLLs). In these cases, link directly to the functions
in the DLLs instead of trying to load them dynamically at runtime.

Merges Libav commit fd1ffa1f10.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-06-27 18:05:02 +02:00
wm4
70143a3954 dxva: add support for new dxva2 and d3d11 hwaccel APIs
This also adds support to avconv (which is trivial due to the new
hwaccel API being generic enough).

The new decoder setup code in dxva2.c is significantly based on work by
Steve Lhomme <robux4@gmail.com>, but with heavy changes/rewrites.

Merges Libav commit f9e7a2f95a.
Also adds untested VP9 support.
The check for DXVA2 COBJs is removed. Just update your MinGW to
something newer than a 5 year old release.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-06-27 18:05:02 +02:00
wm4
5659f74047 dxva: move d3d11 locking/unlocking to functions
I want to make it non-mandatory to set a mutex in the D3D11 device
context, and replacing it with user callbacks seems like the best
solution. This is preparation for it. Also makes the code slightly more
readable.

Merges Libav commit 831cfe10b4.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-06-27 18:05:02 +02:00
wm4
ab28108a36 dxva: preparations for new hwaccel API
The actual hwaccel code will need to access an internal context instead
of avctx->hwaccel_context, so add a new DXVA_CONTEXT() macro, that will
dispatch between the "old" external and the new internal context.

Also, the new API requires a new D3D11 pixfmt, so all places which check
for the pixfmt need to be adjusted. Introduce a ff_dxva2_is_d3d11()
function, which does the check.

Merges Libav commit 4dec101acc.
Adds changes to vp9 over the Libav patch.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-06-27 18:05:02 +02:00
wm4
865360ba63 lavc: set avctx->hwaccel before init
So a hwaccel can access avctx->hwaccel in init for whatever reason. This
is for the new d3d hwaccel API. We could create separate entrypoints for
each of the 3 hwaccel types (dxva2, d3d11va, new d3d11va), but this
seems nicer.

Merges Libav commit bd747b9226.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-06-27 18:05:02 +02:00
wm4
3303511f33 lavu: add new D3D11 pixfmt and hwcontext
To be used with the new d3d11 hwaccel decode API.

With the new hwaccel API, we don't want surfaces to depend on the
decoder (other than the required dimension and format). The old D3D11VA
pixfmt uses ID3D11VideoDecoderOutputView pointers, which include the
decoder configuration, and thus is incompatible with the new hwaccel
API. This patch introduces AV_PIX_FMT_D3D11, which uses ID3D11Texture2D
and an index. It's simpler and compatible with the new hwaccel API.

The introduced hwcontext supports only the new pixfmt.

Frame upload code untested.

Significantly based on work by Steve Lhomme <robux4@gmail.com>, but with
heavy changes/rewrites.

Merges Libav commit fff90422d1.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-06-27 18:05:02 +02:00
James Almer
4d62ee6746 x86inc: don't use read-only data sections on COFF targets
Yasm:
src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections
src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align'

Nasm:
src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification
src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here

Tested-by: Clément Bœsch <u@pkh.me>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 12:48:04 -03:00
Paul B Mahol
feab761b73 avcodec/interplayvideo: properly check if there is enough bytes left
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-06-27 15:46:08 +02:00
Hein-Pieter van Braam
c4cbaec6e3 Interplay MVE: Changelog entry for changes
Signed-off-by: Hein-Pieter van Braam <hp@tmm.cx>
2017-06-27 15:10:37 +02:00
Hein-Pieter van Braam
8f96da060a Interplay MVE: Implement frame format 0x10
This implements the 0x10 frame format for Interplay MVE movies. The
format is a variation on the 0x06 format with some changes. In addition
to the decoding map there's also a skip map. This skip map is used to
determine what 8x8 blocks can change in a particular frame.

This format expects to be able to copy an 8x8 block from before the last
time it was changed. This can be an arbitrary time in the past. In order
to implement this this decoder allocates two additional AVFrames where
actual decoding happens. At the end of a frame decoding changed blocks
are copied to a finished frame based on the skip map.

The skip map's encoding is a little convulted, I'll refer to the code
for details.

Values in the decoding map are the same as in format 0x06.

Signed-off-by: Hein-Pieter van Braam <hp@tmm.cx>
2017-06-27 15:09:12 +02:00
Hein-Pieter van Braam
19f6fd199e Interplay MVE: Implement frame format 0x06
This implements the 0x06 frame format for Interplay MVE movies. The
format is relatively simple. The video data consists of two parts:

16 bits per 8x8 block movement data
a number of 8x8 blocks of pixel data

For each 8x8 block of pixel data the movement data is consulted. There
are 3 possible meanings of the movement data:
* zero     : copy the 8x8 block from the pixel data
* negative : copy the 8x8 block from the previous frame from an offset
             determined by the actual value of the entry -0xC000.
* positive : copy the 8x8 block from the current frame from an offset
             determined by the actual value of the entry -0x4000

Decoding happens in two passes, in the fist pass only new pixeldata is
copied, during the second pass data is copied from the previous and
current frames.

The codec expects that the current frame being decoded to still has the
data from 2 frames ago on it when decoding starts.

Signed-off-by: Hein-Pieter van Braam <hp@tmm.cx>
2017-06-27 15:08:39 +02:00
Hein-Pieter van Braam
8f87bfb4b7 Interplay MVE: Refactor IP packet format
Interplay MVE can contain up to three different frame formats. They
require different streams of information to render a frame. This patch
changes the IP packet format to prepare for the extra frame formats.

Signed-off-by: Hein-Pieter van Braam <hp@tmm.cx>
2017-06-27 15:06:14 +02:00
Hein-Pieter van Braam
ba2c385006 Interplay MVE: Implement MVE SEND_BUFFER operation
Interplay MVE movies have a SEND_BUFFER operation. Only after this
command does the current decoding buffer get displayed. This is required
for the other frame formats. They are fixed-size and can't always encode
a full frame worth of pixeldata.

This code prevents half-finished frames from being emitted.

Signed-off-by: Hein-Pieter van Braam <hp@tmm.cx>
2017-06-27 15:03:51 +02:00
Paul B Mahol
bbaca6e867 avcodec/proresenc_kostya: add 4444XQ profile
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-06-27 13:53:51 +02:00
Michael Niedermayer
0f8d3d8a46 avcodec/ffv1enc: compute the max number of slices and limit by that
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-27 13:09:58 +02:00
Andreas Håkon
a29c712729 avformat: Fix Pro-MPEG non-square matrix
Reviewed-by:vtarca@mobibase.com
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-27 12:54:06 +02:00
Michael Niedermayer
430d4f2bb5 avcodec/ffv1enc: Allow less than 2 rows of slices for low vertical resolution
Fixes: Ticket5548

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-27 12:54:06 +02:00
Paul B Mahol
4ed7c2bbc3 avcodec/utvideodec: add SIMD for restore_rgb_planes
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-06-27 09:54:10 +02:00
Paul B Mahol
3594788b71 avcodec/utvideodec: decode to GBR(A)P
This is actually internal utvideo format.
Allows to make use of SIMD for median prediction for rgb(a) formats,
thus speeding up decoding.
Simplifies code, eases further developement and maintenance.

Update FATE because of pixel format switch.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-06-26 20:11:58 +02:00
Paul B Mahol
e9510dc032 avfilter: remove usage of empty header
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-06-26 19:43:41 +02:00
Kyle Swanson
c11ca6105e avcodec/g722enc: force mono channel layout
Signed-off-by: Kyle Swanson <k@ylo.ph>
2017-06-26 09:46:58 -05:00
Michael Niedermayer
3c70251780 avcodec/jpeg2000dwt: Fix integer overflows in sr_1d97_int()
Fixes: runtime error: signed integer overflow: 1157259380 + 1157259380 cannot be represented in type 'int'
Fixes: 2365/clusterfuzz-testcase-minimized-6020421927305216

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-26 16:06:25 +02:00
Michael Niedermayer
ea5366670e avcodec/jpeg2000dwt: Fix integer overflow in dwt_decode97_int()
Fixes: runtime error: signed integer overflow: -163654656 * 256 cannot be represented in type 'int'
Fixes: 2367/clusterfuzz-testcase-minimized-4648678897745920

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-26 16:06:25 +02:00
Michael Niedermayer
4147bb9053 avcodec/ffv1enc: Try to choose slice count so that slice packet sizes are within the supported size
Fixes assertion failure

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-26 16:06:25 +02:00