Instead of inlining everything into ff_h264_hl_decode_mb(), use
explicit templating to create versions of the called functions
with constant parameters filled in. This greatly speeds up
compilation of h264.c and reduces the code size without any
measurable impact on performance.
Compilation time for h264.c on an i7 goes from 30s to 5.5s.
Code size is reduced by 430kB.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Using ff_mspel_motion assumes that s (a MpegEncContext
poiinter) really is a Wmv2Context.
This fixes crashes in error resilience on vc1/wmv3 videos.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
The buffers are only allocated once, although it can happen from
any of a few different places, so there is no need to use realloc.
Using av_malloc() ensures they are aligned suitably for SIMD
optimisations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
ff_wma_init is used only by wmadec and wmaenc, and neither of them
can handle more than 2 channels.
This fixes crashes with invalid files.
Based on patch by Piotr Bandurski and Michael Niedermayer.
Signed-off-by: Martin Storsjö <martin@martin.st>
This creates proper position independent code when accessing
data symbols if CONFIG_PIC is set.
References to external symbols should now use the movrelx macro.
Some additional code changes are required since this macro may
need a register to hold the GOT pointer.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The problem is that the ssse3 psign instruction does the wrong
thing here. Commit ea60dfe incorrectly removed a macro emulating
this instruction for pre-ssse3 code. However, the emulation is
incorrect, and the code relies on the behaviour of the macro.
Specifically, the psign sets destination elements to zero where
the corresponding source element is zero, whereas the emulation
only negates destination elements where the source is negative.
Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus,
which is why the original VC-1 code had an additional right shift
when using it. Since the psign instruction cannot be used here,
skip all the macro hell and use the working instruction sequence
directly.
None of this was noticed due a stray return statement in
ff_vc1dsp_init_mmx() which meant that only the mmx version of the
loop filter was ever used (before being removed in ea60dfe).
Signed-off-by: Mans Rullgard <mans@mansr.com>
The function call was a mess to handle, and memcpy cannot make
the assumptions we do in the new code.
Tested on an IMC sample: 430c -> 370c.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Apparently, some build environments require dxva.h even for dxva2,
while others lack this header entirely. Including it conditionally
allows building in both cases.
Signed-off-by: Martin Storsjö <martin@martin.st>
The MBAFF flag may only be signaled if we're actually dealing with
a full frame, and not singular fields, as it can happen in mixed content.
Signed-off-by: Martin Storsjö <martin@martin.st>