* commit 'fb3b2f5d923a6e19d80f21eb4e081674bceec810':
configure: Set the thread type after resolving dependencies
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '2f02bbcca050936686482453078e83dc25493da0':
build: Let the ffvhuff decoder/encoder depend on the huffyuv decoder/encoder
Conflicts:
configure
libavcodec/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '34150be515cd9c43b0b679806b8d01774960af78':
build: Let the iac decoder depend on the imc decoder
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '8e0cf39faf02536dca08f4fe628a66d1ae022fde':
build: Let all MJPEG-related decoders depend on the MJPEG decoder
Conflicts:
configure
libavcodec/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '0a36988e48dd581d29e77f768f987738bdf365f0':
build: Let AMV decoder depend on the SP5X decoder
Conflicts:
configure
libavcodec/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '17a63ff0cd187b9e50e4a47862750295976853b1':
h264: update flag name in ff_h264_decode_ref_pic_list_reordering()
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e70ab7c1f5005041bba0e4efc1165410f83495b2':
h264: add MVCD to the list of High profiles in SPS
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The new function has the ability to allocate the structure, allowing it to grow
without needing major bumps
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This structure is used in the interface between libs and thus cannot have
fields added in the middle without major bump
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
A threading type might be detected originally, but later disabled
if one of its dependencies is unavailable.
This makes sure that the threading support item in the configure
output is right for setups where w32threads are available but
native atomics aren't.
Signed-off-by: Martin Storsjö <martin@martin.st>
These codecs compile all of the MJPEG code anyway, so there is little
point in not enabling the MJPEG decoder directly. This also simplifies
the dependency declarations for the MJPEG codec family.
This codec compiles all of the SP5X code anyway, so there is little
point in not enabling the decoder directly. This also simplifies the
dependency declaration for the AMV decoder.
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '92e598a57a7ce4b8ac9ea56274af39f5fd888311':
prores: Drop DSP infrastructure for prores encoder bits
Conflicts:
libavcodec/Makefile
libavcodec/proresdsp.c
libavcodec/proresenc_kostya.c
Note, these changes only affect one of the 2 prores encoders we have
If someone wants to add optimizations to the affected encoder, or needs/wants
this infrastructure, then iam happy to revert this
Merged-by: Michael Niedermayer <michaelni@gmx.at>
AAC LOAS can have new audio config objects in the stream itself.
Make sure the decoder reconfigures itself when the first one arrives
midstream.
Bug-Id: 644
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.
Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.
The decode_hf implementations have following timings:
For x86 Arrandale:
C SSE SSE2 SSE4
win32: 260 162 119 104
win64: 242 N/A 89 72
The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
Based on a patch from Christophe Gisquet.
Unrolling of the m == 0 case avoids a possible use of the uninitilized
value sum when s->predictor_history is not set. I failed to find a
sample for it. It also reduced the cycle count from 220 to 150 on
sandy bridge, x86_64 linux, gcc 4.8.2 compared to his patch.
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The scaling factor is constant so it is faster to scale the
FIR coefficients in the tables during compilation.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>