Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).
*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips
This means around a 4% speedup for some sequences.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This makes the first packet of a track fragment run to get
the keyframe flag set properly if sample_degradation_priority
is nonzero.
This makes the keyframes flag be set properly for ismv files
created by Microsoft.
Signed-off-by: Martin Storsjö <martin@martin.st>
Previously, we've only passed the key string on to the recursive
amf_parse_object for the mixedarray type, not for 'object'. By
passing the key string on, the recursive amf_parse_object can
store the amf objects as metadata.
This kind of data was seen in data from XSplit Broadcaster, received
over RTMP via Wowza. This patch allows reading this metadata.
Signed-off-by: Martin Storsjö <martin@martin.st>
Currently, any samples in the final frame are not decoded because they are
only represented by one frame instead of two. So we encode two final frames to
cover both the analysis delay and the MDCT delay.
Check results for av_malloc() and fix an overflow in one call.
Related to CVE-2011-3940.
Based in part on work from Michael Niedermayer.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Fixes CVE-2011-3940 (Out of bounds read resulting in out of bounds write)
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
(cherry picked from commit 5c011706bc)
Signed-off-by: Alex Converse <alex.converse@gmail.com>
While pshufb allows emulating bswap on XMM registers for SSSE3, more
shuffling is needed for SSE2. Alignment is critical, so specific codepaths
are provided for this case.
For the huffyuv sequence "angels_480-huffyuvcompress.avi":
C (using bswap instruction): ~ 55k cycles
SSE2: ~ 40k cycles
SSSE3 using unaligned loads: ~ 35k cycles
SSSE3 using aligned loads: ~ 30k cycles
Signed-off-by: Diego Biurrun <diego@biurrun.de>
The functions are already av_ prefixed and intfloat header is already provided.
Install libavutil/intfloat.h
Signed-off-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
-vbsf doesn't exist anymore. It got renamed to -bsf somewhere along the
line. Update print statement accordingly.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Current demuxer recognizes several colorspace formats that begin with 'C420'
but does not yet recognize plain 'C420'. GStreamer's y4menc component
generates .y4m files with a 'C420' colorspace. This new comparison is
placed after the other 'C420' checks so that it doesn't interfere with
them.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
On x86-64, it indeed uses all 16 registers (and on x86-32, this gets
clipped to 8). Not marking it properly causes callers of this function
to fail randomly because of XMM register clobbering.
Note: This fixes the following GCC warning :-
libavcodec/sunrast.c:94: warning: comparison of unsigned expression < 0 is always false.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
10l: Forgot to adjust deinterleave for new location of incoming samples in 7946a5a.
This produced incorrect, but surprisingly listenable results.
Thanks to Justin Ruggles for the report.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This prepares for assembly optimisations by moving the most
time-consuming loops to functions called through pointers
in a new context.
Signed-off-by: Mans Rullgard <mans@mansr.com>