inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.
See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.
Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
code directly also and remove loop setup. 20% faster in function, 0.8% overall.
See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.
Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
but it used to do it only for h264 codec.
Allow it for other codecs, as mpeg2 and mpeg4 also set this flag.
Originally committed as revision 25156 to svn://svn.ffmpeg.org/ffmpeg/trunk
Apparently Apple platforms do not handle movw/movt relocations
properly, leading to runtime crashes in code using them.
Originally committed as revision 25150 to svn://svn.ffmpeg.org/ffmpeg/trunk
which will hopefully solve the Win64/FATE failures caused by these functions.
Originally committed as revision 25137 to svn://svn.ffmpeg.org/ffmpeg/trunk
have been accepted before).
Also do not fail if they are invalid but instead override them to 0.
This allows decoding e.g. MPEG video when only the container values are corrupted.
For encoding a value of 0,0 of course makes no sense, but was allowed
through before and will be caught by an extra check in the encode function.
Originally committed as revision 25124 to svn://svn.ffmpeg.org/ffmpeg/trunk
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.
Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.
Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
Original patch by Zhou Zongyi, zhouzy A os pku edu cn, resubmitted by
James Darnley, james.darnley gmail, changes by me.
Originally committed as revision 25115 to svn://svn.ffmpeg.org/ffmpeg/trunk
For a PCE based configuration map the channels solely based on tags.
For an indexed configuration map the channels solely based on position.
This works with all known exotic samples including al17, elem_id0, bad_concat,
and lfe_is_sce.
Originally committed as revision 25098 to svn://svn.ffmpeg.org/ffmpeg/trunk
case the stride must be aligned to a multiple of 4.
The original CSCD encoder just compresses bitmaps it gets via Windows
API functions as-is, thus it uses exactly those alignment rules.
Originally committed as revision 25096 to svn://svn.ffmpeg.org/ffmpeg/trunk
mm_support instead.
Fix compilation if altivec is present and libxvidff.c is compiled.
Originally committed as revision 25075 to svn://svn.ffmpeg.org/ffmpeg/trunk
mm_support() instead.
Reduce complexity and simplify pending move to libavutil.
Originally committed as revision 25074 to svn://svn.ffmpeg.org/ffmpeg/trunk
Fixes a black line in non-swapped, non-mod-16-height Theora videos
when vp3_draw_horiz_band is used.
Originally committed as revision 25073 to svn://svn.ffmpeg.org/ffmpeg/trunk
Using floating-point here can cause erroneous rejection of
parameters due to rounding errors leading to a slightly too
large result.
This fixes the mxf regression test with gcc 4.5 on x86_32.
Originally committed as revision 25060 to svn://svn.ffmpeg.org/ffmpeg/trunk
that result in overdrawing areas again and again if s->flipped_image
is false.
Originally committed as revision 25051 to svn://svn.ffmpeg.org/ffmpeg/trunk
The #ifndef is required to allow for example some automated regression
tests by simply configuring with: --extra-cflags="-DFF_API_FOO=0".
Originally committed as revision 25043 to svn://svn.ffmpeg.org/ffmpeg/trunk
This increases compatibilty with nasm and is also more consistent,
e.g. with h264_intrapred.asm and h264_chromamc.asm that already
do it that way.
Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk
This fixes building with --disable-everything --enable-muxer=rtp, closing
issue 2159.
Originally committed as revision 25036 to svn://svn.ffmpeg.org/ffmpeg/trunk
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.
Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
biweight code to sse2/ssse3; add sse2 weight code; and use that same code to
create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be
removed. OK'ed by Jason on IRC.
Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.
Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
uint16_fast_t is unsigned int (or long) on Linux, which when compared
with int results in an unsigned compare.
Originally committed as revision 24994 to svn://svn.ffmpeg.org/ffmpeg/trunk
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in
dsputil_mmx.c).
Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk
This removes duplicated definitions of 8x8 and 16x16 fullpel MC
functions with various names reducing dsputil.o by 8k on x86_64.
Originally committed as revision 24933 to svn://svn.ffmpeg.org/ffmpeg/trunk
The stride can be negative and must be sign extended before being
used in pointer arithmetic.
Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk
Because the order of evaluation of subexpressions is undefined, two
get_bits() calls may not be part of the same expression.
See also r24902.
Originally committed as revision 24906 to svn://svn.ffmpeg.org/ffmpeg/trunk
Because the order of evaluation of subexpressions is undefined, two
get_bits() calls may not be part of the same expression. In this
specific case, using get_bits_long() is simpler.
This fixes msmpeg4v1 decoding with armcc.
Originally committed as revision 24902 to svn://svn.ffmpeg.org/ffmpeg/trunk
This performs quite a bit better than the current 3GPP-inspired window decision
on all the samples I have tested. On the castanets.wav sample it performs very
similar to iTunes window selection, and seems to perform better than Nero.
On fatboy.wav, it seems to perform at least as good as iTunes, if not better.
Nero performs horribly on this sample.
Patch by: Nathan Caldwell <saintdev@gmail.com>
Originally committed as revision 24892 to svn://svn.ffmpeg.org/ffmpeg/trunk
This allows cleaner implementation of other psymodels using the existing
structs. It also will make it easier to interchange individual parts of
the psymodel to create hybrid models.
Patch by: Nathan Caldwell <saintdev@gmail.com>
Originally committed as revision 24890 to svn://svn.ffmpeg.org/ffmpeg/trunk
This is to avoid split asm sections that attempt to preserve some
registers between sections.
Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk
since all of it ends up in a single 32-bit pixel.
This seems likely to be wrong though, since it is different
from the 15 and 16 bit modes and might explain the half-width
issue for 24 bit truemotion.
Originally committed as revision 24864 to svn://svn.ffmpeg.org/ffmpeg/trunk
BGR32 and not BGR24, do not swap red and blue on big-endian
for this format as well.
Originally committed as revision 24863 to svn://svn.ffmpeg.org/ffmpeg/trunk
The relevent extract of the iso 13818-2 about the value of the syntaxical
element top_field_first of the Picture Coding Extension is:
"top_field_first -- The meaning of this element depends upon picture_structure,
progressive_sequence and repeat_first_field.
[...]
In a field picture top_field_first shall have the value '0', and the only field
output by the decoding process is the decoded field picture."
Originally committed as revision 24853 to svn://svn.ffmpeg.org/ffmpeg/trunk
Conversion of an out of range float to int is undefined. Clipping to
the final range first avoids such problems. This fixes decoding on
MIPS, which handles these conversions differently from many other CPUs.
Originally committed as revision 24838 to svn://svn.ffmpeg.org/ffmpeg/trunk
ff_get_plane_bytewidth().
The new implementation is more generic, more compact and more correct.
Originally committed as revision 24786 to svn://svn.ffmpeg.org/ffmpeg/trunk
Grab from the bitstream in 16-bit chunks instead of 8-bit chunks.
TODO: grab in 32-bit chunks on 64-bit systems.
Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk
wrapper of it.
The new function is more generic, and does not depend on the
definition of the AVPicture struct.
Patch by S.N. Hemanth Meenakshisundaram s + "meenakshisundaram".substr(0, 7) + "@ucsd.edu".
Originally committed as revision 24768 to svn://svn.ffmpeg.org/ffmpeg/trunk
correct mb_xy to use mb_width.
tighten allocations.
reduce the amount of zeroing.
Originally committed as revision 24760 to svn://svn.ffmpeg.org/ffmpeg/trunk
two padding zeroes before it. Should fix fate failures on openBSD and crashes
on MacOSX (that I cannot reproduce).
Originally committed as revision 24750 to svn://svn.ffmpeg.org/ffmpeg/trunk
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself.
Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
including libavcore/imgutils.h, which was required since the recent
avcodec_check_dimensions() -> av_check_image_size() transition.
Originally committed as revision 24734 to svn://svn.ffmpeg.org/ffmpeg/trunk
When sizeof(uint_fast8_t) >= sizeof(int) there are unintended size effects.
Originally committed as revision 24716 to svn://svn.ffmpeg.org/ffmpeg/trunk
Create a custom table for VP5/6/8's renorm to avoid depending on H.264's.
Saves one instruction in the arithmetic decoder as well.
Originally committed as revision 24701 to svn://svn.ffmpeg.org/ffmpeg/trunk
This reverts rev 24674 - the VP8 decoder actually depends on cabac.o.
vp8.c includes vp56.h, which includes cabac.h, which has inline functions
that reference tables from cabac.c.
This fixes compilation with --disable-everything --enable-decoder=vp8.
Originally committed as revision 24692 to svn://svn.ffmpeg.org/ffmpeg/trunk
Always inline the arithmetic coder, except in the case of header-parsing stuff,
in which case don't inline it at all to save code size.
Originally committed as revision 24677 to svn://svn.ffmpeg.org/ffmpeg/trunk
If sizeof uint_fast8_t > 1 and sizeof size_t <= 4, the expression that mallocs
classifs is susceptible to integer overflow.
Originally committed as revision 24675 to svn://svn.ffmpeg.org/ffmpeg/trunk
A lot of the time the DC block is empty: don't do the WHT in this case.
A lot of the rest of the time, there's only one coefficient: make a special
DC-only transform for that case.
When the block is empty, don't incorrectly mark luma DCT blocks as having DC
coefficients.
Originally committed as revision 24670 to svn://svn.ffmpeg.org/ffmpeg/trunk
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.
Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte
interleave. Also allows using pextrw on SSE4, which speeds up even more
(e.g. 25% faster on Core i7).
Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
can be known exactly, so larger frame sizes can be safely encoded without buffer
overwrite.
Originally committed as revision 24630 to svn://svn.ffmpeg.org/ffmpeg/trunk