Although a reasonable compiler will probably optimise out the
actual store and load, this operation still implies a truncation
to 16 bits which the compiler will probably not realise is not
necessary here.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Writing the scaled excitation to a scratch buffer (borrowing the
'audio' array) instead of modifying it in place avoids the need
to save and restore the unscaled values.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Use saturating addition functions instead of 64-bit intermediates
and separate clipping. This is much faster when dedicated
instructions are available.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Firstly, nothing in this function can overflow 32 bits so the use
of a 64-bit type is completely unnecessary. Secondly, the scale
is either a power of two or 0x7fff. Doing separate loops for these
cases avoids using multiplications. Finally, since only the number
of bits, not the actual value, of the maximum value is needed, the
bitwise or of all the values serves the purpose while being faster.
It is worth noting that even if overflow could happen, it was not
handled correctly anyway.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The operands in both cases are 16-bit so cannot overflow a 32-bit
destination. In gain_scale() the inputs are reduced to 14-bit,
so even the shift cannot overflow.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Adding instead of subtracting the products in the loop allows the
compiler to generate more efficient multiply-accumulate instructions
when 16-bit multiply-subtract is not available. ARM has only
multiply-accumulate for 16-bit operands. In general, if only one
variant exists, it is usually accumulate rather than subtract.
In the same spirit, using the dedicated saturation function enables
use of any special optimised versions of this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
C++ does not allow to mix different enums, so e.g. code comparing
ACodecID with CodecID would fail to compile with gcc.
This very evil hack should fix this problem.
* qatar/master:
g723.1: fix addition overflow
g723.1: simplify and fix multiplication overflow
g723.1: deobfuscate an expression
g723.1: remove unused #includes
ARM: add missing "cc" clobber in av_clipl_int32_arm()
rtmp: Factorize the code by adding handle_invoke_error
rtmp: Factorize the code by adding handle_invoke_status
rtmp: Factorize the code by adding handle_invoke_result
libavutil: remove unused av_abort() macro
ffmenc: replace if/abort with assert()
libavutil: drop offsetof() fallback definition
libavutil: drop fallback definitions of INTxx_MIN/MAX
configure: Check for a sctp struct instead of just the header
configure: suncc: Add -xc99 to dependency flags, required on Solaris
doxygen: Fix function parameter names to match the code
doc: Drop obsolete shared libs cflags hint to workaround Cygwin gcc bugs
swf: Move shared table out of the header file
swf: Move swf_audio_codec_tags table to the only place it is used
fate: add G.723.1 decoder tests
Conflicts:
configure
doc/platform.texi
libavformat/Makefile
libavutil/arm/intmath.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
In 16-bit arithmetic, x * 0xffffc is simply x * -4 with extra overflows,
(and the constant was probably meant to be 0xfffc). Combined with the
shift, this simplifies to -x >> 1. Finally, clearing the low two bits
with a 32-bit mask and switching to a 32-bit type allows more efficient
code on 32-bit machines.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
motion_est: drop inline from sad_hpel_motion_search()
motion_est: remove unused macros
motion_est: remove useless no_motion_search() function
lagarith: frame multithreading
doxygen: qdm2: Drop documentation for non-existing function parameters
build: add HOSTOBJS to SUBDIR_VARS list
Conflicts:
Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mpegvideo: reduce excessive inlining of mpeg_motion()
mpegvideo: convert mpegvideo_common.h to a .c file
build: factor out mpegvideo.o dependencies to CONFIG_MPEGVIDEO
Move MASK_ABS macro to libavcodec/mathops.h
x86: move MANGLE() and related macros to libavutil/x86/asm.h
x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
aacdec: Don't fall back to the old output configuration when no old configuration is present.
rtmp: Add message tracking
rtsp: Support mpegts in raw udp packets
rtsp: Support receiving plain data over UDP without any RTP encapsulation
rtpdec: Remove an unused include
rtpenc: Remove an av_abort() that depends on user-supplied data
vsrc_movie: discourage its use with avconv.
avconv: allow no input files.
avconv: prevent invalid reads in transcode_init()
avconv: rename OutputStream.is_past_recording_time to finished.
Conflicts:
configure
doc/filters.texi
ffmpeg.c
ffmpeg.h
libavcodec/Makefile
libavcodec/aacdec.c
libavcodec/mpegvideo.c
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
At both places this function is called, mb_[xy] == s->mb_[xy]
making the call together with following code equivalent to
simply assigning zeros.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The old code generates a termination packet with the same regions as the
start packet and page_state set to "only what changed"; the result is
that the termination packet is decoded as identical to the start packet.
The new code does as found in some DVB broadcasts: produce a packet with
no regions. This is done by expecting num_rects to be 0 rather than
using a flip-flop. ffmpeg.c is updated accordingly.
The main benefit of inlining this function is from constant
propagation for the 'field_based' argument. Instead of inlining
all calls, create two versions of the function for field_based
values of 0 and 1.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This file defines a single, huge function, MPV_motion(), which
although being declared inline is not actually inlined by the
compiler (for good reason). There is thus no sense in defining
this function in a header file, resulting in multiple copies of
it in the final library.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds a hidden config variable for the mpegvideo.o dependency
and selects from the codecs which require it.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This macro is only used in two places, both in libavcodec, so this
is a more sensible place for it.
Two small tweaks to the macro are made:
- removing the trailing semicolon
- dropping unnecessary 'volatile' from the x86 asm
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master: (23 commits)
build: cosmetics: Reorder some lists in a more logical fashion
x86: pngdsp: Fix assembly for OS/2
fate: add test for RTjpeg in nuv with frameheader
rtmp: send check_bw as notification
g723_1: clip argument for 15-bit version of normalize_bits()
g723_1: use all LPC vectors in formant postfilter
id3v2: Support v2.2 PIC
avplay: fix build with lavfi disabled.
avconv: split configuring filter configuration to a separate file.
avconv: split option parsing into a separate file.
mpc8: do not leave padding after last frame in buffer for the next decode call
mpegaudioenc: list supported channel layouts.
mpegaudiodec: don't print an error on > 1 frame in a packet.
api-example: update to new audio encoding API.
configure: add --enable/disable-random option
doc: cygwin: Update list of FATE package requirements
build: Remove all installed headers and header directories on uninstall
build: change checkheaders to use regular build rules
rtmp: Add a new option 'rtmp_subscribe'
rtmp: Add support for subscribing live streams
...
Conflicts:
Makefile
common.mak
configure
doc/examples/decoding_encoding.c
ffmpeg.c
libavcodec/g723_1.c
libavcodec/mpegaudiodec.c
libavcodec/x86/pngdsp.asm
libavformat/version.h
library.mak
tests/fate/video.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
It expects maximum value to be 32767 but calculations in scale_vector()
which uses this function can give it ABS(-32768) which leads to wrong
result and thus clipping is needed.
* qatar/master:
x86: fix build with nasm 2.08
x86: use nop cpu directives only if supported
x86: fix rNmp macros with nasm
build: add trailing / to yasm/nasm -I flags
x86: use 32-bit source registers with movd instruction
x86: add colons after labels
Conflicts:
Makefile
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f5d2c597e99af218b0d4d1cf9737c7e68ee934e4':
build: fix library installation on cygwin
mpc8: add a flush function
mpc8: set packet duration and stream start time instead of tracking frames
Conflicts:
libavformat/mpc8.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
yasm tolerates mismatch between movd/movq and source register size,
adjusting the instruction according to the register. nasm is more
strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The assert can be false with some invalid inputs, the check is
too expensive to always do though for just a warning message.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It is possible in various error pathes as well as gap handling
that this has already been allocated. Its not clear why that
would be a problem with the current code, thus disable the
assert to avoid common assert failure when asserts are enabled.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The check is bogus since the nuv frameheader is already skipped
and the (decompressed) RTjpeg header is checked.
This reverts commit f6afacdb3b.
CC: libav-stable@libav.org
* qatar/master:
x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2
rational: add av_inv_q() returning the inverse of an AVRational
dpx: Make start offset unsigned
lavfi: properly signal out-of-memory error in ff_filter_samples
cosmetics: Fix a few switched periods and linebreaks
zerocodec: Fix memleak in decode_frame
zerocodec: Cosmetics
Conflicts:
ffmpeg.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
We're now running some of this code through valgrind for the first
time, and a few warnings showed up stemming from two problems.
1) The ASS code assumes the subtitle header is null terminated, but
it wasn't, and passing the size down doesn't look like fun, so I
added a terminator
2) The code wasn't freeing all of its state.
Signed-off-by: Philip Langdale <philipl@overt.org>
* qatar/master:
lavr: fix handling of custom mix matrices
fate: force pix_fmt in lagarith-rgb32 test
fate: add tests for lagarith lossless video codec.
ARMv6: vp8: fix stack allocation with Apple's assembler
ARM: vp56: allow inline asm to build with clang
fft: 3dnow: fix register name typo in DECL_IMDCT macro
x86: dct32: port to cpuflags
x86: build: replace mmx2 by mmxext
Revert "wmapro: prevent division by zero when sample rate is unspecified"
wmapro: prevent division by zero when sample rate is unspecified
lagarith: fix color plane inversion for YUY2 output.
lagarith: pad RGB buffer by 1 byte.
dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.
Conflicts:
doc/APIchanges
libavcodec/lagarith.c
libavfilter/x86/gradfun.c
libavutil/cpu.h
libavutil/version.h
libswscale/utils.c
libswscale/version.h
libswscale/x86/yuv2rgb.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
If there was a failure inflating, or reinitializing
the zstream, the current frame's buffer would be lost.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
If there was a failure inflating, or reinitializing
the zstream, the current frame's buffer would be lost.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This change introduces a basic encoder for 3GPP Timed Text subtitles,
also known as TX3G, Quicktime subtitles, or "movtext" in the existing
code.
This initial change doesn't attempt to write styling information,
and just writes the plain text of the subtitles. I intend to add
support for styles eventually, but it's challenging due to a lack
of existing players that support them.
Note that an additional change is required to the mov/mp4 muxer to
write empty subtitle packets to indicate subtitle duration.
Signed-off-by: Philip Langdale <philipl@overt.org>
In the GNU assembler, a relational expression, bizarrely, has the
value -1 if true, whereas in Apple's it is +1. This patch makes
sure the correct expression is used in both cases.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The clang integrated assembler does not support pre-UAL syntax,
while gcc requires pre-UAL syntax for ARM code. A patch[1] for
clang to support the old syntax as well has been ignored since
January.
This patch chooses the syntax appropriate for each compiler,
allowing both to build the code. Notably, this change allows
building for iphone with the latest Apple Xcode update.
[1] http://llvm.org/bugs/show_bug.cgi?id=11855
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
vc1dec: Remove separate scaling function for interlaced field MVs
vc1dec: Invoke edge_emulation regardless of MV precision
x86: Use consistent 3dnowext function and macro name suffixes
g723_1: scale output as supposed for the case with postfilter disabled
g723_1: increase excitation storage by 4
g723_1: fix upper bound parameter from inverse maximum autocorrelation
g723_1: make scale_vector() behave like the reference
g723_1: fix off-by-one error in normalize_bits()
g723_1: save/restore excitation with offset to store LPC history
wmapro: prevent division by zero when sample rate is unspecified
x86: proresdsp: improve SIGNEXTEND macro comments
x86: h264dsp: K&R formatting cosmetics
LICENSE: Document all GPL files
Conflicts:
libavcodec/g723_1.c
libavcodec/wmaprodec.c
libavcodec/x86/h264dsp_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
OpenJPEG doesn't have a particular limit
Signed-off-by: Michael Bradshaw <mbradshaw@sorensonmedia.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
For left HFYU prediction, we predict from the buffer buf+1 using 8- or
16-byte reads. This means that aligning the buffer by 16 bytes is in
itself not sufficient, because if the width itself is 16- or 8-byte
aligned, the buffer will not be padded, and thus a read of size 16 at
buf+1 will overflow boundaries at the right edge. Padding the buffer by
1 byte is sufficient to not overflow its boundaries.
Fixes bug 342.
This makes add_hfyu_left_prediction_sse4() handle sources that are not
16-byte aligned in its own function rather than by proxying the call to
add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the
sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs)
does not restore it, thus leading to XMM clobbering and RSP being off.
Fixes bug 342.
The scaling process for obtaining direct MVs from co-located field MVs
is the same for interlaced field and progressive pictures.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
In VC-1 interlaced field pictures, chroma motion vectors can extend beyond
picture boundary even if luma vectors are bounded. The problem shows up
only for hpel interpolated MVs, and may be due to the way motion vectors
are scaled / cropped.
Thanks to Konstantin Shishkov for suggesting the fix. This fixes
long-known segfaults in MC-VC1.ts from videolan streams archive.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
Fixed codebook mode in 5300 rate may write up to SUBFRAME_LEN + 4 and
that is considered normal by the reference decoder. Without that additional
padding it might overwrite first elements of LPC history.