* qatar/master:
proresdsp: port x86 assembly to cpuflags.
lavr: x86: improve non-SSE4 version of S16_TO_S32_SX macro
lavfi: better channel layout negotiation
alac: check for truncated packets
alac: reverse lpc coeff order, simplify filter
lavr: add x86-optimized mixing functions
x86: add support for fmaddps fma4 instruction with abstraction to avx/sse
tscc2: fix typo in array index
build: use COMPILE template for HOSTOBJS
build: do full flag handling for all compiler-type tools
eval: fix printing of NaN in eval fate test.
build: Rename aandct component to more descriptive aandcttables
mpegaudio: bury inline asm under HAVE_INLINE_ASM.
x86inc: automatically insert vzeroupper for YMM functions.
rtmp: Check the buffer length of ping packets
rtmp: Allow having more unknown data at the end of a chunk size packet without failing
rtmp: Prevent reading outside of an allocate buffer when receiving server bandwidth packets
Conflicts:
Makefile
configure
libavcodec/x86/proresdsp.asm
libavutil/eval.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
libopenjpeg: support YUV and deep RGB pixel formats
Fix typo in v410 decoder.
vf_yadif: unset cur_buf on the input link.
vf_overlay: ensure the overlay frame does not get leaked.
vf_overlay: prevent premature freeing of cur_buf
Support urlencoded http authentication credentials
rtmp: Return an error when the client bandwidth is incorrect
rtmp: Return proper error code in handle_server_bw
rtmp: Return proper error code in handle_client_bw
rtmp: Return proper error codes in handle_chunk_size
lavr: x86: add missing vzeroupper in ff_mix_1_to_2_fltp_flt()
vp8: Replace x*155/100 by x*101581>>16.
vp3: don't use calls to inline asm in yasm code.
x86/dsputil: put inline asm under HAVE_INLINE_ASM.
dsputil_mmx: fix incorrect assembly code
rtmp: Factorize the code by adding handle_invoke
rtmp: Factorize the code by adding handle_chunk_size
rtmp: Factorize the code by adding handle_ping
rtmp: Factorize the code by adding handle_client_bw
rtmp: Factorize the code by adding handle_server_bw
Conflicts:
libavcodec/libopenjpegdec.c
libavcodec/x86/dsputil_mmx.c
libavfilter/vf_overlay.c
libavformat/Makefile
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Mixing yasm and inline asm is a bad idea, since if either yasm or inline
asm is not supported by your toolchain, all of the asm stops working.
Thus, better to use either one or the other alone.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
In the first block (in the unrolled loop), the instructions
"movq 8%3, %%mm1 \n\t", and so forth, have problems.
From above instruction, it is clear what the programmer wants: a load from
p + 8. But this assembly code doesn’t guarantee that. It only works if the
compiler puts p in a register to produce an instruction like this:
"movq 8(%edi), %mm1". During compiler optimization, it is possible that the
compiler will be able to constant propagate into p. Suppose p = &x[10000].
Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.
This will cause a segmentation fault.
This error was fixed in the second block of the assembly code, but not in
the unrolled loop.
How to reproduce:
This error is exposed when we build using Intel C++ Compiler, with
IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
In file libavcodec/x86/dsputil_mmx.c, function ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t" etc have problem.
For above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: “movq 8(%edi), %mm1”. During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes “movq 810000(%edx)”. That is, it will stride by 810000 instead of 8.
This will cause the segmentation fault.
This error was fixed in the second block of the assembly code, but not in the unrolled loop.
How to reproduce:
This error is exposed when we build the ffmpeg using Intel C++ Compiler, IPO+PGO optimization. The ffmpeg was crashed when decoding a mjpeg video.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
v410dec: Implement explode mode support
zerocodec: fix direct rendering.
wav: init st to NULL to avoid a false-positive warning.
wavpack: set bits_per_raw_sample for S32 samples to properly identify 24-bit
h264: refactor NAL decode loop
RTMPTE protocol support
RTMPE protocol support
rtmp: Add ff_rtmp_calc_digest_pos()
rtmp: Rename rtmp_calc_digest to ff_rtmp_calc_digest and make it global
swscale: add missing HAVE_INLINE_ASM check.
lavfi: place x86 inline assembly under HAVE_INLINE_ASM.
vc1: Add a test for interlaced field pictures
swscale: Mark all init functions as av_cold
swscale: x86: Drop pointless _mmx suffix from filenames
lavf: use conditional notation for default codec in muxer declarations.
swscale: place inline assembly bilinear scaler under HAVE_INLINE_ASM.
dsputil: ppc: cosmetics: pretty-print
dsputil: x86: add SHUFFLE_MASK_W macro
configure: respect CC_O setting in check_cc
Conflicts:
Changelog
configure
libavcodec/v410dec.c
libavcodec/zerocodec.c
libavformat/asfenc.c
libavformat/version.h
libswscale/utils.c
libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Print full compiler identification, not only version number
flacdec: reverse lpc coeff order, simplify filter
x86: dsputil: drop some unused CPU flag debug code
Conflicts:
cmdutils.c
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
ppc: fix build with altivec disabled
vp3: move idct and loop filter pointers to new vp3dsp context
build: add CONFIG_VP3DSP, reduce repetition in OBJS lists
tscc2: do not add/subtract 128 bias during DCT
tscc2: fix typo in DCT
configure: clarify external library section of help output
configure: mark libfdk-aac as nonfree
configure: cosmetics: drop some unnecessary backslashes
os_support: K&R formatting cosmetics
Conflicts:
configure
libavcodec/vp3.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context. There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
mxfdec: replace x>>av_log2(sizeof(..)) by x/sizeof(..).
x86: h264_intrapred: Don't add the 'd' suffix to the SPLATB_REG macro
Conflicts:
libavformat/mxfdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The SPLATB_REG macro already adds the 'd' suffix internally.
This fixes building on Win64, which has been broken since 878e66902.
This worked for unix, where r2 happened to be rdx in this case, which
with the first suffix rdxd was mapped to eax, and eaxd is defined back
to eax. On win64 however, r2 happened to be R8 in this case, and
R8d mapps to R8D just fine, but there's no mapping for R8Dd to anything.
Signed-off-by: Martin Storsjö <martin@martin.st>
* qatar/master:
qdm2: remove broken and disabled dump_context() debug function
x86: h264_intrapred: use newly introduced SPLAT* and PSHUFLW macros
x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros
x86inc: modify ALIGN to not generate long nops on i586
x86: h264_intrapred: port to cpuflag macros
avplay: update input filter pointer when the filtergraph is reset.
avconv: fix parsing of -force_key_frames option.
h264: use templates to avoid excessive inlining
xtea: Make the count parameter match the documentation
blowfish: Make the count parameter match the documentation
mpegvideo: Don't use ff_mspel_motion() for vc1
xtea: invert branch and loop precedence
blowfish: invert branch and loop precedence
flvdec: optionally trust the metadata
avconv: Set audio filter time base to the sample rate
vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too
Conflicts:
ffmpeg.c
ffplay.c
libavcodec/h264.c
libavcodec/mpegvideo_common.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (29 commits)
lavfi: reclassify showfiltfmts as a TESTPROG
graph2dot: fix printf format specifier
swscale: yuv2planeX 8bit >=sse2 functions need aligned stack on x86-32.
vp8: loopfilter >=sse2 functions need aligned stack on x86-32.
amr: remove shift out of the AMR_BIT() macro.
dsputilenc: group yasm and inline asm function pointer assignment.
mov: use forward declaration of a function instead of a table.
Clarify Doxygen comment for FF_API_* #defines.
configure: simplify get_version()
Create version.h headers for libraries that lack them
gitignore: Use full path instead of relative path to specify patterns
mpegvideo: remove VLAs
Add XTEA encryption support in libavutil
Add Blowfish encryption support in libavutil
eval: Add the isinf() function and tests for it
flacdec: move lpc filter to flacdsp
flacdec: split off channel decorrelation as flacdsp
avplay: Add an option for not limiting the input buffer size
FATE: add a test for WMA cover art.
FATE: add a test for apetag cover art
...
Conflicts:
.gitignore
configure
ffplay.c
libavcodec/Makefile
libavcodec/error_resilience.c
libavcodec/mpegvideo.c
libavcodec/ratecontrol.c
libavdevice/avdevice.h
libavfilter/Makefile
libavfilter/filtfmts.c
libavfilter/version.h
libavformat/mov.c
libavformat/version.h
libavutil/Makefile
libavutil/avutil.h
libavutil/version.h
libswscale/swscale.h
libswscale/x86/swscale_mmx.c
tests/fate/libavutil.mak
tests/lavfi-regression.sh
tools/graph2dot.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
dsputilenc_mmx: split assignment of ff_sse16_sse2 to SSE2 section.
dnxhdenc: add space between function argument type and comment.
x86: fmtconvert: add special asm for float_to_int16_interleave_misc_*
attributes: Add a definition of av_always_inline for MSVC
cmdutils: Pass the actual chosen encoder to filter_codec_opts
os_support: Add fallback definitions for stat flags
os_support: Rename the poll fallback function to ff_poll
network: Check for struct pollfd
os_support: Don't compare a negative number against socket descriptors
os_support: Include all the necessary headers for the win32 open function
x86: vc1: fix and enable optimised loop filter
Conflicts:
cmdutils.c
cmdutils.h
ffmpeg.c
ffplay.c
libavformat/os_support.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The problem is that the ssse3 psign instruction does the wrong
thing here. Commit ea60dfe incorrectly removed a macro emulating
this instruction for pre-ssse3 code. However, the emulation is
incorrect, and the code relies on the behaviour of the macro.
Specifically, the psign sets destination elements to zero where
the corresponding source element is zero, whereas the emulation
only negates destination elements where the source is negative.
Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus,
which is why the original VC-1 code had an additional right shift
when using it. Since the psign instruction cannot be used here,
skip all the macro hell and use the working instruction sequence
directly.
None of this was noticed due a stray return statement in
ff_vc1dsp_init_mmx() which meant that only the mmx version of the
loop filter was ever used (before being removed in ea60dfe).
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
mss1: validate number of changeable palette entries
mss1: report palette changed when some additional colours were decoded
x86: fft: replace call to memcpy by a loop
udp: Support IGMPv3 source specific multicast and source blocking
dxva2: include dxva.h if found
libm: Provide fallback definitions for isnan() and isinf()
tcp: Pass NULL as hostname to getaddrinfo if the string is empty
tcp: Set AI_PASSIVE when the socket will be used for listening
Conflicts:
configure
libavcodec/mss1.c
libavformat/udp.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The function call was a mess to handle, and memcpy cannot make
the assumptions we do in the new code.
Tested on an IMC sample: 430c -> 370c.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
wtv: Check the return value from gmtime
x86: fft: convert sse inline asm to yasm
x86: place some inline asm under #if HAVE_INLINE_ASM
Conflicts:
libavcodec/x86/fft_sse.c
libavformat/wtv.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
log: Only include unistd.h if configure found it
ape: create audio stream before reading tags.
mov: make a length variable larger.
image2: Add "start_number" private option to the demuxer
image2: Add "start_number" private option to the muxer
avconv: remove a forgotten debugging printf.
avconv: use more descriptive names for hardcoded filters.
avconv: remove redundant handling of async.
doc/filters: fix typo.
h264: use asm cabac reader under a generic condition
Conflicts:
ffmpeg.c
libavformat/img2dec.c
libavformat/img2enc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: Only use optimizations with cmov if the CPU supports the instruction
x86: Add CPU flag for the i686 cmov instruction
x86: remove unused inline asm macros from dsputil_mmx.h
x86: move some inline asm macros to the only places they are used
lavfi: Add the af_channelmap audio channel mapping filter.
lavfi: add join audio filter.
lavfi: allow audio filters to request a given number of samples.
lavfi: support automatically inserting the fifo filter when needed.
lavfi/audio: eliminate ff_default_filter_samples().
Conflicts:
Changelog
libavcodec/x86/h264dsp_mmx.c
libavfilter/Makefile
libavfilter/allfilters.c
libavfilter/avfilter.h
libavfilter/avfiltergraph.c
libavfilter/version.h
libavutil/x86/cpu.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This removes a dependency on implementation details from generic
code and allows easy addition of the equivalent optimisation for
other architectures than x86.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
libspeexenc: add supported sample rates and channel layouts.
Replace usleep() calls with av_usleep()
lavu: add av_usleep() function
utvideo: mark interlaced frames as such
utvideo: Fix interlaced prediction for RGB utvideo.
cosmetics: do not use full path for local headers
lavu/file: include unistd.h only when available
configure: check for unistd.h
log: include unistd.h only when needed
lavf: include libavutil/time.h instead of redeclaring av_gettime()
Conflicts:
configure
doc/APIchanges
ffmpeg.c
ffplay.c
libavcodec/utvideo.c
libavutil/avutil.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavr: add x86-optimized functions for mixing 1-to-2 s16p with flt coeffs
lavr: add x86-optimized functions for mixing 1-to-2 fltp with flt coeffs
Add Dolby/DPLII downmix support to libavresample
vorbisdec: replace div/mod in loop with a counter
fate: vorbis: add 5.1 surround test
rtpenc: Allow requesting H264 RTP packetization mode 0
configure: Sort the library listings in the help text alphabetically
dwt: remove variable-length arrays
RTMPT protocol support
http: Properly handle chunked transfer-encoding for replies to post data
http: Fail reading if the connection has gone away
amr: Mark an array const
amr: More space cleanup
rtpenc: Fix memory leaks in the muxer open function
Conflicts:
Changelog
configure
doc/APIchanges
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
float_dsp: ppc: add a separate header for Altivec function prototypes
ARM: fix float_dsp breakage from d5a7229
Add a float DSP framework to libavutil
PPC: Move types_altivec.h and util_altivec.h from libavcodec to libavutil
ARM: Move asm.S from libavcodec to libavutil
vc1dsp: mark put/avg_vc1_mspel_mc() always_inline
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f919cc7df6ab844bc12f89fe7bef4fb915a47725':
fate: fix acodec/vsynth tests for make 3.81
pcm_mpeg: fix number of consumed bytes to include the header.
avfilter: include required header file avfilter.h in video.h
x86: Avoid movs on BUTTERFLYPS when in AVX mode
x86: use new schema for ASM macros
fate: convert codec-regression.sh to makefile rules
fate: allow tests to specify unit size for psnr comparison
fate: teach videogen/rotozoom to output a single raw video stream
http: Add support for reusing the http socket for subsequent requests
http: Add support for using persistent connections
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
os_support: Define SHUT_RD, SHUT_WR and SHUT_RDWR on OS/2
http: Add support for reading http POST reply headers
http: Add http_shutdown() for ending writing of posts
tcp: Allow signalling end of reading/writing
avio: Add a function for signalling end of reading/writing
lavfi: fix comment, audio is supported now.
lavfi: fix incorrect comment.
lavfi: remove avfilter_null_* from public API on next bump.
lavfi: remove avfilter_default_* from public API on next bump.
lavfi: deprecate default config_props() callback and refactor avfilter_config_links()
avfiltergraph: smarter sample format selection.
avconv: rename transcode_audio/video to decode_audio/video.
asyncts: reset delta to 0 when it's not used.
x86: lavc: use %if HAVE_AVX guards around AVX functions in yasm code.
dwt: return errors from ff_slice_buffer_init()
Conflicts:
ffmpeg.c
libavfilter/avfilter.c
libavfilter/avfilter.h
libavfilter/formats.c
libavfilter/version.h
libavfilter/vf_blackframe.c
libavfilter/vf_drawtext.c
libavfilter/vf_fade.c
libavfilter/vf_format.c
libavfilter/vf_showinfo.c
libavfilter/video.c
libavfilter/video.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
dwt: check malloc calls
ppc: Drop unused header regs.h
af_resample: remove an extra space in the log output
Convert vector_fmul range of functions to YASM and add AVX versions
lavfi: add an audio split filter
lavfi: rename vf_split.c to split.c
Conflicts:
doc/filters.texi
libavcodec/ppc/regs.h
libavfilter/Makefile
libavfilter/allfilters.c
libavfilter/f_split.c
libavfilter/split.c
libavfilter/version.h
libavfilter/vf_split.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
fate: Work around non-standard wc implementations at more places
fate: work around non-standard wc implementations
x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions.
ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16
fate: use standard diff options
tta: Fix comment about channel number; TTA supports >2 channels.
avfilter: Move ff_get_ref_perms_string() to where it is used.
build: Add 'check' target to run all compile and test targets.
indeo3: validate new frame size before resetting decoder
indeo3: when freeing buffers, set pointers referencing them to NULL as well
indeo3: initialise pixel planes on allocation
indeo3: ensure that decoded cell data is in 7-bit range as presumed by decoder
fate: rename psx-str-v3-mdec to mdec-v3
fate: convert psx-str to a demuxer test
lavf: add mdec to is_intra_only() list
Conflicts:
doc/developer.texi
libavcodec/indeo3.c
libavfilter/video.c
libavformat/utils.c
tests/fate/demux.mak
tests/fate/video.mak
tests/lavf-regression.sh
tests/ref/vsynth1/cljr
tests/ref/vsynth1/ffvhuff
tests/ref/vsynth2/cljr
tests/ref/vsynth2/ffvhuff
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (25 commits)
vcr1: Add vcr1_ prefixes to all static functions with generic names.
vcr1: Fix return type of common_init to match the function pointer signature.
vcr1enc: Replace obsolete get_bit_count by put_bits_count/flush_put_bits.
motion-test: remove disabled code
gxfenc: remove disabled half-implemented MJPEG tag
x86: use more standard construct for setting ASM functions in FFT code
fate: westwood-aud: disable decoding
fate: caf: disable decoding
fate: film-cvid: drop pcm audio and rename test
fate: d-cinema-demux: drop unnecessary flags
fate: split off dpcm-interplay from interplay-mve tests
fate: rename funcom-iss to adpcm-ima-iss
fate: rename cryo-apc to adpcm-ima-apc
fate: rename adpcm-psx-str-v3 to adpcm-xa
fate: split off adpcm-ms-mono test from dxa-feeble
fate: split off adpcm-ima-ws test from vqa-cc
fate: add adpcm-ima-smjpeg test
fate: split off adpcm-ima-amv from amv test
fate: separate bmv audio and video tests
fate: separate delphine-cin audio and video tests
...
Conflicts:
doc/platform.texi
libavcodec/vcr1.c
tests/fate/audio.mak
tests/fate/demux.mak
tests/fate/video.mak
tests/ref/fate/ea-mad-pcm-planar
tests/ref/fate/interplay-mve-16bit
tests/ref/fate/interplay-mve-8bit
tests/ref/fate/mtv
tests/ref/fate/qtrle-1bit
tests/ref/fate/qtrle-2bit
tests/ref/fate/truemotion1-15
tests/ref/fate/truemotion1-24
tests/ref/fate/vqa-cc
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (25 commits)
rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC
ape: Use unsigned integer maths
arm: dsputil: fix overreads in put/avg_pixels functions
h264: K&R formatting cosmetics for header files (part II/II)
h264: K&R formatting cosmetics for header files (part I/II)
rtmp: Implement check bandwidth notification.
rtmp: Support 'rtmp_swfurl', an option which specifies the URL of the SWF player.
rtmp: Support 'rtmp_flashver', an option which overrides the version of the Flash plugin.
rtmp: Support 'rtmp_tcurl', an option which overrides the URL of the target stream.
cmdutils: Add fallback case to switch in check_stream_specifier().
sctp: be consistent with socket option level
configure: Add _XOPEN_SOURCE=600 to Solaris preprocessor flags.
vcr1enc: drop pointless empty encode_init() wrapper function
vcr1: drop pointless write-only AVCodecContext member from VCR1Context
vcr1: group encoder code together to save #ifdefs
vcr1: cosmetics: K&R prettyprinting, typos, parentheses, dead code, comments
mov: make one comment slightly more specific
lavr: replace the SSE version of ff_conv_fltp_to_flt_6ch() with SSE4 and AVX
lavfi: move audio-related functions to a separate file.
lavfi: remove some audio-related function from public API.
...
Conflicts:
cmdutils.c
libavcodec/h264.h
libavcodec/h264_mvpred.h
libavcodec/vcr1.c
libavfilter/avfilter.c
libavfilter/avfilter.h
libavfilter/defaults.c
libavfilter/internal.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions
Benchmark (rounded to tens of unit):
V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16
C 445 358 985 1785 1559 3280
MMX* 219 271 478 714 929 1443
SSE2 131 158 294 425 515 892
SSSE3 120 122 248 387 390 763
End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* qatar/master:
4xm: fix invalid array indexing
rv34dsp: factorize a multiplication in the noround inverse transform
rv40: perform bitwise checks in loop filter
rv34: remove inline keyword from rv34_decode_block().
rv40: change a logical test into a bitwise one.
rv34: remove constant parameter
rv40: don't always do the full prev_type search
dsputil x86: revert a test back to its previous value
rv34dsp x86: implement MMX2 inverse transform
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register.
There is a surprisingly large performance improvement over the c version (more
so than the generated assembly seems to suggest) just in get_cabac, I measured
roughly 40% faster for get_cabac on a K8. However, overall the difference is
not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
Hopefully it still compiles on x86 32bit...
Now that only one table is used, there's some chance even darwin as compiles
this (apparently the label arithmetic used previously doesn't work if it
involves symbols defined in a different file, thanks to Ronald S. Bultje for
helping me with this).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register. get_cabac() gets about 40% faster, for
an overall speedup of about 5%.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The new lowres support is limited to decoders where lowres decoding
is possible in high quality.
I was not able to measure any speed difference, but if one is found
the 2-3 lines that might affect speed can be made compile time conditional
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
ARM: allow runtime masking of CPU features
dsputil: remove unused functions
mov: Treat keyframe indexes as 1-origin if starting at non-zero.
mov: Take stps entries into consideration also about key_off.
Remove lowres video decoding
Conflicts:
ffmpeg.c
ffplay.c
libavcodec/arm/vp8dsp_init_arm.c
libavcodec/libopenjpegdec.c
libavcodec/mjpegdec.c
libavcodec/mpegvideo.c
libavcodec/utils.c
libavformat/mov.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
avcodec: remove AVCodecContext.dsp_mask
avconv: fix a segfault when default encoder for a format doesn't exist.
utvideo: general cosmetics
aac: Handle HE-AACv2 when sniffing a channel order.
movenc: Support high sample rates in isomedia formats by setting the sample rate field in stsd to 0.
xxan: Remove write-only variable in xan_decode_frame_type0().
ivi_common: Initialize a variable at declaration in ff_ivi_decode_blocks().
Conflicts:
ffmpeg.c
libavcodec/utvideo.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This removes all references to AVCodecContext.dsp_mask and marks
it for eviction at the next version bump. It has been superseded
by av_set_cpu_flag_mask() which, unlike this field, works everywhere.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register.
There is a surprisingly large performance improvement over the c version (more
so than the generated assembly seems to suggest) just in get_cabac, I measured
roughly 40% faster for get_cabac on a K8. However, overall the difference is
not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
Hopefully it still compiles on x86 32bit...
v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
for every table, and got rid of unnecessary @GOTPCREL.
v3: apply similar fixes to the the decode_significance functions, and use
same macro arguments for non-pic case.
v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
the c code to be faster otherwise since both cmov and sbb suck hard on a
Prescott, even can't construct the mask with a 64bit shift as that's just as
terrible - it's quite difficult to find usable instructions on that chip...).
This is tested to work but not on a P4, in theory it _should_ be fast there.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
indeo3: add parens around some macro arguments
h264: use proper PROLOGUE statement for a function using 8 registers.
doc: Update sample Vim config with suitable (function) indentation settings.
dv: Merge dvquant.h into dvdata.c where all other DV tables reside.
dv: Move static tables only used in one place to where they are used.
graphparser: set next to NULL on an entry extracted from inputs list
doc/filters: update documentation.
avconv: flush decoders immediately after an EOF.
avconv: send EOF to vsrc_buffer.
avconv: reindent.
Conflicts:
doc/filters.texi
ffmpeg.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
vsrc_buffer: fix check from 7ae7c41.
libxvid: Reorder functions to avoid forward declarations; make functions static.
libxvid: drop some pointless dead code
wmal: vertical alignment cosmetics
wmal: Warn about missing bitstream splicing feature and ask for sample.
wmal: Skip seekable_frame_in_packet.
wmal: Drop unused variable num_possible_block_size.
avfiltergraph: make the AVFilterInOut alloc/free API public
graphparser: allow specifying sws flags in the graph description.
graphparser: fix the order of connecting unlabeled links.
graphparser: add avfilter_graph_parse2().
vsrc_buffer: allow using a NULL buffer to signal EOF.
swscale: handle last pixel if lines have an odd width.
qdm2: fix a dubious pointer cast
WMAL: Do not try to read rawpcm coefficients if bits is invalid
mov: Fix detecting there is no sync sample.
tiffdec: K&R cosmetics
avf: has_duration does not check the global one
dsputil: fix optimized emu_edge function on Win64.
Conflicts:
doc/APIchanges
libavcodec/libxvid_rc.c
libavcodec/libxvidff.c
libavcodec/tiff.c
libavcodec/wmalosslessdec.c
libavfilter/avfiltergraph.h
libavfilter/graphparser.c
libavfilter/version.h
libavfilter/vsrc_buffer.c
libswscale/output.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
swscale: K&R formatting cosmetics (part II)
tiffdec: Add a malloc check and refactor another.
faxcompr: Check malloc results and unify return path
configure: escape colons in values written to config.fate
ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE
matroska: Fix leaking memory allocated for laces.
pthread: Fix crash due to fctx->delaying not being cleared.
vp3: Assert on invalid filter_limit values.
h264: fix 10bit biweight functions after recent x86inc.asm fixes.
ffv1: Fix size mismatch in encode_line.
movenc: Remove a dead initialization
git-howto: Explain how to avoid Windows line endings in git checkouts.
build: Move all arch OBJS declarations into arch subdirectory Makefiles.
Conflicts:
configure
libavcodec/vp3.c
libavformat/matroskadec.c
libavutil/Makefile
libswscale/Makefile
libswscale/swscale.c
libswscale/swscale_internal.h
libswscale/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Recent register allocation changes (x86inc.asm update) changed the
register order and thus opcodes for the inner loops. One of them became
>128bytes, which confuses other parts of this function where it jumps
to fixed-offset positions to extend the edge by fixed amounts. A simple
register change fixes this.
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments
Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
* qatar/master: (22 commits)
rv40dsp x86: use only one register, for both increment and loop counter
rv40dsp: implement prescaled versions for biweight.
avconv: use default channel layouts when they are unknown
avconv: parse channel layout string
nutdec: K&R formatting cosmetics
vda: Signal 4 byte NAL headers to the decoder regardless of what's in the extradata
mem: Consistently return NULL for av_malloc(0)
vf_overlay: implement poll_frame()
vf_scale: support named constants for sws flags.
lavc doxy: add all installed headers to doxy groups.
lavc doxy: add avfft to the main lavc group.
lavc doxy: add remaining avcodec.h functions to a misc doxygen group.
lavc doxy: add AVPicture functions to a doxy group.
lavc doxy: add resampling functions to a doxy group.
lavc doxy: replace \ with /
lavc doxy: add encoding functions to a doxy group.
lavc doxy: add decoding functions to a doxy group.
lavc doxy: fix formatting of AV_PKT_DATA_{PARAM_CHANGE,H263_MB_INFO}
lavc doxy: add AVPacket-related stuff to a separate doxy group.
lavc doxy: add core functions/definitions to a doxy group.
...
Conflicts:
ffmpeg.c
libavcodec/avcodec.h
libavcodec/vda.c
libavcodec/x86/rv40dsp.asm
libavfilter/vf_scale.c
libavformat/nutdec.c
libavutil/mem.c
tests/ref/acodec/pcm_s24daud
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Quite often, the original weights are multiple of 512. By prescaling them
by 1/512 when they are computed (once per frame), no intermediate shifting
is needed, and no prescaling on each call either.
The x86 code already used that trick.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* qatar/master:
h264: Factorize declaration of mb_sizes array.
vsrc_buffer: when no frame is available, return an error instead of segfaulting.
configure: add dl to frei0r extralibs.
dsputil x86: use SSE float instruction instead of SSE2 integer equivalent
dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype
vp8dsp x86: perform rounding shift with a single instruction
fate: add BMP tests.
swscale: handle complete dimensions for monoblack/white.
aacenc: Mark deinterleave_input_samples argument as const.
vf_unsharp: Mark readonly variable as const.
h264: fix 4:2:2 PCM-macroblocks decoding
Conflicts:
configure
libavcodec/h264.h
libavcodec/x86/dsputil_mmx.c
libavfilter/vf_unsharp.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
asf: only set index_read if the index contained entries.
cabac: add overread protection to BRANCHLESS_GET_CABAC().
cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().
cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().
cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().
h264: add overread protection to get_cabac_bypass_sign_x86().
h264: reindent get_cabac_bypass_sign_x86().
h264: use struct offsets in get_cabac_bypass_sign_x86().
h264: fix overreads in cabac reader.
wmall: fix seeking.
lagarith: fix buffer overreads.
dvdec: drop unnecessary dv_tablegen.h #include
build: fix doc generation errors in parallel builds
Replace memset(0) by zero initializations.
faandct: Remove FAAN_POSTSCALE define and related code.
dvenc: print allowed profiles if the video doesn't conform to any of them.
avcodec_encode_{audio,video}: only reallocate output packet when it has non-zero size.
FATE: add a test for vp8 with changing frame size.
fate: add kgv1 fate test.
oggdec: calculate correct timestamps in Ogg/FLAC
Conflicts:
libavcodec/4xm.c
libavcodec/cook.c
libavcodec/dvdata.c
libavcodec/dvdsubdec.c
libavcodec/lagarith.c
libavcodec/lagarithrac.c
libavcodec/utils.c
tests/fate/video.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
build: ppc: drop stray leftover backslash
build: Only clean the architecture subdirectory we build for.
build: drop some unnecessary dependencies from the H.264 parser
build: prettyprinting cosmetics
libavutil: Remove pointless rational test program.
libavutil: Remove broken and pointless lzo test program.
lavf doxy: expand AVStream.codec doxy.
lavf doxy: improve AVStream.time_base doxy.
lavf doxy: add some basic documentation about reading from the demuxer.
lavf doxy: document passing options to demuxers.
lavf doxy: clarify that an AVPacket contains encoded data.
mpegtsenc: allow user triggered PES packet flushing
APIchanges: mark the place where 0.7 was cut.
APIchanges: mark the place where 0.8 was cut.
APIchanges: fill in missing dates and hashes.
smacker: convert palette and header reading to bytestream2.
alac: convert extradata reading to bytestream2.
Conflicts:
doc/APIchanges
libavcodec/smacker.c
libavcodec/x86/Makefile
libavfilter/Makefile
libavutil/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
rv34: error out on size changes with frame threading
aacsbr: Add a debug check to sbr_mapping.
aac: Reset some state variables when turning SBR off
aac: Reset PS parameters on header decode failure.
fate: add wmalossless test.
aacsbr: handle m_max values smaller than 4.
Conflicts:
libavcodec/aacsbr.c
tests/fate/lossless-audio.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Prevents a signflip in the counter, and a subsequent crash because of
overreads/overwrites.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
They were moved into code under HAVE_YASM and most of them
even into completely disabled code with no reason given
for that in the commit message.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This is even potentially faster in this use-case.
Should fix AAC SBR decoding on machines with SSE but not
SSE2, fixing track issue #1041.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Yasm creates an implicit unaligned text section if "struc" is used
outside of any section:
http://tortall.lighthouseapp.com/projects/78676-yasm/tickets/247
Since yasm only honors the "align" annotation on the first declaration
of a section, this implicit text section causes all text section
alignments to be ignored. Also fixes a yasm warning about it agnoring
alignment.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
cook: expand dither_tab[], and make sure indexes into it don't overflow.
xxan: reindent xan_unpack_luma().
xxan: protect against chroma LUT overreads.
xxan: convert to bytestream2 API.
xxan: don't read before start of buffer in av_memcpy_backptr().
vp8: convert mbedge loopfilter x86 assembly to use named arguments.
vp8: convert inner loopfilter x86 assembly to use named arguments.
Conflicts:
libavcodec/xxan.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (29 commits)
sbrdsp.asm: convert all instructions to float/SSE ones.
dv: cosmetics.
dv: check buffer size before reading profile.
Revert "AAC SBR: group some writes."
udp: Print an error message if bind fails
cook: extend channel uncoupling tables so the full bit range is covered.
roqvideo: cosmetics.
roqvideo: convert to bytestream2 API.
dca: don't use av_clip_uintp2().
wmall: fix build with -DDEBUG enabled.
smc: port to bytestream2 API.
AAC SBR: group some writes.
dsputil: remove shift parameter from scalarproduct_int16
SBR DSP: unroll sum_square
rv34: remove dead code in intra availability check
rv34: clean a bit availability checks.
v4l2: update documentation
tgq: convert to bytestream2 API.
parser: remove forward declaration of MpegEncContext
dca: prevent accessing static arrays with invalid indexes.
...
Conflicts:
doc/indevs.texi
libavcodec/Makefile
libavcodec/dca.c
libavcodec/dvdata.c
libavcodec/eatgq.c
libavcodec/mmvideo.c
libavcodec/roqvideodec.c
libavcodec/smc.c
libswscale/output.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Since the values are floats, using the float operations
makes sense, improves performance on some CPUs and
makes the code SSE compatible instead of needing SSE2.
Based on suggestion by Jason.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
There is only one caller, which does not need the shifting. Other use cases
are situations where different roundings would be needed.
The x86 and neon versions are modified accordingly.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>