This fixes integer multiplication overflows in RGB48 output
(vertical) scaling as detected by IOC. What happens is that for
certain types of filters (lanczos, spline, bicubic), the
intermediate sum of coefficients in the middle of a filter can
be larger than the fixed-point equivalent of 1.0, even if the
final sum is 1.0. This is fine and we support that.
However, at frame edges, initFilter() will merge the coefficients
for the off-screen pixels into the top or bottom pixel, such as
to emulate edge extension. This means that suddenly, a single
coefficient can be larger than the fixed-point equivalent of
1.0, which the vertical scaling routines do not support.
Therefore, remove the merging of coefficients for edges for
the vertical scaling filter, and instead add edge detection
to the scaler itself so that it copies the pointers (not data)
for the edges (i.e. it uses line[0] for line[-1] as well), so
that a single coefficient is never larger than the fixed-point
equivalent of 1.0.
This fixes the same overflow as in the RGB48/16-bit YUV scaling;
some filters can overflow both negatively and positively (e.g.
spline/lanczos), so we bias a signed integer so it's "half signed"
and "half unsigned", and can cover overflows in both directions
while maintaining full 31-bit depth.
Signed-off-by: Mans Rullgard <mans@mansr.com>
We're shifting individual components (8-bit, unsigned) left by 24,
so making them unsigned should give the same results without the
overflow.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
For certain types of filters where the intermediate sum of coefficients
can go above the fixed-point equivalent of 1.0 in the middle of a filter,
the sum of a 31-bit calculation can overflow in both directions and can
thus not be represented in a 32-bit signed or unsigned integer. To work
around this, we subtract 0x40000000 from a signed integer base, so that
we're halfway signed/unsigned, which makes it fit even if it overflows.
After the filter finishes, we add the scaled bias back after a shift.
We use the same trick for 16-bit bpc YUV output routines.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
get_bits: remove A32 variant
avconv: support stream specifiers in -metadata and -map_metadata
wavpack: Fix 32-bit clipping
wavpack: Clip samples after shifting
h264: don't drop B-frames after next keyframe on POC reset.
get_bits: remove useless pointer casts
configure: refactor lists of tests and components into variables
rv40: NEON optimised weak loop filter
mpegts: replace some magic numbers with the existing define
swscale: add unscaled packed 16 bit per component endianess conversion
Conflicts:
libavcodec/get_bits.h
libavcodec/h264.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
applehttp: Properly clean up if unable to probe a segment
applehttp: Avoid reading uninitialized memory
fate: Replace misleading "aac" in the name of an ADTS test with "adts".
fate: Drop pointless "-an" from pictor test command.
fate: split off image codec FATE tests into their own file
fate: split off WMA codec FATE tests into their own file
fate: split off lossless video and audio FATE tests into their own files
fate: split off qtrle codec FATE tests into their own file
fate: split off Ut Video codec FATE tests into their own file
fate: split off screen codec FATE tests into their own file
fate: split off Real Inc. codec FATE tests into their own file
fate: split off AC-3 codec FATE tests into their own file
mpegvideo: remove abort() in ff_find_unused_picture()
rv40: NEON optimised loop filter strength selection
rv40: rearrange loop filter functions
configure: cosmetics: sort some lists where appropriate
swscale_mmx: drop no longer required parameters from VSCALEX macros
swscale: Mark yuv2planeX_8_mmx as MMX2; it contains MMX2 instructions.
build: conditionally compile x86 H.264 chroma optimizations
v410 encoder and decoder
...
Conflicts:
Changelog
configure
doc/developer.texi
doc/general.texi
libavcodec/arm/asm.S
libavcodec/avcodec.h
libavcodec/v410dec.c
libavcodec/v410enc.c
libavcodec/version.h
libavcodec/x86/Makefile
libavcodec/x86/dsputil_mmx.c
libswscale/x86/swscale_mmx.c
tests/Makefile
tests/fate2.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
ulti: Fix invalid reads
lavf: dealloc private options in av_write_trailer
yadif: support 10bit YUV
vc1: mark with ER_MB_ERROR bits overconsumption
lavc: introduce ER_MB_END and ER_MB_ERROR
error_resilience: use the ER_ namespace
build: move inclusion of subdir.mak to main subdir loop
rv34: NEON optimised 4x4 dequant
rv34: move 4x4 dequant to RV34DSPContext
aacdec: Use intfloat.h rather than local punning union.
Conflicts:
libavcodec/h264.c
libavcodec/vc1dec.c
libavfilter/vf_yadif.c
libavformat/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: cabac: replace explicit memory references with "m" operands
avplay: don't request a stereo downmix
wmapro: use av_float2int()
lavc: avoid invalid memcpy() in avcodec_default_release_buffer()
lavu: replace int/float punning functions
lavfi: install libavfilter/vsrc_buffer.h
Remove extraneous semicolons
sdp: Restore the original mp4 format h264 extradata if converted
rtpenc: Add support for mp4 format h264
rtpenc: Simplify code by introducing a separate end pointer
movenc: Use the actual converted sample for RTP hinting
Fix a bunch of common typos.
Conflicts:
doc/developer.texi
doc/eval.texi
doc/filters.texi
doc/protocols.texi
ffmpeg.c
ffplay.c
libavcodec/mpegvideo.h
libavcodec/x86/cabac.h
libavfilter/Makefile
libavformat/avformat.h
libavformat/cafdec.c
libavformat/flvdec.c
libavformat/flvenc.c
libavformat/gxfenc.c
libavformat/img2.c
libavformat/movenc.c
libavformat/mpegts.c
libavformat/rtpenc_h264.c
libavformat/utils.c
libavformat/wtv.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
libswscale/swscale.c:2744:40: warning: to be safe all intermediate pointers in cast from ‘int16_t **’ to ‘const int16_t **’ must be ‘const’ qualified [-Wcast-qual]
libswscale/swscale.c:2745:41: warning: to be safe all intermediate pointers in cast from ‘int16_t **’ to ‘const int16_t **’ must be ‘const’ qualified [-Wcast-qual]
libswscale/swscale.c:2746:41: warning: to be safe all intermediate pointers in cast from ‘int16_t **’ to ‘const int16_t **’ must be ‘const’ qualified [-Wcast-qual]
libswscale/swscale.c:2747:78: warning: to be safe all intermediate pointers in cast from ‘int16_t **’ to ‘const int16_t **’ must be ‘const’ qualified [-Wcast-qual]
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
libswscale/x86/swscale_mmx.c:131:36: warning: to be safe all intermediate pointers in cast from ‘int16_t **’ to ‘const int16_t **’ must be ‘const’ qualified [-Wcast-qual]
libswscale/x86/swscale_mmx.c:132:37: warning: to be safe all intermediate pointers in cast from ‘int16_t **’ to ‘const int16_t **’ must be ‘const’ qualified [-Wcast-qual]
libswscale/x86/swscale_mmx.c:133:74: warning: to be safe all intermediate pointers in cast from ‘int16_t **’ to ‘const int16_t **’ must be ‘const’ qualified [-Wcast-qual]
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (25 commits)
rtpenc: Add support for G726 audio
rtpdec: Interpret the different G726 names as bits_per_coded_sample
rtpenc: Change rtp_send_samples to handle sample sizes other than even bytes
rtpenc: Cast a rescaling parameter to int64_t
h264: cap max has_b_frames at MAX_DELAYED_PIC_COUNT - 1.
ARM: fix indentation in ff_dsputil_init_neon()
ARM: NEON put/avg_pixels8/16 cosmetics
ARM: add remaining NEON avg_pixels8/16 functions
ARM: clean up NEON put/avg_pixels macros
fate: split acodec-pcm into individual tests
swscale: #include "libavutil/mathematics.h"
pmpdec: don't use deprecated av_set_pts_info.
rv34: align temporary block of "dct" coefs
Add PlayStation Portable PMP format demuxer
proto: Realign struct initializers
proto: Use .priv_data_size to allocate the private context
mmsh: Properly clean up if the second ffurl_alloc failed
rtmp: Clean up properly if the handshake failed
md5proto: Remove the get_file_handle function
applehttpproto: Use the close function if the open function fails
...
Conflicts:
libavcodec/vble.c
libavformat/mmsh.c
libavformat/pmpdec.c
libavformat/udp.c
tests/ref/acodec/pcm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (42 commits)
swscale: fix signed overflow in yuv2mono_X_c_template
snow: fix integer overflows
svq1enc: remove stale altivec-related hack
snow: fix signed overflow in byte to 32-bit replication
adx: rename ff_adx_decode_header() to avpriv_adx_decode_header()
avformat: add CRI ADX format demuxer
adx: add an ADX parser.
adx: move header decoding to ADX common code
adx: calculate the number of blocks in a packet
adx: define and use 2 new macro constants BLOCK_SIZE and BLOCK_SAMPLES
adx: check for unsupported ADX formats
adx: simplify encoding by using put_sbits()
adx: calculate correct LPC coeffs
adx: use 12-bit coefficients instead of 14-bit to avoid integer overflow
adx: simplify adx_decode() by using get_sbits() to read residual samples
adx: fix the data offset parsing in adx_decode_header()
adx: remove unneeded post-decode channel interleaving
adx: validate header values
adx: cosmetics: general pretty-printing and comment clean-up
adx: remove useless comments
...
Conflicts:
Changelog
libavcodec/cook.c
libavcodec/fraps.c
libavcodec/nuv.c
libavcodec/pthread.c
libavcodec/version.h
libavformat/Makefile
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
As old bits are shifted out of the accumulator, they cause signed
overflows when they reach the end. Making the variable unsigned fixes
this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This was removed erroneously in
046f081b46. This define still is
necessary for getting MAP_ANONYMOUS defined on linux/glibc,
despite the define reshuffling done in that commit.
Without MAP_ANONYMOUS defined, the mprotect calls for setting the
generated mmx2 scaler code pages executable are left out, causing
crashes if that codepath is chosen.
This patch fixes scaling from 192x144 to 320x240 with
-sws_flags fast_bilinear, which crashes on linux at the
moment.
Signed-off-by: Martin Storsjö <martin@martin.st>
Although gcc guarantees 16 byte stack alignment, threads under WinXP
don't appear to be guaranteed to start stack aligned. So fix the
alignment.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
libswscale/swscale_unscaled.c:915:9: warning: new qualifiers in middle of multi-level non-const cast are unsafe
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
libswscale/swscale_unscaled.c:805:5: warning: passing argument 1 of ‘check_image_pointers’ from incompatible pointer type
libswscale/swscale_unscaled.c:774:12: note: expected ‘uint8_t **’ but argument is of type ‘const uint8_t * const*’
libswscale/swscale_unscaled.c:809:5: warning: passing argument 1 of ‘check_image_pointers’ discards qualifiers from pointer target type
libswscale/swscale_unscaled.c:774:12: note: expected ‘uint8_t **’ but argument is of type ‘uint8_t * const*’
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
libswscale/utils.c:941:13: warning: passing argument 5 of ‘initMMX2HScaler’ from incompatible pointer type
libswscale/utils.c:524:12: note: expected ‘int32_t *’ but argument is of type ‘int16_t *’
libswscale/utils.c:942:13: warning: passing argument 5 of ‘initMMX2HScaler’ from incompatible pointer type
libswscale/utils.c:524:12: note: expected ‘int32_t *’ but argument is of type ‘int16_t *’
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavf: pass options from AVFormatContext to avio.
avformat: Use avio_open2, pass the AVFormatContext interrupt_callback onwards
avio: add avio_open2, taking an interrupt callback and options
avio: add support for passing options to protocols.
avio: add and use ffurl_protocol_next().
avformat: Pass the interrupt callback on to chained muxers/demuxers
avio: Add an AVIOInterruptCB parameter to ffurl_open/ffurl_alloc
avformat: Use ff_check_interrupt
avio: Add an internal utility function for checking the new interrupt callback
avio: Add AVIOInterruptCB
texi2html: remove stray \n
doc: prettyfy the texi2html documentation
swscale: handle unaligned buffers in yuv2plane1
Conflicts:
libavformat/avformat.h
libavformat/avio.c
libavformat/mov.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Altivec does unaligned reads from this buffer in
hscale_altivec_real(), and can thus read up to 16 bytes beyond
the end of the buffer. Therefore, add an extra 16 bytes of
padding at the end of the conversion buffer.
This fixes fate-lavfi-pixfmts_scale on AltiVec-enabled builds
under valgrind.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* qatar/master: (23 commits)
x86inc: use sse versions of common macros instead of sse2 when applicable
doc/APIchanges: add missing dates and hashes
lavf: don't return from void av_update_cur_dts()
Changelog: add more entries.
Changelog: update ffmpeg/avconv incompatibility list.
avconv: remove some redundant temporary variables.
avconv: fix broken indentation
avconv: move copy_initial_nonkeyframes to the options context.
avconv: use file:stream instead of file.stream in log messages.
doc/avconv: elaborate on basic functionality.
doc/avconv: -sample_fmts, not -help sample_fmts prints the sample formats
openssl: Only use CRYPTO_set_id_callback on OpenSSL < 1.0.0
Call avformat_network_init/deinit in the programs
Remove leftover includes of strings.h
avutil: Don't allow using strcasecmp/strncasecmp
Replace all usage of strcasecmp/strncasecmp
avstring: Add locale independent implementations of strcasecmp/strncasecmp
avstring: Add locale independent implementations of toupper/tolower
cosmetics: insert some spaces in explicit enum value assignments
move 8SVX audio codecs to the audio codec list part on the next bump
...
Conflicts:
avprobe.c
doc/APIchanges
ffplay.c
ffserver.c
libavcodec/avcodec.h
libavdevice/bktr.c
libavdevice/v4l.c
libavdevice/v4l2.c
libavformat/matroskaenc.c
libavformat/wtv.c
libavutil/avstring.c
libavutil/avstring.h
libavutil/avutil.h
libswscale/x86/swscale_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
http: Remove the custom function for disabling chunked posts
rtsp: Disable chunked http post through AVOptions
movdec: Set frame_size for AMR
h264_weight: remove duplication functions.
swscale: align vertical filtersize by 2 on x86.
libavfilter: reindent.
matroskadec: empty blocks are in fact valid.
avfilter: don't abort() on zero-size allocations.
h264: improve calculation of codec delay.
movenc: Set a correct packet size for AMR-NB mode 15, "no data"
avformat: Add functions for doing global network initialization
avformat: Add the https protocol
avformat: Add the tls protocol, using OpenSSL or gnutls
avformat: Initialize gnutls in ff_tls_init()
w32threads: Wrap the mutex functions in inline functions returning int
configure: Allow linking to the gnutls library
avformat: Add ff_tls_init()/deinit() that initialize OpenSSL
configure: Allow linking to openssl
avcodec: Allow locking and unlocking an avformat specific mutex
avformat: Split out functions from network.h to a new file, network.c
Conflicts:
Changelog
configure
doc/APIchanges
libavcodec/internal.h
libavcodec/version.h
libavfilter/formats.c
libavformat/matroskadec.c
libavformat/mov.c
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
Move id3v2 tag writing to a separate file.
swscale: add missing colons to x86 assembly yuv2planeX.
g722: split decoder and encoder into separate files
cosmetics: remove extra spaces before end-of-statement semi-colons
vorbisdec: check output buffer size before writing output
wavpack: calculate bpp using av_get_bytes_per_sample()
ac3enc: Set max value for mode options correctly
lavc: move get_b_cbp() from h263.h to mpeg4videoenc.c
mpeg12: move closed_gop from MpegEncContext to Mpeg1Context
mpeg12: move full_pel from MpegEncContext to Mpeg1Context
mpeg12: move Mpeg1Context from mpeg12.c to mpeg12.h
mpegvideo: remove some unused variables from MpegEncContext.
Conflicts:
libavcodec/mpeg12.c
libavformat/mp3enc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
id3v2: fix doxy comment - 'machine byte order' makes no sense on char arrays
VC1: restore mistakenly removed code
twinvq: check output buffer size before decoding
twinvq: return an error when the packet size is too small
lavf: export some forgotten symbols with non-av prefixes.
swscale: update altivec yuv2planeX asm to new per-plane API.
swscale: make yuv2yuvX_10_sse2/avx 8/9/16-bits aware.
yuv2planeX10 SIMD
swscale: decide whether to use yuv2plane1/X on a per-plane basis.
swscale: reintroduce full precision in 16-bit output.
Split up yuv2yuvX functions
Split out yuv2yuv1 luma and chroma in order to make them generic DSP functions
lavc: replace references to deprecated AVCodecContext.error_recognition to use AVCodecContext.err_recognition
lavc: translate non-flag-based er options into flag-based ef options at codec open
add -err_filter AVOptions to access flag-based error recognition
h264_weight: initialize "height" function argument properly.
presets: spelling error in libvpx 1080p50_60
avplay: fix fullscreen behaviour with SDL 1.2.14 on Mac OS X
Conflicts:
ffplay.c
libavformat/libavformat.v
libswscale/swscale.c
libswscale/x86/swscale_template.c
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (35 commits)
flvdec: Do not call parse_keyframes_index with a NULL stream
libspeexdec: include system headers before local headers
libspeexdec: return meaningful error codes
libspeexdec: cosmetics: reindent
libspeexdec: decode one frame at a time.
swscale: fix signed shift overflows in ff_yuv2rgb_c_init_tables()
Move timefilter code from lavf to lavd.
mov: add support for hdvd and pgapmetadata atoms
mov: rename function _stik, some indentation cosmetics
mov: rename function _int8 to remove ambiguity, some indentation cosmetics
mov: parse the gnre atom
mp3on4: check for allocation failures in decode_init_mp3on4()
mp3on4: create a separate flush function for MP3onMP4.
mp3on4: ensure that the frame channel count does not exceed the codec channel count.
mp3on4: set channel layout
mp3on4: fix the output channel order
mp3on4: allocate temp buffer with av_malloc() instead of on the stack.
mp3on4: copy MPADSPContext from first context to all contexts.
fmtconvert: port float_to_int16_interleave() 2-channel x86 inline asm to yasm
fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm
...
Conflicts:
libavcodec/arm/h264dsp_init_arm.c
libavcodec/h264.c
libavcodec/h264.h
libavcodec/h264_cabac.c
libavcodec/h264_cavlc.c
libavcodec/h264_ps.c
libavcodec/h264dsp_template.c
libavcodec/h264idct_template.c
libavcodec/h264pred.c
libavcodec/h264pred_template.c
libavcodec/x86/h264dsp_mmx.c
libavdevice/Makefile
libavdevice/jack_audio.c
libavformat/Makefile
libavformat/flvdec.c
libavformat/flvenc.c
libavutil/pixfmt.h
libswscale/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
fix AC3ENC_OPT_MODE_ON/OFF
h264: fix HRD parameters parsing
prores: implement multithreading.
prores: idct sse2/sse4 optimizations.
swscale: use aligned move for storage into temporary buffer.
prores: extract idct into its own dspcontext and merge with put_pixels.
h264: fix invalid shifts in init_cavlc_level_tab()
intfloat_readwrite: fix signed addition overflows
mov: do not misreport empty stts
mov: cosmetics, fix for and if spacing
id3v2: fix NULL pointer dereference
mov: read album_artist atom
mov: fix disc/track numbers and totals
doc: fix references to obsolete presets directories for avconv/ffmpeg
flashsv: return more meaningful error value
flashsv: fix typo in av_log() message
smacker: validate channels and sample format.
smacker: check buffer size before reading output size
smacker: validate number of channels
smacker: Separate audio flags from sample rates in smacker demuxer.
...
Conflicts:
cmdutils.h
doc/ffmpeg.texi
libavcodec/Makefile
libavcodec/motion_est_template.c
libavformat/id3v2.c
libavformat/mov.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
ppc: fix some pointer to integer casts
ppc: fix 32-bit PIC build
vmdaudio: fix decoding of 16-bit audio format.
lavf: do not set codec_tag for rawvideo
h264: check for out of bounds reads in ff_h264_decode_extradata().
flvdec: Check for overflow before allocating arrays
avconv: use correct output stream index when checking max_frames
avconv: remove fake coded_frame on streamcopy hack
Conflicts:
avconv.c
libavcodec/h264.c
libavcodec/ppc/asm.S
libavcodec/vmdav.c
libavformat/flvdec.c
libavformat/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Use uintptr_t instead of plain int. Without this change, the
comparisons will come out wrong for pointers in certain ranges.
Fixes random failures on ppc64. Also fixes some compiler warnings.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Replaces a very hackish hack to fix the same issue (call instruction
overwriting stack variables).
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
* qatar/master:
rtp: factorize dynamic payload type fallback
flvdec: Ignore the index if it's from a creator known to be different
cmdutils: move grow_array out of #if CONFIG_AVFILTER
avconv: actually set InputFile.rate_emu
ratecontrol: update last_qscale_for sooner
Fix unnecessary shift with 9/10bit vertical scaling
prores: mark prores as intra-only in libavformat/utils.c:is_intra_only()
prores: return more meaningful error values
prores: improve error message wording
prores: cosmetics: prettyprinting, drop useless parentheses
prores: lowercase AVCodec name entry
Conflicts:
cmdutils.c
libavcodec/proresdec_lgpl.c
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
gcc 4.6 no longer decrements esp to account for local variables.
Thus using call will end up overwriting some local variable.
So add an extra one it can safely clobber.
This is a huge hack because it's basically pure chance it works,
no idea how this is supposed to be done.
Fixes trac ticket #397.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
* qatar/master:
swscale: fix byte overreads in SSE-optimized hscale().
matroskadec: fix typo.
matroskadec: bail on parsing of incorrect seek index segments
Merged-by: Michael Niedermayer <michaelni@gmx.at>
SSE-optimized hScale() scales up to 4 pixels at once, so we need to
allocate up to 3 padding pixels to prevent overreads. This fixes
valgrind errors in various swscale-tests on fate.
* qatar/master:
sws: implement MMX/SSE2/SSSE3/SSE4 versions for horizontal scaling.
include stdint.h in adpcm_data.h
mpeg12: reorder functions to avoid ugly forward declarations
Fixed off by one packet size allocation in the smacker demuxer.
Check for invalid packet size in the smacker demuxer.
ape demuxer: fix segfault on memory allocation failure.
xan: Add some buffer checks
xan: Remove extra trailing newline
Fixed size given to init_get_bits() in xan decoder.
Conflicts:
libavcodec/mpeg12.c
libswscale/x86/swscale_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Speed: from 3.9x to 9.6x speed improvement over C, and some small
(up to 15%) speed improvements over existing MMX code (particularly
for bigger filters).
Author of the fix is ronald, the enabling & commit message are mine.
This fixes
commit 4e3e333a79
Author: Ronald S. Bultje <rsbultje@gmail.com>
Date: Tue Jul 5 12:49:11 2011 -0700
swscale: error dithering for 16/9/10-bit to 8-bit.
Based on a somewhat similar idea in FFmpeg's swscale copy.
The Fix was originally commited in: (and i missed it due to the commit message)
commit 5c391a161a
Author: Ronald S. Bultje <rsbultje@gmail.com>
Date: Fri Jul 8 14:39:04 2011 -0700
swscale: rename uv_off/uv_off2 to uv_off_px/byte.
* qatar/master:
AVOptions: fix av_set_string3() doxy to match reality.
cmdutils: get rid of dummy contexts for examining AVOptions.
lavf,lavc,sws: add {avcodec,avformat,sws}_get_class() functions.
AVOptions: add AV_OPT_SEARCH_FAKE_OBJ flag for av_opt_find().
cpu detection: avoid a signed overflow
Conflicts:
avconv.c
cmdutils.c
doc/APIchanges
ffmpeg.c
libavcodec/options.c
libavcodec/version.h
libavformat/version.h
libavutil/avutil.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (23 commits)
h264: hide reference frame errors unless requested
swscale: split hScale() function pointer into h[cy]Scale().
Move clipd macros to x86util.asm.
avconv: reindent.
avconv: rescue poor abused start_time global.
avconv: rescue poor abused recording_time global.
avconv: merge two loops in output_packet().
avconv: fix broken indentation.
avconv: get rid of the arbitrary MAX_FILES limit.
avconv: get rid of the output_streams_for_file vs. ost_table schizophrenia
avconv: add a wrapper for output AVFormatContexts and merge output_opts into it
avconv: make itsscale syntax consistent with other options.
avconv: factor out adding input streams.
avconv: Factorize combining auto vsync with format.
avconv: Factorize video resampling.
avconv: Don't unnecessarily convert ipts to a double.
ffmpeg: remove unsed variable nopts
RV3/4 parser: remove unused variable 'off'
add XMV demuxer
rmdec: parse FPS in RealMedia properly
...
Conflicts:
avconv.c
libavformat/version.h
libswscale/swscale.c
tests/ref/fate/lmlm4-demux
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This allows using more specific implementations for chroma/luma, e.g.
we can make assumptions on filterSize being constant, thus avoiding
that test at runtime.
* qatar/master:
swscale: add dithering to yuv2yuvX_altivec_real
rv34: free+allocate buffer instead of reallocating it to preserve alignment
h264: add missing brackets.
swscale: use 15-bit intermediates for 9/10-bit scaling.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
It just does that part in scalar form, I doubt using a vector store
over 2 array would speed it up particularly.
The function should be written to not use a scratch buffer.
* qatar/master:
lsws: remove optimization debug logs in sws_init_context()
lsws: use array for storing the supported in/out information
Conflicts:
libswscale/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The logged information is possibly false, and it tends to be outdated
after each change since the logging code needs to be manually updated.
Simplify and prevent confusing wrong debug messages.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Also remove the unnecessary isSupportedIn/Out macros.
Make the code more compact/readable, and simplify the access to
lsws-specific pixel format information.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* qatar/master:
lavc: Deprecate unused FF_ER_VERY_AGGRESSIVE
x11grab: add show_region AVOption.
x11grab: add follow_mouse AVOption.
Do not convert RGB buffer at once when stride does not fit exact samples.
Conflicts:
libswscale/swscale_unscaled.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
When converting RGB format to RGB format with the same bits per sample,
unscaled path performs conversion on the whole buffer at once. For
non-multiple-of-16 BGR24 to RGB24 conversion it means that padding at the
end of line will be converted too. Since it may be of arbitrary length
(e.g. 8 bytes), operating on the whole buffer produces obviously wrong
results.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* qatar/master:
rv30: return AVERROR(EINVAL) instead of EINVAL
build: add -L flags before existing LDFLAGS
simple_idct: whitespace cosmetics
simple_idct: make repeated code a macro
dsputil: remove huge #if 0 block
simple_idct: change 10-bit add/put stride from pixels to bytes
dsputil: allow 9/10-bit functions for non-h264 codecs
dnxhd: rename some data tables
dnxhdenc: remove inline from function only called through pointer
dnxhdenc: whitespace cosmetics
swscale: mark YUV422P10(LE,BE) as supported for output
configure: add -xc99 to LDFLAGS for Sun CC
Remove unused and non-compiling vestigial g729 decoder
Remove unused code under G729_BITEXACT #ifdef.
mpegvideo: fix invalid picture unreferencing.
dsputil: Remove extra blank line at end.
dsputil: Replace a LONG_MAX check with HAVE_FAST_64BIT.
simple_idct: add 10-bit version
Conflicts:
Makefile
libavcodec/g729data.h
libavcodec/g729dec.c
libavcodec/rv30.c
tests/ref/lavfi/pixdesc
tests/ref/lavfi/pixfmts_copy
tests/ref/lavfi/pixfmts_null
tests/ref/lavfi/pixfmts_scale
tests/ref/lavfi/pixfmts_vflip
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
configure: Automatically add more flags required on symbian
mem.h: switch doxygen parameter order to match function prototype
doxygen: replace @sa tag by the more readable but equivalent @see
doxygen: use Doxygen markup for authors and web links where appropriate
doxygen: do not include license boilerplate in Doxygen documentation
ac3enc: Mark AVClasses const
ffserver: Replace two loops with one loop.
ffmpeg: Fix the check for experimental codecs
swscale: extend mmx padding.
swscale: clip unscaled colorspace conversion path.
doxygen: misc consistency cosmetics
doc: remove file name from @file directive in Doxygen usage example
doxygen: consistently place brief description
doxygen: place empty line between brief description and detailed description
avformat_open_input(): Add braces to shut up gcc warning.
Conflicts:
libavcodec/8svx.c
libavcodec/tiff.c
libavcodec/tiff.h
libavcodec/vaapi_h264.c
libavcodec/vorbis.c
libavcodec/vorbisdec.c
libavcodec/vp6.c
libswscale/swscale_unscaled.c
libswscale/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
APIchanges: fill in missing hashes and dates.
Add an APIChanges entry and bump minor versions for recent changes.
ffmpeg: print the low bitrate warning after the codec is openend.
doxygen: Move function documentation into the macro generating the function.
doxygen: Make sure parameter names match between .c and .h files.
h264: move fill_decode_neighbors()/fill_decode_caches() to h264_mvpred.h
H.264: Add more x86 assembly for 10-bit H.264 predict functions
lavf: fix invalid reads in avformat_find_stream_info()
cmdutils: replace opt_default with opt_default2() and remove set_context_opts
ffmpeg: use new avcodec_open2 and avformat_find_stream_info API.
ffplay: use new avcodec_open2 and avformat_find_stream_info API.
cmdutils: store all codec options in one dict instead of video/audio/sub
ffmpeg: check experimental flag after codec is opened.
ffmpeg: do not set GLOBAL_HEADER flag in the options context
Conflicts:
cmdutils.c
doc/APIchanges
ffmpeg.c
ffplay.c
libavcodec/version.h
libavformat/version.h
libswscale/swscale_unscaled.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '142e76f1055de5dde44696e71a5f63f2cb11dedf':
swscale: fix crash with dithering due incorrect offset calculation.
matroskadec: fix stupid typo (!= -> ==)
build: remove duplicates from order-only directory prerequisite list
build: rework rules for things in the tools dir
configure: fix --cpu=host with gcc 4.6
ARM: use const macro to define constant data in asm
bitdepth: simplify FUNC/FUNCC macros
dsputil: remove ff_emulated_edge_mc macro used in one place
9/10-bit: simplify clipping macros
matroskadec: reindent
matroskadec: defer parsing of cues element until we seek.
lavc: add support for codec-specific defaults.
lavc: make avcodec_alloc_context3 officially public.
lavc: remove a half-working attempt at different defaults for audio/video codecs.
ac3dec: add a drc_scale private option
lavf: add avformat_find_stream_info()
lavc: introduce avcodec_open2() as a replacement for avcodec_open().
Conflicts:
Makefile
libavcodec/utils.c
libavformat/avformat.h
libswscale/swscale_internal.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* sws_32bit_integration:
regtests/sws: update checksums for recent changes
sws: dont mess with XInc when the code needing it isnt used
sws: Fix chroma init for 32bit buffers.
swscale: error dithering for 16/9/10-bit to 8-bit.
swscale: fix overflow in 16-bit vertical scaling.
swscale: fix crash in 8-bpc bilinear output without alpha.
swscale: fix 16-bit scaling when output is 8-bits.
sws: fix non native endian 9-15 bit input with 16bit out
sws: disable scale16 when int32 is used
sws: fix rgb -> 16bit
sws: fix uv overwrite in 32bt
sws: fix gray16_1
sws:ix yuv2rgb48_1_c_template()
sws: fix 16/32 bug from merge
swscale: for >8bit scaling, read in native bit-depth.
swscale: fix another yuv range conversion overflow in 16bit scaling. (cherry picked from commit 81cc7d0bd1)
swscale: fix yuv range correction when using 16-bit scaling. (cherry picked from commit e0b8fff6c7)
swscale: implement >8bit scaling support.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
We operated on 31-bits, but with e.g. lanczos scaling, values can
add up to beyond 0x80000000, thus leading to output of zeroes. Drop
one bit of precision fixes this.
ptrdiff_t can be 4 bytes, which leads to the next element being 4-byte
aligned and thus at a different offset than intended. Forcing 8-byte
alignment forces equal offset of dither16/32 on x86-32 and x86-64.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
We operated on 31-bits, but with e.g. lanczos scaling, values can
add up to beyond 0x80000000, thus leading to output of zeroes. Drop
one bit of precision fixes this.
* qatar/master:
ffserver: remove unused variable.
Remove unused and outdated TODO file.
gitignore: Drop individual .d ignore; it is already covered by a wildcard.
lavf: deprecate AVStream.quality.
bink: pass Bink version to audio decoder through extradata instead of codec_tag.
libpostproc: Remove disabled code.
flashsv: improve some comments and fix some wrong ones
flashsv: Eliminate redundant variable indirection.
flashsv: set reference frame type to full frame
flashsv: replace bitstream description by a link to the specification
flashsv: convert a debug av_log into av_dlog
flashsv: simplify condition
flashsv: return more meaningful error values
flashsv: cosmetics: break some overly long lines
flashsv: cosmetics: drop some unnecessary parentheses
swscale: amend documentation to mention use of native depth for scaling.
eval: add missing comma to tests.
eval: fix memleak.
H.264: make loopfilter bS const where applicable
Conflicts:
libavcodec/binkaudio.c
libavformat/bink.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (36 commits)
ARM: allow unaligned buffer in fixed-point NEON FFT4
fate: test more FFT etc sizes
dca: set AVCodecContext frame_size for DTS audio
YASM: Shut up unused variable compiler warning with --disable-yasm.
x86_32: Fix build on x86_32 with --disable-yasm.
iirfilter: add fate test
doxygen: Add qmul docs.
ogg: propagate return values and return more meaningful error values
H.264: fix overreads of qscale_table
Remove unused static tables and static inline functions.
eval: clear Parser instances before using
dct-test: remove 'ref' function pointer from tables
build: Remove deleted 'check' target from .PHONY list.
oggdec: Abort Ogg header parsing when encountering a data packet.
Add LGPL license boilerplate to files lacking it.
mxfenc: small typo fix
doxygen: Fix documentation for some VP8 functions.
sha: use AV_RB32() instead of assuming buffer can be cast to uint32_t*
des: allow unaligned input and output buffers
aes: allow unaligned input and output buffers
...
Conflicts:
libavcodec/dct-test.c
libavcodec/libvpxenc.c
libavcodec/x86/dsputil_mmx.c
libavcodec/x86/h264_qpel_mmx.c
libavfilter/x86/gradfun.c
libavformat/oggdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (40 commits)
H.264: template left MB handling
H.264: faster fill_decode_caches
H.264: faster write_back_*
H.264: faster fill_filter_caches
H.264: make filter_mb_fast support the case of unavailable top mb
Do not include log.h in avutil.h
Do not include pixfmt.h in avutil.h
Do not include rational.h in avutil.h
Do not include mathematics.h in avutil.h
Do not include intfloat_readwrite.h in avutil.h
Remove return statements following infinite loops without break
RTSP: Doxygen comment cleanup
doxygen: Escape '\' in Doxygen documentation.
md5: cosmetics
md5: use AV_WL32 to write result
md5: add fate test
md5: include correct headers
md5: fix test program
doxygen: Drop array size declarations from Doxygen parameter names.
doxygen: Fix parameter names to match the function prototypes.
...
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavformat/flvenc.c
libavformat/oggenc.c
libavformat/wtv.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
get_bits: remove x86 inline asm in A32 bitstream reader
doc: Remove outdated information about our issue tracker
avidec: Factor out the sync fucntionality.
fate-aac: Expand coverage.
ac3dsp: add x86-optimized versions of ac3dsp.extract_exponents().
ac3dsp: simplify extract_exponents() now that it does not need to do clipping.
ac3enc: clip coefficients after MDCT.
ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.
swscale: for >8bit scaling, read in native bit-depth.
matroskadec: matroska_read_seek after after EBML_STOP leads to failure.
doxygen: fix usage of @file directive in libavutil/{dict,file}.h
doxygen: Help doxygen parser to understand the DECLARE_ALIGNED and offsetof macros
Conflicts:
doc/issue_tracker.txt
libavformat/avidec.c
libavutil/dict.h
libswscale/swscale.c
libswscale/utils.c
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
cosmetics: fix some then/than typos
doxygen: Include libavcodec and libavformat examples into the documentation
avutil: elaborate documentation for av_get_random_seed
Add support for aac streams in mp4/mov without extradata.
aes: whitespace cosmetics
adler32: whitespace cosmetics
swscale: fix another yuv range conversion overflow in 16bit scaling.
Fix cpu flags test program
opt-test: Add missing braces to silence compiler warnings.
build: Eliminate obsolete test targets.
udp: Fix a compilation warning
swscale: Unbreak build with --enable-small
base64: add fate test
aes: improve test program and add fate test
adler32: make test program more useful and add fate test
swscale: fix yuv range correction when using 16-bit scaling.
aacenc: Make chan_map const correct
Conflicts:
Makefile
doc/examples/muxing-example.c
libavformat/udp.c
libavutil/random_seed.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (21 commits)
swscale: Add Doxygen for hyscale_fast/hScale.
fate: enable lavfi-pixmt tests on big endian systems
PPC: swscale: disable altivec functions for unsupported formats
fate: merge identical pixdesc_be/le tests
swscale: Add Doxygen for yuv2planar*/yuv2packed* functions.
build: call texi2pod.pl with full path instead of symlink
build: include sub-makefiles using full path instead of symlinks
swscale: update big endian reference values after dff5a835.
wavpack: skip blocks with no samples
cosmetics: remove outdated comment that is no longer true
build: replace some addprefix/addsuffix with substitution refs
avutil: Remove unused arbitrary precision integer code.
configure: Drop check for availability of ten assembler operands.
aacenc: Save channel configuration for later use.
aacenc: Fix codebook trellising for zeroed bands.
swscale: change prototypes of scaled YUV output functions.
swscale: re-add support for non-native endianness.
swscale: disentangle yuv2rgbX_c_full() into small functions.
swscale: split yuv2packed[12X]_c() remainders into small functions.
swscale: split yuv2packedX_altivec in smaller functions.
...
Conflicts:
Makefile
configure
libavcodec/x86/dsputil_mmx.c
libavfilter/Makefile
libavformat/Makefile
libavutil/integer.c
libavutil/integer.h
libswscale/swscale.c
libswscale/swscale_internal.h
libswscale/x86/swscale_template.c
tests/ref/lavfi/pixdesc_le
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Remove unused variables "flags" and "dstFormat" in yuv2packed1,
merge source rows per plane for yuv2packed[12], and make every
source argument int16_t (some where invalidly set to uint16_t).
This prevents stack pollution and is part of the Great Evil Plan
to simplify swscale.
This will likely lead to a considerable performance boost,
since it removes a branch from the inner loop. Part of the
Great Evil Plan to simplify swscale.
* qatar/master:
build: improve rules for test programs
build: factor out the .c and .S compile commands as a macro
swscale: remove unused xInc/srcW arguments from hScale().
H.264: disable 2tap qpel with CODEC_FLAG2_FAST and >8-bit
H.264: make filter_mb_fast support 4:4:4
mpeg4videoenc: Remove disabled variant of mpeg4_encode_block().
configure: allow post-fixed cpu strings for athlon64, k8, and opteron when setting the -march flag.
Move some variable declarations below the proper #ifdefs.
Conflicts:
Makefile
ffplay.c
libswscale/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>