1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

725 Commits

Author SHA1 Message Date
Michael Niedermayer
ea5dab58e0 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  dwt: check malloc calls
  ppc: Drop unused header regs.h
  af_resample: remove an extra space in the log output
  Convert vector_fmul range of functions to YASM and add AVX versions
  lavfi: add an audio split filter
  lavfi: rename vf_split.c to split.c

Conflicts:
	doc/filters.texi
	libavcodec/ppc/regs.h
	libavfilter/Makefile
	libavfilter/allfilters.c
	libavfilter/f_split.c
	libavfilter/split.c
	libavfilter/version.h
	libavfilter/vf_split.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-22 23:42:17 +02:00
Kieran Kunhya
5ff01259a8 Convert vector_fmul range of functions to YASM and add AVX versions
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-05-21 17:13:05 -04:00
Michael Niedermayer
703e920bb7 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  fate: Work around non-standard wc implementations at more places
  fate: work around non-standard wc implementations
  x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions.
  ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16
  fate: use standard diff options
  tta: Fix comment about channel number; TTA supports >2 channels.
  avfilter: Move ff_get_ref_perms_string() to where it is used.
  build: Add 'check' target to run all compile and test targets.
  indeo3: validate new frame size before resetting decoder
  indeo3: when freeing buffers, set pointers referencing them to NULL as well
  indeo3: initialise pixel planes on allocation
  indeo3: ensure that decoded cell data is in 7-bit range as presumed by decoder
  fate: rename psx-str-v3-mdec to mdec-v3
  fate: convert psx-str to a demuxer test
  lavf: add mdec to is_intra_only() list

Conflicts:
	doc/developer.texi
	libavcodec/indeo3.c
	libavfilter/video.c
	libavformat/utils.c
	tests/fate/demux.mak
	tests/fate/video.mak
	tests/lavf-regression.sh
	tests/ref/vsynth1/cljr
	tests/ref/vsynth1/ffvhuff
	tests/ref/vsynth2/cljr
	tests/ref/vsynth2/ffvhuff

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-16 22:32:05 +02:00
Michael Kostylev
6797d1948b x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions. 2012-05-15 23:54:08 +02:00
Justin Ruggles
95a98ab3f0 ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16
Simplifies the code by using cpuflags and a new macro.
Also fixes the invalid use of the MMX2 pshufw operation in the MMX-only
function.
2012-05-15 15:23:59 -04:00
Michael Niedermayer
7e944159c6 Merge remote-tracking branch 'qatar/master'
* qatar/master: (25 commits)
  vcr1: Add vcr1_ prefixes to all static functions with generic names.
  vcr1: Fix return type of common_init to match the function pointer signature.
  vcr1enc: Replace obsolete get_bit_count by put_bits_count/flush_put_bits.
  motion-test: remove disabled code
  gxfenc: remove disabled half-implemented MJPEG tag
  x86: use more standard construct for setting ASM functions in FFT code
  fate: westwood-aud: disable decoding
  fate: caf: disable decoding
  fate: film-cvid: drop pcm audio and rename test
  fate: d-cinema-demux: drop unnecessary flags
  fate: split off dpcm-interplay from interplay-mve tests
  fate: rename funcom-iss to adpcm-ima-iss
  fate: rename cryo-apc to adpcm-ima-apc
  fate: rename adpcm-psx-str-v3 to adpcm-xa
  fate: split off adpcm-ms-mono test from dxa-feeble
  fate: split off adpcm-ima-ws test from vqa-cc
  fate: add adpcm-ima-smjpeg test
  fate: split off adpcm-ima-amv from amv test
  fate: separate bmv audio and video tests
  fate: separate delphine-cin audio and video tests
  ...

Conflicts:
	doc/platform.texi
	libavcodec/vcr1.c
	tests/fate/audio.mak
	tests/fate/demux.mak
	tests/fate/video.mak
	tests/ref/fate/ea-mad-pcm-planar
	tests/ref/fate/interplay-mve-16bit
	tests/ref/fate/interplay-mve-8bit
	tests/ref/fate/mtv
	tests/ref/fate/qtrle-1bit
	tests/ref/fate/qtrle-2bit
	tests/ref/fate/truemotion1-15
	tests/ref/fate/truemotion1-24
	tests/ref/fate/vqa-cc

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-14 20:17:24 +02:00
Vitor Sessak
fcc456b829 x86: use more standard construct for setting ASM functions in FFT code
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-05-14 15:38:42 +02:00
Michael Niedermayer
1caf614bec Merge remote-tracking branch 'qatar/master'
* qatar/master:
  lavfi: autoinsert resample filter when necessary.
  lavfi: add lavr-based audio resampling filter.
  x86: vc1: drop MMX loop filter implementation, which uses MMX2 instructions.

Conflicts:
	configure
	doc/filters.texi
	libavcodec/x86/vc1dsp_mmx.c
	libavfilter/Makefile
	libavfilter/allfilters.c
	libavfilter/avfiltergraph.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 00:13:49 +02:00
Michael Kostylev
ea60dfe284 x86: vc1: drop MMX loop filter implementation, which uses MMX2 instructions. 2012-05-12 14:02:45 +02:00
Michael Niedermayer
015903294c Merge remote-tracking branch 'qatar/master'
* qatar/master: (25 commits)
  rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC
  ape: Use unsigned integer maths
  arm: dsputil: fix overreads in put/avg_pixels functions
  h264: K&R formatting cosmetics for header files (part II/II)
  h264: K&R formatting cosmetics for header files (part I/II)
  rtmp: Implement check bandwidth notification.
  rtmp: Support 'rtmp_swfurl', an option which specifies the URL of the SWF player.
  rtmp: Support 'rtmp_flashver', an option which overrides the version of the Flash plugin.
  rtmp: Support 'rtmp_tcurl', an option which overrides the URL of the target stream.
  cmdutils: Add fallback case to switch in check_stream_specifier().
  sctp: be consistent with socket option level
  configure: Add _XOPEN_SOURCE=600 to Solaris preprocessor flags.
  vcr1enc: drop pointless empty encode_init() wrapper function
  vcr1: drop pointless write-only AVCodecContext member from VCR1Context
  vcr1: group encoder code together to save #ifdefs
  vcr1: cosmetics: K&R prettyprinting, typos, parentheses, dead code, comments
  mov: make one comment slightly more specific
  lavr: replace the SSE version of ff_conv_fltp_to_flt_6ch() with SSE4 and AVX
  lavfi: move audio-related functions to a separate file.
  lavfi: remove some audio-related function from public API.
  ...

Conflicts:
	cmdutils.c
	libavcodec/h264.h
	libavcodec/h264_mvpred.h
	libavcodec/vcr1.c
	libavfilter/avfilter.c
	libavfilter/avfilter.h
	libavfilter/defaults.c
	libavfilter/internal.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-10 23:30:42 +02:00
Christophe Gisquet
110d0cdc9d rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC
Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
  the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions

Benchmark (rounded to tens of unit):
        V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
C       445    358   985    1785    1559    3280
MMX*    219    271   478     714     929    1443
SSE2    131    158   294     425     515     892
SSSE3   120    122   248     387     390     763

End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-05-10 18:42:43 +02:00
Ronald S. Bultje
bec207f9f9 snowdsp: explicitily state instruction size.
Fixes a compile error with clang at -O0.
2012-05-02 09:57:12 -07:00
Michael Niedermayer
dfa07e8928 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  4xm: fix invalid array indexing
  rv34dsp: factorize a multiplication in the noround inverse transform
  rv40: perform bitwise checks in loop filter
  rv34: remove inline keyword from rv34_decode_block().
  rv40: change a logical test into a bitwise one.
  rv34: remove constant parameter
  rv40: don't always do the full prev_type search
  dsputil x86: revert a test back to its previous value
  rv34dsp x86: implement MMX2 inverse transform

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 21:45:54 +02:00
Roland Scheidegger
82c71913e4 h264: new assembly version of get_cabac for x86_64 with PIC
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register.
There is a surprisingly large performance improvement over the c version (more
so than the generated assembly seems to suggest) just in get_cabac, I measured
roughly 40% faster for get_cabac on a K8. However, overall the difference is
not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
Hopefully it still compiles on x86 32bit...
Now that only one table is used, there's some chance even darwin as compiles
this (apparently the label arithmetic used previously doesn't work if it
involves symbols defined in a different file, thanks to Ronald S. Bultje for
helping me with this).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 20:02:27 +02:00
Roland Scheidegger
7f668cd2b5 h264: use one table instead of several for cabac functions
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.

The assembly uses the new table but still won't work for PIC case.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 20:02:27 +02:00
Roland Scheidegger
5520df6a8f h264: (trivial) remove unneeded macro argument in x86/cabac.h
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 20:02:27 +02:00
Christophe GISQUET
e75d1d4f73 dsputil x86: revert a test back to its previous value
Commit 356ee8d caused the initial inversion.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 11:00:51 -07:00
Christophe Gisquet
fe5ed69dc7 rv34dsp x86: implement MMX2 inverse transform
141 cycles down to 51.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 10:58:47 -07:00
Roland Scheidegger
9b9df1cdff h264: new assembly version of get_cabac for x86_64 with PIC
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register. get_cabac() gets about 40% faster, for
an overall speedup of about 5%.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 09:43:25 -07:00
Roland Scheidegger
14e9ffc1e4 h264: use one table instead of several for cabac functions
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.

The assembly uses the new table but still won't work for PIC case.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 08:26:12 -07:00
Roland Scheidegger
444f47b55c h264: (trivial) remove unneeded macro argument in x86/cabac.h
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-28 08:24:56 -07:00
Michael Niedermayer
70d54392f5 lowres2 support.
The new lowres support is limited to decoders where lowres decoding
is possible in high quality.
I was not able to measure any speed difference, but if one is found
the 2-3 lines that might affect speed can be made compile time conditional

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-22 22:26:55 +02:00
Michael Niedermayer
92ef4be4ab Merge remote-tracking branch 'qatar/master'
* qatar/master:
  ARM: allow runtime masking of CPU features
  dsputil: remove unused functions
  mov: Treat keyframe indexes as 1-origin if starting at non-zero.
  mov: Take stps entries into consideration also about key_off.
  Remove lowres video decoding

Conflicts:
	ffmpeg.c
	ffplay.c
	libavcodec/arm/vp8dsp_init_arm.c
	libavcodec/libopenjpegdec.c
	libavcodec/mjpegdec.c
	libavcodec/mpegvideo.c
	libavcodec/utils.c
	libavformat/mov.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-22 22:26:42 +02:00
Michael Niedermayer
c047afb80c Merge remote-tracking branch 'qatar/master'
* qatar/master:
  avcodec: remove AVCodecContext.dsp_mask
  avconv: fix a segfault when default encoder for a format doesn't exist.
  utvideo: general cosmetics
  aac: Handle HE-AACv2 when sniffing a channel order.
  movenc: Support high sample rates in isomedia formats by setting the sample rate field in stsd to 0.
  xxan: Remove write-only variable in xan_decode_frame_type0().
  ivi_common: Initialize a variable at declaration in ff_ivi_decode_blocks().

Conflicts:
	ffmpeg.c
	libavcodec/utvideo.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-21 22:56:07 +02:00
Mans Rullgard
2bcbd98459 Remove lowres video decoding
This feature is complex, of questionable utility, and slows down
normal decoding.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-21 18:56:19 +01:00
Mans Rullgard
95510be8c3 avcodec: remove AVCodecContext.dsp_mask
This removes all references to AVCodecContext.dsp_mask and marks
it for eviction at the next version bump.  It has been superseded
by av_set_cpu_flag_mask() which, unlike this field, works everywhere.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-21 18:30:01 +01:00
Michael Niedermayer
9849515214 Revert "h264: assembly version of get_cabac for x86_64 with PIC (v4)"
This broke compilation on darwin, revert until a better solution is found.

This reverts commit a812b599b5.
2012-04-21 02:09:27 +02:00
Roland Scheidegger
a812b599b5 h264: assembly version of get_cabac for x86_64 with PIC (v4)
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register.
There is a surprisingly large performance improvement over the c version (more
so than the generated assembly seems to suggest) just in get_cabac, I measured
roughly 40% faster for get_cabac on a K8. However, overall the difference is
not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
Hopefully it still compiles on x86 32bit...
v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
for every table, and got rid of unnecessary @GOTPCREL.
v3: apply similar fixes to the the decode_significance functions, and use
same macro arguments for non-pic case.
v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
the c code to be faster otherwise since both cmov and sbb suck hard on a
Prescott, even can't construct the mask with a 64bit shift as that's just as
terrible - it's quite difficult to find usable instructions on that chip...).
This is tested to work but not on a P4, in theory it _should_ be fast there.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-21 00:27:06 +02:00
Michael Niedermayer
15141f939d Merge remote-tracking branch 'qatar/master'
* qatar/master:
  indeo3: add parens around some macro arguments
  h264: use proper PROLOGUE statement for a function using 8 registers.
  doc: Update sample Vim config with suitable (function) indentation settings.
  dv: Merge dvquant.h into dvdata.c where all other DV tables reside.
  dv: Move static tables only used in one place to where they are used.
  graphparser: set next to NULL on an entry extracted from inputs list
  doc/filters: update documentation.
  avconv: flush decoders immediately after an EOF.
  avconv: send EOF to vsrc_buffer.
  avconv: reindent.

Conflicts:
	doc/filters.texi
	ffmpeg.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-17 12:13:22 +02:00
Ronald S. Bultje
87a246341b h264: use proper PROLOGUE statement for a function using 8 registers.
Fixes crashes when using biweight on win64.
2012-04-16 08:07:21 -07:00
Michael Niedermayer
7432bcfe5a Merge remote-tracking branch 'qatar/master'
* qatar/master:
  vsrc_buffer: fix check from 7ae7c41.
  libxvid: Reorder functions to avoid forward declarations; make functions static.
  libxvid: drop some pointless dead code
  wmal: vertical alignment cosmetics
  wmal: Warn about missing bitstream splicing feature and ask for sample.
  wmal: Skip seekable_frame_in_packet.
  wmal: Drop unused variable num_possible_block_size.
  avfiltergraph: make the AVFilterInOut alloc/free API public
  graphparser: allow specifying sws flags in the graph description.
  graphparser: fix the order of connecting unlabeled links.
  graphparser: add avfilter_graph_parse2().
  vsrc_buffer: allow using a NULL buffer to signal EOF.
  swscale: handle last pixel if lines have an odd width.
  qdm2: fix a dubious pointer cast
  WMAL: Do not try to read rawpcm coefficients if bits is invalid
  mov: Fix detecting there is no sync sample.
  tiffdec: K&R cosmetics
  avf: has_duration does not check the global one
  dsputil: fix optimized emu_edge function on Win64.

Conflicts:
	doc/APIchanges
	libavcodec/libxvid_rc.c
	libavcodec/libxvidff.c
	libavcodec/tiff.c
	libavcodec/wmalosslessdec.c
	libavfilter/avfiltergraph.h
	libavfilter/graphparser.c
	libavfilter/version.h
	libavfilter/vsrc_buffer.c
	libswscale/output.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-14 22:37:43 +02:00
Michael Niedermayer
367d9b2957 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  swscale: K&R formatting cosmetics (part II)
  tiffdec: Add a malloc check and refactor another.
  faxcompr: Check malloc results and unify return path
  configure: escape colons in values written to config.fate
  ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE
  matroska: Fix leaking memory allocated for laces.
  pthread: Fix crash due to fctx->delaying not being cleared.
  vp3: Assert on invalid filter_limit values.
  h264: fix 10bit biweight functions after recent x86inc.asm fixes.
  ffv1: Fix size mismatch in encode_line.
  movenc: Remove a dead initialization
  git-howto: Explain how to avoid Windows line endings in git checkouts.
  build: Move all arch OBJS declarations into arch subdirectory Makefiles.

Conflicts:
	configure
	libavcodec/vp3.c
	libavformat/matroskadec.c
	libavutil/Makefile
	libswscale/Makefile
	libswscale/swscale.c
	libswscale/swscale_internal.h
	libswscale/utils.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-13 21:50:37 +02:00
Ronald S. Bultje
b089ca871a dsputil: fix optimized emu_edge function on Win64.
Recent register allocation changes (x86inc.asm update) changed the
register order and thus opcodes for the inner loops. One of them became
>128bytes, which confuses other parts of this function where it jumps
to fixed-offset positions to extend the edge by fixed amounts. A simple
register change fixes this.
2012-04-13 11:28:30 -07:00
Justin Ruggles
de7f22ab0c ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE
Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-12 21:33:04 -07:00
Ronald S. Bultje
76538d7a78 h264: fix 10bit biweight functions after recent x86inc.asm fixes.
This should have been updated in the x86inc.asm update, but was
accidently forgotten.
2012-04-12 21:13:57 -07:00
Michael Niedermayer
ca19862d38 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  libxvid: remove disabled code
  qdm2: make a table static const
  qdm2: simplify bitstream reader setup for some subpacket types
  qdm2: use get_bits_left()
  build: Consistently handle conditional compilation for all optimization OBJS.
  avpacket, bfi, bgmc, rawenc: K&R prettyprinting cosmetics
  msrle: convert MS RLE decoding function to bytestream2.
  x86inc improvements for 64-bit

Conflicts:
	common.mak
	libavcodec/avpacket.c
	libavcodec/bfi.c
	libavcodec/msrledec.c
	libavcodec/qdm2.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-13 00:39:19 +02:00
Diego Biurrun
7bb3a302fe build: Consistently handle conditional compilation for all optimization OBJS. 2012-04-12 09:00:49 +02:00
Henrik Gramner
729f90e268 x86inc improvements for 64-bit
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-11 15:47:00 -04:00
Michael Niedermayer
e387c9d5dd Merge remote-tracking branch 'qatar/master'
* qatar/master: (22 commits)
  rv40dsp x86: use only one register, for both increment and loop counter
  rv40dsp: implement prescaled versions for biweight.
  avconv: use default channel layouts when they are unknown
  avconv: parse channel layout string
  nutdec: K&R formatting cosmetics
  vda: Signal 4 byte NAL headers to the decoder regardless of what's in the extradata
  mem: Consistently return NULL for av_malloc(0)
  vf_overlay: implement poll_frame()
  vf_scale: support named constants for sws flags.
  lavc doxy: add all installed headers to doxy groups.
  lavc doxy: add avfft to the main lavc group.
  lavc doxy: add remaining avcodec.h functions to a misc doxygen group.
  lavc doxy: add AVPicture functions to a doxy group.
  lavc doxy: add resampling functions to a doxy group.
  lavc doxy: replace \ with /
  lavc doxy: add encoding functions to a doxy group.
  lavc doxy: add decoding functions to a doxy group.
  lavc doxy: fix formatting of AV_PKT_DATA_{PARAM_CHANGE,H263_MB_INFO}
  lavc doxy: add AVPacket-related stuff to a separate doxy group.
  lavc doxy: add core functions/definitions to a doxy group.
  ...

Conflicts:
	ffmpeg.c
	libavcodec/avcodec.h
	libavcodec/vda.c
	libavcodec/x86/rv40dsp.asm
	libavfilter/vf_scale.c
	libavformat/nutdec.c
	libavutil/mem.c
	tests/ref/acodec/pcm_s24daud

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-10 22:53:25 +02:00
Christophe GISQUET
2130bd8f5b rv40dsp x86: use only one register, for both increment and loop counter
Around 10 cycles faster for luma.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-10 10:07:09 -07:00
Christophe GISQUET
272b252c01 rv40dsp: implement prescaled versions for biweight.
Quite often, the original weights are multiple of 512. By prescaling them
by 1/512 when they are computed (once per frame), no intermediate shifting
is needed, and no prescaling on each call either.

The x86 code already used that trick.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-10 10:06:48 -07:00
Michael Niedermayer
2c5a2958e9 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  h264: Factorize declaration of mb_sizes array.
  vsrc_buffer: when no frame is available, return an error instead of segfaulting.
  configure: add dl to frei0r extralibs.
  dsputil x86: use SSE float instruction instead of SSE2 integer equivalent
  dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype
  vp8dsp x86: perform rounding shift with a single instruction
  fate: add BMP tests.
  swscale: handle complete dimensions for monoblack/white.
  aacenc: Mark deinterleave_input_samples argument as const.
  vf_unsharp: Mark readonly variable as const.
  h264: fix 4:2:2 PCM-macroblocks decoding

Conflicts:
	configure
	libavcodec/h264.h
	libavcodec/x86/dsputil_mmx.c
	libavfilter/vf_unsharp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-05 22:26:50 +02:00
Christophe GISQUET
6b81da2fd0 dsputil x86: use SSE float instruction instead of SSE2 integer equivalent
All the more required since the users are pure SSE functions.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-04 11:24:27 -07:00
Christophe GISQUET
cd88105f6f dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-04 11:24:08 -07:00
Christophe GISQUET
f9888520cc vp8dsp x86: perform rounding shift with a single instruction
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-04 11:23:36 -07:00
Michael Niedermayer
226671ee2f dsputil_mmx: fix scalarproduct prototypes
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-01 22:04:05 +02:00
Michael Niedermayer
d40ff29cac Merge remote-tracking branch 'qatar/master'
* qatar/master:
  asf: only set index_read if the index contained entries.
  cabac: add overread protection to BRANCHLESS_GET_CABAC().
  cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().
  cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().
  cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().
  h264: add overread protection to get_cabac_bypass_sign_x86().
  h264: reindent get_cabac_bypass_sign_x86().
  h264: use struct offsets in get_cabac_bypass_sign_x86().
  h264: fix overreads in cabac reader.
  wmall: fix seeking.
  lagarith: fix buffer overreads.
  dvdec: drop unnecessary dv_tablegen.h #include
  build: fix doc generation errors in parallel builds
  Replace memset(0) by zero initializations.
  faandct: Remove FAAN_POSTSCALE define and related code.
  dvenc: print allowed profiles if the video doesn't conform to any of them.
  avcodec_encode_{audio,video}: only reallocate output packet when it has non-zero size.
  FATE: add a test for vp8 with changing frame size.
  fate: add kgv1 fate test.
  oggdec: calculate correct timestamps in Ogg/FLAC

Conflicts:
	libavcodec/4xm.c
	libavcodec/cook.c
	libavcodec/dvdata.c
	libavcodec/dvdsubdec.c
	libavcodec/lagarith.c
	libavcodec/lagarithrac.c
	libavcodec/utils.c
	tests/fate/video.mak

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-03-29 04:11:10 +02:00
Ronald S. Bultje
a940198130 cabac: add overread protection to BRANCHLESS_GET_CABAC().
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
2012-03-28 08:01:29 -07:00
Ronald S. Bultje
448dc42571 cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC(). 2012-03-28 08:01:29 -07:00
Ronald S. Bultje
16f6e83f74 cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE(). 2012-03-28 08:01:29 -07:00