Christophe Gisquet
08e3ea60ff
x86: synth filter float: implement SSE2 version
...
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-28 13:00:48 +01:00
Christophe Gisquet
ad507d7907
x86: dcadsp: implement SSE lfe_dir
...
Results for Arrandale/Windows:
32: 1670 -> 316
64: 728 -> 298
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-28 13:00:47 +01:00
Diego Biurrun
b23650491f
prores: Use consistent names for DSP arch initialization functions
2014-02-28 10:34:55 +01:00
Diego Biurrun
017a06a9ee
x86: dsputil: Use correct file name as multiple inclusion guard
2014-02-20 04:16:15 -08:00
Diego Biurrun
b23bc95920
x86: dca: Add missing multiple inclusion guards
2014-02-19 10:19:15 +01:00
Janne Grunau
5c1c6e8226
dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders
2014-02-08 13:38:36 +01:00
Janne Grunau
0cffd6fff5
x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbale
...
Fixes compilation with MSVC. Also does not rely on on earlier config.h
include but include it directly.
2014-02-08 12:10:56 +01:00
Christophe Gisquet
5b59a9fc61
x86: dcadsp: implement int8x8_fmul_int32
...
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-07 22:52:40 +01:00
Ronald S. Bultje
9ee9c679a7
x86: videodsp: Fix a bug in a %if statement where we used '%%' instead of '&&'.
...
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-01-30 15:33:23 +01:00
Ronald S. Bultje
51daafb02e
x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such.
...
Should fix crashes or corrupt output on pre-SSE2 CPUs when they were
using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in
hfix or hvar single-edge (left/right) extension functions.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-01-30 15:30:01 +01:00
Diego Biurrun
aab40bbfd5
x86: dsputil: Simplify xvmc deprecation conditional
2014-01-15 15:23:46 +01:00
Diego Biurrun
46bacb5cc6
x86: Consistently use cpu flag detection macros in places that still miss it
2014-01-14 00:04:58 +01:00
Diego Biurrun
4c642d8d98
x86: hpeldsp: Add missing av_cold attribute to init function
2014-01-09 15:09:07 +01:00
Diego Biurrun
b0be1ae792
x86: avcodec: Add a bunch of missing #includes for av_cold
2014-01-09 15:09:07 +01:00
Anton Khirnov
a03a642d5c
h264: do not use 422 functions for monochrome
...
Fixes invalid memory access.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
2014-01-06 08:25:36 +01:00
Anton Khirnov
dfc50ac85e
x86: mpegvideo: move denoise_dct asm to mpegvideoenc
...
This function is encoding-only.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-12-20 17:16:11 +01:00
Diego Biurrun
4958f35a2e
dsputil: Move apply_window_int16 to ac3dsp
...
The (optimized) functions are used nowhere else.
2013-12-08 17:57:15 +01:00
Diego Biurrun
3d7c84747d
x86: Initialize mmxext after amd3dnow optimizations
...
The mmxext optimizations should be at least equally fast if available and
amd3dnow optimizations are being deprecated. Thus the former should
override the latter, not the other way around.
2013-12-04 18:52:48 +01:00
Diego Biurrun
7ffaa19570
dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo
...
The table is MMX-specific and used nowhere else.
2013-12-02 04:05:18 +01:00
Diego Biurrun
cf7860db60
x86: dsputil: Suppress deprecation warnings for XvMC bits
...
These parts are scheduled for removal on the next version bump.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2013-11-28 16:04:30 +01:00
Ronald S. Bultje
72ca830f51
lavc: VP9 decoder
...
Originally written by Ronald S. Bultje <rsbultje@gmail.com> and
Clément Bœsch <u@pkh.me>
Further contributions by:
Anton Khirnov <anton@khirnov.net>
Diego Biurrun <diego@biurrun.de>
Luca Barbato <lu_zero@gentoo.org>
Martin Storsjö <martin@martin.st>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-11-15 10:16:28 +01:00
Ronald S. Bultje
458446acfa
lavc: Edge emulation with dst/src linesize
...
Allow supporting files for which the image stride is smaller than
the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9
file or a 16x16 VP8 file with -fflags +emu_edge.
2013-11-15 10:16:27 +01:00
Diego Biurrun
19e30a58fc
Deprecate obsolete XvMC hardware decoding support
...
XvMC has long ago been superseded by newer acceleration APIs, such as
VDPAU, and few downstreams still support it. Furthermore XvMC is not
implemented within the hwaccel framework, but requires its own specific
code in the MPEG-1/2 decoder, which is a maintenance burden.
2013-11-13 21:07:45 +01:00
Diego Biurrun
0338c39698
dsputil: Split off H.263 bits into their own H263DSPContext
2013-11-08 12:40:47 +01:00
Diego Biurrun
e2b5b09789
x86: rv40dsp: Use PAVGB instruction macro where appropriate
2013-11-04 21:14:39 +01:00
Mikulas Patocka
694d997afe
x86: hpeldsp: Use PAVGB instruction macro where necessary
...
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-11-04 01:29:23 +01:00
Diego Biurrun
1700b4e678
x86: vp8dsp: Split loopfilter code into a separate file
2013-11-01 22:05:20 +01:00
Diego Biurrun
6405ca7d4a
x86: h264_idct: Update comments to match 8/10-bit depth optimization split
2013-10-07 21:46:46 +02:00
Henrik Gramner
bbe4a6db44
x86inc: Utilize the shadow space on 64-bit Windows
...
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Diego Biurrun
ce1e8045e0
x86: fdct: Employ more specific ifdefs
...
This avoids building mmxext and sse2 code when disabled by configure.
2013-10-06 22:02:25 +02:00
Diego Biurrun
2ddb35b911
x86: dsputil: Separate ff_add_hfyu_median_prediction_cmov from dsputil_mmx
...
The function does not depend on MMX and compilation without MMX enabled
fails if the function is compiled conditional on MMX availability.
2013-10-05 19:21:15 +02:00
Diego Biurrun
258414d077
x86: fdct: Initialize optimized fdct implementations in the standard way
2013-10-05 18:20:52 +02:00
Diego Biurrun
0b8b2ae5e9
x86: xviddct: Employ more specific ifdefs
...
This avoids building mmxext and sse2 code when disabled by configure.
2013-10-05 18:14:58 +02:00
Diego Biurrun
6cc133ec58
x86: fdct: Only build fdct code if encoders have been enabled
...
fdct is only initialized if encoders are enabled.
2013-10-04 10:50:44 +02:00
Martin Storsjö
1daea5232f
x86: Add an xmm clobbering wrapper for avcodec_encode_video2
...
This is required since 187105ff8
when we started trying to
wrap this function as well.
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-09-16 22:22:41 +03:00
Hendrik Leppkes
a06a5b78e2
mathops/x86: work around inline asm miscompilation with GCC 4.8.1
...
The volatile is not required here, and prevents a miscompilation with GCC
4.8.1 when building on x86 with --cpu=i686
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-09-15 11:15:07 -04:00
Diego Biurrun
e998b56362
x86: avcodec: Consistently structure CPU extension initialization
2013-08-29 13:07:37 +02:00
Diego Biurrun
6369ba3c9c
x86: avcodec: Use convenience macros to check for CPU flags
2013-08-29 13:07:37 +02:00
Diego Biurrun
cd52917237
x86: rv40dsp: Move inline assembly optimizations out of YASM init section
2013-08-28 23:59:24 +02:00
Diego Biurrun
a64f6a04ac
dsputil: x86: Hide arch-specific initialization details
...
Also give consistent names to init functions.
2013-08-28 23:59:24 +02:00
Diego Biurrun
8506ff97c9
vp56: Mark VP6-only optimizations as such.
...
Most of our VP56 optimizations are VP6-only and will stay that way.
So avoid compiling them for VP5-only builds.
2013-08-23 14:42:19 +02:00
Diego Biurrun
e7b31844f6
x86: Split DCT and FFT initialization into separate files
2013-08-21 20:15:27 +02:00
Diego Biurrun
0b45269c2d
x86: h264_idct: Remove incorrect comment
2013-08-21 15:09:58 +02:00
Diego Biurrun
3ac7fa81b2
Consistently use "cpu_flags" as variable/parameter name for CPU flags
2013-07-18 00:31:35 +02:00
Christophe Gisquet
b6293e2798
fmtconvert: Explicitly use int32_t instead of int
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-17 11:02:47 +03:00
Luca Barbato
2b379a9251
mlpdsp: x86: Respect cpuflags
2013-07-12 04:34:49 +02:00
Jason Garrett-Glaser
d222f6e39e
cabac: x86 version of get_cabac_bypass
...
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-07-04 16:06:10 +02:00
Diego Biurrun
186599ffe0
build: cosmetics: Place unconditional before conditional OBJS lines
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:31 +03:00
Diego Biurrun
004b81c465
mpegvideo: Remove commented-out PARANOID debug cruft
2013-05-15 23:53:42 +02:00
Diego Biurrun
1399931d07
x86: dsputil: Rename dsputil_mmx.h --> dsputil_x86.h
...
The header is not (anymore) MMX-specific.
2013-05-12 22:28:07 +02:00