Martin Storsjö
676a9ee1d2
x86: Fix constraints for decode_significance*_x86
...
Originally, prior to 8742a4ff8, the caller code was compiled
within this condition:
ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS)
Since HAVE_7REGS is defined as
(ARCH_X86_64 || (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE))
the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal
to HAVE_7REGS (for 32 bit at least). The correct simplification
of the original condition thus is HAVE_7REGS, not
HAVE_EBX_AVAILABLE.
This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0
and HAVE_EBX_AVAILABLE = 1.
Signed-off-by: Martin Storsjö <martin@martin.st>
2011-12-27 09:05:14 +02:00
Diego Biurrun
6fdb2ce34a
x86: Tighten register constraints for decode_significance*_x86.
...
On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register
is not available, but required to assemble the functions.
This reverts commit 8742a4f to a simplified version of the original constraints.
2011-12-21 12:06:37 +01:00
Diego Biurrun
30bbd5cbc0
x86: conditionally compile dnxhd encoder optimizations
2011-12-19 13:54:10 +01:00
Diego Biurrun
88b9735753
build: conditionally compile x86 H.264 chroma optimizations
2011-12-14 11:58:45 +01:00
Martin Storsjö
f1dba9e498
x86: Require 7 registers for the cabac asm
...
The change in 599b4c6ef didn't turn out to work properly on
i386 on OS X, where it broke building with PIC enabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
2011-12-12 15:36:20 +02:00
Mans Rullgard
599b4c6efd
x86: cabac: replace explicit memory references with "m" operands
...
This replaces the explicit offset(reg) memory references with
"m" operands for the same locations. As a result, one fewer
register operand is needed for these inline asm statements.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-11 22:29:22 +00:00
Diego Biurrun
da9cea77e3
Fix a bunch of common typos.
2011-12-11 00:32:25 +01:00
Justin Ruggles
0e8fdd41c2
dsputil: use cpuflags in x86 emu_edge_core
...
avoids passing around the extra argument among all the macros it uses
2011-11-22 15:40:51 -05:00
Justin Ruggles
395f2e70dd
dsputil: use movups instead of movdqu in ff_emu_edge_core_sse()
...
This allows emulated_edge_mc_sse() and gmc_sse() to be used under
AV_CPU_FLAG_SSE.
2011-11-22 15:40:51 -05:00
Justin Ruggles
9d06037d48
twinvq: add SSE/AVX optimized sum/difference stereo interleaving
2011-11-11 14:13:58 -05:00
Diego Biurrun
ce33320b30
Remove redundant filename self-references inside files.
...
Filenames are brittle across renames and add no useful information.
2011-11-08 17:52:56 +01:00
Diego Biurrun
276b995d85
x86: drop pointless ARCH_X86 #ifdef from files in x86 subdirectory
2011-11-08 17:52:55 +01:00
Justin Ruggles
b8f02f5b4e
dsputil: use cpuflags in x86 versions of vector_clip_int32()
2011-11-06 20:50:06 -05:00
Ronald S. Bultje
717401aff2
h264_weight: remove duplication functions.
2011-11-05 07:16:30 -07:00
Justin Ruggles
5463e83dbc
fmtconvert: fix int32_to_float_fmul_scalar() for windows x86_64
...
The calling convention only allows 4 non-stack parameter, with each
float or int register being skipped if not used.
fixes Bug 64
2011-11-02 21:44:58 -04:00
Daniel Kang
ded3e9f054
H.264: Cometics to dsputil_mmx.c
...
Add whitespace.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-10-26 06:41:32 -07:00
Ronald S. Bultje
b0b3231074
h264_weight: initialize "height" function argument properly.
...
Right now it's not actually initialized on 32-bit, leading to crashes
on win32.
2011-10-22 00:23:24 -07:00
Justin Ruggles
aad3429d4e
fmtconvert: port float_to_int16_interleave() 2-channel x86 inline asm to yasm
2011-10-21 10:13:05 -04:00
Justin Ruggles
4e8e262476
fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm
2011-10-21 10:13:05 -04:00
Justin Ruggles
185142a5ea
fmtconvert: check compile-time x86 instruction set flags
2011-10-21 10:13:05 -04:00
Justin Ruggles
708ab7dd69
fmtconvert: port float_to_int16() x86 inline asm to yasm
2011-10-21 10:13:05 -04:00
Ronald S. Bultje
c2d337429c
H264: change weight/biweight functions to take a height argument.
...
Neon parts by Mans Rullgard <mans@mansr.com>.
2011-10-21 01:00:45 -07:00
Ronald S. Bultje
229d263cc9
Support for lossless and inter H264 4:2:2.
2011-10-21 01:00:45 -07:00
Baptiste Coudurier
76741b0e56
h264: 4:2:2 intra decoding support
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-10-21 01:00:41 -07:00
Diego Biurrun
265980dabc
x86: Move some variable declarations below the appropriat #ifdef.
...
This avoids some unused variable warnings with YASM disabled.
2011-10-20 16:19:27 +02:00
Diego Biurrun
2cb7c81669
x86: Fix linking of ProRes DSP ASM with YASM disabled.
2011-10-20 16:19:13 +02:00
Ronald S. Bultje
05c8f119cc
proresdsp: fix function prototypes.
...
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2011-10-14 21:34:46 +02:00
Ronald S. Bultje
e3f530feca
prores: idct sse2/sse4 optimizations.
...
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
2011-10-11 07:50:48 -07:00
Sean McGovern
c2d3f56107
fft: avoid a signed overflow
...
As a signed integer, 1<<31 overflows, so force it to unsigned.
Signed-off-by: Alex Converse <alex.converse@gmail.com>
2011-09-23 17:02:58 -07:00
Ronald S. Bultje
38e06c2969
Move clipd macros to x86util.asm.
...
This allows sharing them between multiple .asm files.
2011-08-17 20:56:06 -07:00
Dave Yeo
cc73511e8e
Fix NASM include directive
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-15 11:24:35 -07:00
Alex Converse
48f7163f13
dsputil_mmx: Honor HAVE_AMD3DNOW
2011-08-15 11:20:08 -07:00
Ronald S. Bultje
b2c087871d
Move x86util.asm from libavcodec/ to libavutil/.
...
This allows using it in swscale also.
2011-08-12 11:43:03 -07:00
Ronald S. Bultje
3a39195b1d
Move x86inc.asm to libavutil/.
...
This allows using it in libswscale/ also.
2011-08-12 11:43:02 -07:00
Kostya Shishkov
d241f51e0f
Move RV3/4-specific DSP functions into their own context
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-11 16:07:15 -07:00
Vitor Sessak
18b131de04
dct32: Add SSE2 ASM optimizations
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-02 10:17:29 -07:00
Jason Garrett-Glaser
a3bf7b864a
H.264: tweak some other x86 asm for Atom
2011-07-29 12:24:15 -07:00
Mans Rullgard
3ad1684126
x86: cabac: add operand size suffixes missing from 6c32576
...
This fixes build with clang.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-28 18:59:23 -07:00
Mans Rullgard
f5f004bc5a
x86: cabac: don't load/store context values in asm
...
Inspection of compiled code shows gcc handles these fine on its own.
Benchmarking also shows no measurable speed difference.
Removing the remaining cases in get_cabac_bypass_sign_x86() does
cause more substantial changes to the compiled code with uncertain
impact.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-28 22:25:21 +01:00
Jason Garrett-Glaser
6c32576548
H.264: optimize CABAC x86 asm for Atom
2011-07-28 13:06:13 -07:00
Mans Rullgard
da4c7cce21
x86: fix build with gcc 4.7
...
The upcoming gcc 4.7 has more advanced constant propagation
resulting some inline asm operands becoming constants and thus
emitted as literals, sometimes in contexts where this results
in invalid instructions.
This patch changes the constraints of the relevant operands
to "rm" thus forcing a valid type. While obviously suboptimal,
this is what older gcc versions already did, and there is no
change to the code generated with these.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-26 22:17:43 +01:00
Daniel Kang
406fbd24dc
H.264: Add optimizations to predict x86 assembly.
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-22 14:54:33 -07:00
Joseph Artsimovich
5ab21439fd
dnxhd: 10-bit support
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:44:40 +01:00
Mans Rullgard
a617c6aaa3
dsputil: update per-arch init funcs for non-h264 high bit depth
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
874f1a901d
dsputil: template get_pixels() for different bit depths
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
0a72533e98
jfdctint: add 10-bit version
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
e7a972e113
simple_idct: add 10-bit version
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-20 17:49:48 +01:00
Diego Biurrun
65083b4911
dsputil: remove disabled code
2011-07-18 11:48:35 +02:00
Martin Storsjö
8f62ef0f95
x86: Use LOCAL_ALIGNED in mpegvideo_mmx_template
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2011-07-18 00:10:45 +03:00
Diego Biurrun
e0ae2174db
simple_idct: remove disabled code
2011-07-17 17:32:37 +02:00