Derek Buitenhuis
b056482ef3
Merge commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c'
...
* commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c':
build: Add vc1dsp component for more fine-grained dependencies
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-24 18:21:38 +00:00
James Almer
45d3af9059
x86/dcadec: add ff_lfe_fir1_float_{sse3,avx}
...
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-22 21:21:34 -03:00
Diego Biurrun
15a24614ae
build: Add vc1dsp component for more fine-grained dependencies
2016-02-19 20:38:18 +01:00
Derek Buitenhuis
04e4166536
Merge commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45'
...
* commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45':
v210: Use separate sample_factors
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-16 17:23:32 +00:00
Derek Buitenhuis
8f8381bf03
Merge commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a'
...
* commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a':
v210: x86: Add the correct guards around the asm code
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-16 17:02:56 +00:00
James Almer
70d685a77f
x86: use the new helper macros where useful
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:21 -03:00
Timothy Gu
bcc223523e
x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM format
...
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
2016-02-14 11:11:02 -08:00
Timothy Gu
59ebf32bca
huffyuvencdsp: Undefine "i" macro after each use
2016-02-07 09:19:17 -08:00
James Almer
8ae7447941
x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}
...
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-06 01:36:55 -03:00
Timothy Gu
9fd6ea933f
dirac_dwt: Make x86 files/functions names consistent
2016-02-05 19:30:23 -08:00
Timothy Gu
17ab8f7e68
diracdsp: Make x86 files/functions names consistent
2016-02-05 19:29:43 -08:00
Henrik Gramner
aa751573fe
avcodec/h264: Fix segfault in 4:2:2 chroma deblock with 32-bit msvc
...
Using rNm and x86inc's stack allocation with a negative value at the same
time isn't supported, and caused the original stack pointer to be clobbered
when using a compiler that doesn't support stack alignment.
2016-02-05 22:01:38 +01:00
James Darnley
7042a55c55
avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter
...
2.6 times faster (366 vs. 142 cycles)
2016-02-05 17:26:04 +01:00
Timothy Gu
dd57b316c1
diracdsp_mmx: Fix some more indentations
2016-02-01 20:47:56 -08:00
Timothy Gu
f5e2b8de55
diracdsp_mmx: Fix indentation
2016-02-01 20:41:33 -08:00
Timothy Gu
838abfc1d7
x86: vc1dsp: Convert vc1_inv_trans_*_dc to NASM format
2016-02-01 17:01:11 -08:00
Luca Barbato
e280fe1329
v210: Use separate sample_factors
...
The 10bit and the 8bit functions can now be implemented to process
a different amount of samples.
And while at it simplify a little the code.
2016-02-01 13:40:07 +01:00
James Darnley
15ec7aa417
v210: Add avx2 version of the 10-bit line encoder
...
Around 25% faster than the ssse3 version.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-02-01 13:40:07 +01:00
James Darnley
d29237e557
v210: Add avx2 version of the 8-bit line encoder
...
Around 35% faster than the avx version.
Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-02-01 13:40:07 +01:00
Timothy Gu
180f9a0958
all: Make header guard names consistent
2016-01-31 15:44:11 -08:00
foo86
ae5b2c5250
avcodec/dca: add new decoder based on libdcadec
2016-01-31 17:09:38 +01:00
foo86
4608996772
avcodec/dca: remove old decoder
...
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
2016-01-31 17:09:38 +01:00
James Almer
c792528970
x86/imdct36: use extractps inside the STORE macro
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-01-28 13:35:15 -03:00
Derek Buitenhuis
ea2df33052
Merge commit '4f22b138886e29f7fffa8c715673951e51be9f32'
...
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-01-27 18:23:31 +00:00
Luca Barbato
eafb05fcf3
v210: x86: Add the correct guards around the asm code
...
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-01-26 23:31:57 +01:00
James Almer
209f50e16b
avcodec/synth_filter: split off remaining code from dcadec files
...
Signed-off-by: James Almer <jamrial@gmail.com>
2016-01-25 14:57:38 -03:00
Geza Lore
cc602061ee
x86inc: Add debug symbols indicating sizes of compiled functions
...
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:46:28 +01:00
Geza Lore
d39c229e54
x86inc: Add debug symbols indicating sizes of compiled functions
...
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
2016-01-21 23:19:46 +01:00
Ronald S. Bultje
0f88b3f82f
videodsp: fix 1-byte overread in top/bottom READ_NUM_BYTES iterations.
...
This can overread (either before start or beyond end) of the buffer in
Nx1 (i.e. height=1) images.
Fixes mozilla bug 1240080.
2016-01-18 11:12:47 -05:00
Diego Biurrun
03ef89faf2
x86: build: Group all encoder objects together
2016-01-18 14:47:58 +01:00
Diego Biurrun
4f22b13888
x86: ac3dsp: Drop forward declaration for nonexisting function
2016-01-18 11:55:38 +01:00
James Darnley
f59b727e2f
avcodec/v210: guard new avx2 functions from old assemblers
2016-01-17 21:23:58 +01:00
James Darnley
2cba1825f7
avcodec/v210: add avx2 version of the 10-bit line encoder
...
Around 25% faster than the ssse3 version.
2016-01-17 16:03:43 +01:00
James Darnley
3836f404a8
avcodec/v210: add avx2 version of the 8-bit line encoder
...
Around 35% faster than the avx version.
Signed-off-by: Henrik Gramner <henrik@gramner.com>
2016-01-17 16:03:43 +01:00
Michael Niedermayer
da6f34516b
avcodec/x86/fmtconvert: Add emms to int32_to_float_fmul_array8_sse()
...
this should fix checkasm on x86_64-archlinux-gcc-valgrind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-01-15 17:08:37 +01:00
Hendrik Leppkes
2214207d04
Merge commit '8563f9887194b07c972c3475d6b51592d77f73f7'
...
* commit '8563f9887194b07c972c3475d6b51592d77f73f7':
x86: use emms after ff_int32_to_float_fmul_scalar_sse
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:27:11 +01:00
Hendrik Leppkes
a9cd11b212
Merge commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c'
...
* commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c':
x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:23:25 +01:00
Hendrik Leppkes
d03da3e240
Merge commit '2008f76054906e9ff6bf744800af0e5a5bfe61be'
...
* commit '2008f76054906e9ff6bf744800af0e5a5bfe61be':
dca: remove unused decode_hf function and quant_d tables
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:17:48 +01:00
Hendrik Leppkes
00e91d0676
Merge commit '5dfe4edad63971d669ae456b0bc40ef9364cca80'
...
* commit '5dfe4edad63971d669ae456b0bc40ef9364cca80':
x86_64: int32_to_float_fmul_scalar sign extend integer length
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 10:46:18 +01:00
Janne Grunau
8563f98871
x86: use emms after ff_int32_to_float_fmul_scalar_sse
...
Intel's Instruction Set Reference (as of September 2015) clearly states
that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
source is a memory location. The Instruction Set Reference from 1999
(Order Number 243191) describes this behaviour but all later versions
I've seen have make no distinction whether MMX registers or memory is
used as source.
The documentation for the matching SSE2 instruction to convert to double
(cvtpi2pd) was fixed (see the valgrind bug
https://bugs.kde.org/show_bug.cgi?id=210264 ).
It will take time to get a clarification and fixes in place. In the
meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
be correct according to the documentation. The vast majority of users
will have SSE2 so a change to the SSE version has little effect.
Fixes fate-checkasm on x86 valgrind targets.
Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
2015-12-30 13:37:57 +01:00
Janne Grunau
f4f27e4cf1
x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly
...
This reverts commit 5dfe4edad6
.
2015-12-29 11:42:51 +01:00
Alexandra Hájková
2008f76054
dca: remove unused decode_hf function and quant_d tables
...
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
2015-12-24 13:58:18 +01:00
James Almer
d4c47333e1
x86/hevc_sao: add ff_hevc_sao_edge_filter_{8,16}_{10,12}
...
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 17:01:15 -03:00
James Almer
3ff2beff65
x86/hevc_sao: simplify sao_edge_filter 10/12bit
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 16:45:37 -03:00
James Almer
34b2bd03cf
x86/hevc_sao: simplify sao_band_filter 10/12bit
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 16:42:36 -03:00
Janne Grunau
5dfe4edad6
x86_64: int32_to_float_fmul_scalar sign extend integer length
2015-12-14 16:42:35 +01:00
Dave Yeo
b0b133b8c0
hevcdsp: use a macro for .rodata section
...
fixes assembling on OS/2
Signed-off-by: Dave Yeo <dave.r.yeo@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-12-11 16:19:30 +01:00
Kieran Kunhya
3f07f12f65
diracdec: Template DSP functions adding 10-bit versions
2015-12-10 18:25:02 +00:00
Anton Khirnov
e7078e842d
hevcdsp: add x86 SIMD for MC
2015-12-05 21:11:52 +01:00
Timothy Gu
4b80b895a9
pixblockdsp: x86: Condense diff_pixels_* to a shared macro
...
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
2015-11-07 14:31:34 -08:00