1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-12 19:18:44 +02:00
Commit Graph

2109 Commits

Author SHA1 Message Date
Timothy Gu
bcc223523e x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM format
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
2016-02-14 11:11:02 -08:00
Timothy Gu
59ebf32bca huffyuvencdsp: Undefine "i" macro after each use 2016-02-07 09:19:17 -08:00
James Almer
8ae7447941 x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-06 01:36:55 -03:00
Timothy Gu
9fd6ea933f dirac_dwt: Make x86 files/functions names consistent 2016-02-05 19:30:23 -08:00
Timothy Gu
17ab8f7e68 diracdsp: Make x86 files/functions names consistent 2016-02-05 19:29:43 -08:00
Henrik Gramner
aa751573fe avcodec/h264: Fix segfault in 4:2:2 chroma deblock with 32-bit msvc
Using rNm and x86inc's stack allocation with a negative value at the same
time isn't supported, and caused the original stack pointer to be clobbered
when using a compiler that doesn't support stack alignment.
2016-02-05 22:01:38 +01:00
James Darnley
7042a55c55 avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter
2.6 times faster (366 vs. 142 cycles)
2016-02-05 17:26:04 +01:00
Timothy Gu
dd57b316c1 diracdsp_mmx: Fix some more indentations 2016-02-01 20:47:56 -08:00
Timothy Gu
f5e2b8de55 diracdsp_mmx: Fix indentation 2016-02-01 20:41:33 -08:00
Timothy Gu
838abfc1d7 x86: vc1dsp: Convert vc1_inv_trans_*_dc to NASM format 2016-02-01 17:01:11 -08:00
Timothy Gu
180f9a0958 all: Make header guard names consistent 2016-01-31 15:44:11 -08:00
foo86
ae5b2c5250 avcodec/dca: add new decoder based on libdcadec 2016-01-31 17:09:38 +01:00
foo86
4608996772 avcodec/dca: remove old decoder
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
2016-01-31 17:09:38 +01:00
James Almer
c792528970 x86/imdct36: use extractps inside the STORE macro
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-01-28 13:35:15 -03:00
Derek Buitenhuis
ea2df33052 Merge commit '4f22b138886e29f7fffa8c715673951e51be9f32'
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-01-27 18:23:31 +00:00
James Almer
209f50e16b avcodec/synth_filter: split off remaining code from dcadec files
Signed-off-by: James Almer <jamrial@gmail.com>
2016-01-25 14:57:38 -03:00
Geza Lore
d39c229e54 x86inc: Add debug symbols indicating sizes of compiled functions
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.

Currently only implemented for ELF.
2016-01-21 23:19:46 +01:00
Ronald S. Bultje
0f88b3f82f videodsp: fix 1-byte overread in top/bottom READ_NUM_BYTES iterations.
This can overread (either before start or beyond end) of the buffer in
Nx1 (i.e. height=1) images.

Fixes mozilla bug 1240080.
2016-01-18 11:12:47 -05:00
Diego Biurrun
4f22b13888 x86: ac3dsp: Drop forward declaration for nonexisting function 2016-01-18 11:55:38 +01:00
James Darnley
f59b727e2f avcodec/v210: guard new avx2 functions from old assemblers 2016-01-17 21:23:58 +01:00
James Darnley
2cba1825f7 avcodec/v210: add avx2 version of the 10-bit line encoder
Around 25% faster than the ssse3 version.
2016-01-17 16:03:43 +01:00
James Darnley
3836f404a8 avcodec/v210: add avx2 version of the 8-bit line encoder
Around 35% faster than the avx version.

Signed-off-by: Henrik Gramner <henrik@gramner.com>
2016-01-17 16:03:43 +01:00
Michael Niedermayer
da6f34516b avcodec/x86/fmtconvert: Add emms to int32_to_float_fmul_array8_sse()
this should fix checkasm on x86_64-archlinux-gcc-valgrind

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-01-15 17:08:37 +01:00
Hendrik Leppkes
2214207d04 Merge commit '8563f9887194b07c972c3475d6b51592d77f73f7'
* commit '8563f9887194b07c972c3475d6b51592d77f73f7':
  x86: use emms after ff_int32_to_float_fmul_scalar_sse

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:27:11 +01:00
Hendrik Leppkes
a9cd11b212 Merge commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c'
* commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c':
  x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:23:25 +01:00
Hendrik Leppkes
d03da3e240 Merge commit '2008f76054906e9ff6bf744800af0e5a5bfe61be'
* commit '2008f76054906e9ff6bf744800af0e5a5bfe61be':
  dca: remove unused decode_hf function and quant_d tables

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:17:48 +01:00
Hendrik Leppkes
00e91d0676 Merge commit '5dfe4edad63971d669ae456b0bc40ef9364cca80'
* commit '5dfe4edad63971d669ae456b0bc40ef9364cca80':
  x86_64: int32_to_float_fmul_scalar sign extend integer length

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 10:46:18 +01:00
Janne Grunau
8563f98871 x86: use emms after ff_int32_to_float_fmul_scalar_sse
Intel's Instruction Set Reference (as of September 2015) clearly states
that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
source is a memory location. The Instruction Set Reference from 1999
(Order Number 243191) describes this behaviour but all later versions
I've seen have make no distinction whether MMX registers or memory is
used as source.
The documentation for the matching SSE2 instruction to convert to double
(cvtpi2pd) was fixed (see the valgrind bug
https://bugs.kde.org/show_bug.cgi?id=210264).

It will take time to get a clarification and fixes in place. In the
meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
be correct according to the documentation. The vast majority of users
will have SSE2 so a change to the SSE version has little effect.

Fixes fate-checkasm on x86 valgrind targets.

Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
2015-12-30 13:37:57 +01:00
Janne Grunau
f4f27e4cf1 x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly
This reverts commit 5dfe4edad6.
2015-12-29 11:42:51 +01:00
Alexandra Hájková
2008f76054 dca: remove unused decode_hf function and quant_d tables
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
2015-12-24 13:58:18 +01:00
James Almer
d4c47333e1 x86/hevc_sao: add ff_hevc_sao_edge_filter_{8,16}_{10,12}
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 17:01:15 -03:00
James Almer
3ff2beff65 x86/hevc_sao: simplify sao_edge_filter 10/12bit
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 16:45:37 -03:00
James Almer
34b2bd03cf x86/hevc_sao: simplify sao_band_filter 10/12bit
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 16:42:36 -03:00
Janne Grunau
5dfe4edad6 x86_64: int32_to_float_fmul_scalar sign extend integer length 2015-12-14 16:42:35 +01:00
Dave Yeo
b0b133b8c0 hevcdsp: use a macro for .rodata section
fixes assembling on OS/2

Signed-off-by: Dave Yeo <dave.r.yeo@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-12-11 16:19:30 +01:00
Kieran Kunhya
3f07f12f65 diracdec: Template DSP functions adding 10-bit versions 2015-12-10 18:25:02 +00:00
Anton Khirnov
e7078e842d hevcdsp: add x86 SIMD for MC 2015-12-05 21:11:52 +01:00
Timothy Gu
4b80b895a9 pixblockdsp: x86: Condense diff_pixels_* to a shared macro
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
2015-11-07 14:31:34 -08:00
Ganesh Ajjanagadde
38f4e973ef all: fix -Wextra-semi reported on clang
This fixes extra semicolons that clang 3.7 on GNU/Linux warns about.
These were trigggered when built under -Wpedantic, which essentially
checks for strict ISO compliance in numerous ways.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-24 17:58:17 -04:00
Ronald S. Bultje
52f84d82bd videodsp: don't overread edges in vfix3 emu_edge.
Fixes trac ticket 3226. Also see Andreas' analysis in
https://bugs.debian.org/801745, which was very helpful.
2015-10-24 14:34:50 -04:00
Michael Niedermayer
ea5a1d1485 avcodec/x86/vc1dsp: Remove unused macro
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-22 21:13:42 +02:00
Carl Eugen Hoyos
775b84e30e lavc/x86/vc1dsp_init: Fix compilation with --disable-yasm. 2015-10-22 11:37:42 +02:00
James Almer
73353af6e5 x86/Makefile: move decoder/encoder objects out of the subsystems section
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-22 03:55:18 -03:00
Timothy Gu
ab5f43e634 vc1dsp: Port ff_vc1_put_ver_16b_shift2_mmx to yasm
This function is only used within other inline asm functions, hence the
HAVE_MMX_INLINE guard. Per recent discussions, we should not worry about
the performance of inline asm-only builds.
2015-10-21 20:01:52 -07:00
Timothy Gu
98da061461 huffyuvencdsp: Cherry pick changes left out in the last commit
Oops.
2015-10-21 12:42:33 -07:00
Timothy Gu
5e586e1bef huffyuvencdsp: Add ff_diff_bytes_{sse2,avx2}
SSE2 version 4%-35% faster than MMX depending on the width.
AVX2 version 1%-13% faster than SSE2 depending on the width.
2015-10-21 12:25:32 -07:00
Timothy Gu
6b41b44149 huffyuvencdsp: Convert ff_diff_bytes_mmx to yasm
Heavily based upon ff_add_bytes by Christophe Gisquet.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2015-10-20 18:24:54 -07:00
Timothy Gu
068e6cb732 huffyuvencdsp: Use intptr_t for width
It is done this way in huffyuvdsp as well.
2015-10-19 16:57:33 -07:00
Timothy Gu
a079cbf458 x86: vc1dsp_mmx: Move yasm initiation steps to vc1dsp_init
That's where all yasm initiation steps are. Also removes the overlap
between the two files.
2015-10-19 16:52:52 -07:00
Timothy Gu
607f820ec7 x86: fpel: Remove erroneous ff_put_pixels8_mmxext prototype
This function does not exist.
2015-10-19 16:52:37 -07:00