1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-24 13:56:33 +02:00

172 Commits

Author SHA1 Message Date
James Almer
224a529b44 x86/vf_w3fdif: use aligned loads in w3fdif_simple_high
Found-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-11 20:07:12 -03:00
James Almer
e8903fbf8e x86/vf_w3fdif: simplify w3fdif_simple_high
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-11 20:04:54 -03:00
James Almer
d2bf2d094e x86/vf_w3fdif: move pxor outside the loop in w3fdif_complex_low
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-11 14:23:21 -03:00
Paul B Mahol
c3d312bb7f avfilter/x86/vf_w3fdif: add colons after labels
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-10 17:55:06 +02:00
Paul B Mahol
5740dc27e1 avfilter/vf_w3fdif: add x86 SIMD
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-10 17:33:43 +02:00
Andreas Cadhalpun
8d6625642d doc: fix spelling errors
Reviewed-by: Lou Logan <lou@lrcd.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-10-09 22:09:08 +02:00
Paul B Mahol
624a1a0e69 avfilter/x86/vf_blend.asm: hardmix: do same with two pxor instructions less
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-07 23:12:09 +02:00
Paul B Mahol
e999210cec avfilter/x86/vf_blend.asm: 11th register is used, update functions
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-07 22:53:54 +02:00
Paul B Mahol
0948ba3204 avfilter/x86/vf_blend.asm: add hardmix and phoenix sse2 SIMD
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-07 22:50:15 +02:00
Paul B Mahol
ac74e857a2 avfilter/vf_stereo3d: add x86 SIMD for anaglyph outputs
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-06 21:01:24 +02:00
Michael Niedermayer
fd9a528523 avfilter/vf_blend: Fix argument types, fix segfault in asm
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-03 21:59:24 +02:00
Paul B Mahol
9762554dd0 avfilter/vf_blend: add x86 SIMD for some modes
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-03 21:26:17 +02:00
Paul B Mahol
160556c9ad avfilter/vf_maskedmerge: add SIMD for maskedmerge with 8 bit depth input
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-02 17:40:57 +02:00
Paul B Mahol
0701ff2c32 avfilter/x86/vf_psnr.asm: fix typo
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-01 21:53:13 +02:00
Hendrik Leppkes
5d8e836d0e Replace all remaining occurances of step/depth_minus1 and offset_plus1 2015-09-08 17:10:48 +02:00
Ronald S. Bultje
ad45121d56 options: mark av_get_{int,double,q} as deprecated.
Convert last users to av_opt_get_*() counterparts.
2015-08-18 12:05:17 -04:00
Henrik Gramner
f0b7882ceb x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
2015-08-04 20:13:09 +02:00
James Almer
d9e10af547 x86/vf_interlace: add missing colon to labels
Silences warnings with Nasm

Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-26 02:50:50 -03:00
James Almer
e3851169ee x86/vf_ssim: add ff_ssim_4x4_line_xop
~20% faster than ssse3. Also enabled for x86_32

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-20 13:18:05 -03:00
James Almer
e1778fb657 x86/vf_ssim: fix some instruction comments
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-20 13:17:58 -03:00
Paul B Mahol
eea08efc0d avfilter/x86/vf_psnr.asm: split one line of license text into two
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-07-14 23:54:26 +00:00
James Darnley
bff7242608 avfilter/vf_removegrain: add x86 and x86_64 SSE2 functions
Speed of all modes increased by a factor between 7.4 and 19.8 largely depending
on whether bytes are unpacked into words.  Modes 2, 3, and 4 have been sped-up
by a factor of 43 (thanks quick sort!)

All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20,
21, and 22 are available on x86 due to the number of SIMD registers used.

With a contribution from James Almer <jamrial@gmail.com>
2015-07-14 23:50:50 +00:00
Ronald S. Bultje
ae4c9ddebc vf_psnr: sse2 optimizations for sum-squared-error.
The internal line accumulator for 16bit can overflow, so I changed that
from int to uint64_t in the C code. The matching assembly looks a little
weird but output looks correct.

(avx2 should be trivial to add later.)

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-14 17:57:14 +02:00
Ronald S. Bultje
dfc58584b4 vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.
Both are 2-2.5x faster than their C counterpart.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-14 05:07:07 +02:00
James Almer
c16e99e3b3 x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-01 00:15:35 +02:00
Michael Niedermayer
52fc3e372f avfilter/x86/vf_hqdn3d: Fix register types
Fixes Ticket4301

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-27 05:18:55 +02:00
Michael Niedermayer
5bc2c39527 avfilter/x86/vf_fspp: Fix invalid combination of opcode and operands
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-26 01:43:47 +02:00
Michael Niedermayer
a6f9a5d0f6 avfilter/x86/vf_fspp: Fix loop condition for column_fidct()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-01-28 17:23:27 +01:00
Michael Niedermayer
f5b3257c50 avfilter/vf_eq: mark src as const
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-01-27 01:46:08 +01:00
Michael Niedermayer
530bf8ece6 avfilter/vf_eq: Fix clipping code
Found-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-01-26 23:46:44 +01:00
Arwa Arif
4c38e960d0 avfilter: Port mp=eq/eq2 to lavfi
Code adapted from James Darnley's port
Some fixes from Paul B Mahol <onemda@gmail.com>

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-01-26 00:14:04 +01:00
James Almer
da02ee127a x86/vf_pp7: port dctB_mmx to yasm
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-09 20:02:27 -03:00
Arwa Arif
a299cd5ab3 lavfi: port mp=pp7 to libavfilter
The only difference with mp=pp7 is that default mode is "medium", as stated
in the MPlayer docs, rather than "hard".

Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
2015-01-09 17:26:31 +01:00
James Almer
a4f876a1a2 x86/vf_fspp: move pxor in store slice functions out of the loop
m7 is not overwritten, so we only need to clear it once.
Found by Christophe Gisquet.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-26 17:15:34 -03:00
James Almer
466e32bf25 x86/vf_fspp: port inline asm to yasm
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-26 15:39:51 -03:00
James Almer
b94e85453e avfilter/vf_fspp: add missing inline asm guards 2014-12-24 15:44:06 -03:00
Arwa Arif
bdc4db0ee3 lavfi: port mp=fspp to a native libavfilter filter
Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
2014-12-24 16:29:18 +01:00
Michael Niedermayer
6706a2986c avfilter/vf_spp: Fix overflow in 8bit store slice
Fixes regression with
ffplay -f lavfi -i testsrc=640x480  -vf format=gray,boxblur=20:10,geq="'mod(lum(X,Y),16)*15'",boxblur=10,geq="'abs(mod(lum(X,Y),15)-7)*32'",spp=4:40

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-21 01:48:19 +01:00
Michael Niedermayer
838aa08d75 avfilter/vf_spp: support 10bit per sample
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-15 18:49:35 +01:00
Michael Niedermayer
30d2ac4bf9 avfilter/vf_spp: change temporary to unsigned
More consistent with uspp and allows for future 10bit support

Found-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-12 13:34:18 +01:00
Michael Niedermayer
ca59b5b6ec avfilter/x86/vf_interlace: remove redundant instructions
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-25 12:37:19 +01:00
Michael Niedermayer
3fe3c8abb1 Merge commit 'ca5c3ff90972a5c97aabda2ace57ba72dcd7d83b'
* commit 'ca5c3ff90972a5c97aabda2ace57ba72dcd7d83b':
  vf_interlace: x86: improve asm performance

Conflicts:
	libavfilter/x86/vf_interlace.asm

See: 05e4b25e9b0a3586033dc21548b03c8e5071efe3
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-25 12:31:45 +01:00
Michael Niedermayer
ca5c3ff909 vf_interlace: x86: improve asm performance
4775 decicycles -> 3688 decicycles
2014-11-25 02:00:06 +00:00
Michael Niedermayer
05e4b25e9b avfilter/x86/vf_interlace: rewrite asm
4775 decicycles -> 3688 decicycles

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-15 04:09:03 +01:00
Michael Niedermayer
fb3eb57369 avfilter/tinterlace: add Support for ff_lowpass_line_avx() & ff_lowpass_line_sse2()
Based-on: 2e1704059ae8625beda2ffde847ad22c5ba416dc by Kieran Kunhya

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-15 04:02:33 +01:00
Michael Niedermayer
6f373d75e8 Merge commit '2e1704059ae8625beda2ffde847ad22c5ba416dc'
* commit '2e1704059ae8625beda2ffde847ad22c5ba416dc':
  vf_interlace: Add SIMD for lowpass filter

Conflicts:
	libavfilter/vf_interlace.c
	libavfilter/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-15 02:39:49 +01:00
Kieran Kunhya
2e1704059a vf_interlace: Add SIMD for lowpass filter
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2014-11-15 00:35:31 +01:00
James Almer
864f9326fb x86/vf_noise: move asm code to a separate file
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-17 00:44:35 -03:00
Pascal Massimino
649b7a9946 av_filter/x86/idet: use HADDD where appropriate
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-09 19:02:49 -03:00
Pascal Massimino
e3fd6a3a4e av_filter/x86/idet: MMX/SSE2 implementation of 16bits filter_line()
tested on http://ps-auxw.de/10bit-h264-sample/10bit-eldorado.mkv
MMX: ~30% faster decoding overall
SSE2:~40% faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-09 16:47:22 +02:00