Speed of all modes increased by a factor between 7.4 and 19.8 largely depending
on whether bytes are unpacked into words. Modes 2, 3, and 4 have been sped-up
by a factor of 43 (thanks quick sort!)
All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20,
21, and 22 are available on x86 due to the number of SIMD registers used.
With a contribution from James Almer <jamrial@gmail.com>
The internal line accumulator for 16bit can overflow, so I changed that
from int to uint64_t in the C code. The matching assembly looks a little
weird but output looks correct.
(avx2 should be trivial to add later.)
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Both are 2-2.5x faster than their C counterpart.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The only difference with mp=pp7 is that default mode is "medium", as stated
in the MPlayer docs, rather than "hard".
Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
paddq is an SSE2 instruction so it cannot be used for MMX.
This was probably just a typo because the sums are dwords anyway.
Reviewed-by: Pascal Massimino <pascal.massimino@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
integration by Neil Birkbeck, with help from Vitor Sessak.
core SSE2 loop by Skal (pascal.massimino@gmail.com)
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
And use the x86util ones instead, which are optimized for mmxext/sse2.
About ~1% increase in performance on pre SSSE3 processors.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This makes C and MMX match, no change to fate as the differences where
apparently not sufficient to show up in fate
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '01c5779f56cf708e6cb88b11cfdc248cae7e2ee8':
x86: Drop some unnecessary YASM ifdefs
Conflicts:
libavfilter/x86/vf_yadif_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This reverts commit a87b17f328.
This reduces the amount of non LGPL code, making a relicensing to LGPL
easier
Conflicts:
libavfilter/vf_yadif.c
libavfilter/x86/yadif.c
libavfilter/x86/yadif_template.c
libavfilter/yadif.h
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This reverts commit fc5fe4804f, reversing
changes made to ffe3350098.
The factoring is broken; it's not calling the ssse3 code anymore, and
calling the mmx2 code with bad alignment. It also broke some FATE
instances.
Conflicts:
libavfilter/x86/vf_gradfun_init.c
* commit 'ed1a11ed52bbd1f15bb9b0416d69b7924bee3191':
gradfun: x86: Factor out common code for some gradfun_filter_line() variants
Conflicts:
libavfilter/x86/vf_gradfun_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>