Improved version of VBROADCASTSS that works like the avx2 instruction.
Emulation of vpbroadcastd.
Horizontal sum HSUMPS that places the result in all elements.
Emulation of blendvps and pblendvb.
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2
macro name, drop pointless check for MMX support, we always assume MMX is
available in our SIMD code, fix spelling.
Originally committed to x264 in 1637239a by Henrik Gramner who has
agreed to re-license it as LGPL. Original commit message follows.
x86: Avoid some bypass delays and false dependencies
A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
between int and float domains, so try to avoid that if possible.
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.
This further reduces differences with x264's x86inc.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Up to four instructions less depending on function and instruction set.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).
Benchmarks on an Intel Core i5-4200U:
idct8x8_dc
SSE2 MMXEXT C
cycles 22 26 57
idct16x16_dc
AVX2 SSE2 C
cycles 27 32 249
idct32x32_dc
AVX2 SSE2 C
cycles 62 126 1375
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Also port relevant AVX2/XOP optimizations from x264 with permission
to relicense to LGPL from the corresponding authors
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a9.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d':
x86inc: Add cvisible macro for C functions with public prefix
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b':
x86inc: Rename "program_name" to "private_prefix"
configure: Run SHFLAGS through ldflags_filter()
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* commit 'dae1d507af94261bafd3b11549884e5d1eca590e':
x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags
vf_fps: add final flushed frames to the dropped frame count
rv34_parser: Adjust #if for disabling individual parsers
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '094a7405e5d8463d7d167d893e04934ec1a84ecd':
x86: ABSB: port to cpuflags
sdp: Include SRTP crypto params if using the srtp protocol
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'd8c772de53d29afb1bada88afa859fce8489c668':
nutdec: Always return a value from nut_read_timestamp()
configure: Make warnings from -Wreturn-type fatal errors
x86: ABS2: port to cpuflags
vdpau: Remove av_unused attribute from function declaration
h264: fix ff_generate_sliding_window_mmcos() prototype.
Conflicts:
configure
libavformat/nutdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5b4dfbffc258f90a7d2540d21209ac23afcf7cd0':
x86: ABS1: port to cpuflags
v210x: cosmetics, reformat
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9d5c62ba5b586c80af508b5914934b1c439f6652':
lavu/opt: do not filter out the initial sign character except for flags
eval: treat dB as decibels instead of decibytes
float_dsp: add vector_dmul_scalar() to multiply a vector of doubles
Conflicts:
libavutil/eval.c
tests/ref/fate/eval
Merged-by: Michael Niedermayer <michaelni@gmx.at>