Rostislav Pehlivanov
29eb1c51d7
mdct15: simplify x86 exptab permutation
...
Removes an unneeded copy and does the 5-point permute in-place.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-05-07 23:44:40 +01:00
Rostislav Pehlivanov
a72d0fb973
mdct15: simplify the fft15 x86 SIMD
...
Saves 1 gpr and 2 instructions and simplifies the macros a bit.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-05-07 23:27:41 +01:00
Kieran Kunhya
f9d3841ae6
mpeg4video: Add support for MPEG-4 Simple Studio Profile.
...
This is a profile supporting > 8-bit video and has a higher quality DCT
2018-04-02 13:06:23 +01:00
Aurelien Jacobs
f1e490b1ad
sbcenc: add MMX optimizations
...
This was originally based on libsbc, and was fully integrated into ffmpeg.
Rough speed test:
C version: speed= 592x
MMX version: speed= 785x
2018-03-07 22:26:53 +01:00
Rostislav Pehlivanov
50945482a7
h264_idct: enable unmacro on newer NASM versions
...
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-02-12 10:50:37 +00:00
Martin Vignali
8f9c38b196
avcodec/utvideoenc : add SIMD (avx) for sub_left_prediction
...
asm code by Henrik Gramner
2018-01-28 20:23:11 +01:00
James Almer
6e80079a28
avcodec: increase AV_INPUT_BUFFER_PADDING_SIZE to 64
...
AVX-512 support has been introduced, and even if no functions currently
use zmm registers (able to load as much as 64 bytes of consecutive data
per instruction), they will be added eventually.
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2018-01-11 23:46:31 -03:00
James Almer
438f884fc4
x86/lossless_videodsp: rename ff_add_left_pred_int16_sse4 to ff_add_left_pred_int16_unaligned_ssse3
...
SSSE3_FAST is the proper check for it.
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-10 00:51:01 -03:00
James Almer
a4fc63c0f9
x86/lossless_videodsp: don't overread the dst buffer in ff_add_left_pred_unaligned_avx2
...
Fixes valgrind
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-10 00:38:05 -03:00
Martin Vignali
630967ef63
avcodec/utvideodec : add SIMD (SSSE3 and AVX2) for gradient_pred
2017-12-09 15:19:03 +01:00
Martin Vignali
4353c35067
avcodec/x86/lossless_videodsp : add avx2 version for add_left_pred
2017-12-09 15:16:03 +01:00
Martin Vignali
cfbcea1cca
avcodec/x86/lossless_videodsp.asm : make macro for add_left_pred_unaligned in order to add avx2 version
2017-12-09 15:15:59 +01:00
Martin Vignali
be6d1f9632
avcodec/x86/bswapdsp : use macro for 128 bits constants loading in xmm or ymm
2017-12-02 18:25:25 +01:00
Mikulas Patocka
fbdd78fa3e
avcodec/fft: fix INTERL macro on 3dnow
...
The commit b7c16a3f2c
("x86: fft: Port to
cpuflags") breaks the opus decoder in ffmpeg when compiling for 3dnow. The
output is audible, but there's a lot of noise.
The reason for the breakage is that the commit unintentionally changed the
INTERL macro so that it is empty when compiling for 3dnow. This patch
fixes it.
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-25 13:11:45 -03:00
Martin Vignali
515555af6c
avcodec/x86/exrdsp : use ymm constant for pb_80
...
speed seems to be similar, but simplify code
2017-11-23 20:00:13 +01:00
James Almer
beb63baa69
x86/utvideodsp: reuse shared constants
...
Remove the broadcast instructions as well now that they are wide
enough.
Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-21 10:57:14 -03:00
James Almer
ebf352116b
x86/constants: make pb_80 32 byte wide
...
Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-21 10:57:03 -03:00
Martin Vignali
ba98f8463f
avcodec/huffyuvdspenc : add diff_int16 AVX2 func
2017-11-21 09:42:08 +01:00
Martin Vignali
d189a426fa
avcodec/huffyuvdspenc : reorganize diff_int16
2017-11-21 09:42:03 +01:00
Martin Vignali
e641c94190
avcodec/huffyuvdsp : add add_int16 AVX2 func
2017-11-21 09:41:58 +01:00
Martin Vignali
6955e8842e
avcodec/huffyuvdsp : reorganize add_int16 asm
2017-11-21 09:41:52 +01:00
Martin Vignali
7f9b67bcb6
avcodec/huffyuvdsp(enc) : move duplicate macro to a template file
2017-11-21 09:41:46 +01:00
Martin Vignali
caf51a573d
avcodec/x86/utvideodsp.asm : cosmetic
...
better func separator
and add comment for the restore rgb planes10 declaration
2017-11-21 09:00:47 +01:00
Martin Vignali
b5ebe38443
avcodec/utvideodsp : add avx2 version for the dsp
2017-11-21 09:00:42 +01:00
Martin Vignali
48b7c45b0c
avcodec/x86/utvideodsp : make macro for func
2017-11-21 09:00:38 +01:00
James Almer
aea0f06db7
x86/jpeg2000dsp: add ff_ict_float_{fma3,fma4}
...
jpeg2000_ict_float_c: 2296.0
jpeg2000_ict_float_sse: 628.0
jpeg2000_ict_float_avx: 317.0
jpeg2000_ict_float_fma3: 262.0
Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-20 18:33:58 -03:00
Michael Niedermayer
58cf31cee7
avcodec/x86/mpegvideodsp: Fix signedness bug in need_emu
...
Fixes: out of array read
Fixes: 3516/attachment-311488.dat
Found-by: Insu Yun, Georgia Tech.
Tested-by: wuninsu@gmail.com
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-14 04:54:31 +01:00
Thomas Köppe
43171a2a73
Fix missing used attribute for inline assembly variables
...
Variables used in inline assembly need to be marked with attribute((used)).
Static constants already were, via the define of DECLARE_ASM_CONST.
But DECLARE_ALIGNED does not add this attribute, and some of the variables
defined with it are const only used in inline assembly, and therefore
appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks
variables as used.
This change makes FFMPEG work with Clang's ThinLTO.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-13 03:58:34 +01:00
Martin Vignali
0380b72d35
libavcodec/lossless_video_dsp : cosmetic add better separator for each function, in order to make reading of the asm file easier
2017-11-07 00:56:54 +01:00
Martin Vignali
da62128ea1
libavcodec/lossless_videodsp : add add_bytes avx2 version
2017-11-07 00:56:02 +01:00
James Almer
783535a4cd
x86/bswapdsp: add missing preprocessor wrappers for AVX2 functions
...
Fixes build with old nasm/yasm.
Signed-off-by: James Almer <jamrial@gmail.com>
2017-10-29 22:21:51 -03:00
Martin Vignali
e9930883a2
libavcodec/bswapdsp : add AVX2 func for bswap_buf (swap uint32_t)
2017-10-29 15:21:35 +01:00
James Almer
b7c16a3f2c
Merge commit '681a86aba6cb09b98ad716d986182060c7795d20'
...
* commit '681a86aba6cb09b98ad716d986182060c7795d20':
x86: fft: Port to cpuflags
Merged-by: James Almer <jamrial@gmail.com>
2017-10-21 12:45:49 -03:00
James Almer
11f5ffd330
Merge commit 'e9bb77fb1012cba1951a82136df7071f71bce8fb'
...
* commit 'e9bb77fb1012cba1951a82136df7071f71bce8fb':
x86: h264: Simplify DEQUANT macro with cpuflags
Merged-by: James Almer <jamrial@gmail.com>
2017-10-21 12:39:41 -03:00
James Almer
53eea3a569
Merge commit '307eb1a8ee363db1fcf869e427a8deb6d9538881'
...
* commit '307eb1a8ee363db1fcf869e427a8deb6d9538881':
x86: vp8dsp: port FILTER_BILINEAR macro to cpuflags
Merged-by: James Almer <jamrial@gmail.com>
2017-10-21 12:28:39 -03:00
James Almer
2904db9045
Merge commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2'
...
* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2':
x86util: Port all macros to cpuflags
See d5f8a642f6
Merged-by: James Almer <jamrial@gmail.com>
2017-10-21 12:15:57 -03:00
James Almer
b78bb51a7c
Merge commit '6eef263aca281fb582e1fa3d841ac20ef747a252'
...
* commit '6eef263aca281fb582e1fa3d841ac20ef747a252':
x86: Merge align directives into SECTION_RODATA declarations where possible
Merged-by: James Almer <jamrial@gmail.com>
2017-10-12 13:48:35 -03:00
James Almer
18279738f9
x86/blockdsp: use three operand form for an instruction
...
Fixes assembling with old yasm.
2017-10-04 23:51:44 -03:00
Michael Niedermayer
26ea142658
avcodec/x86/lossless_videoencdsp: Fix warning: signed dword value exceeds bounds
...
Add () to regsize define
Suggested-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-05 01:22:44 +02:00
Michael Niedermayer
df62b70de8
avcodec/x86/lossless_videoencdsp: Fix handling of small widths
...
Fixes out of array access
Fixes: crash-huf.avi
Regression since: 6b41b44149
This could also be fixed by adding checks in the C code that calls the dsp
Found-by: Zhibin Hu and 连一汉 <lianyihan@360.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-05 01:22:44 +02:00
Martin Vignali
cbbec68847
libavcodec/blockdsp : add AVX version
...
Also modify the required alignment, to 32 instead of 16
for several codecs
Signed-off-by: James Almer <jamrial@gmail.com>
2017-10-03 19:47:37 -03:00
Martin Vignali
ac5908b13f
libavcodec/exr : add x86 SIMD for predictor
...
Signed-off-by: James Almer <jamrial@gmail.com>
2017-10-01 17:35:30 -03:00
James Almer
0c005fa86f
Merge commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6'
...
* commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6':
asm: Consistently uppercase SECTION markers
Merged-by: James Almer <jamrial@gmail.com>
2017-09-26 18:48:06 -03:00
James Almer
318778de9e
Merge commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3'
...
* commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3':
Mark some arrays that never change as const.
Merged-by: James Almer <jamrial@gmail.com>
2017-09-26 16:02:40 -03:00
Henrik Gramner
18821e3ba1
x86/exrdsp: optimize ff_reorder_pixels_avx2()
...
Tested with "checkasm --test=exrdsp -bench"
Before:
reorder_pixels_c: 5187.8
reorder_pixels_sse2: 377.0
reorder_pixels_avx2: 331.3
After:
reorder_pixels_c: 5181.5
reorder_pixels_sse2: 377.0
reorder_pixels_avx2: 313.8
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-18 23:24:55 -03:00
James Almer
98d7ad085e
avcodec/exrdsp: improve the ExrDSPContext->reorder_pixels prototype
...
Make dst be the first parameter and src const. It's more in line with the rest of the codebase.
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-17 19:01:40 -03:00
Martin Vignali
9b8c1224d7
libavcodec/exr : add X86 SIMD for reorder_pixels
...
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-17 17:53:57 -03:00
Michael Niedermayer
bc488ec28a
avcodec/me_cmp: Fix crashes on ARM due to misalignment
...
Adds a diff_pixels_unaligned()
Fixes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=872503
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-08-21 23:19:18 +02:00
Ivan Kalvachev
43dab86bcd
opus_pvq_search: Restore the proper use of conditional define and simplify the function name suffix handling.
...
Using named define properly documents the code paths.
It also avoids passing additional numbered arguments through
multiple levels of macro templates.
The suffix handling is done by concatenation, like in
other asm functions and avoid having two separate
"cglobal" defines.
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
2017-08-19 22:42:56 +01:00
Rostislav Pehlivanov
3c99523a28
opus_pvq_search: split functions into exactness and only use the exact if its faster
...
This splits the asm function into exact and non-exact version. The exact
version is as fast or faster on newer CPUs (which EXTERNAL_AVX_FAST describes
well) whilst the non-exact version is faster than the exact on older CPUs.
Also fixes yasm compilation which doesn't accept !cpuflags(avx) syntax.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2017-08-18 19:32:55 +01:00