1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-24 13:56:33 +02:00

2447 Commits

Author SHA1 Message Date
Michael Niedermayer
24af459d1e avcodec/x86/diracdsp: Fix high bits on Windows x86_64
Found-by: james
2020-01-31 00:04:22 +01:00
Michael Niedermayer
0694b60b7b avcodec/x86/diracdsp: Fix incorrect src addressing in dequant_subband_32()
Fixes: Segfault (not reproducable with asm, which made this hard to debug)
Fixes: decoding errors
Fixes: 19854/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DIRAC_fuzzer-5729372837511168

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-30 18:47:21 +01:00
Peter Ross
fd17218558 vp4: prevent unaligned memory access in loop filter
VP4 applies a loop filter during motion compensation, causing the block offset
will often by unaligned. This produces a bus error on some platforms, namely
ARMv7 NEON.

This patch adds a unaligned version of the loop filter function pointer
to VP3DSPContext.

Reported-by: Mike Melanson <mike@multimedia.cx>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-30 10:06:38 +01:00
James Almer
1faedb9a11 x85/opusdsp: enable the functions on all FMA3 CPUs
It's not using ymm registers, so limiting it to CPUs with fast AVX
is not necessary.

Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-11 20:50:45 -03:00
James Almer
80444e23ac x86/opusdps: clear the high bits from some gprs
Fixes checkasm on systems like win64.

Reviewed-by: Lynne
Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-11 20:42:31 -03:00
James Almer
58d167bcd5 avcodec/Makefile: add missing pngdsp dependency to the lscr decoder
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-14 16:47:56 -03:00
James Almer
b41d8ab2e6 x86/v210dec: use named registers
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-03 01:20:18 -03:00
James Almer
abf1aa87ab x86/v210dec: don't reserve more xmm regs than needed
Prevents pointless register saving on win64 for the sse3 and avx
versions of the function.

Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-03 01:09:50 -03:00
James Almer
b0e29357ba x86/v210dec: remove duplicate load instruction
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-03 01:08:34 -03:00
James Darnley
46f1718cd9 avcodec/x86/v210: fix operands of vpblendd used in new avx2 code
Assembly failed when using yasm rather than nasm.
2019-05-02 21:20:54 +02:00
Michael Stoner
ebd6fb23c5 libavcodec Adding ff_v210_planar_unpack AVX2
Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck
AVX2 is 1.4x faster than AVX
2019-05-02 19:21:37 +02:00
Lynne
4b7166c9d5 x86/opusdsp: replace loads with shuffles
Has a slight speedup.
Can't be carried over to aarch64, since it has no shufps-like instruction.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-04-26 20:39:38 -03:00
Lynne
b43b8d337d x86/opusdsp: fix WIN64 return value
Signed-off-by: James Almer <jamrial@gmail.com>
2019-04-01 11:06:34 -03:00
Lynne
605e330310 x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis
58893 decicycles in deemphasis_c,  130548 runs,    524 skips
9475 decicycles in deemphasis_fma3,  130686 runs,    386 skips -> 6.21x speedup

24866 decicycles in postfilter_c,   65386 runs,    150 skips
5268 decicycles in postfilter_fma3,   65505 runs,     31 skips -> 4.72x speedup

Total decoder speedup: ~14%

Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;

for (int i = 0; i < len; i += 4) {
    y[0] = x[0] + c1*state;
    y[1] = x[1] + c2*state + c1*x[0];
    y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
    y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];

    state = y[3];
    y += 4;
    x += 4;
}
2019-04-01 00:22:00 +02:00
Lynne
5468c1d075 celt_pvq_init: only build when CONFIG_OPUS_ENCODER is enabled
The entire function was defined away before.
2019-03-31 23:36:43 +02:00
Lynne
4a2c651620 x86/opus_dsp: rename to celt_pvq
Its only used in the encoder and in CELT's PVQ.
2019-03-31 23:35:00 +02:00
James Almer
d5d699ab6e avcodec/h264dsp: change loop filter stride argument to ptrdiff_t 2019-02-20 15:27:43 -03:00
Martin Vignali
9a22e6fa1d avcodec/proresdsp indent after prev commit 2018-12-02 12:55:35 +01:00
Martin Vignali
c097a32e93 avcodec/proresdec : rename dsp part for 10b and check dspinit for supported bits per raw sample
based on patch by Kieran Kunhya
2018-12-02 12:55:31 +01:00
Rostislav Pehlivanov
29eb1c51d7 mdct15: simplify x86 exptab permutation
Removes an unneeded copy and does the 5-point permute in-place.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-05-07 23:44:40 +01:00
Rostislav Pehlivanov
a72d0fb973 mdct15: simplify the fft15 x86 SIMD
Saves 1 gpr and 2 instructions and simplifies the macros a bit.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-05-07 23:27:41 +01:00
Kieran Kunhya
f9d3841ae6 mpeg4video: Add support for MPEG-4 Simple Studio Profile.
This is a profile supporting > 8-bit video and has a higher quality DCT
2018-04-02 13:06:23 +01:00
Aurelien Jacobs
f1e490b1ad sbcenc: add MMX optimizations
This was originally based on libsbc, and was fully integrated into ffmpeg.

Rough speed test:
C version:    speed= 592x
MMX version:  speed= 785x
2018-03-07 22:26:53 +01:00
Rostislav Pehlivanov
50945482a7 h264_idct: enable unmacro on newer NASM versions
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-02-12 10:50:37 +00:00
Martin Vignali
8f9c38b196 avcodec/utvideoenc : add SIMD (avx) for sub_left_prediction
asm code by Henrik Gramner
2018-01-28 20:23:11 +01:00
James Almer
6e80079a28 avcodec: increase AV_INPUT_BUFFER_PADDING_SIZE to 64
AVX-512 support has been introduced, and even if no functions currently
use zmm registers (able to load as much as 64 bytes of consecutive data
per instruction), they will be added eventually.

Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2018-01-11 23:46:31 -03:00
James Almer
438f884fc4 x86/lossless_videodsp: rename ff_add_left_pred_int16_sse4 to ff_add_left_pred_int16_unaligned_ssse3
SSSE3_FAST is the proper check for it.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-10 00:51:01 -03:00
James Almer
a4fc63c0f9 x86/lossless_videodsp: don't overread the dst buffer in ff_add_left_pred_unaligned_avx2
Fixes valgrind

Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-10 00:38:05 -03:00
Martin Vignali
630967ef63 avcodec/utvideodec : add SIMD (SSSE3 and AVX2) for gradient_pred 2017-12-09 15:19:03 +01:00
Martin Vignali
4353c35067 avcodec/x86/lossless_videodsp : add avx2 version for add_left_pred 2017-12-09 15:16:03 +01:00
Martin Vignali
cfbcea1cca avcodec/x86/lossless_videodsp.asm : make macro for add_left_pred_unaligned in order to add avx2 version 2017-12-09 15:15:59 +01:00
Martin Vignali
be6d1f9632 avcodec/x86/bswapdsp : use macro for 128 bits constants loading in xmm or ymm 2017-12-02 18:25:25 +01:00
Mikulas Patocka
fbdd78fa3e avcodec/fft: fix INTERL macro on 3dnow
The commit b7c16a3f2c4921f613319938b8ee0e3d6fa83e8d ("x86: fft: Port to
cpuflags") breaks the opus decoder in ffmpeg when compiling for 3dnow. The
output is audible, but there's a lot of noise.

The reason for the breakage is that the commit unintentionally changed the
INTERL macro so that it is empty when compiling for 3dnow. This patch
fixes it.

Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-25 13:11:45 -03:00
Martin Vignali
515555af6c avcodec/x86/exrdsp : use ymm constant for pb_80
speed seems to be similar, but simplify code
2017-11-23 20:00:13 +01:00
James Almer
beb63baa69 x86/utvideodsp: reuse shared constants
Remove the broadcast instructions as well now that they are wide
enough.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-21 10:57:14 -03:00
James Almer
ebf352116b x86/constants: make pb_80 32 byte wide
Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-21 10:57:03 -03:00
Martin Vignali
ba98f8463f avcodec/huffyuvdspenc : add diff_int16 AVX2 func 2017-11-21 09:42:08 +01:00
Martin Vignali
d189a426fa avcodec/huffyuvdspenc : reorganize diff_int16 2017-11-21 09:42:03 +01:00
Martin Vignali
e641c94190 avcodec/huffyuvdsp : add add_int16 AVX2 func 2017-11-21 09:41:58 +01:00
Martin Vignali
6955e8842e avcodec/huffyuvdsp : reorganize add_int16 asm 2017-11-21 09:41:52 +01:00
Martin Vignali
7f9b67bcb6 avcodec/huffyuvdsp(enc) : move duplicate macro to a template file 2017-11-21 09:41:46 +01:00
Martin Vignali
caf51a573d avcodec/x86/utvideodsp.asm : cosmetic
better func separator
and add comment for the restore rgb planes10 declaration
2017-11-21 09:00:47 +01:00
Martin Vignali
b5ebe38443 avcodec/utvideodsp : add avx2 version for the dsp 2017-11-21 09:00:42 +01:00
Martin Vignali
48b7c45b0c avcodec/x86/utvideodsp : make macro for func 2017-11-21 09:00:38 +01:00
James Almer
aea0f06db7 x86/jpeg2000dsp: add ff_ict_float_{fma3,fma4}
jpeg2000_ict_float_c: 2296.0
jpeg2000_ict_float_sse: 628.0
jpeg2000_ict_float_avx: 317.0
jpeg2000_ict_float_fma3: 262.0

Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-20 18:33:58 -03:00
Michael Niedermayer
58cf31cee7 avcodec/x86/mpegvideodsp: Fix signedness bug in need_emu
Fixes: out of array read
Fixes: 3516/attachment-311488.dat

Found-by: Insu Yun, Georgia Tech.
Tested-by: wuninsu@gmail.com
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-14 04:54:31 +01:00
Thomas Köppe
43171a2a73 Fix missing used attribute for inline assembly variables
Variables used in inline assembly need to be marked with attribute((used)).
Static constants already were, via the define of DECLARE_ASM_CONST.
But DECLARE_ALIGNED does not add this attribute, and some of the variables
defined with it are const only used in inline assembly, and therefore
appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks
variables as used.

This change makes FFMPEG work with Clang's ThinLTO.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-13 03:58:34 +01:00
Martin Vignali
0380b72d35 libavcodec/lossless_video_dsp : cosmetic add better separator for each function, in order to make reading of the asm file easier 2017-11-07 00:56:54 +01:00
Martin Vignali
da62128ea1 libavcodec/lossless_videodsp : add add_bytes avx2 version 2017-11-07 00:56:02 +01:00
James Almer
783535a4cd x86/bswapdsp: add missing preprocessor wrappers for AVX2 functions
Fixes build with old nasm/yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-10-29 22:21:51 -03:00