FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-24 13:56:33 +02:00

Author	SHA1	Message	Date
Michael Niedermayer	24af459d1e	avcodec/x86/diracdsp: Fix high bits on Windows x86_64 Found-by: james	2020-01-31 00:04:22 +01:00
Michael Niedermayer	0694b60b7b	avcodec/x86/diracdsp: Fix incorrect src addressing in dequant_subband_32() Fixes: Segfault (not reproducable with asm, which made this hard to debug) Fixes: decoding errors Fixes: 19854/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DIRAC_fuzzer-5729372837511168 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-01-30 18:47:21 +01:00
Peter Ross	fd17218558	vp4: prevent unaligned memory access in loop filter VP4 applies a loop filter during motion compensation, causing the block offset will often by unaligned. This produces a bus error on some platforms, namely ARMv7 NEON. This patch adds a unaligned version of the loop filter function pointer to VP3DSPContext. Reported-by: Mike Melanson <mike@multimedia.cx> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-10-30 10:06:38 +01:00
James Almer	1faedb9a11	x85/opusdsp: enable the functions on all FMA3 CPUs It's not using ymm registers, so limiting it to CPUs with fast AVX is not necessary. Signed-off-by: James Almer <jamrial@gmail.com>	2019-09-11 20:50:45 -03:00
James Almer	80444e23ac	x86/opusdps: clear the high bits from some gprs Fixes checkasm on systems like win64. Reviewed-by: Lynne Signed-off-by: James Almer <jamrial@gmail.com>	2019-09-11 20:42:31 -03:00
James Almer	58d167bcd5	avcodec/Makefile: add missing pngdsp dependency to the lscr decoder Signed-off-by: James Almer <jamrial@gmail.com>	2019-05-14 16:47:56 -03:00
James Almer	b41d8ab2e6	x86/v210dec: use named registers Signed-off-by: James Almer <jamrial@gmail.com>	2019-05-03 01:20:18 -03:00
James Almer	abf1aa87ab	x86/v210dec: don't reserve more xmm regs than needed Prevents pointless register saving on win64 for the sse3 and avx versions of the function. Signed-off-by: James Almer <jamrial@gmail.com>	2019-05-03 01:09:50 -03:00
James Almer	b0e29357ba	x86/v210dec: remove duplicate load instruction Signed-off-by: James Almer <jamrial@gmail.com>	2019-05-03 01:08:34 -03:00
James Darnley	46f1718cd9	avcodec/x86/v210: fix operands of vpblendd used in new avx2 code Assembly failed when using yasm rather than nasm.	2019-05-02 21:20:54 +02:00
Michael Stoner	ebd6fb23c5	libavcodec Adding ff_v210_planar_unpack AVX2 Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck AVX2 is 1.4x faster than AVX	2019-05-02 19:21:37 +02:00
Lynne	4b7166c9d5	x86/opusdsp: replace loads with shuffles Has a slight speedup. Can't be carried over to aarch64, since it has no shufps-like instruction. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2019-04-26 20:39:38 -03:00
Lynne	b43b8d337d	x86/opusdsp: fix WIN64 return value Signed-off-by: James Almer <jamrial@gmail.com>	2019-04-01 11:06:34 -03:00
Lynne	605e330310	x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis 58893 decicycles in deemphasis_c, 130548 runs, 524 skips 9475 decicycles in deemphasis_fma3, 130686 runs, 386 skips -> 6.21x speedup 24866 decicycles in postfilter_c, 65386 runs, 150 skips 5268 decicycles in postfilter_fma3, 65505 runs, 31 skips -> 4.72x speedup Total decoder speedup: ~14% Deemphasis SIMD based on the following unrolling: const float c1 = CELT_EMPH_COEFF, c2 = c1c1, c3 = c2c1, c4 = c3c1; float state = coeff; for (int i = 0; i < len; i += 4) { y[0] = x[0] + c1state; y[1] = x[1] + c2state + c1x[0]; y[2] = x[2] + c3state + c1x[1] + c2x[0]; y[3] = x[3] + c4state + c1x[2] + c2x[1] + c3*x[0]; state = y[3]; y += 4; x += 4; }	2019-04-01 00:22:00 +02:00
Lynne	5468c1d075	celt_pvq_init: only build when CONFIG_OPUS_ENCODER is enabled The entire function was defined away before.	2019-03-31 23:36:43 +02:00
Lynne	4a2c651620	x86/opus_dsp: rename to celt_pvq Its only used in the encoder and in CELT's PVQ.	2019-03-31 23:35:00 +02:00
James Almer	d5d699ab6e	avcodec/h264dsp: change loop filter stride argument to ptrdiff_t	2019-02-20 15:27:43 -03:00
Martin Vignali	9a22e6fa1d	avcodec/proresdsp indent after prev commit	2018-12-02 12:55:35 +01:00
Martin Vignali	c097a32e93	avcodec/proresdec : rename dsp part for 10b and check dspinit for supported bits per raw sample based on patch by Kieran Kunhya	2018-12-02 12:55:31 +01:00
Rostislav Pehlivanov	29eb1c51d7	mdct15: simplify x86 exptab permutation Removes an unneeded copy and does the 5-point permute in-place. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2018-05-07 23:44:40 +01:00
Rostislav Pehlivanov	a72d0fb973	mdct15: simplify the fft15 x86 SIMD Saves 1 gpr and 2 instructions and simplifies the macros a bit. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2018-05-07 23:27:41 +01:00
Kieran Kunhya	f9d3841ae6	mpeg4video: Add support for MPEG-4 Simple Studio Profile. This is a profile supporting > 8-bit video and has a higher quality DCT	2018-04-02 13:06:23 +01:00
Aurelien Jacobs	f1e490b1ad	sbcenc: add MMX optimizations This was originally based on libsbc, and was fully integrated into ffmpeg. Rough speed test: C version: speed= 592x MMX version: speed= 785x	2018-03-07 22:26:53 +01:00
Rostislav Pehlivanov	50945482a7	h264_idct: enable unmacro on newer NASM versions Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2018-02-12 10:50:37 +00:00
Martin Vignali	8f9c38b196	avcodec/utvideoenc : add SIMD (avx) for sub_left_prediction asm code by Henrik Gramner	2018-01-28 20:23:11 +01:00
James Almer	6e80079a28	avcodec: increase AV_INPUT_BUFFER_PADDING_SIZE to 64 AVX-512 support has been introduced, and even if no functions currently use zmm registers (able to load as much as 64 bytes of consecutive data per instruction), they will be added eventually. Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2018-01-11 23:46:31 -03:00
James Almer	438f884fc4	x86/lossless_videodsp: rename ff_add_left_pred_int16_sse4 to ff_add_left_pred_int16_unaligned_ssse3 SSSE3_FAST is the proper check for it. Signed-off-by: James Almer <jamrial@gmail.com>	2017-12-10 00:51:01 -03:00
James Almer	a4fc63c0f9	x86/lossless_videodsp: don't overread the dst buffer in ff_add_left_pred_unaligned_avx2 Fixes valgrind Signed-off-by: James Almer <jamrial@gmail.com>	2017-12-10 00:38:05 -03:00
Martin Vignali	630967ef63	avcodec/utvideodec : add SIMD (SSSE3 and AVX2) for gradient_pred	2017-12-09 15:19:03 +01:00
Martin Vignali	4353c35067	avcodec/x86/lossless_videodsp : add avx2 version for add_left_pred	2017-12-09 15:16:03 +01:00
Martin Vignali	cfbcea1cca	avcodec/x86/lossless_videodsp.asm : make macro for add_left_pred_unaligned in order to add avx2 version	2017-12-09 15:15:59 +01:00
Martin Vignali	be6d1f9632	avcodec/x86/bswapdsp : use macro for 128 bits constants loading in xmm or ymm	2017-12-02 18:25:25 +01:00
Mikulas Patocka	fbdd78fa3e	avcodec/fft: fix INTERL macro on 3dnow The commit b7c16a3f2c4921f613319938b8ee0e3d6fa83e8d ("x86: fft: Port to cpuflags") breaks the opus decoder in ffmpeg when compiling for 3dnow. The output is audible, but there's a lot of noise. The reason for the breakage is that the commit unintentionally changed the INTERL macro so that it is empty when compiling for 3dnow. This patch fixes it. Signed-off-by: Mikulas Patocka <mikulas@twibright.com> Signed-off-by: James Almer <jamrial@gmail.com>	2017-11-25 13:11:45 -03:00
Martin Vignali	515555af6c	avcodec/x86/exrdsp : use ymm constant for pb_80 speed seems to be similar, but simplify code	2017-11-23 20:00:13 +01:00
James Almer	beb63baa69	x86/utvideodsp: reuse shared constants Remove the broadcast instructions as well now that they are wide enough. Signed-off-by: James Almer <jamrial@gmail.com>	2017-11-21 10:57:14 -03:00
James Almer	ebf352116b	x86/constants: make pb_80 32 byte wide Signed-off-by: James Almer <jamrial@gmail.com>	2017-11-21 10:57:03 -03:00
Martin Vignali	ba98f8463f	avcodec/huffyuvdspenc : add diff_int16 AVX2 func	2017-11-21 09:42:08 +01:00
Martin Vignali	d189a426fa	avcodec/huffyuvdspenc : reorganize diff_int16	2017-11-21 09:42:03 +01:00
Martin Vignali	e641c94190	avcodec/huffyuvdsp : add add_int16 AVX2 func	2017-11-21 09:41:58 +01:00
Martin Vignali	6955e8842e	avcodec/huffyuvdsp : reorganize add_int16 asm	2017-11-21 09:41:52 +01:00
Martin Vignali	7f9b67bcb6	avcodec/huffyuvdsp(enc) : move duplicate macro to a template file	2017-11-21 09:41:46 +01:00
Martin Vignali	caf51a573d	avcodec/x86/utvideodsp.asm : cosmetic better func separator and add comment for the restore rgb planes10 declaration	2017-11-21 09:00:47 +01:00
Martin Vignali	b5ebe38443	avcodec/utvideodsp : add avx2 version for the dsp	2017-11-21 09:00:42 +01:00
Martin Vignali	48b7c45b0c	avcodec/x86/utvideodsp : make macro for func	2017-11-21 09:00:38 +01:00
James Almer	aea0f06db7	x86/jpeg2000dsp: add ff_ict_float_{fma3,fma4} jpeg2000_ict_float_c: 2296.0 jpeg2000_ict_float_sse: 628.0 jpeg2000_ict_float_avx: 317.0 jpeg2000_ict_float_fma3: 262.0 Signed-off-by: James Almer <jamrial@gmail.com>	2017-11-20 18:33:58 -03:00
Michael Niedermayer	58cf31cee7	avcodec/x86/mpegvideodsp: Fix signedness bug in need_emu Fixes: out of array read Fixes: 3516/attachment-311488.dat Found-by: Insu Yun, Georgia Tech. Tested-by: wuninsu@gmail.com Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-11-14 04:54:31 +01:00
Thomas Köppe	43171a2a73	Fix missing used attribute for inline assembly variables Variables used in inline assembly need to be marked with attribute((used)). Static constants already were, via the define of DECLARE_ASM_CONST. But DECLARE_ALIGNED does not add this attribute, and some of the variables defined with it are const only used in inline assembly, and therefore appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks variables as used. This change makes FFMPEG work with Clang's ThinLTO. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-11-13 03:58:34 +01:00
Martin Vignali	0380b72d35	libavcodec/lossless_video_dsp : cosmetic add better separator for each function, in order to make reading of the asm file easier	2017-11-07 00:56:54 +01:00
Martin Vignali	da62128ea1	libavcodec/lossless_videodsp : add add_bytes avx2 version	2017-11-07 00:56:02 +01:00
James Almer	783535a4cd	x86/bswapdsp: add missing preprocessor wrappers for AVX2 functions Fixes build with old nasm/yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2017-10-29 22:21:51 -03:00

1 2 3 4 5 ...

2447 Commits