FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2023-02-01 04:23:55 +01:00
James Darnley	eef763c705	checkasm/v210dec: add extra space to the destination arrays	2022-12-21 00:36:49 +01:00
James Darnley	6af453ca38	avcodec/x86: add avx512icl function for v210dec Ice Lake (Xeon Silver 4316): 2.01x faster (1147±36.8 vs. 571±38.2 decicycles) compared with avx2	2022-12-20 15:02:45 +01:00
James Darnley	cfd1c3c0a1	checkasm/v210enc: test the entire width of 10-bit planar input arrays	2022-12-01 18:19:03 +01:00
bwang30	3ab11dc5bb	libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter sobel_c: 4537 sobel_avx512icl 2136 Signed-off-by: bwang30 <bin.wang@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-11-14 10:04:16 +08:00
Lynne	e0661fc805	dca_core: convert to lavu/tx Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the arm32 and aarch64 changes.	2022-11-06 14:39:36 +01:00
James Darnley	1936c06f02	checkasm: add a verbose check function for uint32_t data	2022-11-04 19:37:46 +01:00
Andreas Rheinhardt	37ee36f689	checkasm/idctdsp: Use declare_func_emms only when needed There is no MMX code for (add\|put\|put_signed)_pixels_clamped since commit `bfb28b5ce8`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	5102b98b7a	checkasm/llviddspenc: Use declare_func_emms only when needed There is no MMX code for diff_bytes since commit `230ea38de1`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	e814569c8d	checkasm/huffyuvdsp: Use declare_func_emms only when needed There is no MMX code for add_int16 since commit `4b6ffc2880`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	cd8a33bcce	checkasm/llviddsp: Be strict about MMX There is no MMX code for llviddsp after commit `fed07efcde`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	b4e2d67636	checkasm/pixblockdsp: Be strict about MMX There is no MMX code for pixblockdsp after commit `92b5800277`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	42921190cb	checkasm/audiodsp: Be strict about MMX There is no MMX code for audiodsp after commit `3d716d38ab`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	18afaa20f1	checkasm/blockdsp: Be strict about MMX There is no MMX code for blockdsp after commit `ee551a21dd`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	f224c195e0	checkasm/vc1dsp: Use declare_func_emms only when needed There is no MMX code for vc1_inv_trans_8x8 or vc1_unescape_buffer, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Rémi Denis-Courmont	c962c78901	checkasm: RISC-V 64-bit assembler test harness	2022-10-10 02:23:18 +02:00
Andreas Rheinhardt	bcfa427c8f	checkasm/vp8dsp: Use declare_func_emms only when needed There is no MMX code for loop filters since commit `6a551f1405`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-08 09:33:36 +02:00
Rémi Denis-Courmont	37d5ddc317	lavu/riscv: CPU flag for the Zbb extension Unfortunately, it is common, and will remain so, that the Bit manipulations are not enabled at compilation time. This is an official policy for Debian ports in general (though they do not support RISC-V officially as of yet) to stick to the minimal target baseline, which does not include the B extension or even its Zbb subset. For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems impractical. But at least it can work for the byte-swap DSP functions.	2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont	0c0a3deb18	lavu/cpu: CPU flags for the RISC-V Vector extension RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Zve64f plus double precision floats. - 6 different vector lengths: - Zvl32b (embedded only), - Zvl64b (embedded only), - Zvl128b, - Zvl256b, - Zvl512b, - Zvl1024b, - and the V extension proper: equivalent to Zve64f and Zvl128b. In total, there are 6 different possible sets of supported instructions (including the empty set), but for convenience we allocate one bit for each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32), 64-bit ints (RVV_I64) and doubles (RVV_F64). Whence the vector size is needed, it can be retrieved by reading the unprivileged read-only vlenb CSR. This should probably be a separate helper macro if needed at a later point.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	b95e2fbd85	lavu/cpu: detect RISC-V base extensions This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But as things stand, checkasm wants them that way. Compare the ARMV8 flag on AArch64. We are nowhere near running short on CPU flag bits.	2022-09-27 13:19:52 +02:00
Lynne	ace42cf581	x86/tx_float: add 15xN PFA FFT AVX SIMD ~4x faster than the C version. The shuffles in the 15pt dim1 are seriously expensive. Not happy with it, but I'm contempt. Can be easily converted to pure AVX by removing all vpermpd/vpermps instructions.	2022-09-23 12:35:27 +02:00
Lynne	668f43af20	tests/checkasm/lpc: correct arithmetic when randomizing buffers Results weren't signed.	2022-09-23 01:50:59 +02:00
Lynne	6ad39f01df	tests/checkasm/lpc: reduce range and use signed values This is more similar to its regular use, and prevents inaccuracies of huge float*float multiplications from failing the tests.	2022-09-23 01:42:34 +02:00
James Almer	9cbfffa0d4	tests/checkasm/lpc: print mismatching values Will help debugging. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-22 18:18:52 -03:00
James Almer	a1c6f4b653	tests/checkasm/lpc: randomize buffer length Simplifies the test, while trying more values and preventing pointlessly running benchmarks in a loop. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-22 18:17:26 -03:00
James Almer	c8c4a162fc	avcodec/lpc: use ptrdiff_t for length parameters Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-22 18:17:26 -03:00
Lynne	b67776e12f	x86/lpc: fix even scalar loop overreads/writes Passes checkasm with valgrind, tested to sizes of more than 4000 samples.	2022-09-22 04:27:19 +02:00
Andreas Rheinhardt	9beba05311	avcodec/fmtconvert: Remove unused AVCodecContext parameter Unused since `d74a8cb7e4`. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-21 20:26:40 +02:00
Andreas Rheinhardt	fd72d8aea3	avcodec/blockdsp: Remove unused AVCodecContext parameter Possible since `be95df12bb`. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-21 20:24:40 +02:00
Lynne	3ade6a8644	x86/lpc: implement a new Welch windowing function Old one was written with the assumption only even inputs would be given. This very messy replacement supports even and odd inputs, and supports AVX2 for extra speed. The buffers given are usually quite big (4k samples), so the speedup is worth it. The new SSE version is still faster than the old inline asm version by 33%. Also checkasm is provided to make sure this monstrosity works. This fixes some FATE tests.	2022-09-21 07:12:39 +02:00
James Almer	8f119b501e	tests/checkasm: add a test for VorbisDSPContext Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-19 21:28:23 -03:00
Lynne	9a9647af33	checkasm/tx: add checkasm support for the iMDCT	2022-09-06 04:21:49 +02:00
Martin Storsjö	f921c58335	checkasm: sw_scale: Produce more realistic test filter coefficients for yuv2yuvX This avoids triggering overflows in the filters, and avoids stray test failures in the approximate functions on x86; due to rounding differences, one implementation might overflow while another one doesn't. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-19 22:54:51 +03:00
Alan Kelly	da0a37bab7	checkasm/sw_scale: hscale does not requires cpuflag test. This is done in ff_shuffle_filter_coefficients. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-08-18 16:24:48 +02:00
Alan Kelly	a38293e444	libswscale: Enable hscale_avx2 for all input sizes. ff_shuffle_filter_coefficients shuffles the tail as required. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-08-18 16:24:48 +02:00
Martin Storsjö	d69d12a5b9	checkasm: motion: Test different h parameters Previously, the checkasm test always passed h=8, so no other cases were tested. Out of the me_cmp functions, in practice, some functions are hardcoded to always assume a 8x8 block (ignoring the h parameter), while others do use the parameter. For those with hardcoded height, both the reference C function and the assembly implementations ignore the parameter similarly. The documentation for the functions indicate that heights between w/2 and 2*w, within the range of 4 to 16, should be supported. This patch just tests random heights in that range, without knowing what width the current function actually uses. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-17 00:00:50 +03:00
Martin Storsjö	21c2c57ba5	checkasm: Provide enough alignment in the new yuv2plane1 test This fixes the checkasm test in some setups on x86. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 23:47:16 +03:00
J. Dekker	ea6ecb12aa	checkasm/hevc_add_res: add 12bit test Also fix the bug where in every other byte only the lower 2 bits were used in the 8bit test. Signed-off-by: J. Dekker <jdek@itanimul.li>	2022-08-16 14:00:34 +02:00
Swinney, Jonathan	4dcd191a50	checkasm: updated tests for sw_scale Change the reference to exactly match the C reference in swscale, instead of exactly matching the x86 SIMD implementations (which differs slightly). Test with and without SWS_ACCURATE_RND - if this flag isn't set, the output must match the C reference exactly, otherwise it is allowed to be off by 2. Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND is set - apparently this discrepancy hasn't been noticed in other exact tests before. Add a test for yuv2plane1. Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 13:40:42 +03:00
Martin Storsjö	5cdf4c0bed	checkasm: Silence warnings about unused return value from read() This codepath is enabled by default on arm, if the linux perf API is available, unless disabled with --disable-linux-perf. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-08 23:39:13 +03:00
Andreas Rheinhardt	6c4595190e	avcodec/flacdsp: Split encoder-only parts into a ctx of its own Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 03:28:45 +02:00
Andreas Rheinhardt	3a869cd5cd	avcodec/flacdsp: Remove unused function parameter Forgotten in `e609cfd697`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 03:28:45 +02:00
Martin Storsjö	237730f0e0	checkasm: motion: Make the benchmarks more stable Don't use the last random offset, but a static one. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-07-16 17:25:35 +03:00
Martin Storsjö	900424cda9	checkasm: Provide enough alignment in the new motion test This fixes the checkasm test in some setups on x86. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-06-28 18:09:08 +03:00
Swinney, Jonathan	c471cc7474	lavc/aarch64: motion estimation functions in neon - ff_pix_abs16_neon - ff_pix_abs16_xy2_neon In direct micro benchmarks of these ff functions verses their C implementations, these functions performed as follows on AWS Graviton 3. ff_pix_abs16_neon: pix_abs_0_0_c: 141.1 pix_abs_0_0_neon: 19.6 ff_pix_abs16_xy2_neon: pix_abs_0_3_c: 269.1 pix_abs_0_3_neon: 39.3 Tested with: ./tests/checkasm/checkasm --test=motion --bench --disable-linux-perf Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-06-28 00:51:39 +03:00
Michael Goulet	b7f6a933fa	tests/checkasm/sw_scale: Fix alignment for movdqa SSE3 instruction movdqa in ff_yuv2yuvX_sse3() expects a 16-byte aligned address for a memory address, or else a segfault is generated. The src_pixels buffer below was not aligned to 16 bytes on the stack necessarily, so we got segfaults during fate-checkasm-sw_scale. Therefore 16-byte align all of these local variables, aligning them too much shouldn't hurt.	2022-06-20 11:08:43 +02:00
Swinney, Jonathan	92ea8e03df	checkasm: added additional dstW tests for hscale Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-05-28 01:09:00 +03:00
J. Dekker	cc679054c7	checkasm: improve hevc_sao test The HEVC decoder can call these functions with smaller widths than the functions themselves are designed to operate on so we should only check the relevant output Signed-off-by: J. Dekker <jdek@itanimul.li>	2022-05-25 08:04:58 +02:00
Andreas Rheinhardt	d496bbe105	avcodec/v210enc: Move ff_v210enc_init into a header This removes a dependency of checkasm on lavc/v210_enc.o and also allows to inline ff_v210enc_init() irrespectively of interposing. This dependency pulled basically all of libavcodec into checkasm, in particular all codecs. This also makes checkasm work when using shared Windows builds: On Windows, it needs to be known to the compiler whether a data symbol is external to the library/executable or not; hence the need for av_export_avutil. checkasm needs access to the internals of the libraries it tests and is therefore linked statically to all the libraries. This means that the users of avpriv_cga_font and avpriv_vga16_font in libavcodec (namely ansi.o, bintext.o, tmv.o) end up in the same executable as the symbols, although they have been compiled as if these symbols were external, leading to linker errors. With this commit said files are discarded by the linker, bypassing this problem. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:33:38 +02:00
Andreas Rheinhardt	0c2489fe29	avcodec/v210_dec: Move ff_v210dec_init into a header This removes a dependency of checkasm on lavc/v210_dec.o and also allows to inline ff_v210dec_init() irrespectively of interposing. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:50 +02:00

1 2 3 4 5 ...

391 Commits