FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00

Author	SHA1	Message	Date
bwang30	3ab11dc5bb	libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter sobel_c: 4537 sobel_avx512icl 2136 Signed-off-by: bwang30 <bin.wang@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-11-14 10:04:16 +08:00
Lynne	e0661fc805	dca_core: convert to lavu/tx Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the arm32 and aarch64 changes.	2022-11-06 14:39:36 +01:00
James Darnley	1936c06f02	checkasm: add a verbose check function for uint32_t data	2022-11-04 19:37:46 +01:00
Andreas Rheinhardt	37ee36f689	checkasm/idctdsp: Use declare_func_emms only when needed There is no MMX code for (add\|put\|put_signed)_pixels_clamped since commit `bfb28b5ce8`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	5102b98b7a	checkasm/llviddspenc: Use declare_func_emms only when needed There is no MMX code for diff_bytes since commit `230ea38de1`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	e814569c8d	checkasm/huffyuvdsp: Use declare_func_emms only when needed There is no MMX code for add_int16 since commit `4b6ffc2880`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	cd8a33bcce	checkasm/llviddsp: Be strict about MMX There is no MMX code for llviddsp after commit `fed07efcde`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	b4e2d67636	checkasm/pixblockdsp: Be strict about MMX There is no MMX code for pixblockdsp after commit `92b5800277`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	42921190cb	checkasm/audiodsp: Be strict about MMX There is no MMX code for audiodsp after commit `3d716d38ab`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	18afaa20f1	checkasm/blockdsp: Be strict about MMX There is no MMX code for blockdsp after commit `ee551a21dd`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Andreas Rheinhardt	f224c195e0	checkasm/vc1dsp: Use declare_func_emms only when needed There is no MMX code for vc1_inv_trans_8x8 or vc1_unescape_buffer, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-11 14:18:54 +02:00
Rémi Denis-Courmont	c962c78901	checkasm: RISC-V 64-bit assembler test harness	2022-10-10 02:23:18 +02:00
Andreas Rheinhardt	bcfa427c8f	checkasm/vp8dsp: Use declare_func_emms only when needed There is no MMX code for loop filters since commit `6a551f1405`, so use declare_func instead of declare_func_emms() to also test that we are not in MMX mode after return. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-08 09:33:36 +02:00
Rémi Denis-Courmont	37d5ddc317	lavu/riscv: CPU flag for the Zbb extension Unfortunately, it is common, and will remain so, that the Bit manipulations are not enabled at compilation time. This is an official policy for Debian ports in general (though they do not support RISC-V officially as of yet) to stick to the minimal target baseline, which does not include the B extension or even its Zbb subset. For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems impractical. But at least it can work for the byte-swap DSP functions.	2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont	0c0a3deb18	lavu/cpu: CPU flags for the RISC-V Vector extension RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Zve64f plus double precision floats. - 6 different vector lengths: - Zvl32b (embedded only), - Zvl64b (embedded only), - Zvl128b, - Zvl256b, - Zvl512b, - Zvl1024b, - and the V extension proper: equivalent to Zve64f and Zvl128b. In total, there are 6 different possible sets of supported instructions (including the empty set), but for convenience we allocate one bit for each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32), 64-bit ints (RVV_I64) and doubles (RVV_F64). Whence the vector size is needed, it can be retrieved by reading the unprivileged read-only vlenb CSR. This should probably be a separate helper macro if needed at a later point.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	b95e2fbd85	lavu/cpu: detect RISC-V base extensions This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But as things stand, checkasm wants them that way. Compare the ARMV8 flag on AArch64. We are nowhere near running short on CPU flag bits.	2022-09-27 13:19:52 +02:00
Lynne	ace42cf581	x86/tx_float: add 15xN PFA FFT AVX SIMD ~4x faster than the C version. The shuffles in the 15pt dim1 are seriously expensive. Not happy with it, but I'm contempt. Can be easily converted to pure AVX by removing all vpermpd/vpermps instructions.	2022-09-23 12:35:27 +02:00
Lynne	668f43af20	tests/checkasm/lpc: correct arithmetic when randomizing buffers Results weren't signed.	2022-09-23 01:50:59 +02:00
Lynne	6ad39f01df	tests/checkasm/lpc: reduce range and use signed values This is more similar to its regular use, and prevents inaccuracies of huge float*float multiplications from failing the tests.	2022-09-23 01:42:34 +02:00
James Almer	9cbfffa0d4	tests/checkasm/lpc: print mismatching values Will help debugging. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-22 18:18:52 -03:00
James Almer	a1c6f4b653	tests/checkasm/lpc: randomize buffer length Simplifies the test, while trying more values and preventing pointlessly running benchmarks in a loop. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-22 18:17:26 -03:00
James Almer	c8c4a162fc	avcodec/lpc: use ptrdiff_t for length parameters Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-22 18:17:26 -03:00
Lynne	b67776e12f	x86/lpc: fix even scalar loop overreads/writes Passes checkasm with valgrind, tested to sizes of more than 4000 samples.	2022-09-22 04:27:19 +02:00
Andreas Rheinhardt	9beba05311	avcodec/fmtconvert: Remove unused AVCodecContext parameter Unused since `d74a8cb7e4`. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-21 20:26:40 +02:00
Andreas Rheinhardt	fd72d8aea3	avcodec/blockdsp: Remove unused AVCodecContext parameter Possible since `be95df12bb`. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-21 20:24:40 +02:00
Lynne	3ade6a8644	x86/lpc: implement a new Welch windowing function Old one was written with the assumption only even inputs would be given. This very messy replacement supports even and odd inputs, and supports AVX2 for extra speed. The buffers given are usually quite big (4k samples), so the speedup is worth it. The new SSE version is still faster than the old inline asm version by 33%. Also checkasm is provided to make sure this monstrosity works. This fixes some FATE tests.	2022-09-21 07:12:39 +02:00
James Almer	8f119b501e	tests/checkasm: add a test for VorbisDSPContext Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-19 21:28:23 -03:00
Lynne	9a9647af33	checkasm/tx: add checkasm support for the iMDCT	2022-09-06 04:21:49 +02:00
Martin Storsjö	f921c58335	checkasm: sw_scale: Produce more realistic test filter coefficients for yuv2yuvX This avoids triggering overflows in the filters, and avoids stray test failures in the approximate functions on x86; due to rounding differences, one implementation might overflow while another one doesn't. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-19 22:54:51 +03:00
Alan Kelly	da0a37bab7	checkasm/sw_scale: hscale does not requires cpuflag test. This is done in ff_shuffle_filter_coefficients. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-08-18 16:24:48 +02:00
Alan Kelly	a38293e444	libswscale: Enable hscale_avx2 for all input sizes. ff_shuffle_filter_coefficients shuffles the tail as required. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-08-18 16:24:48 +02:00
Martin Storsjö	d69d12a5b9	checkasm: motion: Test different h parameters Previously, the checkasm test always passed h=8, so no other cases were tested. Out of the me_cmp functions, in practice, some functions are hardcoded to always assume a 8x8 block (ignoring the h parameter), while others do use the parameter. For those with hardcoded height, both the reference C function and the assembly implementations ignore the parameter similarly. The documentation for the functions indicate that heights between w/2 and 2*w, within the range of 4 to 16, should be supported. This patch just tests random heights in that range, without knowing what width the current function actually uses. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-17 00:00:50 +03:00
Martin Storsjö	21c2c57ba5	checkasm: Provide enough alignment in the new yuv2plane1 test This fixes the checkasm test in some setups on x86. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 23:47:16 +03:00
J. Dekker	ea6ecb12aa	checkasm/hevc_add_res: add 12bit test Also fix the bug where in every other byte only the lower 2 bits were used in the 8bit test. Signed-off-by: J. Dekker <jdek@itanimul.li>	2022-08-16 14:00:34 +02:00
Swinney, Jonathan	4dcd191a50	checkasm: updated tests for sw_scale Change the reference to exactly match the C reference in swscale, instead of exactly matching the x86 SIMD implementations (which differs slightly). Test with and without SWS_ACCURATE_RND - if this flag isn't set, the output must match the C reference exactly, otherwise it is allowed to be off by 2. Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND is set - apparently this discrepancy hasn't been noticed in other exact tests before. Add a test for yuv2plane1. Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 13:40:42 +03:00
Martin Storsjö	5cdf4c0bed	checkasm: Silence warnings about unused return value from read() This codepath is enabled by default on arm, if the linux perf API is available, unless disabled with --disable-linux-perf. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-08 23:39:13 +03:00
Andreas Rheinhardt	6c4595190e	avcodec/flacdsp: Split encoder-only parts into a ctx of its own Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 03:28:45 +02:00
Andreas Rheinhardt	3a869cd5cd	avcodec/flacdsp: Remove unused function parameter Forgotten in `e609cfd697`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 03:28:45 +02:00
Martin Storsjö	237730f0e0	checkasm: motion: Make the benchmarks more stable Don't use the last random offset, but a static one. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-07-16 17:25:35 +03:00
Martin Storsjö	900424cda9	checkasm: Provide enough alignment in the new motion test This fixes the checkasm test in some setups on x86. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-06-28 18:09:08 +03:00
Swinney, Jonathan	c471cc7474	lavc/aarch64: motion estimation functions in neon - ff_pix_abs16_neon - ff_pix_abs16_xy2_neon In direct micro benchmarks of these ff functions verses their C implementations, these functions performed as follows on AWS Graviton 3. ff_pix_abs16_neon: pix_abs_0_0_c: 141.1 pix_abs_0_0_neon: 19.6 ff_pix_abs16_xy2_neon: pix_abs_0_3_c: 269.1 pix_abs_0_3_neon: 39.3 Tested with: ./tests/checkasm/checkasm --test=motion --bench --disable-linux-perf Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-06-28 00:51:39 +03:00
Michael Goulet	b7f6a933fa	tests/checkasm/sw_scale: Fix alignment for movdqa SSE3 instruction movdqa in ff_yuv2yuvX_sse3() expects a 16-byte aligned address for a memory address, or else a segfault is generated. The src_pixels buffer below was not aligned to 16 bytes on the stack necessarily, so we got segfaults during fate-checkasm-sw_scale. Therefore 16-byte align all of these local variables, aligning them too much shouldn't hurt.	2022-06-20 11:08:43 +02:00
Swinney, Jonathan	92ea8e03df	checkasm: added additional dstW tests for hscale Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-05-28 01:09:00 +03:00
J. Dekker	cc679054c7	checkasm: improve hevc_sao test The HEVC decoder can call these functions with smaller widths than the functions themselves are designed to operate on so we should only check the relevant output Signed-off-by: J. Dekker <jdek@itanimul.li>	2022-05-25 08:04:58 +02:00
Andreas Rheinhardt	d496bbe105	avcodec/v210enc: Move ff_v210enc_init into a header This removes a dependency of checkasm on lavc/v210_enc.o and also allows to inline ff_v210enc_init() irrespectively of interposing. This dependency pulled basically all of libavcodec into checkasm, in particular all codecs. This also makes checkasm work when using shared Windows builds: On Windows, it needs to be known to the compiler whether a data symbol is external to the library/executable or not; hence the need for av_export_avutil. checkasm needs access to the internals of the libraries it tests and is therefore linked statically to all the libraries. This means that the users of avpriv_cga_font and avpriv_vga16_font in libavcodec (namely ansi.o, bintext.o, tmv.o) end up in the same executable as the symbols, although they have been compiled as if these symbols were external, leading to linker errors. With this commit said files are discarded by the linker, bypassing this problem. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:33:38 +02:00
Andreas Rheinhardt	0c2489fe29	avcodec/v210_dec: Move ff_v210dec_init into a header This removes a dependency of checkasm on lavc/v210_dec.o and also allows to inline ff_v210dec_init() irrespectively of interposing. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:50 +02:00
Andreas Rheinhardt	11e37539ee	avfilter/vf_threshold: Move ff_threshold_init into a header This removes a dependency of checkasm on lavfi/vf_threshold.o and also allows to inline ff_threshold_init() irrespectively of interposing. With this patch checkasm no longer pulls all of lavfi and lavf in. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:50 +02:00
Andreas Rheinhardt	c499f9bc38	avfilter/vf_nlmeans: Move ff_nlmeans_init into a header This removes a dependency of checkasm on lavfi/vf_nlmeans.o and also allows to inline ff_nlmeans_init() irrespectively of interposing. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:50 +02:00
Andreas Rheinhardt	fbe4e825d8	avfilter/vf_hflip: Move ff_hflip_init into a header This removes a dependency of checkasm on lavfi/vf_hflip.o and also allows to inline ff_hflip_init() irrespectively of interposing. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:50 +02:00
Andreas Rheinhardt	24936a9fbb	avfilter/vf_gblur: Move ff_gblur_init into a header This removes a dependency of checkasm on lavfi/vf_gblur.o and also allows to inline ff_gblur_init() irrespectively of interposing. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:49 +02:00
Andreas Rheinhardt	364fab1fdc	avfilter/vf_blend: Move ff_blend_init into a header This removes a dependency of checkasm on lavfi/vf_blend.o and also allows to inline ff_blend_init() irrespectively of interposing. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:49 +02:00
Andreas Rheinhardt	0df18f29ae	avfilter/af_afir: Only keep DSP stuff in header Only the AudioFIRDSPContext and the functions for its initialization are needed outside of lavfi/af_afir.c. Also rename the header to af_afirdsp.h to reflect the change. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:49 +02:00
Ben Avison	2e26847780	avcodec/vc1: Introduce fast path for unescaping bitstream buffer Includes a checkasm test. Signed-off-by: Ben Avison <bavison@riscosopen.org> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-04-01 10:03:33 +03:00
Ben Avison	bd3615a81a	checkasm: Add idctdsp add/put-pixels-clamped tests Signed-off-by: Ben Avison <bavison@riscosopen.org> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-04-01 10:03:33 +03:00
Ben Avison	2698bfdc93	checkasm: Add vc1dsp inverse transform tests This test deliberately doesn't exercise the full range of inputs described in the committee draft VC-1 standard. It says: input coefficients in frequency domain, D, satisfy -2048 <= D < 2047 intermediate coefficients, E, satisfy -4096 <= E < 4095 fully inverse-transformed coefficients, R, satisfy -512 <= R < 511 For one thing, the inequalities look odd. Did they mean them to go the other way round? That would make more sense because the equations generally both add and subtract coefficients multiplied by constants, including powers of 2. Requiring the most-negative values to be valid extends the number of bits to represent the intermediate values just for the sake of that one case! For another thing, the extreme values don't look to occur in real streams - both in my experience and supported by the following comment in the AArch32 decoder: tNhalf is half of the value of tN (as described in vc1_inv_trans_8x8_c). This is done because sometimes files have input that causes tN + tM to overflow. To avoid this overflow, we compute tNhalf, then compute tNhalf + tM (which doesn't overflow), and then we use vhadd to compute (tNhalf + (tNhalf + tM)) >> 1 which does not overflow because it is one instruction. My AArch64 decoder goes further than this. It calculates tNhalf and tM then does an SRA (essentially a fused halve and add) to compute (tN + tM) >> 1 without ever having to hold (tNhalf + tM) in a 16-bit element without overflowing. It only encounters difficulties if either tNhalf or tM overflow in isolation. I haven't had sight of the final standard, so it's possible that these issues were dealt with during finalisation, which could explain the lack of usage of extreme inputs in real streams. Or a preponderance of decoders that only support 16-bit intermediate values in their inverse transforms might have caused encoders to steer clear of such cases. I have effectively followed this approach in the test, and limited the scale of the coefficients sufficient that both the existing AArch32 decoder and my new AArch64 decoder both pass. Signed-off-by: Ben Avison <bavison@riscosopen.org> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-04-01 10:03:33 +03:00
Ben Avison	20cb43ea8b	checkasm: Add vc1dsp in-loop deblocking filter tests Note that the benchmarking results for these functions are highly dependent upon the input data. Therefore, each function is benchmarked twice, corresponding to the best and worst case complexity of the reference C implementation. The performance of a real stream decode will fall somewhere between these two extremes. Signed-off-by: Ben Avison <bavison@riscosopen.org> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-04-01 10:03:33 +03:00
Martin Storsjö	a78f136f3f	configure: Use a separate config_components.h header for $ALL_COMPONENTS This avoids unnecessary rebuilds of most source files if only the list of enabled components has changed, but not the other properties of the build, set in config.h. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-03-16 14:12:49 +02:00
Wu Jianhua	f629ea2e18	avutil/cpu: add AVX512 Icelake flag Signed-off-by: Wu Jianhua <jianhua.wu@intel.com> Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>	2022-03-10 16:45:48 -03:00
Anton Khirnov	d552f2535b	lavc/h264: move some shared code from h264dec to h264_parse	2022-01-26 15:23:30 +01:00
Mark Reid	52f7026164	swscale/x86/input.asm: add x86-optimized planer rgb2yuv functions sse2 only operates on 2 lanes per loop for to_y and to_uv functions, due to the lack of pmulld instruction. Emulating pmulld with 2 pmuludq and shuffles proved too costly and made to_uv functions slower then the c implementation. For to_y on sse2 only float functions are generated, I was are not able outperform the c implementation on the integer pixel formats. For to_a on see4 only the float functions are generated. sse2 and sse4 generated nearly identical performing code on integer pixel formats, so only sse2/avx2 versions are generated. planar_gbrp_to_y_512_c: 1197.5 planar_gbrp_to_y_512_sse4: 444.5 planar_gbrp_to_y_512_avx2: 287.5 planar_gbrap_to_y_512_c: 1204.5 planar_gbrap_to_y_512_sse4: 447.5 planar_gbrap_to_y_512_avx2: 289.5 planar_gbrp9be_to_y_512_c: 1380.0 planar_gbrp9be_to_y_512_sse4: 543.5 planar_gbrp9be_to_y_512_avx2: 340.0 planar_gbrp9le_to_y_512_c: 1200.5 planar_gbrp9le_to_y_512_sse4: 442.0 planar_gbrp9le_to_y_512_avx2: 282.0 planar_gbrp10be_to_y_512_c: 1378.5 planar_gbrp10be_to_y_512_sse4: 544.0 planar_gbrp10be_to_y_512_avx2: 337.5 planar_gbrp10le_to_y_512_c: 1200.0 planar_gbrp10le_to_y_512_sse4: 448.0 planar_gbrp10le_to_y_512_avx2: 285.5 planar_gbrap10be_to_y_512_c: 1380.0 planar_gbrap10be_to_y_512_sse4: 542.0 planar_gbrap10be_to_y_512_avx2: 340.5 planar_gbrap10le_to_y_512_c: 1199.0 planar_gbrap10le_to_y_512_sse4: 446.0 planar_gbrap10le_to_y_512_avx2: 289.5 planar_gbrp12be_to_y_512_c: 10563.0 planar_gbrp12be_to_y_512_sse4: 542.5 planar_gbrp12be_to_y_512_avx2: 339.0 planar_gbrp12le_to_y_512_c: 1201.0 planar_gbrp12le_to_y_512_sse4: 440.5 planar_gbrp12le_to_y_512_avx2: 286.0 planar_gbrap12be_to_y_512_c: 1701.5 planar_gbrap12be_to_y_512_sse4: 917.0 planar_gbrap12be_to_y_512_avx2: 338.5 planar_gbrap12le_to_y_512_c: 1201.0 planar_gbrap12le_to_y_512_sse4: 444.5 planar_gbrap12le_to_y_512_avx2: 288.0 planar_gbrp14be_to_y_512_c: 1370.5 planar_gbrp14be_to_y_512_sse4: 545.0 planar_gbrp14be_to_y_512_avx2: 338.5 planar_gbrp14le_to_y_512_c: 1199.0 planar_gbrp14le_to_y_512_sse4: 444.0 planar_gbrp14le_to_y_512_avx2: 279.5 planar_gbrp16be_to_y_512_c: 1364.0 planar_gbrp16be_to_y_512_sse4: 544.5 planar_gbrp16be_to_y_512_avx2: 339.5 planar_gbrp16le_to_y_512_c: 1201.0 planar_gbrp16le_to_y_512_sse4: 445.5 planar_gbrp16le_to_y_512_avx2: 280.5 planar_gbrap16be_to_y_512_c: 1377.0 planar_gbrap16be_to_y_512_sse4: 545.0 planar_gbrap16be_to_y_512_avx2: 338.5 planar_gbrap16le_to_y_512_c: 1201.0 planar_gbrap16le_to_y_512_sse4: 442.0 planar_gbrap16le_to_y_512_avx2: 279.0 planar_gbrpf32be_to_y_512_c: 4113.0 planar_gbrpf32be_to_y_512_sse2: 2438.0 planar_gbrpf32be_to_y_512_sse4: 1068.0 planar_gbrpf32be_to_y_512_avx2: 904.5 planar_gbrpf32le_to_y_512_c: 3818.5 planar_gbrpf32le_to_y_512_sse2: 2024.5 planar_gbrpf32le_to_y_512_sse4: 1241.5 planar_gbrpf32le_to_y_512_avx2: 657.0 planar_gbrapf32be_to_y_512_c: 3707.0 planar_gbrapf32be_to_y_512_sse2: 2444.0 planar_gbrapf32be_to_y_512_sse4: 1077.0 planar_gbrapf32be_to_y_512_avx2: 909.0 planar_gbrapf32le_to_y_512_c: 3822.0 planar_gbrapf32le_to_y_512_sse2: 2024.5 planar_gbrapf32le_to_y_512_sse4: 1176.0 planar_gbrapf32le_to_y_512_avx2: 658.5 planar_gbrp_to_uv_512_c: 2325.8 planar_gbrp_to_uv_512_sse2: 1726.8 planar_gbrp_to_uv_512_sse4: 771.8 planar_gbrp_to_uv_512_avx2: 506.8 planar_gbrap_to_uv_512_c: 2281.8 planar_gbrap_to_uv_512_sse2: 1726.3 planar_gbrap_to_uv_512_sse4: 768.3 planar_gbrap_to_uv_512_avx2: 496.3 planar_gbrp9be_to_uv_512_c: 2336.8 planar_gbrp9be_to_uv_512_sse2: 1924.8 planar_gbrp9be_to_uv_512_sse4: 852.3 planar_gbrp9be_to_uv_512_avx2: 552.8 planar_gbrp9le_to_uv_512_c: 2270.3 planar_gbrp9le_to_uv_512_sse2: 1512.3 planar_gbrp9le_to_uv_512_sse4: 764.3 planar_gbrp9le_to_uv_512_avx2: 491.3 planar_gbrp10be_to_uv_512_c: 2281.8 planar_gbrp10be_to_uv_512_sse2: 1917.8 planar_gbrp10be_to_uv_512_sse4: 855.3 planar_gbrp10be_to_uv_512_avx2: 541.3 planar_gbrp10le_to_uv_512_c: 2269.8 planar_gbrp10le_to_uv_512_sse2: 1515.3 planar_gbrp10le_to_uv_512_sse4: 759.8 planar_gbrp10le_to_uv_512_avx2: 487.8 planar_gbrap10be_to_uv_512_c: 2382.3 planar_gbrap10be_to_uv_512_sse2: 1924.8 planar_gbrap10be_to_uv_512_sse4: 855.3 planar_gbrap10be_to_uv_512_avx2: 540.8 planar_gbrap10le_to_uv_512_c: 2382.3 planar_gbrap10le_to_uv_512_sse2: 1512.3 planar_gbrap10le_to_uv_512_sse4: 759.3 planar_gbrap10le_to_uv_512_avx2: 484.8 planar_gbrp12be_to_uv_512_c: 2283.8 planar_gbrp12be_to_uv_512_sse2: 1936.8 planar_gbrp12be_to_uv_512_sse4: 858.3 planar_gbrp12be_to_uv_512_avx2: 541.3 planar_gbrp12le_to_uv_512_c: 2278.8 planar_gbrp12le_to_uv_512_sse2: 1507.3 planar_gbrp12le_to_uv_512_sse4: 760.3 planar_gbrp12le_to_uv_512_avx2: 485.8 planar_gbrap12be_to_uv_512_c: 2385.3 planar_gbrap12be_to_uv_512_sse2: 1927.8 planar_gbrap12be_to_uv_512_sse4: 855.3 planar_gbrap12be_to_uv_512_avx2: 539.8 planar_gbrap12le_to_uv_512_c: 2377.3 planar_gbrap12le_to_uv_512_sse2: 1516.3 planar_gbrap12le_to_uv_512_sse4: 759.3 planar_gbrap12le_to_uv_512_avx2: 484.8 planar_gbrp14be_to_uv_512_c: 2283.8 planar_gbrp14be_to_uv_512_sse2: 1935.3 planar_gbrp14be_to_uv_512_sse4: 852.3 planar_gbrp14be_to_uv_512_avx2: 540.3 planar_gbrp14le_to_uv_512_c: 2276.8 planar_gbrp14le_to_uv_512_sse2: 1514.8 planar_gbrp14le_to_uv_512_sse4: 762.3 planar_gbrp14le_to_uv_512_avx2: 484.8 planar_gbrp16be_to_uv_512_c: 2383.3 planar_gbrp16be_to_uv_512_sse2: 1881.8 planar_gbrp16be_to_uv_512_sse4: 852.3 planar_gbrp16be_to_uv_512_avx2: 541.8 planar_gbrp16le_to_uv_512_c: 2378.3 planar_gbrp16le_to_uv_512_sse2: 1476.8 planar_gbrp16le_to_uv_512_sse4: 765.3 planar_gbrp16le_to_uv_512_avx2: 485.8 planar_gbrap16be_to_uv_512_c: 2382.3 planar_gbrap16be_to_uv_512_sse2: 1886.3 planar_gbrap16be_to_uv_512_sse4: 853.8 planar_gbrap16be_to_uv_512_avx2: 550.8 planar_gbrap16le_to_uv_512_c: 2381.8 planar_gbrap16le_to_uv_512_sse2: 1488.3 planar_gbrap16le_to_uv_512_sse4: 765.3 planar_gbrap16le_to_uv_512_avx2: 491.8 planar_gbrpf32be_to_uv_512_c: 4863.0 planar_gbrpf32be_to_uv_512_sse2: 3347.5 planar_gbrpf32be_to_uv_512_sse4: 1800.0 planar_gbrpf32be_to_uv_512_avx2: 1199.0 planar_gbrpf32le_to_uv_512_c: 4725.0 planar_gbrpf32le_to_uv_512_sse2: 2753.0 planar_gbrpf32le_to_uv_512_sse4: 1474.5 planar_gbrpf32le_to_uv_512_avx2: 927.5 planar_gbrapf32be_to_uv_512_c: 4859.0 planar_gbrapf32be_to_uv_512_sse2: 3269.0 planar_gbrapf32be_to_uv_512_sse4: 1802.0 planar_gbrapf32be_to_uv_512_avx2: 1201.5 planar_gbrapf32le_to_uv_512_c: 6338.0 planar_gbrapf32le_to_uv_512_sse2: 2756.5 planar_gbrapf32le_to_uv_512_sse4: 1476.0 planar_gbrapf32le_to_uv_512_avx2: 908.5 planar_gbrap_to_a_512_c: 383.3 planar_gbrap_to_a_512_sse2: 66.8 planar_gbrap_to_a_512_avx2: 43.8 planar_gbrap10be_to_a_512_c: 601.8 planar_gbrap10be_to_a_512_sse2: 86.3 planar_gbrap10be_to_a_512_avx2: 34.8 planar_gbrap10le_to_a_512_c: 602.3 planar_gbrap10le_to_a_512_sse2: 48.8 planar_gbrap10le_to_a_512_avx2: 31.3 planar_gbrap12be_to_a_512_c: 601.8 planar_gbrap12be_to_a_512_sse2: 111.8 planar_gbrap12be_to_a_512_avx2: 41.3 planar_gbrap12le_to_a_512_c: 385.8 planar_gbrap12le_to_a_512_sse2: 75.3 planar_gbrap12le_to_a_512_avx2: 39.8 planar_gbrap16be_to_a_512_c: 386.8 planar_gbrap16be_to_a_512_sse2: 79.8 planar_gbrap16be_to_a_512_avx2: 31.3 planar_gbrap16le_to_a_512_c: 600.3 planar_gbrap16le_to_a_512_sse2: 40.3 planar_gbrap16le_to_a_512_avx2: 30.3 planar_gbrapf32be_to_a_512_c: 1148.8 planar_gbrapf32be_to_a_512_sse2: 611.3 planar_gbrapf32be_to_a_512_sse4: 234.8 planar_gbrapf32be_to_a_512_avx2: 183.3 planar_gbrapf32le_to_a_512_c: 851.3 planar_gbrapf32le_to_a_512_sse2: 263.3 planar_gbrapf32le_to_a_512_sse4: 199.3 planar_gbrapf32le_to_a_512_avx2: 156.8 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2022-01-11 16:34:33 -03:00
Mark Reid	9e445a5be2	swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions changes since v2: * fixed label changes since v1: * remove vex intruction on sse4 path * some load/pack marcos use less intructions * fixed some typos yuv2gbrp_full_X_4_512_c: 12757.6 yuv2gbrp_full_X_4_512_sse2: 8946.6 yuv2gbrp_full_X_4_512_sse4: 5138.6 yuv2gbrp_full_X_4_512_avx2: 3889.6 yuv2gbrap_full_X_4_512_c: 15368.6 yuv2gbrap_full_X_4_512_sse2: 11916.1 yuv2gbrap_full_X_4_512_sse4: 6294.6 yuv2gbrap_full_X_4_512_avx2: 3477.1 yuv2gbrp9be_full_X_4_512_c: 14381.6 yuv2gbrp9be_full_X_4_512_sse2: 9139.1 yuv2gbrp9be_full_X_4_512_sse4: 5150.1 yuv2gbrp9be_full_X_4_512_avx2: 2834.6 yuv2gbrp9le_full_X_4_512_c: 12990.1 yuv2gbrp9le_full_X_4_512_sse2: 9118.1 yuv2gbrp9le_full_X_4_512_sse4: 5132.1 yuv2gbrp9le_full_X_4_512_avx2: 2833.1 yuv2gbrp10be_full_X_4_512_c: 14401.6 yuv2gbrp10be_full_X_4_512_sse2: 9133.1 yuv2gbrp10be_full_X_4_512_sse4: 5126.1 yuv2gbrp10be_full_X_4_512_avx2: 2837.6 yuv2gbrp10le_full_X_4_512_c: 12718.1 yuv2gbrp10le_full_X_4_512_sse2: 9106.1 yuv2gbrp10le_full_X_4_512_sse4: 5120.1 yuv2gbrp10le_full_X_4_512_avx2: 2826.1 yuv2gbrap10be_full_X_4_512_c: 18535.6 yuv2gbrap10be_full_X_4_512_sse2: 33617.6 yuv2gbrap10be_full_X_4_512_sse4: 6264.1 yuv2gbrap10be_full_X_4_512_avx2: 3422.1 yuv2gbrap10le_full_X_4_512_c: 16724.1 yuv2gbrap10le_full_X_4_512_sse2: 11787.1 yuv2gbrap10le_full_X_4_512_sse4: 6282.1 yuv2gbrap10le_full_X_4_512_avx2: 3441.6 yuv2gbrp12be_full_X_4_512_c: 13723.6 yuv2gbrp12be_full_X_4_512_sse2: 9128.1 yuv2gbrp12be_full_X_4_512_sse4: 7997.6 yuv2gbrp12be_full_X_4_512_avx2: 2844.1 yuv2gbrp12le_full_X_4_512_c: 12257.1 yuv2gbrp12le_full_X_4_512_sse2: 9107.6 yuv2gbrp12le_full_X_4_512_sse4: 5142.6 yuv2gbrp12le_full_X_4_512_avx2: 2837.6 yuv2gbrap12be_full_X_4_512_c: 18511.1 yuv2gbrap12be_full_X_4_512_sse2: 12156.6 yuv2gbrap12be_full_X_4_512_sse4: 6251.1 yuv2gbrap12be_full_X_4_512_avx2: 3444.6 yuv2gbrap12le_full_X_4_512_c: 16687.1 yuv2gbrap12le_full_X_4_512_sse2: 11785.1 yuv2gbrap12le_full_X_4_512_sse4: 6243.6 yuv2gbrap12le_full_X_4_512_avx2: 3446.1 yuv2gbrp14be_full_X_4_512_c: 13690.6 yuv2gbrp14be_full_X_4_512_sse2: 9120.6 yuv2gbrp14be_full_X_4_512_sse4: 5138.1 yuv2gbrp14be_full_X_4_512_avx2: 2843.1 yuv2gbrp14le_full_X_4_512_c: 14995.6 yuv2gbrp14le_full_X_4_512_sse2: 9119.1 yuv2gbrp14le_full_X_4_512_sse4: 5126.1 yuv2gbrp14le_full_X_4_512_avx2: 2843.1 yuv2gbrp16be_full_X_4_512_c: 12367.1 yuv2gbrp16be_full_X_4_512_sse2: 8233.6 yuv2gbrp16be_full_X_4_512_sse4: 4820.1 yuv2gbrp16be_full_X_4_512_avx2: 2666.6 yuv2gbrp16le_full_X_4_512_c: 10904.1 yuv2gbrp16le_full_X_4_512_sse2: 8214.1 yuv2gbrp16le_full_X_4_512_sse4: 4824.1 yuv2gbrp16le_full_X_4_512_avx2: 2629.1 yuv2gbrap16be_full_X_4_512_c: 26569.6 yuv2gbrap16be_full_X_4_512_sse2: 10884.1 yuv2gbrap16be_full_X_4_512_sse4: 5488.1 yuv2gbrap16be_full_X_4_512_avx2: 3272.1 yuv2gbrap16le_full_X_4_512_c: 14010.1 yuv2gbrap16le_full_X_4_512_sse2: 10562.1 yuv2gbrap16le_full_X_4_512_sse4: 5463.6 yuv2gbrap16le_full_X_4_512_avx2: 3255.1 yuv2gbrpf32be_full_X_4_512_c: 14524.1 yuv2gbrpf32be_full_X_4_512_sse2: 8552.6 yuv2gbrpf32be_full_X_4_512_sse4: 4636.1 yuv2gbrpf32be_full_X_4_512_avx2: 2474.6 yuv2gbrpf32le_full_X_4_512_c: 13060.6 yuv2gbrpf32le_full_X_4_512_sse2: 9682.6 yuv2gbrpf32le_full_X_4_512_sse4: 4298.1 yuv2gbrpf32le_full_X_4_512_avx2: 2453.1 yuv2gbrapf32be_full_X_4_512_c: 18629.6 yuv2gbrapf32be_full_X_4_512_sse2: 11363.1 yuv2gbrapf32be_full_X_4_512_sse4: 15201.6 yuv2gbrapf32be_full_X_4_512_avx2: 3727.1 yuv2gbrapf32le_full_X_4_512_c: 16677.6 yuv2gbrapf32le_full_X_4_512_sse2: 10221.6 yuv2gbrapf32le_full_X_4_512_sse4: 5693.6 yuv2gbrapf32le_full_X_4_512_avx2: 3656.6 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2022-01-11 16:33:17 -03:00
Alan Kelly	eebe406c80	libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions. This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions are only used where they are faster.	2021-12-21 17:44:53 -03:00
Henrik Gramner	15cfb4eee3	checkasm: Use the correct AVTXContext in av_tx tests Keep a reference to the correct associated context of the reference function and use that context when calling the reference function.	2021-12-20 23:58:05 +01:00
Alan Kelly	86663963e6	x86/swscale: fix minor coding style issues Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-16 13:16:04 -03:00
Alan Kelly	f900a19fa9	libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes. Fixes so that fate under 64 bit Windows passes. These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-15 20:04:59 -03:00
Shiyou Yin	9a840ffa17	avutil: [loongarch] Add support for loongarch SIMD. LSX and LASX is loongarch SIMD extention. They are enabled by default if compiler support it, and can be disabled with '--disable-lsx' '--disable-lasx'. Change-Id: Ie2608ea61dbd9b7fffadbf0ec2348bad6c124476 Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Reviewed-by: guxiwei <guxiwei-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-12-15 18:37:40 +01:00
Andreas Rheinhardt	09408539f4	checkasm/hevc_pel: Fix stack buffer overreads This patch increases several stack buffers in order to fix stack-buffer-overflows (e.g. in put_hevc_qpel_uni_hv_9 in line 814 of hevcdsp_template.c) detected with ASAN in the hevc_pel checkasm test. The buffers are increased by the minimal amount necessary in order not to mask potential future bugs. Reviewed-by: Martin Storsjö <martin@martin.st> Reviewed-by: "zhilizhao(赵志立)" <quinkblack@foxmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-29 04:35:31 +02:00
Andreas Rheinhardt	1ea3650823	Replace all occurences of av_mallocz_array() by av_calloc() They do the same. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-20 01:03:52 +02:00
Wu Jianhua	133b2767cf	tests/checkasm/vf_gblur.c: update check_horiz_slice for the new ff_horiz_slice_avx2/512 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
Wu Jianhua	0c54ab20c2	tests/checkasm/vf_gblur.c: add check_verti_slice() for unit test Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
J. Dekker	b492cacffd	checkasm: collapse hevc pel tests Also add to `make fate-checkasm' target. Signed-off-by: J. Dekker <jdek@itanimul.li>	2021-08-24 22:12:06 +02:00
Andreas Rheinhardt	4608f7cc6a	Remove unnecessary mem.h inclusions Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-07-22 14:47:57 +02:00
J. Dekker	c866a099b2	lavu/kperf: use ff_thread_once() Signed-off-by: J. Dekker <jdek@itanimul.li>	2021-07-21 16:35:27 +02:00
J. Dekker	9a727235fd	lavu/checkasm: add (private) kperf timing for macOS Signed-off-by: J. Dekker <jdek@itanimul.li>	2021-07-20 19:40:03 +02:00
Anton Khirnov	fe490ec165	sws: separate the calls to scaled vs unscaled conversion Call the scaler function directly rather than through a function pointer. Drop the now-unused return value from ff_getSwsFunc() and rename the function to reflect its new role. This will be useful in the following commits, where it will become important that the amount of output is different for scaled vs unscaled case.	2021-07-03 15:57:13 +02:00
Matthieu Patou	b27ae2c0b7	checkasm/vp9dsp: rename the iszero function to is_zero Suggested-by: ffmpeg@fb.com Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2021-06-08 13:11:22 -03:00
Lynne	1978b143eb	checkasm: add av_tx FFT SIMD testing code This sadly required making changes to the code itself, due to the same context needing to be reused for both versions. The lookup table had to be duplicated for both versions.	2021-04-24 17:19:17 +02:00
Alan Kelly	e1484bc455	tests/checkasm/sw_scale: adds additional tests sizes for yux2yuvX Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-04-01 20:47:52 +02:00
James Almer	d52ceed9fd	tests/checkasm/sw_scale: use memset() to fill dither Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-19 16:19:11 -03:00
Alan Kelly	ee18edb13a	checkasm/sw_scale: properly initialize src_pixer and filter_coeff buffers Fixes valgrind uninitialised value warnings. Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-19 11:20:32 -03:00
James Almer	1371647fc3	checkasm/sw_scale: use av_free() instead of free() Fixes crashes on Win64 Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-17 20:57:33 -03:00
Alan Kelly	554c2bc708	swscale: move yuv2yuvX_sse3 to yasm, unrolls main loop And other small optimizations for ~20% speedup.	2021-02-17 21:21:03 +01:00
James Almer	bea7c51307	checkasm/vf_gblur: add a test for postscale_slice Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-17 13:39:31 -03:00
James Almer	2df3c2ed9b	checkasm/vf_gblur: split off the horiz_slice test into its own function Will come in handy for the following commit. Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-17 13:39:11 -03:00
Josh Dekker	9c513edb79	checkasm: add hevc_pel tests Co-authored-by: Niklas Haas <git@haasn.xyz> Signed-off-by: Josh Dekker <josh@itanimul.li>	2021-01-25 09:24:11 +01:00
Anton Khirnov	c8c2dfbc37	lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h That is a more appropriate place for it.	2021-01-01 14:11:01 +01:00
Limin Wang	c748bd77dc	tests: fix warning ISO C90 forbids mixed declarations and code Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2020-09-10 20:34:51 +08:00
Carl Eugen Hoyos	b61376bdee	lavfi/hflip: Support Bayer pixel formats. Fixes part of ticket #8819.	2020-08-25 01:29:24 +02:00
Jiaxun Yang	e387fcd01c	libavutil: Detect MMI and MSA flags for MIPS Add MMI & MSA runtime detection for MIPS. Basically there are two code pathes. For systems that natively support CPUCFG instruction or kernel emulated that instruction, we'll sense this feature from HWCAP and report the flags according to values grab from CPUCFG. For systems that have no CPUCFG (or not export it in HWCAP), we'll parse /proc/cpuinfo instead. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-07-23 17:21:58 +02:00
James Almer	55e1bc39cb	checkasm/vf_blend: use the correct depth parameters to initialize the blend modes This effectively enables the tests that until now were just running the C version alone. Signed-off-by: James Almer <jamrial@gmail.com>	2020-07-12 11:30:23 -03:00
Jun Zhao	7f76f20fa0	checkasm: sw_rgb: Fix mixed declaration and code Fix mixed declaration and code. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2020-06-01 23:28:07 +08:00
Andreas Rheinhardt	57e570b508	checkasm/sw_scale: Fix stack-buffer-overflow A buffer whose size is not a multiple of four has been initialized using consecutive writes of 32bits. This results in a stack-buffer-overflow reported by ASAN in the checkasm-sw_scale FATE-test. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	2020-05-20 23:18:50 +02:00
Martin Storsjö	9c326af1d0	checkasm: swscale: Fix running the hscale test on 32 bit x86 This function doesn't call emms. Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-16 08:16:12 +03:00
Martin Storsjö	eba1ebd9bf	checkasm: sw_rgb: Add a test for interleaveBytes Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-15 23:38:01 +03:00
Martin Storsjö	5bdffced0a	checkasm: pixblockdsp: Add tests for get_pixels_unaligned and diff_pixels_unaligned Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-15 23:37:27 +03:00
Martin Storsjö	ed7d73355e	checkasm: aarch64: Check for stack overflows Also fill x8-x17 with garbage before calling the function. Figure out the number of stack parameters and make sure that the value on the stack after those is untouched. Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-15 21:22:36 +03:00
Martin Storsjö	6cb2d4d94b	checkasm: arm: Check for stack overflows Figure out the number of stack parameters and make sure that the value on the stack after those is untouched. Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-15 21:22:34 +03:00
Martin Storsjö	3f266cf49e	checkasm: arm: Don't use blx to call checkasm_fail_func We should just use a normal bl here, and the linker will add the 'x' bit if necessary. This fixes calling the checkasm_fail_func on windows, where the code is built in thumb mode (and the linker doesn't clear the 'x' bit in the blx instruction). Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-15 21:22:32 +03:00
Martin Storsjö	89cf9e1fb6	checkasm: arm: Make the indentation consistent with other files This makes it easier to share code with e.g. the dav1d implementation of checkasm. Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-15 21:22:27 +03:00
Josh de Kock	5913cd4e6c	checkasm: add hscale test This tests the hscale 8bpp to 14/18bpp functions with different filter sizes. Signed-off-by: Josh de Kock <josh@itanimul.li>	2020-05-15 10:29:30 +01:00

1 2 3 4 5 ...

437 Commits