1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-12 19:18:44 +02:00
Commit Graph

391 Commits

Author SHA1 Message Date
Lynne
bbe95f7353
x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
James Darnley
eef763c705 checkasm/v210dec: add extra space to the destination arrays 2022-12-21 00:36:49 +01:00
James Darnley
6af453ca38 avcodec/x86: add avx512icl function for v210dec
Ice Lake (Xeon Silver 4316): 2.01x faster (1147±36.8 vs. 571±38.2 decicycles) compared with avx2
2022-12-20 15:02:45 +01:00
James Darnley
cfd1c3c0a1 checkasm/v210enc: test the entire width of 10-bit planar input arrays 2022-12-01 18:19:03 +01:00
bwang30
3ab11dc5bb libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI
This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter

sobel_c: 4537
sobel_avx512icl 2136

Signed-off-by: bwang30 <bin.wang@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-11-14 10:04:16 +08:00
Lynne
e0661fc805
dca_core: convert to lavu/tx
Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the
arm32 and aarch64 changes.
2022-11-06 14:39:36 +01:00
James Darnley
1936c06f02 checkasm: add a verbose check function for uint32_t data 2022-11-04 19:37:46 +01:00
Andreas Rheinhardt
37ee36f689 checkasm/idctdsp: Use declare_func_emms only when needed
There is no MMX code for (add|put|put_signed)_pixels_clamped
since commit bfb28b5ce8, so use
declare_func instead of declare_func_emms() to also test that
we are not in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
5102b98b7a checkasm/llviddspenc: Use declare_func_emms only when needed
There is no MMX code for diff_bytes since commit
230ea38de1, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
e814569c8d checkasm/huffyuvdsp: Use declare_func_emms only when needed
There is no MMX code for add_int16 since commit
4b6ffc2880, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
cd8a33bcce checkasm/llviddsp: Be strict about MMX
There is no MMX code for llviddsp after commit
fed07efcde, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
b4e2d67636 checkasm/pixblockdsp: Be strict about MMX
There is no MMX code for pixblockdsp after commit
92b5800277, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
42921190cb checkasm/audiodsp: Be strict about MMX
There is no MMX code for audiodsp after commit
3d716d38ab, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
18afaa20f1 checkasm/blockdsp: Be strict about MMX
There is no MMX code for blockdsp after commit
ee551a21dd, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
f224c195e0 checkasm/vc1dsp: Use declare_func_emms only when needed
There is no MMX code for vc1_inv_trans_8x8 or
vc1_unescape_buffer, so use declare_func instead of
declare_func_emms() to also test that we are not in MMX
mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Rémi Denis-Courmont
c962c78901 checkasm: RISC-V 64-bit assembler test harness 2022-10-10 02:23:18 +02:00
Andreas Rheinhardt
bcfa427c8f checkasm/vp8dsp: Use declare_func_emms only when needed
There is no MMX code for loop filters since commit
6a551f1405, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-08 09:33:36 +02:00
Rémi Denis-Courmont
37d5ddc317 lavu/riscv: CPU flag for the Zbb extension
Unfortunately, it is common, and will remain so, that the Bit
manipulations are not enabled at compilation time. This is an official
policy for Debian ports in general (though they do not support RISC-V
officially as of yet) to stick to the minimal target baseline, which
does not include the B extension or even its Zbb subset.

For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or
even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems
impractical. But at least it can work for the byte-swap DSP functions.
2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont
0c0a3deb18 lavu/cpu: CPU flags for the RISC-V Vector extension
RVV defines a total of 12 different extensions, including:

- 5 different instruction subsets:
  - Zve32x: 8-, 16- and 32-bit integers,
  - Zve32f: Zve32x plus single precision floats,
  - Zve64x: Zve32x plus 64-bit integers,
  - Zve64f: Zve32f plus Zve64x,
  - Zve64d: Zve64f plus double precision floats.

- 6 different vector lengths:
  - Zvl32b (embedded only),
  - Zvl64b (embedded only),
  - Zvl128b,
  - Zvl256b,
  - Zvl512b,
  - Zvl1024b,

- and the V extension proper: equivalent to Zve64f and Zvl128b.

In total, there are 6 different possible sets of supported instructions
(including the empty set), but for convenience we allocate one bit for
each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32),
64-bit ints (RVV_I64) and doubles (RVV_F64).

Whence the vector size is needed, it can be retrieved by reading the
unprivileged read-only vlenb CSR. This should probably be a separate
helper macro if needed at a later point.
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
b95e2fbd85 lavu/cpu: detect RISC-V base extensions
This introduces compile-time and run-time CPU detection on RISC-V. In
practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of
I, F and D extensions, and if it does, it probably won't have run-time
detection. So the flags are essentially always set.

But as things stand, checkasm wants them that way. Compare the ARMV8
flag on AArch64. We are nowhere near running short on CPU flag bits.
2022-09-27 13:19:52 +02:00
Lynne
ace42cf581
x86/tx_float: add 15xN PFA FFT AVX SIMD
~4x faster than the C version.
The shuffles in the 15pt dim1 are seriously expensive. Not happy with it,
but I'm contempt.

Can be easily converted to pure AVX by removing all vpermpd/vpermps
instructions.
2022-09-23 12:35:27 +02:00
Lynne
668f43af20
tests/checkasm/lpc: correct arithmetic when randomizing buffers
Results weren't signed.
2022-09-23 01:50:59 +02:00
Lynne
6ad39f01df
tests/checkasm/lpc: reduce range and use signed values
This is more similar to its regular use, and prevents inaccuracies
of huge float*float multiplications from failing the tests.
2022-09-23 01:42:34 +02:00
James Almer
9cbfffa0d4 tests/checkasm/lpc: print mismatching values
Will help debugging.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 18:18:52 -03:00
James Almer
a1c6f4b653 tests/checkasm/lpc: randomize buffer length
Simplifies the test, while trying more values and preventing pointlessly
running benchmarks in a loop.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 18:17:26 -03:00
James Almer
c8c4a162fc avcodec/lpc: use ptrdiff_t for length parameters
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 18:17:26 -03:00
Lynne
b67776e12f
x86/lpc: fix even scalar loop overreads/writes
Passes checkasm with valgrind, tested to sizes of more than 4000 samples.
2022-09-22 04:27:19 +02:00
Andreas Rheinhardt
9beba05311 avcodec/fmtconvert: Remove unused AVCodecContext parameter
Unused since d74a8cb7e4.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-21 20:26:40 +02:00
Andreas Rheinhardt
fd72d8aea3 avcodec/blockdsp: Remove unused AVCodecContext parameter
Possible since be95df12bb.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-21 20:24:40 +02:00
Lynne
3ade6a8644
x86/lpc: implement a new Welch windowing function
Old one was written with the assumption only even inputs would be given.
This very messy replacement supports even and odd inputs, and supports
AVX2 for extra speed. The buffers given are usually quite big (4k samples),
so the speedup is worth it.
The new SSE version is still faster than the old inline asm version by 33%.

Also checkasm is provided to make sure this monstrosity works.

This fixes some FATE tests.
2022-09-21 07:12:39 +02:00
James Almer
8f119b501e tests/checkasm: add a test for VorbisDSPContext
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-19 21:28:23 -03:00
Lynne
9a9647af33
checkasm/tx: add checkasm support for the iMDCT 2022-09-06 04:21:49 +02:00
Martin Storsjö
f921c58335 checkasm: sw_scale: Produce more realistic test filter coefficients for yuv2yuvX
This avoids triggering overflows in the filters, and avoids stray
test failures in the approximate functions on x86; due to rounding
differences, one implementation might overflow while another one
doesn't.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-19 22:54:51 +03:00
Alan Kelly
da0a37bab7 checkasm/sw_scale: hscale does not requires cpuflag test.
This is done in ff_shuffle_filter_coefficients.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-08-18 16:24:48 +02:00
Alan Kelly
a38293e444 libswscale: Enable hscale_avx2 for all input sizes.
ff_shuffle_filter_coefficients shuffles the tail as required.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-08-18 16:24:48 +02:00
Martin Storsjö
d69d12a5b9 checkasm: motion: Test different h parameters
Previously, the checkasm test always passed h=8, so no other cases
were tested.

Out of the me_cmp functions, in practice, some functions are hardcoded
to always assume a 8x8 block (ignoring the h parameter), while others
do use the parameter. For those with hardcoded height, both the
reference C function and the assembly implementations ignore the
parameter similarly.

The documentation for the functions indicate that heights between
w/2 and 2*w, within the range of 4 to 16, should be supported. This
patch just tests random heights in that range, without knowing what
width the current function actually uses.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-17 00:00:50 +03:00
Martin Storsjö
21c2c57ba5 checkasm: Provide enough alignment in the new yuv2plane1 test
This fixes the checkasm test in some setups on x86.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 23:47:16 +03:00
J. Dekker
ea6ecb12aa checkasm/hevc_add_res: add 12bit test
Also fix the bug where in every other byte only the lower 2 bits were
used in the 8bit test.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-08-16 14:00:34 +02:00
Swinney, Jonathan
4dcd191a50 checkasm: updated tests for sw_scale
Change the reference to exactly match the C reference in swscale,
instead of exactly matching the x86 SIMD implementations (which
differs slightly). Test with and without SWS_ACCURATE_RND - if this
flag isn't set, the output must match the C reference exactly,
otherwise it is allowed to be off by 2.

Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND
is set - apparently this discrepancy hasn't been noticed in other
exact tests before.

Add a test for yuv2plane1.

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 13:40:42 +03:00
Martin Storsjö
5cdf4c0bed checkasm: Silence warnings about unused return value from read()
This codepath is enabled by default on arm, if the linux perf API
is available, unless disabled with --disable-linux-perf.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-08 23:39:13 +03:00
Andreas Rheinhardt
6c4595190e avcodec/flacdsp: Split encoder-only parts into a ctx of its own
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 03:28:45 +02:00
Andreas Rheinhardt
3a869cd5cd avcodec/flacdsp: Remove unused function parameter
Forgotten in e609cfd697.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 03:28:45 +02:00
Martin Storsjö
237730f0e0 checkasm: motion: Make the benchmarks more stable
Don't use the last random offset, but a static one.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-07-16 17:25:35 +03:00
Martin Storsjö
900424cda9 checkasm: Provide enough alignment in the new motion test
This fixes the checkasm test in some setups on x86.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-06-28 18:09:08 +03:00
Swinney, Jonathan
c471cc7474 lavc/aarch64: motion estimation functions in neon
- ff_pix_abs16_neon
 - ff_pix_abs16_xy2_neon

In direct micro benchmarks of these ff functions verses their C implementations,
these functions performed as follows on AWS Graviton 3.

ff_pix_abs16_neon:
pix_abs_0_0_c: 141.1
pix_abs_0_0_neon: 19.6

ff_pix_abs16_xy2_neon:
pix_abs_0_3_c: 269.1
pix_abs_0_3_neon: 39.3

Tested with:
./tests/checkasm/checkasm --test=motion --bench --disable-linux-perf

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-06-28 00:51:39 +03:00
Michael Goulet
b7f6a933fa tests/checkasm/sw_scale: Fix alignment for movdqa
SSE3 instruction movdqa in ff_yuv2yuvX_sse3() expects a 16-byte aligned address for a memory address, or else a segfault is generated.
The src_pixels buffer below was not aligned to 16 bytes on the stack necessarily, so we got segfaults during fate-checkasm-sw_scale.

Therefore 16-byte align all of these local variables, aligning them too much shouldn't hurt.
2022-06-20 11:08:43 +02:00
Swinney, Jonathan
92ea8e03df checkasm: added additional dstW tests for hscale
Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-05-28 01:09:00 +03:00
J. Dekker
cc679054c7 checkasm: improve hevc_sao test
The HEVC decoder can call these functions with smaller widths than the
functions themselves are designed to operate on so we should only check
the relevant output

Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-05-25 08:04:58 +02:00
Andreas Rheinhardt
d496bbe105 avcodec/v210enc: Move ff_v210enc_init into a header
This removes a dependency of checkasm on lavc/v210_enc.o
and also allows to inline ff_v210enc_init() irrespectively of
interposing.
This dependency pulled basically all of libavcodec into checkasm,
in particular all codecs.
This also makes checkasm work when using shared Windows builds:
On Windows, it needs to be known to the compiler whether a data
symbol is external to the library/executable or not; hence the
need for av_export_avutil. checkasm needs access to the internals
of the libraries it tests and is therefore linked statically to all
the libraries. This means that the users of avpriv_cga_font and
avpriv_vga16_font in libavcodec (namely ansi.o, bintext.o, tmv.o)
end up in the same executable as the symbols, although they have
been compiled as if these symbols were external, leading to linker
errors. With this commit said files are discarded by the linker,
bypassing this problem.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-05-06 05:33:38 +02:00
Andreas Rheinhardt
0c2489fe29 avcodec/v210_dec: Move ff_v210dec_init into a header
This removes a dependency of checkasm on lavc/v210_dec.o
and also allows to inline ff_v210dec_init() irrespectively of
interposing.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-05-06 05:19:50 +02:00