Andreas Rheinhardt
0df18f29ae
avfilter/af_afir: Only keep DSP stuff in header
...
Only the AudioFIRDSPContext and the functions for its initialization
are needed outside of lavfi/af_afir.c.
Also rename the header to af_afirdsp.h to reflect the change.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-05-06 05:19:49 +02:00
Ben Avison
2e26847780
avcodec/vc1: Introduce fast path for unescaping bitstream buffer
...
Includes a checkasm test.
Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-01 10:03:33 +03:00
Ben Avison
bd3615a81a
checkasm: Add idctdsp add/put-pixels-clamped tests
...
Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-01 10:03:33 +03:00
Ben Avison
2698bfdc93
checkasm: Add vc1dsp inverse transform tests
...
This test deliberately doesn't exercise the full range of inputs described in
the committee draft VC-1 standard. It says:
input coefficients in frequency domain, D, satisfy -2048 <= D < 2047
intermediate coefficients, E, satisfy -4096 <= E < 4095
fully inverse-transformed coefficients, R, satisfy -512 <= R < 511
For one thing, the inequalities look odd. Did they mean them to go the
other way round? That would make more sense because the equations generally
both add and subtract coefficients multiplied by constants, including powers
of 2. Requiring the most-negative values to be valid extends the number of
bits to represent the intermediate values just for the sake of that one case!
For another thing, the extreme values don't look to occur in real streams -
both in my experience and supported by the following comment in the AArch32
decoder:
tNhalf is half of the value of tN (as described in vc1_inv_trans_8x8_c).
This is done because sometimes files have input that causes tN + tM to
overflow. To avoid this overflow, we compute tNhalf, then compute
tNhalf + tM (which doesn't overflow), and then we use vhadd to compute
(tNhalf + (tNhalf + tM)) >> 1 which does not overflow because it is
one instruction.
My AArch64 decoder goes further than this. It calculates tNhalf and tM
then does an SRA (essentially a fused halve and add) to compute
(tN + tM) >> 1 without ever having to hold (tNhalf + tM) in a 16-bit element
without overflowing. It only encounters difficulties if either tNhalf or
tM overflow in isolation.
I haven't had sight of the final standard, so it's possible that these
issues were dealt with during finalisation, which could explain the lack
of usage of extreme inputs in real streams. Or a preponderance of decoders
that only support 16-bit intermediate values in their inverse transforms
might have caused encoders to steer clear of such cases.
I have effectively followed this approach in the test, and limited the
scale of the coefficients sufficient that both the existing AArch32 decoder
and my new AArch64 decoder both pass.
Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-01 10:03:33 +03:00
Ben Avison
20cb43ea8b
checkasm: Add vc1dsp in-loop deblocking filter tests
...
Note that the benchmarking results for these functions are highly dependent
upon the input data. Therefore, each function is benchmarked twice,
corresponding to the best and worst case complexity of the reference C
implementation. The performance of a real stream decode will fall somewhere
between these two extremes.
Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-01 10:03:33 +03:00
Martin Storsjö
a78f136f3f
configure: Use a separate config_components.h header for $ALL_COMPONENTS
...
This avoids unnecessary rebuilds of most source files if only the
list of enabled components has changed, but not the other properties
of the build, set in config.h.
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-03-16 14:12:49 +02:00
Wu Jianhua
f629ea2e18
avutil/cpu: add AVX512 Icelake flag
...
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-03-10 16:45:48 -03:00
Anton Khirnov
d552f2535b
lavc/h264: move some shared code from h264dec to h264_parse
2022-01-26 15:23:30 +01:00
Mark Reid
52f7026164
swscale/x86/input.asm: add x86-optimized planer rgb2yuv functions
...
sse2 only operates on 2 lanes per loop for to_y and to_uv functions, due
to the lack of pmulld instruction. Emulating pmulld with 2 pmuludq and shuffles
proved too costly and made to_uv functions slower then the c implementation.
For to_y on sse2 only float functions are generated,
I was are not able outperform the c implementation on the integer pixel formats.
For to_a on see4 only the float functions are generated.
sse2 and sse4 generated nearly identical performing code on integer pixel formats,
so only sse2/avx2 versions are generated.
planar_gbrp_to_y_512_c: 1197.5
planar_gbrp_to_y_512_sse4: 444.5
planar_gbrp_to_y_512_avx2: 287.5
planar_gbrap_to_y_512_c: 1204.5
planar_gbrap_to_y_512_sse4: 447.5
planar_gbrap_to_y_512_avx2: 289.5
planar_gbrp9be_to_y_512_c: 1380.0
planar_gbrp9be_to_y_512_sse4: 543.5
planar_gbrp9be_to_y_512_avx2: 340.0
planar_gbrp9le_to_y_512_c: 1200.5
planar_gbrp9le_to_y_512_sse4: 442.0
planar_gbrp9le_to_y_512_avx2: 282.0
planar_gbrp10be_to_y_512_c: 1378.5
planar_gbrp10be_to_y_512_sse4: 544.0
planar_gbrp10be_to_y_512_avx2: 337.5
planar_gbrp10le_to_y_512_c: 1200.0
planar_gbrp10le_to_y_512_sse4: 448.0
planar_gbrp10le_to_y_512_avx2: 285.5
planar_gbrap10be_to_y_512_c: 1380.0
planar_gbrap10be_to_y_512_sse4: 542.0
planar_gbrap10be_to_y_512_avx2: 340.5
planar_gbrap10le_to_y_512_c: 1199.0
planar_gbrap10le_to_y_512_sse4: 446.0
planar_gbrap10le_to_y_512_avx2: 289.5
planar_gbrp12be_to_y_512_c: 10563.0
planar_gbrp12be_to_y_512_sse4: 542.5
planar_gbrp12be_to_y_512_avx2: 339.0
planar_gbrp12le_to_y_512_c: 1201.0
planar_gbrp12le_to_y_512_sse4: 440.5
planar_gbrp12le_to_y_512_avx2: 286.0
planar_gbrap12be_to_y_512_c: 1701.5
planar_gbrap12be_to_y_512_sse4: 917.0
planar_gbrap12be_to_y_512_avx2: 338.5
planar_gbrap12le_to_y_512_c: 1201.0
planar_gbrap12le_to_y_512_sse4: 444.5
planar_gbrap12le_to_y_512_avx2: 288.0
planar_gbrp14be_to_y_512_c: 1370.5
planar_gbrp14be_to_y_512_sse4: 545.0
planar_gbrp14be_to_y_512_avx2: 338.5
planar_gbrp14le_to_y_512_c: 1199.0
planar_gbrp14le_to_y_512_sse4: 444.0
planar_gbrp14le_to_y_512_avx2: 279.5
planar_gbrp16be_to_y_512_c: 1364.0
planar_gbrp16be_to_y_512_sse4: 544.5
planar_gbrp16be_to_y_512_avx2: 339.5
planar_gbrp16le_to_y_512_c: 1201.0
planar_gbrp16le_to_y_512_sse4: 445.5
planar_gbrp16le_to_y_512_avx2: 280.5
planar_gbrap16be_to_y_512_c: 1377.0
planar_gbrap16be_to_y_512_sse4: 545.0
planar_gbrap16be_to_y_512_avx2: 338.5
planar_gbrap16le_to_y_512_c: 1201.0
planar_gbrap16le_to_y_512_sse4: 442.0
planar_gbrap16le_to_y_512_avx2: 279.0
planar_gbrpf32be_to_y_512_c: 4113.0
planar_gbrpf32be_to_y_512_sse2: 2438.0
planar_gbrpf32be_to_y_512_sse4: 1068.0
planar_gbrpf32be_to_y_512_avx2: 904.5
planar_gbrpf32le_to_y_512_c: 3818.5
planar_gbrpf32le_to_y_512_sse2: 2024.5
planar_gbrpf32le_to_y_512_sse4: 1241.5
planar_gbrpf32le_to_y_512_avx2: 657.0
planar_gbrapf32be_to_y_512_c: 3707.0
planar_gbrapf32be_to_y_512_sse2: 2444.0
planar_gbrapf32be_to_y_512_sse4: 1077.0
planar_gbrapf32be_to_y_512_avx2: 909.0
planar_gbrapf32le_to_y_512_c: 3822.0
planar_gbrapf32le_to_y_512_sse2: 2024.5
planar_gbrapf32le_to_y_512_sse4: 1176.0
planar_gbrapf32le_to_y_512_avx2: 658.5
planar_gbrp_to_uv_512_c: 2325.8
planar_gbrp_to_uv_512_sse2: 1726.8
planar_gbrp_to_uv_512_sse4: 771.8
planar_gbrp_to_uv_512_avx2: 506.8
planar_gbrap_to_uv_512_c: 2281.8
planar_gbrap_to_uv_512_sse2: 1726.3
planar_gbrap_to_uv_512_sse4: 768.3
planar_gbrap_to_uv_512_avx2: 496.3
planar_gbrp9be_to_uv_512_c: 2336.8
planar_gbrp9be_to_uv_512_sse2: 1924.8
planar_gbrp9be_to_uv_512_sse4: 852.3
planar_gbrp9be_to_uv_512_avx2: 552.8
planar_gbrp9le_to_uv_512_c: 2270.3
planar_gbrp9le_to_uv_512_sse2: 1512.3
planar_gbrp9le_to_uv_512_sse4: 764.3
planar_gbrp9le_to_uv_512_avx2: 491.3
planar_gbrp10be_to_uv_512_c: 2281.8
planar_gbrp10be_to_uv_512_sse2: 1917.8
planar_gbrp10be_to_uv_512_sse4: 855.3
planar_gbrp10be_to_uv_512_avx2: 541.3
planar_gbrp10le_to_uv_512_c: 2269.8
planar_gbrp10le_to_uv_512_sse2: 1515.3
planar_gbrp10le_to_uv_512_sse4: 759.8
planar_gbrp10le_to_uv_512_avx2: 487.8
planar_gbrap10be_to_uv_512_c: 2382.3
planar_gbrap10be_to_uv_512_sse2: 1924.8
planar_gbrap10be_to_uv_512_sse4: 855.3
planar_gbrap10be_to_uv_512_avx2: 540.8
planar_gbrap10le_to_uv_512_c: 2382.3
planar_gbrap10le_to_uv_512_sse2: 1512.3
planar_gbrap10le_to_uv_512_sse4: 759.3
planar_gbrap10le_to_uv_512_avx2: 484.8
planar_gbrp12be_to_uv_512_c: 2283.8
planar_gbrp12be_to_uv_512_sse2: 1936.8
planar_gbrp12be_to_uv_512_sse4: 858.3
planar_gbrp12be_to_uv_512_avx2: 541.3
planar_gbrp12le_to_uv_512_c: 2278.8
planar_gbrp12le_to_uv_512_sse2: 1507.3
planar_gbrp12le_to_uv_512_sse4: 760.3
planar_gbrp12le_to_uv_512_avx2: 485.8
planar_gbrap12be_to_uv_512_c: 2385.3
planar_gbrap12be_to_uv_512_sse2: 1927.8
planar_gbrap12be_to_uv_512_sse4: 855.3
planar_gbrap12be_to_uv_512_avx2: 539.8
planar_gbrap12le_to_uv_512_c: 2377.3
planar_gbrap12le_to_uv_512_sse2: 1516.3
planar_gbrap12le_to_uv_512_sse4: 759.3
planar_gbrap12le_to_uv_512_avx2: 484.8
planar_gbrp14be_to_uv_512_c: 2283.8
planar_gbrp14be_to_uv_512_sse2: 1935.3
planar_gbrp14be_to_uv_512_sse4: 852.3
planar_gbrp14be_to_uv_512_avx2: 540.3
planar_gbrp14le_to_uv_512_c: 2276.8
planar_gbrp14le_to_uv_512_sse2: 1514.8
planar_gbrp14le_to_uv_512_sse4: 762.3
planar_gbrp14le_to_uv_512_avx2: 484.8
planar_gbrp16be_to_uv_512_c: 2383.3
planar_gbrp16be_to_uv_512_sse2: 1881.8
planar_gbrp16be_to_uv_512_sse4: 852.3
planar_gbrp16be_to_uv_512_avx2: 541.8
planar_gbrp16le_to_uv_512_c: 2378.3
planar_gbrp16le_to_uv_512_sse2: 1476.8
planar_gbrp16le_to_uv_512_sse4: 765.3
planar_gbrp16le_to_uv_512_avx2: 485.8
planar_gbrap16be_to_uv_512_c: 2382.3
planar_gbrap16be_to_uv_512_sse2: 1886.3
planar_gbrap16be_to_uv_512_sse4: 853.8
planar_gbrap16be_to_uv_512_avx2: 550.8
planar_gbrap16le_to_uv_512_c: 2381.8
planar_gbrap16le_to_uv_512_sse2: 1488.3
planar_gbrap16le_to_uv_512_sse4: 765.3
planar_gbrap16le_to_uv_512_avx2: 491.8
planar_gbrpf32be_to_uv_512_c: 4863.0
planar_gbrpf32be_to_uv_512_sse2: 3347.5
planar_gbrpf32be_to_uv_512_sse4: 1800.0
planar_gbrpf32be_to_uv_512_avx2: 1199.0
planar_gbrpf32le_to_uv_512_c: 4725.0
planar_gbrpf32le_to_uv_512_sse2: 2753.0
planar_gbrpf32le_to_uv_512_sse4: 1474.5
planar_gbrpf32le_to_uv_512_avx2: 927.5
planar_gbrapf32be_to_uv_512_c: 4859.0
planar_gbrapf32be_to_uv_512_sse2: 3269.0
planar_gbrapf32be_to_uv_512_sse4: 1802.0
planar_gbrapf32be_to_uv_512_avx2: 1201.5
planar_gbrapf32le_to_uv_512_c: 6338.0
planar_gbrapf32le_to_uv_512_sse2: 2756.5
planar_gbrapf32le_to_uv_512_sse4: 1476.0
planar_gbrapf32le_to_uv_512_avx2: 908.5
planar_gbrap_to_a_512_c: 383.3
planar_gbrap_to_a_512_sse2: 66.8
planar_gbrap_to_a_512_avx2: 43.8
planar_gbrap10be_to_a_512_c: 601.8
planar_gbrap10be_to_a_512_sse2: 86.3
planar_gbrap10be_to_a_512_avx2: 34.8
planar_gbrap10le_to_a_512_c: 602.3
planar_gbrap10le_to_a_512_sse2: 48.8
planar_gbrap10le_to_a_512_avx2: 31.3
planar_gbrap12be_to_a_512_c: 601.8
planar_gbrap12be_to_a_512_sse2: 111.8
planar_gbrap12be_to_a_512_avx2: 41.3
planar_gbrap12le_to_a_512_c: 385.8
planar_gbrap12le_to_a_512_sse2: 75.3
planar_gbrap12le_to_a_512_avx2: 39.8
planar_gbrap16be_to_a_512_c: 386.8
planar_gbrap16be_to_a_512_sse2: 79.8
planar_gbrap16be_to_a_512_avx2: 31.3
planar_gbrap16le_to_a_512_c: 600.3
planar_gbrap16le_to_a_512_sse2: 40.3
planar_gbrap16le_to_a_512_avx2: 30.3
planar_gbrapf32be_to_a_512_c: 1148.8
planar_gbrapf32be_to_a_512_sse2: 611.3
planar_gbrapf32be_to_a_512_sse4: 234.8
planar_gbrapf32be_to_a_512_avx2: 183.3
planar_gbrapf32le_to_a_512_c: 851.3
planar_gbrapf32le_to_a_512_sse2: 263.3
planar_gbrapf32le_to_a_512_sse4: 199.3
planar_gbrapf32le_to_a_512_avx2: 156.8
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-01-11 16:34:33 -03:00
Mark Reid
9e445a5be2
swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions
...
changes since v2:
* fixed label
changes since v1:
* remove vex intruction on sse4 path
* some load/pack marcos use less intructions
* fixed some typos
yuv2gbrp_full_X_4_512_c: 12757.6
yuv2gbrp_full_X_4_512_sse2: 8946.6
yuv2gbrp_full_X_4_512_sse4: 5138.6
yuv2gbrp_full_X_4_512_avx2: 3889.6
yuv2gbrap_full_X_4_512_c: 15368.6
yuv2gbrap_full_X_4_512_sse2: 11916.1
yuv2gbrap_full_X_4_512_sse4: 6294.6
yuv2gbrap_full_X_4_512_avx2: 3477.1
yuv2gbrp9be_full_X_4_512_c: 14381.6
yuv2gbrp9be_full_X_4_512_sse2: 9139.1
yuv2gbrp9be_full_X_4_512_sse4: 5150.1
yuv2gbrp9be_full_X_4_512_avx2: 2834.6
yuv2gbrp9le_full_X_4_512_c: 12990.1
yuv2gbrp9le_full_X_4_512_sse2: 9118.1
yuv2gbrp9le_full_X_4_512_sse4: 5132.1
yuv2gbrp9le_full_X_4_512_avx2: 2833.1
yuv2gbrp10be_full_X_4_512_c: 14401.6
yuv2gbrp10be_full_X_4_512_sse2: 9133.1
yuv2gbrp10be_full_X_4_512_sse4: 5126.1
yuv2gbrp10be_full_X_4_512_avx2: 2837.6
yuv2gbrp10le_full_X_4_512_c: 12718.1
yuv2gbrp10le_full_X_4_512_sse2: 9106.1
yuv2gbrp10le_full_X_4_512_sse4: 5120.1
yuv2gbrp10le_full_X_4_512_avx2: 2826.1
yuv2gbrap10be_full_X_4_512_c: 18535.6
yuv2gbrap10be_full_X_4_512_sse2: 33617.6
yuv2gbrap10be_full_X_4_512_sse4: 6264.1
yuv2gbrap10be_full_X_4_512_avx2: 3422.1
yuv2gbrap10le_full_X_4_512_c: 16724.1
yuv2gbrap10le_full_X_4_512_sse2: 11787.1
yuv2gbrap10le_full_X_4_512_sse4: 6282.1
yuv2gbrap10le_full_X_4_512_avx2: 3441.6
yuv2gbrp12be_full_X_4_512_c: 13723.6
yuv2gbrp12be_full_X_4_512_sse2: 9128.1
yuv2gbrp12be_full_X_4_512_sse4: 7997.6
yuv2gbrp12be_full_X_4_512_avx2: 2844.1
yuv2gbrp12le_full_X_4_512_c: 12257.1
yuv2gbrp12le_full_X_4_512_sse2: 9107.6
yuv2gbrp12le_full_X_4_512_sse4: 5142.6
yuv2gbrp12le_full_X_4_512_avx2: 2837.6
yuv2gbrap12be_full_X_4_512_c: 18511.1
yuv2gbrap12be_full_X_4_512_sse2: 12156.6
yuv2gbrap12be_full_X_4_512_sse4: 6251.1
yuv2gbrap12be_full_X_4_512_avx2: 3444.6
yuv2gbrap12le_full_X_4_512_c: 16687.1
yuv2gbrap12le_full_X_4_512_sse2: 11785.1
yuv2gbrap12le_full_X_4_512_sse4: 6243.6
yuv2gbrap12le_full_X_4_512_avx2: 3446.1
yuv2gbrp14be_full_X_4_512_c: 13690.6
yuv2gbrp14be_full_X_4_512_sse2: 9120.6
yuv2gbrp14be_full_X_4_512_sse4: 5138.1
yuv2gbrp14be_full_X_4_512_avx2: 2843.1
yuv2gbrp14le_full_X_4_512_c: 14995.6
yuv2gbrp14le_full_X_4_512_sse2: 9119.1
yuv2gbrp14le_full_X_4_512_sse4: 5126.1
yuv2gbrp14le_full_X_4_512_avx2: 2843.1
yuv2gbrp16be_full_X_4_512_c: 12367.1
yuv2gbrp16be_full_X_4_512_sse2: 8233.6
yuv2gbrp16be_full_X_4_512_sse4: 4820.1
yuv2gbrp16be_full_X_4_512_avx2: 2666.6
yuv2gbrp16le_full_X_4_512_c: 10904.1
yuv2gbrp16le_full_X_4_512_sse2: 8214.1
yuv2gbrp16le_full_X_4_512_sse4: 4824.1
yuv2gbrp16le_full_X_4_512_avx2: 2629.1
yuv2gbrap16be_full_X_4_512_c: 26569.6
yuv2gbrap16be_full_X_4_512_sse2: 10884.1
yuv2gbrap16be_full_X_4_512_sse4: 5488.1
yuv2gbrap16be_full_X_4_512_avx2: 3272.1
yuv2gbrap16le_full_X_4_512_c: 14010.1
yuv2gbrap16le_full_X_4_512_sse2: 10562.1
yuv2gbrap16le_full_X_4_512_sse4: 5463.6
yuv2gbrap16le_full_X_4_512_avx2: 3255.1
yuv2gbrpf32be_full_X_4_512_c: 14524.1
yuv2gbrpf32be_full_X_4_512_sse2: 8552.6
yuv2gbrpf32be_full_X_4_512_sse4: 4636.1
yuv2gbrpf32be_full_X_4_512_avx2: 2474.6
yuv2gbrpf32le_full_X_4_512_c: 13060.6
yuv2gbrpf32le_full_X_4_512_sse2: 9682.6
yuv2gbrpf32le_full_X_4_512_sse4: 4298.1
yuv2gbrpf32le_full_X_4_512_avx2: 2453.1
yuv2gbrapf32be_full_X_4_512_c: 18629.6
yuv2gbrapf32be_full_X_4_512_sse2: 11363.1
yuv2gbrapf32be_full_X_4_512_sse4: 15201.6
yuv2gbrapf32be_full_X_4_512_avx2: 3727.1
yuv2gbrapf32le_full_X_4_512_c: 16677.6
yuv2gbrapf32le_full_X_4_512_sse2: 10221.6
yuv2gbrapf32le_full_X_4_512_sse4: 5693.6
yuv2gbrapf32le_full_X_4_512_avx2: 3656.6
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-01-11 16:33:17 -03:00
Alan Kelly
eebe406c80
libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.
...
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
2021-12-21 17:44:53 -03:00
Henrik Gramner
15cfb4eee3
checkasm: Use the correct AVTXContext in av_tx tests
...
Keep a reference to the correct associated context of the reference
function and use that context when calling the reference function.
2021-12-20 23:58:05 +01:00
Alan Kelly
86663963e6
x86/swscale: fix minor coding style issues
...
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 13:16:04 -03:00
Alan Kelly
f900a19fa9
libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.
...
Fixes so that fate under 64 bit Windows passes.
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-15 20:04:59 -03:00
Shiyou Yin
9a840ffa17
avutil: [loongarch] Add support for loongarch SIMD.
...
LSX and LASX is loongarch SIMD extention.
They are enabled by default if compiler support it, and can be disabled
with '--disable-lsx' '--disable-lasx'.
Change-Id: Ie2608ea61dbd9b7fffadbf0ec2348bad6c124476
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Reviewed-by: guxiwei <guxiwei-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-12-15 18:37:40 +01:00
Andreas Rheinhardt
09408539f4
checkasm/hevc_pel: Fix stack buffer overreads
...
This patch increases several stack buffers in order to fix
stack-buffer-overflows (e.g. in put_hevc_qpel_uni_hv_9 in
line 814 of hevcdsp_template.c) detected with ASAN in the hevc_pel
checkasm test.
The buffers are increased by the minimal amount necessary
in order not to mask potential future bugs.
Reviewed-by: Martin Storsjö <martin@martin.st>
Reviewed-by: "zhilizhao(赵志立)" <quinkblack@foxmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-29 04:35:31 +02:00
Andreas Rheinhardt
1ea3650823
Replace all occurences of av_mallocz_array() by av_calloc()
...
They do the same.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-20 01:03:52 +02:00
Wu Jianhua
133b2767cf
tests/checkasm/vf_gblur.c: update check_horiz_slice for the new ff_horiz_slice_avx2/512
...
Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
2021-08-29 19:58:33 +02:00
Wu Jianhua
0c54ab20c2
tests/checkasm/vf_gblur.c: add check_verti_slice() for unit test
...
Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
2021-08-29 19:58:33 +02:00
J. Dekker
b492cacffd
checkasm: collapse hevc pel tests
...
Also add to `make fate-checkasm' target.
Signed-off-by: J. Dekker <jdek@itanimul.li>
2021-08-24 22:12:06 +02:00
Andreas Rheinhardt
4608f7cc6a
Remove unnecessary mem.h inclusions
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-07-22 14:47:57 +02:00
J. Dekker
c866a099b2
lavu/kperf: use ff_thread_once()
...
Signed-off-by: J. Dekker <jdek@itanimul.li>
2021-07-21 16:35:27 +02:00
J. Dekker
9a727235fd
lavu/checkasm: add (private) kperf timing for macOS
...
Signed-off-by: J. Dekker <jdek@itanimul.li>
2021-07-20 19:40:03 +02:00
Anton Khirnov
fe490ec165
sws: separate the calls to scaled vs unscaled conversion
...
Call the scaler function directly rather than through a function
pointer. Drop the now-unused return value from ff_getSwsFunc() and
rename the function to reflect its new role.
This will be useful in the following commits, where it will become
important that the amount of output is different for scaled vs unscaled
case.
2021-07-03 15:57:13 +02:00
Matthieu Patou
b27ae2c0b7
checkasm/vp9dsp: rename the iszero function to is_zero
...
Suggested-by: ffmpeg@fb.com
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-06-08 13:11:22 -03:00
Lynne
1978b143eb
checkasm: add av_tx FFT SIMD testing code
...
This sadly required making changes to the code itself,
due to the same context needing to be reused for both versions.
The lookup table had to be duplicated for both versions.
2021-04-24 17:19:17 +02:00
Alan Kelly
e1484bc455
tests/checkasm/sw_scale: adds additional tests sizes for yux2yuvX
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-04-01 20:47:52 +02:00
James Almer
d52ceed9fd
tests/checkasm/sw_scale: use memset() to fill dither
...
Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-19 16:19:11 -03:00
Alan Kelly
ee18edb13a
checkasm/sw_scale: properly initialize src_pixer and filter_coeff buffers
...
Fixes valgrind uninitialised value warnings.
Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-19 11:20:32 -03:00
James Almer
1371647fc3
checkasm/sw_scale: use av_free() instead of free()
...
Fixes crashes on Win64
Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-17 20:57:33 -03:00
Alan Kelly
554c2bc708
swscale: move yuv2yuvX_sse3 to yasm, unrolls main loop
...
And other small optimizations for ~20% speedup.
2021-02-17 21:21:03 +01:00
James Almer
bea7c51307
checkasm/vf_gblur: add a test for postscale_slice
...
Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-17 13:39:31 -03:00
James Almer
2df3c2ed9b
checkasm/vf_gblur: split off the horiz_slice test into its own function
...
Will come in handy for the following commit.
Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-17 13:39:11 -03:00
Josh Dekker
9c513edb79
checkasm: add hevc_pel tests
...
Co-authored-by: Niklas Haas <git@haasn.xyz>
Signed-off-by: Josh Dekker <josh@itanimul.li>
2021-01-25 09:24:11 +01:00
Anton Khirnov
c8c2dfbc37
lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h
...
That is a more appropriate place for it.
2021-01-01 14:11:01 +01:00
Limin Wang
c748bd77dc
tests: fix warning ISO C90 forbids mixed declarations and code
...
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2020-09-10 20:34:51 +08:00
Carl Eugen Hoyos
b61376bdee
lavfi/hflip: Support Bayer pixel formats.
...
Fixes part of ticket #8819 .
2020-08-25 01:29:24 +02:00
Jiaxun Yang
e387fcd01c
libavutil: Detect MMI and MSA flags for MIPS
...
Add MMI & MSA runtime detection for MIPS.
Basically there are two code pathes. For systems that
natively support CPUCFG instruction or kernel emulated
that instruction, we'll sense this feature from HWCAP and
report the flags according to values grab from CPUCFG. For
systems that have no CPUCFG (or not export it in HWCAP),
we'll parse /proc/cpuinfo instead.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-23 17:21:58 +02:00
James Almer
55e1bc39cb
checkasm/vf_blend: use the correct depth parameters to initialize the blend modes
...
This effectively enables the tests that until now were just running
the C version alone.
Signed-off-by: James Almer <jamrial@gmail.com>
2020-07-12 11:30:23 -03:00
Jun Zhao
7f76f20fa0
checkasm: sw_rgb: Fix mixed declaration and code
...
Fix mixed declaration and code.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2020-06-01 23:28:07 +08:00
Andreas Rheinhardt
57e570b508
checkasm/sw_scale: Fix stack-buffer-overflow
...
A buffer whose size is not a multiple of four has been initialized using
consecutive writes of 32bits. This results in a stack-buffer-overflow
reported by ASAN in the checkasm-sw_scale FATE-test.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2020-05-20 23:18:50 +02:00
Martin Storsjö
9c326af1d0
checkasm: swscale: Fix running the hscale test on 32 bit x86
...
This function doesn't call emms.
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-16 08:16:12 +03:00
Martin Storsjö
eba1ebd9bf
checkasm: sw_rgb: Add a test for interleaveBytes
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 23:38:01 +03:00
Martin Storsjö
5bdffced0a
checkasm: pixblockdsp: Add tests for get_pixels_unaligned and diff_pixels_unaligned
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 23:37:27 +03:00
Martin Storsjö
ed7d73355e
checkasm: aarch64: Check for stack overflows
...
Also fill x8-x17 with garbage before calling the function.
Figure out the number of stack parameters and make sure that the
value on the stack after those is untouched.
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 21:22:36 +03:00
Martin Storsjö
6cb2d4d94b
checkasm: arm: Check for stack overflows
...
Figure out the number of stack parameters and make sure that the
value on the stack after those is untouched.
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 21:22:34 +03:00
Martin Storsjö
3f266cf49e
checkasm: arm: Don't use blx to call checkasm_fail_func
...
We should just use a normal bl here, and the linker will add the 'x'
bit if necessary.
This fixes calling the checkasm_fail_func on windows, where the
code is built in thumb mode (and the linker doesn't clear the 'x'
bit in the blx instruction).
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 21:22:32 +03:00
Martin Storsjö
89cf9e1fb6
checkasm: arm: Make the indentation consistent with other files
...
This makes it easier to share code with e.g. the dav1d implementation
of checkasm.
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 21:22:27 +03:00
Josh de Kock
5913cd4e6c
checkasm: add hscale test
...
This tests the hscale 8bpp to 14/18bpp functions with different filter
sizes.
Signed-off-by: Josh de Kock <josh@itanimul.li>
2020-05-15 10:29:30 +01:00
Martin Storsjö
3ce1b2bf8d
checkasm: add function to check and diff memory
...
This was ported from dav1d (c950e7101bdf5f7117bfca816984a21e550509f0).
Signed-off-by: Josh de Kock <josh@itanimul.li>
2020-05-15 10:29:30 +01:00