1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-07 11:13:41 +02:00
Commit Graph

2396 Commits

Author SHA1 Message Date
Alan Kelly
e534d98af3 libswscale: Re-factor ff_shuffle_filter_coefficients.
Make the code more readable and follow the style guide.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-02-17 17:17:22 +01:00
Alan Kelly
f1a5414c97 libswscale: Check and propagate memory allocation errors from ff_shuffle_filter_coefficients.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-02-17 17:17:07 +01:00
Andreas Rheinhardt
71e2825150 swscale/x86/swscale: Remove superfluous and invalid ';'
Inside a function an unnecessary ';' is just a null statement;
yet outside of it it is actually illegal (but compilers happen
to accept it without warning except when using -pedantic).
So modify the macros to always expect the user to add a ';'.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-22 17:00:45 +01:00
Mark Reid
52f7026164 swscale/x86/input.asm: add x86-optimized planer rgb2yuv functions
sse2 only operates on 2 lanes per loop for to_y and to_uv functions, due
to the lack of pmulld instruction.  Emulating pmulld with 2 pmuludq and shuffles
proved too costly and made to_uv functions slower then the c implementation.

For to_y on sse2 only float functions are generated,
I was are not able outperform the c implementation on the integer pixel formats.

For to_a on see4 only the float functions are generated.
sse2 and sse4 generated nearly identical performing code on integer pixel formats,
so only sse2/avx2 versions are generated.

planar_gbrp_to_y_512_c: 1197.5
planar_gbrp_to_y_512_sse4: 444.5
planar_gbrp_to_y_512_avx2: 287.5
planar_gbrap_to_y_512_c: 1204.5
planar_gbrap_to_y_512_sse4: 447.5
planar_gbrap_to_y_512_avx2: 289.5
planar_gbrp9be_to_y_512_c: 1380.0
planar_gbrp9be_to_y_512_sse4: 543.5
planar_gbrp9be_to_y_512_avx2: 340.0
planar_gbrp9le_to_y_512_c: 1200.5
planar_gbrp9le_to_y_512_sse4: 442.0
planar_gbrp9le_to_y_512_avx2: 282.0
planar_gbrp10be_to_y_512_c: 1378.5
planar_gbrp10be_to_y_512_sse4: 544.0
planar_gbrp10be_to_y_512_avx2: 337.5
planar_gbrp10le_to_y_512_c: 1200.0
planar_gbrp10le_to_y_512_sse4: 448.0
planar_gbrp10le_to_y_512_avx2: 285.5
planar_gbrap10be_to_y_512_c: 1380.0
planar_gbrap10be_to_y_512_sse4: 542.0
planar_gbrap10be_to_y_512_avx2: 340.5
planar_gbrap10le_to_y_512_c: 1199.0
planar_gbrap10le_to_y_512_sse4: 446.0
planar_gbrap10le_to_y_512_avx2: 289.5
planar_gbrp12be_to_y_512_c: 10563.0
planar_gbrp12be_to_y_512_sse4: 542.5
planar_gbrp12be_to_y_512_avx2: 339.0
planar_gbrp12le_to_y_512_c: 1201.0
planar_gbrp12le_to_y_512_sse4: 440.5
planar_gbrp12le_to_y_512_avx2: 286.0
planar_gbrap12be_to_y_512_c: 1701.5
planar_gbrap12be_to_y_512_sse4: 917.0
planar_gbrap12be_to_y_512_avx2: 338.5
planar_gbrap12le_to_y_512_c: 1201.0
planar_gbrap12le_to_y_512_sse4: 444.5
planar_gbrap12le_to_y_512_avx2: 288.0
planar_gbrp14be_to_y_512_c: 1370.5
planar_gbrp14be_to_y_512_sse4: 545.0
planar_gbrp14be_to_y_512_avx2: 338.5
planar_gbrp14le_to_y_512_c: 1199.0
planar_gbrp14le_to_y_512_sse4: 444.0
planar_gbrp14le_to_y_512_avx2: 279.5
planar_gbrp16be_to_y_512_c: 1364.0
planar_gbrp16be_to_y_512_sse4: 544.5
planar_gbrp16be_to_y_512_avx2: 339.5
planar_gbrp16le_to_y_512_c: 1201.0
planar_gbrp16le_to_y_512_sse4: 445.5
planar_gbrp16le_to_y_512_avx2: 280.5
planar_gbrap16be_to_y_512_c: 1377.0
planar_gbrap16be_to_y_512_sse4: 545.0
planar_gbrap16be_to_y_512_avx2: 338.5
planar_gbrap16le_to_y_512_c: 1201.0
planar_gbrap16le_to_y_512_sse4: 442.0
planar_gbrap16le_to_y_512_avx2: 279.0
planar_gbrpf32be_to_y_512_c: 4113.0
planar_gbrpf32be_to_y_512_sse2: 2438.0
planar_gbrpf32be_to_y_512_sse4: 1068.0
planar_gbrpf32be_to_y_512_avx2: 904.5
planar_gbrpf32le_to_y_512_c: 3818.5
planar_gbrpf32le_to_y_512_sse2: 2024.5
planar_gbrpf32le_to_y_512_sse4: 1241.5
planar_gbrpf32le_to_y_512_avx2: 657.0
planar_gbrapf32be_to_y_512_c: 3707.0
planar_gbrapf32be_to_y_512_sse2: 2444.0
planar_gbrapf32be_to_y_512_sse4: 1077.0
planar_gbrapf32be_to_y_512_avx2: 909.0
planar_gbrapf32le_to_y_512_c: 3822.0
planar_gbrapf32le_to_y_512_sse2: 2024.5
planar_gbrapf32le_to_y_512_sse4: 1176.0
planar_gbrapf32le_to_y_512_avx2: 658.5

planar_gbrp_to_uv_512_c: 2325.8
planar_gbrp_to_uv_512_sse2: 1726.8
planar_gbrp_to_uv_512_sse4: 771.8
planar_gbrp_to_uv_512_avx2: 506.8
planar_gbrap_to_uv_512_c: 2281.8
planar_gbrap_to_uv_512_sse2: 1726.3
planar_gbrap_to_uv_512_sse4: 768.3
planar_gbrap_to_uv_512_avx2: 496.3
planar_gbrp9be_to_uv_512_c: 2336.8
planar_gbrp9be_to_uv_512_sse2: 1924.8
planar_gbrp9be_to_uv_512_sse4: 852.3
planar_gbrp9be_to_uv_512_avx2: 552.8
planar_gbrp9le_to_uv_512_c: 2270.3
planar_gbrp9le_to_uv_512_sse2: 1512.3
planar_gbrp9le_to_uv_512_sse4: 764.3
planar_gbrp9le_to_uv_512_avx2: 491.3
planar_gbrp10be_to_uv_512_c: 2281.8
planar_gbrp10be_to_uv_512_sse2: 1917.8
planar_gbrp10be_to_uv_512_sse4: 855.3
planar_gbrp10be_to_uv_512_avx2: 541.3
planar_gbrp10le_to_uv_512_c: 2269.8
planar_gbrp10le_to_uv_512_sse2: 1515.3
planar_gbrp10le_to_uv_512_sse4: 759.8
planar_gbrp10le_to_uv_512_avx2: 487.8
planar_gbrap10be_to_uv_512_c: 2382.3
planar_gbrap10be_to_uv_512_sse2: 1924.8
planar_gbrap10be_to_uv_512_sse4: 855.3
planar_gbrap10be_to_uv_512_avx2: 540.8
planar_gbrap10le_to_uv_512_c: 2382.3
planar_gbrap10le_to_uv_512_sse2: 1512.3
planar_gbrap10le_to_uv_512_sse4: 759.3
planar_gbrap10le_to_uv_512_avx2: 484.8
planar_gbrp12be_to_uv_512_c: 2283.8
planar_gbrp12be_to_uv_512_sse2: 1936.8
planar_gbrp12be_to_uv_512_sse4: 858.3
planar_gbrp12be_to_uv_512_avx2: 541.3
planar_gbrp12le_to_uv_512_c: 2278.8
planar_gbrp12le_to_uv_512_sse2: 1507.3
planar_gbrp12le_to_uv_512_sse4: 760.3
planar_gbrp12le_to_uv_512_avx2: 485.8
planar_gbrap12be_to_uv_512_c: 2385.3
planar_gbrap12be_to_uv_512_sse2: 1927.8
planar_gbrap12be_to_uv_512_sse4: 855.3
planar_gbrap12be_to_uv_512_avx2: 539.8
planar_gbrap12le_to_uv_512_c: 2377.3
planar_gbrap12le_to_uv_512_sse2: 1516.3
planar_gbrap12le_to_uv_512_sse4: 759.3
planar_gbrap12le_to_uv_512_avx2: 484.8
planar_gbrp14be_to_uv_512_c: 2283.8
planar_gbrp14be_to_uv_512_sse2: 1935.3
planar_gbrp14be_to_uv_512_sse4: 852.3
planar_gbrp14be_to_uv_512_avx2: 540.3
planar_gbrp14le_to_uv_512_c: 2276.8
planar_gbrp14le_to_uv_512_sse2: 1514.8
planar_gbrp14le_to_uv_512_sse4: 762.3
planar_gbrp14le_to_uv_512_avx2: 484.8
planar_gbrp16be_to_uv_512_c: 2383.3
planar_gbrp16be_to_uv_512_sse2: 1881.8
planar_gbrp16be_to_uv_512_sse4: 852.3
planar_gbrp16be_to_uv_512_avx2: 541.8
planar_gbrp16le_to_uv_512_c: 2378.3
planar_gbrp16le_to_uv_512_sse2: 1476.8
planar_gbrp16le_to_uv_512_sse4: 765.3
planar_gbrp16le_to_uv_512_avx2: 485.8
planar_gbrap16be_to_uv_512_c: 2382.3
planar_gbrap16be_to_uv_512_sse2: 1886.3
planar_gbrap16be_to_uv_512_sse4: 853.8
planar_gbrap16be_to_uv_512_avx2: 550.8
planar_gbrap16le_to_uv_512_c: 2381.8
planar_gbrap16le_to_uv_512_sse2: 1488.3
planar_gbrap16le_to_uv_512_sse4: 765.3
planar_gbrap16le_to_uv_512_avx2: 491.8
planar_gbrpf32be_to_uv_512_c: 4863.0
planar_gbrpf32be_to_uv_512_sse2: 3347.5
planar_gbrpf32be_to_uv_512_sse4: 1800.0
planar_gbrpf32be_to_uv_512_avx2: 1199.0
planar_gbrpf32le_to_uv_512_c: 4725.0
planar_gbrpf32le_to_uv_512_sse2: 2753.0
planar_gbrpf32le_to_uv_512_sse4: 1474.5
planar_gbrpf32le_to_uv_512_avx2: 927.5
planar_gbrapf32be_to_uv_512_c: 4859.0
planar_gbrapf32be_to_uv_512_sse2: 3269.0
planar_gbrapf32be_to_uv_512_sse4: 1802.0
planar_gbrapf32be_to_uv_512_avx2: 1201.5
planar_gbrapf32le_to_uv_512_c: 6338.0
planar_gbrapf32le_to_uv_512_sse2: 2756.5
planar_gbrapf32le_to_uv_512_sse4: 1476.0
planar_gbrapf32le_to_uv_512_avx2: 908.5

planar_gbrap_to_a_512_c: 383.3
planar_gbrap_to_a_512_sse2: 66.8
planar_gbrap_to_a_512_avx2: 43.8
planar_gbrap10be_to_a_512_c: 601.8
planar_gbrap10be_to_a_512_sse2: 86.3
planar_gbrap10be_to_a_512_avx2: 34.8
planar_gbrap10le_to_a_512_c: 602.3
planar_gbrap10le_to_a_512_sse2: 48.8
planar_gbrap10le_to_a_512_avx2: 31.3
planar_gbrap12be_to_a_512_c: 601.8
planar_gbrap12be_to_a_512_sse2: 111.8
planar_gbrap12be_to_a_512_avx2: 41.3
planar_gbrap12le_to_a_512_c: 385.8
planar_gbrap12le_to_a_512_sse2: 75.3
planar_gbrap12le_to_a_512_avx2: 39.8
planar_gbrap16be_to_a_512_c: 386.8
planar_gbrap16be_to_a_512_sse2: 79.8
planar_gbrap16be_to_a_512_avx2: 31.3
planar_gbrap16le_to_a_512_c: 600.3
planar_gbrap16le_to_a_512_sse2: 40.3
planar_gbrap16le_to_a_512_avx2: 30.3
planar_gbrapf32be_to_a_512_c: 1148.8
planar_gbrapf32be_to_a_512_sse2: 611.3
planar_gbrapf32be_to_a_512_sse4: 234.8
planar_gbrapf32be_to_a_512_avx2: 183.3
planar_gbrapf32le_to_a_512_c: 851.3
planar_gbrapf32le_to_a_512_sse2: 263.3
planar_gbrapf32le_to_a_512_sse4: 199.3
planar_gbrapf32le_to_a_512_avx2: 156.8

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-01-11 16:34:33 -03:00
Mark Reid
9e445a5be2 swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions
changes since v2:
 * fixed label
changes since v1:
 * remove vex intruction on sse4 path
 * some load/pack marcos use less intructions
 * fixed some typos

yuv2gbrp_full_X_4_512_c: 12757.6
yuv2gbrp_full_X_4_512_sse2: 8946.6
yuv2gbrp_full_X_4_512_sse4: 5138.6
yuv2gbrp_full_X_4_512_avx2: 3889.6
yuv2gbrap_full_X_4_512_c: 15368.6
yuv2gbrap_full_X_4_512_sse2: 11916.1
yuv2gbrap_full_X_4_512_sse4: 6294.6
yuv2gbrap_full_X_4_512_avx2: 3477.1
yuv2gbrp9be_full_X_4_512_c: 14381.6
yuv2gbrp9be_full_X_4_512_sse2: 9139.1
yuv2gbrp9be_full_X_4_512_sse4: 5150.1
yuv2gbrp9be_full_X_4_512_avx2: 2834.6
yuv2gbrp9le_full_X_4_512_c: 12990.1
yuv2gbrp9le_full_X_4_512_sse2: 9118.1
yuv2gbrp9le_full_X_4_512_sse4: 5132.1
yuv2gbrp9le_full_X_4_512_avx2: 2833.1
yuv2gbrp10be_full_X_4_512_c: 14401.6
yuv2gbrp10be_full_X_4_512_sse2: 9133.1
yuv2gbrp10be_full_X_4_512_sse4: 5126.1
yuv2gbrp10be_full_X_4_512_avx2: 2837.6
yuv2gbrp10le_full_X_4_512_c: 12718.1
yuv2gbrp10le_full_X_4_512_sse2: 9106.1
yuv2gbrp10le_full_X_4_512_sse4: 5120.1
yuv2gbrp10le_full_X_4_512_avx2: 2826.1
yuv2gbrap10be_full_X_4_512_c: 18535.6
yuv2gbrap10be_full_X_4_512_sse2: 33617.6
yuv2gbrap10be_full_X_4_512_sse4: 6264.1
yuv2gbrap10be_full_X_4_512_avx2: 3422.1
yuv2gbrap10le_full_X_4_512_c: 16724.1
yuv2gbrap10le_full_X_4_512_sse2: 11787.1
yuv2gbrap10le_full_X_4_512_sse4: 6282.1
yuv2gbrap10le_full_X_4_512_avx2: 3441.6
yuv2gbrp12be_full_X_4_512_c: 13723.6
yuv2gbrp12be_full_X_4_512_sse2: 9128.1
yuv2gbrp12be_full_X_4_512_sse4: 7997.6
yuv2gbrp12be_full_X_4_512_avx2: 2844.1
yuv2gbrp12le_full_X_4_512_c: 12257.1
yuv2gbrp12le_full_X_4_512_sse2: 9107.6
yuv2gbrp12le_full_X_4_512_sse4: 5142.6
yuv2gbrp12le_full_X_4_512_avx2: 2837.6
yuv2gbrap12be_full_X_4_512_c: 18511.1
yuv2gbrap12be_full_X_4_512_sse2: 12156.6
yuv2gbrap12be_full_X_4_512_sse4: 6251.1
yuv2gbrap12be_full_X_4_512_avx2: 3444.6
yuv2gbrap12le_full_X_4_512_c: 16687.1
yuv2gbrap12le_full_X_4_512_sse2: 11785.1
yuv2gbrap12le_full_X_4_512_sse4: 6243.6
yuv2gbrap12le_full_X_4_512_avx2: 3446.1
yuv2gbrp14be_full_X_4_512_c: 13690.6
yuv2gbrp14be_full_X_4_512_sse2: 9120.6
yuv2gbrp14be_full_X_4_512_sse4: 5138.1
yuv2gbrp14be_full_X_4_512_avx2: 2843.1
yuv2gbrp14le_full_X_4_512_c: 14995.6
yuv2gbrp14le_full_X_4_512_sse2: 9119.1
yuv2gbrp14le_full_X_4_512_sse4: 5126.1
yuv2gbrp14le_full_X_4_512_avx2: 2843.1
yuv2gbrp16be_full_X_4_512_c: 12367.1
yuv2gbrp16be_full_X_4_512_sse2: 8233.6
yuv2gbrp16be_full_X_4_512_sse4: 4820.1
yuv2gbrp16be_full_X_4_512_avx2: 2666.6
yuv2gbrp16le_full_X_4_512_c: 10904.1
yuv2gbrp16le_full_X_4_512_sse2: 8214.1
yuv2gbrp16le_full_X_4_512_sse4: 4824.1
yuv2gbrp16le_full_X_4_512_avx2: 2629.1
yuv2gbrap16be_full_X_4_512_c: 26569.6
yuv2gbrap16be_full_X_4_512_sse2: 10884.1
yuv2gbrap16be_full_X_4_512_sse4: 5488.1
yuv2gbrap16be_full_X_4_512_avx2: 3272.1
yuv2gbrap16le_full_X_4_512_c: 14010.1
yuv2gbrap16le_full_X_4_512_sse2: 10562.1
yuv2gbrap16le_full_X_4_512_sse4: 5463.6
yuv2gbrap16le_full_X_4_512_avx2: 3255.1
yuv2gbrpf32be_full_X_4_512_c: 14524.1
yuv2gbrpf32be_full_X_4_512_sse2: 8552.6
yuv2gbrpf32be_full_X_4_512_sse4: 4636.1
yuv2gbrpf32be_full_X_4_512_avx2: 2474.6
yuv2gbrpf32le_full_X_4_512_c: 13060.6
yuv2gbrpf32le_full_X_4_512_sse2: 9682.6
yuv2gbrpf32le_full_X_4_512_sse4: 4298.1
yuv2gbrpf32le_full_X_4_512_avx2: 2453.1
yuv2gbrapf32be_full_X_4_512_c: 18629.6
yuv2gbrapf32be_full_X_4_512_sse2: 11363.1
yuv2gbrapf32be_full_X_4_512_sse4: 15201.6
yuv2gbrapf32be_full_X_4_512_avx2: 3727.1
yuv2gbrapf32le_full_X_4_512_c: 16677.6
yuv2gbrapf32le_full_X_4_512_sse2: 10221.6
yuv2gbrapf32le_full_X_4_512_sse4: 5693.6
yuv2gbrapf32le_full_X_4_512_avx2: 3656.6

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-01-11 16:33:17 -03:00
rcombs
df9180d8a0 swscale/output: use isSwappedChroma 2022-01-04 19:39:22 -06:00
rcombs
cb3a6cc082 swscale/output: use isSemiPlanarYUV for NV12/21/24/42 case 2022-01-04 19:39:22 -06:00
rcombs
f8e284be69 swscale: introduce isSwappedChroma 2022-01-04 19:39:22 -06:00
rcombs
bb4f19f2a2 swscale/output: use isDataInHighBits for 10-bit case
This code will need fleshing-out (probably templating) if we ever add
e.g. a P012 format.
2022-01-04 19:39:22 -06:00
rcombs
cf9e8cb52f swscale/output: use isSemiPlanarYUV for 16-bit case 2022-01-04 19:39:22 -06:00
rcombs
e5d83463c8 swscale: introduce isDataInHighBits 2022-01-04 19:39:22 -06:00
rcombs
cb87a3b137 swscale/output: template-ize yuv2nv12cX 10-bit and 16-bit cases
Fixes incorrect big-endian output introduced in 88d804b7ff

Avoids making the filter-time BE check more expensive
2022-01-04 19:39:22 -06:00
Andreas Rheinhardt
b189550137 lib*/version.h: Bump Versions after release/5.0 branch
This is done a second time for 5.0 because master was
merged into 5.0 so that it contains the recent DOVI additions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 14:29:06 +01:00
Andreas Rheinhardt
c512be9a90 lib*/version.h: Bump Versions before release/5.0 branch
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 13:40:03 +01:00
Andreas Rheinhardt
20b0d24c2f Makefile: Redo duplicating object files in shared builds
In case of shared builds, some object files containing tables
are currently duplicated into other libraries: log2_tab.c,
golomb.c, reverse.c. The check for whether this is duplicated
is simply whether CONFIG_SHARED is true. Yet this is crude:
E.g. libavdevice includes reverse.c for shared builds, but only
needs it for the decklink input device, which given that decklink
is not enabled by default will be unused in most libavdevice.so.

This commit changes this by making it more explicit about what
to duplicate from other libraries. To do this, two new Makefile
variables were added: SHLIBOBJS and STLIBOBJS. SHLIBOBJS contains
the objects that are duplicated from other libraries in case of
shared builds; STLIBOBJS contains stuff that a library has to
provide for other libraries in case of static builds. These new
variables provide a way to enable/disable with a finer granularity
than just whether shared builds are enabled or not. E.g. lavd's
Makefile now contains: SHLIBOBJS-$(CONFIG_DECKLINK_INDEV) += reverse.o

Another example is provided by the golomb tables. These are provided
by lavc for static builds, even if one uses a build configuration
that makes only lavf use them. Therefore lavc's Makefile contains
STLIBOBJS-$(CONFIG_MXF_MUXER) += golomb.o, whereas lavf's Makefile
has a corresponding SHLIBOBJS-$(CONFIG_MXF_MUXER) += golomb_tab.o.
E.g. in case the MXF muxer is the only component needing these tables
only libavformat.so will contain them for shared builds; currently
libavcodec.so does so, too.
(There is currently a CONFIG_EXTRA group for golomb. But actually
one would need two groups (golomb_avcodec and golomb_avformat) in
order to know when and where to include these tables. Therefore
this commit uses a Makefile-based approach for this and stops
using these groups for the users in libavformat.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 05:01:04 +01:00
Michael Niedermayer
4be85c9331 lib*/version.h: Bump Versions after release/5.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-01-03 22:10:46 +01:00
Michael Niedermayer
f3964a59e1 lib*/version.h: Bump Versions before release/5.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-01-03 22:08:31 +01:00
rcombs
3e00b9e395 swscale/x86/init: use isSemiPlanarYUV
Fixes P210/P410 cases introduced (and broken) in 88d804b7ff
2021-12-23 01:41:03 -06:00
rcombs
88d804b7ff swscale: add P210/P410/P216/P416 output 2021-12-22 18:38:40 -06:00
Alan Kelly
eebe406c80 libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
2021-12-21 17:44:53 -03:00
James Almer
eab91c3e2e x86/scale_avx2: don't use $ for hex literals
Fixes compilation with AVX2 enabled yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 17:29:21 -03:00
Alan Kelly
9092e58c44 x86/scale_avx2: Change asm indent from 2 to 4 spaces.
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 13:42:04 -03:00
Alan Kelly
86663963e6 x86/swscale: fix minor coding style issues
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 13:16:04 -03:00
James Almer
76a3f961f8 x86/scale_avx2: add missing check for AVX2 assembler support
Should fix compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 09:41:56 -03:00
Alan Kelly
f900a19fa9 libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.
Fixes so that fate under 64 bit Windows passes.

These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-15 20:04:59 -03:00
Andreas Rheinhardt
3be6fe9a56 swscale/yuv2rgb: Silence a set-but-unused-variable warning
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-12-03 16:10:51 +01:00
rcombs
f0204de47d swscale: add P210/P410/P216/P416 input 2021-11-28 16:40:43 -06:00
Mark Reid
3f4ce004b8 swscale/input: clip rgbf32 values before lrintf
if the float pixel * 65535.0f > 2147483647.0f
lrintf may overfow and return negative values, depending on implementation.
nan and +/-inf values may also be implementation defined

clip the value first so lrintf always works.

values <     0.0f, -inf, nan = 0.0f
values > 65535.0f, +inf      = 65535.0f

old timings
 195960 decicycles in planar_rgbf32le_to_uv,       1 runs,      0 skips
 186120 decicycles in planar_rgbf32le_to_uv,       2 runs,      0 skips
 188645 decicycles in planar_rgbf32le_to_uv,       4 runs,      0 skips
 183625 decicycles in planar_rgbf32le_to_uv,       8 runs,      0 skips
 181157 decicycles in planar_rgbf32le_to_uv,      16 runs,      0 skips
 177533 decicycles in planar_rgbf32le_to_uv,      32 runs,      0 skips
 175689 decicycles in planar_rgbf32le_to_uv,      64 runs,      0 skips

 232960 decicycles in planar_rgbf32be_to_uv,       1 runs,      0 skips
 221380 decicycles in planar_rgbf32be_to_uv,       2 runs,      0 skips
 216640 decicycles in planar_rgbf32be_to_uv,       4 runs,      0 skips
 213505 decicycles in planar_rgbf32be_to_uv,       8 runs,      0 skips
 211558 decicycles in planar_rgbf32be_to_uv,      16 runs,      0 skips
 210596 decicycles in planar_rgbf32be_to_uv,      32 runs,      0 skips
 210202 decicycles in planar_rgbf32be_to_uv,      64 runs,      0 skips

 161680 decicycles in planar_rgbf32le_to_y,       1 runs,      0 skips
 153540 decicycles in planar_rgbf32le_to_y,       2 runs,      0 skips
 148255 decicycles in planar_rgbf32le_to_y,       4 runs,      0 skips
 140600 decicycles in planar_rgbf32le_to_y,       8 runs,      0 skips
 132935 decicycles in planar_rgbf32le_to_y,      16 runs,      0 skips
 128531 decicycles in planar_rgbf32le_to_y,      32 runs,      0 skips
 140933 decicycles in planar_rgbf32le_to_y,      64 runs,      0 skips

 190980 decicycles in planar_rgbf32be_to_y,       1 runs,      0 skips
 176080 decicycles in planar_rgbf32be_to_y,       2 runs,      0 skips
 167980 decicycles in planar_rgbf32be_to_y,       4 runs,      0 skips
 164685 decicycles in planar_rgbf32be_to_y,       8 runs,      0 skips
 162751 decicycles in planar_rgbf32be_to_y,      16 runs,      0 skips
 162404 decicycles in planar_rgbf32be_to_y,      32 runs,      0 skips
 167849 decicycles in planar_rgbf32be_to_y,      64 runs,      0 skips

new timings
 183320 decicycles in planar_rgbf32le_to_uv,       1 runs,      0 skips
 175700 decicycles in planar_rgbf32le_to_uv,       2 runs,      0 skips
 179570 decicycles in planar_rgbf32le_to_uv,       4 runs,      0 skips
 172932 decicycles in planar_rgbf32le_to_uv,       8 runs,      0 skips
 168707 decicycles in planar_rgbf32le_to_uv,      16 runs,      0 skips
 165224 decicycles in planar_rgbf32le_to_uv,      32 runs,      0 skips
 163423 decicycles in planar_rgbf32le_to_uv,      64 runs,      0 skips

 184940 decicycles in planar_rgbf32be_to_uv,       1 runs,      0 skips
 185150 decicycles in planar_rgbf32be_to_uv,       2 runs,      0 skips
 185790 decicycles in planar_rgbf32be_to_uv,       4 runs,      0 skips
 185472 decicycles in planar_rgbf32be_to_uv,       8 runs,      0 skips
 185277 decicycles in planar_rgbf32be_to_uv,      16 runs,      0 skips
 185813 decicycles in planar_rgbf32be_to_uv,      32 runs,      0 skips
 185332 decicycles in planar_rgbf32be_to_uv,      64 runs,      0 skips

 145400 decicycles in planar_rgbf32le_to_y,       1 runs,      0 skips
 145100 decicycles in planar_rgbf32le_to_y,       2 runs,      0 skips
 143490 decicycles in planar_rgbf32le_to_y,       4 runs,      0 skips
 136687 decicycles in planar_rgbf32le_to_y,       8 runs,      0 skips
 131271 decicycles in planar_rgbf32le_to_y,      16 runs,      0 skips
 128698 decicycles in planar_rgbf32le_to_y,      32 runs,      0 skips
 127170 decicycles in planar_rgbf32le_to_y,      64 runs,      0 skips

 156020 decicycles in planar_rgbf32be_to_y,       1 runs,      0 skips
 146990 decicycles in planar_rgbf32be_to_y,       2 runs,      0 skips
 142020 decicycles in planar_rgbf32be_to_y,       4 runs,      0 skips
 141052 decicycles in planar_rgbf32be_to_y,       8 runs,      0 skips
 138973 decicycles in planar_rgbf32be_to_y,      16 runs,      0 skips
 138027 decicycles in planar_rgbf32be_to_y,      32 runs,      0 skips
 143939 decicycles in planar_rgbf32be_to_y,      64 runs,      0 skips

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-11-15 16:50:10 -03:00
Mark Reid
74e49cc583 swscale/input: unify grayf32 funcs with rgbf32 funcs
This is ment to be a cosmetic change

old timings:
  42780 UNITS in grayf32le,       1 runs,      0 skips
  56720 UNITS in grayf32le,       2 runs,      0 skips
  67265 UNITS in grayf32le,       4 runs,      0 skips
  58082 UNITS in grayf32le,       8 runs,      0 skips
  63512 UNITS in grayf32le,      16 runs,      0 skips
  52720 UNITS in grayf32le,      32 runs,      0 skips
  46491 UNITS in grayf32le,      64 runs,      0 skips

  68500 UNITS in grayf32be,       1 runs,      0 skips
  66930 UNITS in grayf32be,       2 runs,      0 skips
  62305 UNITS in grayf32be,       4 runs,      0 skips
  55510 UNITS in grayf32be,       8 runs,      0 skips
  50216 UNITS in grayf32be,      16 runs,      0 skips
  44480 UNITS in grayf32be,      32 runs,      0 skips
  42394 UNITS in grayf32be,      64 runs,      0 skips

new timings:
  46660 UNITS in grayf32le,       1 runs,      0 skips
  51830 UNITS in grayf32le,       2 runs,      0 skips
  53390 UNITS in grayf32le,       4 runs,      0 skips
  50910 UNITS in grayf32le,       8 runs,      0 skips
  44968 UNITS in grayf32le,      16 runs,      0 skips
  40349 UNITS in grayf32le,      32 runs,      0 skips
  38330 UNITS in grayf32le,      64 runs,      0 skips

  39980 UNITS in grayf32be,       1 runs,      0 skips
  49630 UNITS in grayf32be,       2 runs,      0 skips
  53540 UNITS in grayf32be,       4 runs,      0 skips
  59767 UNITS in grayf32be,       8 runs,      0 skips
  51206 UNITS in grayf32be,      16 runs,      0 skips
  44743 UNITS in grayf32be,      32 runs,      0 skips
  41468 UNITS in grayf32be,      64 runs,      0 skips

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-11-14 17:12:13 +01:00
Soft Works
58dce6f010 swscale/swscale: check SWS_PRINT_INFO flag for printing alignment warnings
This makes output consistent with a similar warning just few
lines above where this flag is checked in the same way.

Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
2021-11-13 19:55:32 +01:00
Mark Reid
d2379bd6a0 swscale/input: fix planar_rgb16_to_a for gbrap10be and gbrap12be formats
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-11-04 11:52:33 +01:00
Michael Niedermayer
8316b2a15f swscale/swscale: Improve *ColorspaceDetails() doxy
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Michael Niedermayer
5f3a160b42 swscale/utils: Improve return codes of sws_setColorspaceDetails()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Michael Niedermayer
c7699f95bb swscale/utils: Set all threads to the same colorspace even on failure
Fixes: ./ffplay dav.y4m -vf "scale=hd1080:threads=4"
Found-by: Paul
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Wu Jianhua
2c734a8496 libswscale/x86/rgb2rgb: add shuffle_bytes avx2
Performance data(Less is better):
    shuffle_bytes_ssse3   3.64654
    shuffle_bytes_avx2    0.94288

Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
2021-10-15 10:59:20 +02:00
Michael Niedermayer
f801207568 swscale/swscale: Pass slice location into unscaled code also for dst scaling
Fixes: alphablend=checkerboard

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-03 20:38:29 +02:00
Michael Niedermayer
06d6726588 swscale/alphablend: Fix slice handling
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-03 20:38:29 +02:00
Michael Niedermayer
9f40b5badb swscale/swscale_internal: Avoid unsigned for slice parameters
Mixing unsigned and signed often leads to unexpected arithmetic results.
Fixes: out of array write
Found-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-30 19:47:15 +02:00
Manuel Stoeckl
32329397e2 swscale: add input/output support for X2BGR10LE
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 16:26:10 +02:00
Manuel Stoeckl
ca594df622 swscale/yuv2rgb: fix conversion to X2RGB10
This resolves a problem where conversions from YUV to X2RGB10LE
would produce color values a factor 4 too small, because an 8-bit
value was placed in a 10-bit channel.

Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 16:26:10 +02:00
Andreas Rheinhardt
1ea3650823 Replace all occurences of av_mallocz_array() by av_calloc()
They do the same.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-20 01:03:52 +02:00
Andreas Rheinhardt
044a7c08dc swscale/swscale: Disable x86-specific code for other arches
SSE2 is x86 specific, yet due to the call to av_get_cpu_flags()
compilers were unable to optimize the checks (and the call) away
on other arches.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Andreas Rheinhardt
f440c422b7 swscale/swscale: Fix races when using unaligned strides/data
In this case the current code tries to warn once; to do so, it uses
ordinary static ints to store whether the warning has already been
emitted. This is both a data race (and therefore undefined behaviour)
as well as a race condition, because it is really possible for multiple
threads to be the one thread to emit the warning. This is actually
common since the introduction of the new multithreaded scaling API.

This commit fixes this by using atomic integers for the state;
furthermore, these are not static anymore, but rather contained
in the user-facing SwsContext (i.e. the parent SwsContext in case
of slice-threading).

Given that these atomic variables are not intended for synchronization
at all (but only for atomicity, i.e. only to output the warning once),
the atomic operations use memory_order_relaxed.

This affected the nv12, nv21, yuv420, yuv420p10, yuv422, yuv422p10 and
yuv444 filter-overlay FATE-tests.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Andreas Rheinhardt
a1255a350d libswscale/options: Add parent_log_context_offset to AVClass
This allows to associate log messages from slice contexts to
the user-visible SwsContext.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
James Almer
5fe648d04a libswscale/swscale: initialize all dst plane pointers in sws_receive_slice()
Fixes valgrind warnings about use of uninitialised values.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-09-07 09:44:58 -03:00
Anton Khirnov
d6fdc78e91 sws: implement slice threading 2021-09-06 09:17:53 +02:00
Anton Khirnov
42cd64c182 sws: add a new scaling API 2021-09-06 09:16:52 +02:00
Andreas Rheinhardt
2c05ee092b avutil/internal, swresample/audioconvert: Remove cpu.h inclusions
These inclusions are not necessary, as cpu.h is already included
wherever it is needed (via direct inclusion or via the arch-specific
headers).
Also remove other unnecessary cpu.h inclusions from ordinary
non-headers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-07-22 14:33:45 +02:00
Michael Niedermayer
7874d40f10 swscale/slice: Fix wrong return on error
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 15:21:37 +02:00
Michael Niedermayer
fa1e158ef6 swscale/utils: Use full chroma interpolation for rgb4/8 and dither none
Dither none is only implemented in full chroma interpolation for these rgb formats
Its also a obscure choice (producing less nice images) that implementing it in the
other code-paths makes no sense

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00