1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

2448 Commits

Author SHA1 Message Date
Andreas Rheinhardt
636631d9db Remove unnecessary libavutil/(avutil|common|internal).h inclusions
Some of these were made possible by moving several common macros to
libavutil/macros.h.

While just at it, also improve the other headers a bit.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-02-24 12:56:49 +01:00
Andreas Rheinhardt
155cd6baa4 Remove obsolete version.h inclusions
Forgotten in e7bd47e657.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-02-24 12:56:49 +01:00
Alan Kelly
e534d98af3 libswscale: Re-factor ff_shuffle_filter_coefficients.
Make the code more readable and follow the style guide.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-02-17 17:17:22 +01:00
Alan Kelly
f1a5414c97 libswscale: Check and propagate memory allocation errors from ff_shuffle_filter_coefficients.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-02-17 17:17:07 +01:00
Andreas Rheinhardt
71e2825150 swscale/x86/swscale: Remove superfluous and invalid ';'
Inside a function an unnecessary ';' is just a null statement;
yet outside of it it is actually illegal (but compilers happen
to accept it without warning except when using -pedantic).
So modify the macros to always expect the user to add a ';'.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-22 17:00:45 +01:00
Mark Reid
52f7026164 swscale/x86/input.asm: add x86-optimized planer rgb2yuv functions
sse2 only operates on 2 lanes per loop for to_y and to_uv functions, due
to the lack of pmulld instruction.  Emulating pmulld with 2 pmuludq and shuffles
proved too costly and made to_uv functions slower then the c implementation.

For to_y on sse2 only float functions are generated,
I was are not able outperform the c implementation on the integer pixel formats.

For to_a on see4 only the float functions are generated.
sse2 and sse4 generated nearly identical performing code on integer pixel formats,
so only sse2/avx2 versions are generated.

planar_gbrp_to_y_512_c: 1197.5
planar_gbrp_to_y_512_sse4: 444.5
planar_gbrp_to_y_512_avx2: 287.5
planar_gbrap_to_y_512_c: 1204.5
planar_gbrap_to_y_512_sse4: 447.5
planar_gbrap_to_y_512_avx2: 289.5
planar_gbrp9be_to_y_512_c: 1380.0
planar_gbrp9be_to_y_512_sse4: 543.5
planar_gbrp9be_to_y_512_avx2: 340.0
planar_gbrp9le_to_y_512_c: 1200.5
planar_gbrp9le_to_y_512_sse4: 442.0
planar_gbrp9le_to_y_512_avx2: 282.0
planar_gbrp10be_to_y_512_c: 1378.5
planar_gbrp10be_to_y_512_sse4: 544.0
planar_gbrp10be_to_y_512_avx2: 337.5
planar_gbrp10le_to_y_512_c: 1200.0
planar_gbrp10le_to_y_512_sse4: 448.0
planar_gbrp10le_to_y_512_avx2: 285.5
planar_gbrap10be_to_y_512_c: 1380.0
planar_gbrap10be_to_y_512_sse4: 542.0
planar_gbrap10be_to_y_512_avx2: 340.5
planar_gbrap10le_to_y_512_c: 1199.0
planar_gbrap10le_to_y_512_sse4: 446.0
planar_gbrap10le_to_y_512_avx2: 289.5
planar_gbrp12be_to_y_512_c: 10563.0
planar_gbrp12be_to_y_512_sse4: 542.5
planar_gbrp12be_to_y_512_avx2: 339.0
planar_gbrp12le_to_y_512_c: 1201.0
planar_gbrp12le_to_y_512_sse4: 440.5
planar_gbrp12le_to_y_512_avx2: 286.0
planar_gbrap12be_to_y_512_c: 1701.5
planar_gbrap12be_to_y_512_sse4: 917.0
planar_gbrap12be_to_y_512_avx2: 338.5
planar_gbrap12le_to_y_512_c: 1201.0
planar_gbrap12le_to_y_512_sse4: 444.5
planar_gbrap12le_to_y_512_avx2: 288.0
planar_gbrp14be_to_y_512_c: 1370.5
planar_gbrp14be_to_y_512_sse4: 545.0
planar_gbrp14be_to_y_512_avx2: 338.5
planar_gbrp14le_to_y_512_c: 1199.0
planar_gbrp14le_to_y_512_sse4: 444.0
planar_gbrp14le_to_y_512_avx2: 279.5
planar_gbrp16be_to_y_512_c: 1364.0
planar_gbrp16be_to_y_512_sse4: 544.5
planar_gbrp16be_to_y_512_avx2: 339.5
planar_gbrp16le_to_y_512_c: 1201.0
planar_gbrp16le_to_y_512_sse4: 445.5
planar_gbrp16le_to_y_512_avx2: 280.5
planar_gbrap16be_to_y_512_c: 1377.0
planar_gbrap16be_to_y_512_sse4: 545.0
planar_gbrap16be_to_y_512_avx2: 338.5
planar_gbrap16le_to_y_512_c: 1201.0
planar_gbrap16le_to_y_512_sse4: 442.0
planar_gbrap16le_to_y_512_avx2: 279.0
planar_gbrpf32be_to_y_512_c: 4113.0
planar_gbrpf32be_to_y_512_sse2: 2438.0
planar_gbrpf32be_to_y_512_sse4: 1068.0
planar_gbrpf32be_to_y_512_avx2: 904.5
planar_gbrpf32le_to_y_512_c: 3818.5
planar_gbrpf32le_to_y_512_sse2: 2024.5
planar_gbrpf32le_to_y_512_sse4: 1241.5
planar_gbrpf32le_to_y_512_avx2: 657.0
planar_gbrapf32be_to_y_512_c: 3707.0
planar_gbrapf32be_to_y_512_sse2: 2444.0
planar_gbrapf32be_to_y_512_sse4: 1077.0
planar_gbrapf32be_to_y_512_avx2: 909.0
planar_gbrapf32le_to_y_512_c: 3822.0
planar_gbrapf32le_to_y_512_sse2: 2024.5
planar_gbrapf32le_to_y_512_sse4: 1176.0
planar_gbrapf32le_to_y_512_avx2: 658.5

planar_gbrp_to_uv_512_c: 2325.8
planar_gbrp_to_uv_512_sse2: 1726.8
planar_gbrp_to_uv_512_sse4: 771.8
planar_gbrp_to_uv_512_avx2: 506.8
planar_gbrap_to_uv_512_c: 2281.8
planar_gbrap_to_uv_512_sse2: 1726.3
planar_gbrap_to_uv_512_sse4: 768.3
planar_gbrap_to_uv_512_avx2: 496.3
planar_gbrp9be_to_uv_512_c: 2336.8
planar_gbrp9be_to_uv_512_sse2: 1924.8
planar_gbrp9be_to_uv_512_sse4: 852.3
planar_gbrp9be_to_uv_512_avx2: 552.8
planar_gbrp9le_to_uv_512_c: 2270.3
planar_gbrp9le_to_uv_512_sse2: 1512.3
planar_gbrp9le_to_uv_512_sse4: 764.3
planar_gbrp9le_to_uv_512_avx2: 491.3
planar_gbrp10be_to_uv_512_c: 2281.8
planar_gbrp10be_to_uv_512_sse2: 1917.8
planar_gbrp10be_to_uv_512_sse4: 855.3
planar_gbrp10be_to_uv_512_avx2: 541.3
planar_gbrp10le_to_uv_512_c: 2269.8
planar_gbrp10le_to_uv_512_sse2: 1515.3
planar_gbrp10le_to_uv_512_sse4: 759.8
planar_gbrp10le_to_uv_512_avx2: 487.8
planar_gbrap10be_to_uv_512_c: 2382.3
planar_gbrap10be_to_uv_512_sse2: 1924.8
planar_gbrap10be_to_uv_512_sse4: 855.3
planar_gbrap10be_to_uv_512_avx2: 540.8
planar_gbrap10le_to_uv_512_c: 2382.3
planar_gbrap10le_to_uv_512_sse2: 1512.3
planar_gbrap10le_to_uv_512_sse4: 759.3
planar_gbrap10le_to_uv_512_avx2: 484.8
planar_gbrp12be_to_uv_512_c: 2283.8
planar_gbrp12be_to_uv_512_sse2: 1936.8
planar_gbrp12be_to_uv_512_sse4: 858.3
planar_gbrp12be_to_uv_512_avx2: 541.3
planar_gbrp12le_to_uv_512_c: 2278.8
planar_gbrp12le_to_uv_512_sse2: 1507.3
planar_gbrp12le_to_uv_512_sse4: 760.3
planar_gbrp12le_to_uv_512_avx2: 485.8
planar_gbrap12be_to_uv_512_c: 2385.3
planar_gbrap12be_to_uv_512_sse2: 1927.8
planar_gbrap12be_to_uv_512_sse4: 855.3
planar_gbrap12be_to_uv_512_avx2: 539.8
planar_gbrap12le_to_uv_512_c: 2377.3
planar_gbrap12le_to_uv_512_sse2: 1516.3
planar_gbrap12le_to_uv_512_sse4: 759.3
planar_gbrap12le_to_uv_512_avx2: 484.8
planar_gbrp14be_to_uv_512_c: 2283.8
planar_gbrp14be_to_uv_512_sse2: 1935.3
planar_gbrp14be_to_uv_512_sse4: 852.3
planar_gbrp14be_to_uv_512_avx2: 540.3
planar_gbrp14le_to_uv_512_c: 2276.8
planar_gbrp14le_to_uv_512_sse2: 1514.8
planar_gbrp14le_to_uv_512_sse4: 762.3
planar_gbrp14le_to_uv_512_avx2: 484.8
planar_gbrp16be_to_uv_512_c: 2383.3
planar_gbrp16be_to_uv_512_sse2: 1881.8
planar_gbrp16be_to_uv_512_sse4: 852.3
planar_gbrp16be_to_uv_512_avx2: 541.8
planar_gbrp16le_to_uv_512_c: 2378.3
planar_gbrp16le_to_uv_512_sse2: 1476.8
planar_gbrp16le_to_uv_512_sse4: 765.3
planar_gbrp16le_to_uv_512_avx2: 485.8
planar_gbrap16be_to_uv_512_c: 2382.3
planar_gbrap16be_to_uv_512_sse2: 1886.3
planar_gbrap16be_to_uv_512_sse4: 853.8
planar_gbrap16be_to_uv_512_avx2: 550.8
planar_gbrap16le_to_uv_512_c: 2381.8
planar_gbrap16le_to_uv_512_sse2: 1488.3
planar_gbrap16le_to_uv_512_sse4: 765.3
planar_gbrap16le_to_uv_512_avx2: 491.8
planar_gbrpf32be_to_uv_512_c: 4863.0
planar_gbrpf32be_to_uv_512_sse2: 3347.5
planar_gbrpf32be_to_uv_512_sse4: 1800.0
planar_gbrpf32be_to_uv_512_avx2: 1199.0
planar_gbrpf32le_to_uv_512_c: 4725.0
planar_gbrpf32le_to_uv_512_sse2: 2753.0
planar_gbrpf32le_to_uv_512_sse4: 1474.5
planar_gbrpf32le_to_uv_512_avx2: 927.5
planar_gbrapf32be_to_uv_512_c: 4859.0
planar_gbrapf32be_to_uv_512_sse2: 3269.0
planar_gbrapf32be_to_uv_512_sse4: 1802.0
planar_gbrapf32be_to_uv_512_avx2: 1201.5
planar_gbrapf32le_to_uv_512_c: 6338.0
planar_gbrapf32le_to_uv_512_sse2: 2756.5
planar_gbrapf32le_to_uv_512_sse4: 1476.0
planar_gbrapf32le_to_uv_512_avx2: 908.5

planar_gbrap_to_a_512_c: 383.3
planar_gbrap_to_a_512_sse2: 66.8
planar_gbrap_to_a_512_avx2: 43.8
planar_gbrap10be_to_a_512_c: 601.8
planar_gbrap10be_to_a_512_sse2: 86.3
planar_gbrap10be_to_a_512_avx2: 34.8
planar_gbrap10le_to_a_512_c: 602.3
planar_gbrap10le_to_a_512_sse2: 48.8
planar_gbrap10le_to_a_512_avx2: 31.3
planar_gbrap12be_to_a_512_c: 601.8
planar_gbrap12be_to_a_512_sse2: 111.8
planar_gbrap12be_to_a_512_avx2: 41.3
planar_gbrap12le_to_a_512_c: 385.8
planar_gbrap12le_to_a_512_sse2: 75.3
planar_gbrap12le_to_a_512_avx2: 39.8
planar_gbrap16be_to_a_512_c: 386.8
planar_gbrap16be_to_a_512_sse2: 79.8
planar_gbrap16be_to_a_512_avx2: 31.3
planar_gbrap16le_to_a_512_c: 600.3
planar_gbrap16le_to_a_512_sse2: 40.3
planar_gbrap16le_to_a_512_avx2: 30.3
planar_gbrapf32be_to_a_512_c: 1148.8
planar_gbrapf32be_to_a_512_sse2: 611.3
planar_gbrapf32be_to_a_512_sse4: 234.8
planar_gbrapf32be_to_a_512_avx2: 183.3
planar_gbrapf32le_to_a_512_c: 851.3
planar_gbrapf32le_to_a_512_sse2: 263.3
planar_gbrapf32le_to_a_512_sse4: 199.3
planar_gbrapf32le_to_a_512_avx2: 156.8

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-01-11 16:34:33 -03:00
Mark Reid
9e445a5be2 swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions
changes since v2:
 * fixed label
changes since v1:
 * remove vex intruction on sse4 path
 * some load/pack marcos use less intructions
 * fixed some typos

yuv2gbrp_full_X_4_512_c: 12757.6
yuv2gbrp_full_X_4_512_sse2: 8946.6
yuv2gbrp_full_X_4_512_sse4: 5138.6
yuv2gbrp_full_X_4_512_avx2: 3889.6
yuv2gbrap_full_X_4_512_c: 15368.6
yuv2gbrap_full_X_4_512_sse2: 11916.1
yuv2gbrap_full_X_4_512_sse4: 6294.6
yuv2gbrap_full_X_4_512_avx2: 3477.1
yuv2gbrp9be_full_X_4_512_c: 14381.6
yuv2gbrp9be_full_X_4_512_sse2: 9139.1
yuv2gbrp9be_full_X_4_512_sse4: 5150.1
yuv2gbrp9be_full_X_4_512_avx2: 2834.6
yuv2gbrp9le_full_X_4_512_c: 12990.1
yuv2gbrp9le_full_X_4_512_sse2: 9118.1
yuv2gbrp9le_full_X_4_512_sse4: 5132.1
yuv2gbrp9le_full_X_4_512_avx2: 2833.1
yuv2gbrp10be_full_X_4_512_c: 14401.6
yuv2gbrp10be_full_X_4_512_sse2: 9133.1
yuv2gbrp10be_full_X_4_512_sse4: 5126.1
yuv2gbrp10be_full_X_4_512_avx2: 2837.6
yuv2gbrp10le_full_X_4_512_c: 12718.1
yuv2gbrp10le_full_X_4_512_sse2: 9106.1
yuv2gbrp10le_full_X_4_512_sse4: 5120.1
yuv2gbrp10le_full_X_4_512_avx2: 2826.1
yuv2gbrap10be_full_X_4_512_c: 18535.6
yuv2gbrap10be_full_X_4_512_sse2: 33617.6
yuv2gbrap10be_full_X_4_512_sse4: 6264.1
yuv2gbrap10be_full_X_4_512_avx2: 3422.1
yuv2gbrap10le_full_X_4_512_c: 16724.1
yuv2gbrap10le_full_X_4_512_sse2: 11787.1
yuv2gbrap10le_full_X_4_512_sse4: 6282.1
yuv2gbrap10le_full_X_4_512_avx2: 3441.6
yuv2gbrp12be_full_X_4_512_c: 13723.6
yuv2gbrp12be_full_X_4_512_sse2: 9128.1
yuv2gbrp12be_full_X_4_512_sse4: 7997.6
yuv2gbrp12be_full_X_4_512_avx2: 2844.1
yuv2gbrp12le_full_X_4_512_c: 12257.1
yuv2gbrp12le_full_X_4_512_sse2: 9107.6
yuv2gbrp12le_full_X_4_512_sse4: 5142.6
yuv2gbrp12le_full_X_4_512_avx2: 2837.6
yuv2gbrap12be_full_X_4_512_c: 18511.1
yuv2gbrap12be_full_X_4_512_sse2: 12156.6
yuv2gbrap12be_full_X_4_512_sse4: 6251.1
yuv2gbrap12be_full_X_4_512_avx2: 3444.6
yuv2gbrap12le_full_X_4_512_c: 16687.1
yuv2gbrap12le_full_X_4_512_sse2: 11785.1
yuv2gbrap12le_full_X_4_512_sse4: 6243.6
yuv2gbrap12le_full_X_4_512_avx2: 3446.1
yuv2gbrp14be_full_X_4_512_c: 13690.6
yuv2gbrp14be_full_X_4_512_sse2: 9120.6
yuv2gbrp14be_full_X_4_512_sse4: 5138.1
yuv2gbrp14be_full_X_4_512_avx2: 2843.1
yuv2gbrp14le_full_X_4_512_c: 14995.6
yuv2gbrp14le_full_X_4_512_sse2: 9119.1
yuv2gbrp14le_full_X_4_512_sse4: 5126.1
yuv2gbrp14le_full_X_4_512_avx2: 2843.1
yuv2gbrp16be_full_X_4_512_c: 12367.1
yuv2gbrp16be_full_X_4_512_sse2: 8233.6
yuv2gbrp16be_full_X_4_512_sse4: 4820.1
yuv2gbrp16be_full_X_4_512_avx2: 2666.6
yuv2gbrp16le_full_X_4_512_c: 10904.1
yuv2gbrp16le_full_X_4_512_sse2: 8214.1
yuv2gbrp16le_full_X_4_512_sse4: 4824.1
yuv2gbrp16le_full_X_4_512_avx2: 2629.1
yuv2gbrap16be_full_X_4_512_c: 26569.6
yuv2gbrap16be_full_X_4_512_sse2: 10884.1
yuv2gbrap16be_full_X_4_512_sse4: 5488.1
yuv2gbrap16be_full_X_4_512_avx2: 3272.1
yuv2gbrap16le_full_X_4_512_c: 14010.1
yuv2gbrap16le_full_X_4_512_sse2: 10562.1
yuv2gbrap16le_full_X_4_512_sse4: 5463.6
yuv2gbrap16le_full_X_4_512_avx2: 3255.1
yuv2gbrpf32be_full_X_4_512_c: 14524.1
yuv2gbrpf32be_full_X_4_512_sse2: 8552.6
yuv2gbrpf32be_full_X_4_512_sse4: 4636.1
yuv2gbrpf32be_full_X_4_512_avx2: 2474.6
yuv2gbrpf32le_full_X_4_512_c: 13060.6
yuv2gbrpf32le_full_X_4_512_sse2: 9682.6
yuv2gbrpf32le_full_X_4_512_sse4: 4298.1
yuv2gbrpf32le_full_X_4_512_avx2: 2453.1
yuv2gbrapf32be_full_X_4_512_c: 18629.6
yuv2gbrapf32be_full_X_4_512_sse2: 11363.1
yuv2gbrapf32be_full_X_4_512_sse4: 15201.6
yuv2gbrapf32be_full_X_4_512_avx2: 3727.1
yuv2gbrapf32le_full_X_4_512_c: 16677.6
yuv2gbrapf32le_full_X_4_512_sse2: 10221.6
yuv2gbrapf32le_full_X_4_512_sse4: 5693.6
yuv2gbrapf32le_full_X_4_512_avx2: 3656.6

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-01-11 16:33:17 -03:00
rcombs
df9180d8a0 swscale/output: use isSwappedChroma 2022-01-04 19:39:22 -06:00
rcombs
cb3a6cc082 swscale/output: use isSemiPlanarYUV for NV12/21/24/42 case 2022-01-04 19:39:22 -06:00
rcombs
f8e284be69 swscale: introduce isSwappedChroma 2022-01-04 19:39:22 -06:00
rcombs
bb4f19f2a2 swscale/output: use isDataInHighBits for 10-bit case
This code will need fleshing-out (probably templating) if we ever add
e.g. a P012 format.
2022-01-04 19:39:22 -06:00
rcombs
cf9e8cb52f swscale/output: use isSemiPlanarYUV for 16-bit case 2022-01-04 19:39:22 -06:00
rcombs
e5d83463c8 swscale: introduce isDataInHighBits 2022-01-04 19:39:22 -06:00
rcombs
cb87a3b137 swscale/output: template-ize yuv2nv12cX 10-bit and 16-bit cases
Fixes incorrect big-endian output introduced in 88d804b7ff

Avoids making the filter-time BE check more expensive
2022-01-04 19:39:22 -06:00
Andreas Rheinhardt
b189550137 lib*/version.h: Bump Versions after release/5.0 branch
This is done a second time for 5.0 because master was
merged into 5.0 so that it contains the recent DOVI additions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 14:29:06 +01:00
Andreas Rheinhardt
c512be9a90 lib*/version.h: Bump Versions before release/5.0 branch
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 13:40:03 +01:00
Andreas Rheinhardt
20b0d24c2f Makefile: Redo duplicating object files in shared builds
In case of shared builds, some object files containing tables
are currently duplicated into other libraries: log2_tab.c,
golomb.c, reverse.c. The check for whether this is duplicated
is simply whether CONFIG_SHARED is true. Yet this is crude:
E.g. libavdevice includes reverse.c for shared builds, but only
needs it for the decklink input device, which given that decklink
is not enabled by default will be unused in most libavdevice.so.

This commit changes this by making it more explicit about what
to duplicate from other libraries. To do this, two new Makefile
variables were added: SHLIBOBJS and STLIBOBJS. SHLIBOBJS contains
the objects that are duplicated from other libraries in case of
shared builds; STLIBOBJS contains stuff that a library has to
provide for other libraries in case of static builds. These new
variables provide a way to enable/disable with a finer granularity
than just whether shared builds are enabled or not. E.g. lavd's
Makefile now contains: SHLIBOBJS-$(CONFIG_DECKLINK_INDEV) += reverse.o

Another example is provided by the golomb tables. These are provided
by lavc for static builds, even if one uses a build configuration
that makes only lavf use them. Therefore lavc's Makefile contains
STLIBOBJS-$(CONFIG_MXF_MUXER) += golomb.o, whereas lavf's Makefile
has a corresponding SHLIBOBJS-$(CONFIG_MXF_MUXER) += golomb_tab.o.
E.g. in case the MXF muxer is the only component needing these tables
only libavformat.so will contain them for shared builds; currently
libavcodec.so does so, too.
(There is currently a CONFIG_EXTRA group for golomb. But actually
one would need two groups (golomb_avcodec and golomb_avformat) in
order to know when and where to include these tables. Therefore
this commit uses a Makefile-based approach for this and stops
using these groups for the users in libavformat.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 05:01:04 +01:00
Michael Niedermayer
4be85c9331 lib*/version.h: Bump Versions after release/5.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-01-03 22:10:46 +01:00
Michael Niedermayer
f3964a59e1 lib*/version.h: Bump Versions before release/5.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-01-03 22:08:31 +01:00
rcombs
3e00b9e395 swscale/x86/init: use isSemiPlanarYUV
Fixes P210/P410 cases introduced (and broken) in 88d804b7ff
2021-12-23 01:41:03 -06:00
rcombs
88d804b7ff swscale: add P210/P410/P216/P416 output 2021-12-22 18:38:40 -06:00
Alan Kelly
eebe406c80 libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
2021-12-21 17:44:53 -03:00
James Almer
eab91c3e2e x86/scale_avx2: don't use $ for hex literals
Fixes compilation with AVX2 enabled yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 17:29:21 -03:00
Alan Kelly
9092e58c44 x86/scale_avx2: Change asm indent from 2 to 4 spaces.
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 13:42:04 -03:00
Alan Kelly
86663963e6 x86/swscale: fix minor coding style issues
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 13:16:04 -03:00
James Almer
76a3f961f8 x86/scale_avx2: add missing check for AVX2 assembler support
Should fix compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 09:41:56 -03:00
Alan Kelly
f900a19fa9 libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.
Fixes so that fate under 64 bit Windows passes.

These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-15 20:04:59 -03:00
Andreas Rheinhardt
3be6fe9a56 swscale/yuv2rgb: Silence a set-but-unused-variable warning
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-12-03 16:10:51 +01:00
rcombs
f0204de47d swscale: add P210/P410/P216/P416 input 2021-11-28 16:40:43 -06:00
Mark Reid
3f4ce004b8 swscale/input: clip rgbf32 values before lrintf
if the float pixel * 65535.0f > 2147483647.0f
lrintf may overfow and return negative values, depending on implementation.
nan and +/-inf values may also be implementation defined

clip the value first so lrintf always works.

values <     0.0f, -inf, nan = 0.0f
values > 65535.0f, +inf      = 65535.0f

old timings
 195960 decicycles in planar_rgbf32le_to_uv,       1 runs,      0 skips
 186120 decicycles in planar_rgbf32le_to_uv,       2 runs,      0 skips
 188645 decicycles in planar_rgbf32le_to_uv,       4 runs,      0 skips
 183625 decicycles in planar_rgbf32le_to_uv,       8 runs,      0 skips
 181157 decicycles in planar_rgbf32le_to_uv,      16 runs,      0 skips
 177533 decicycles in planar_rgbf32le_to_uv,      32 runs,      0 skips
 175689 decicycles in planar_rgbf32le_to_uv,      64 runs,      0 skips

 232960 decicycles in planar_rgbf32be_to_uv,       1 runs,      0 skips
 221380 decicycles in planar_rgbf32be_to_uv,       2 runs,      0 skips
 216640 decicycles in planar_rgbf32be_to_uv,       4 runs,      0 skips
 213505 decicycles in planar_rgbf32be_to_uv,       8 runs,      0 skips
 211558 decicycles in planar_rgbf32be_to_uv,      16 runs,      0 skips
 210596 decicycles in planar_rgbf32be_to_uv,      32 runs,      0 skips
 210202 decicycles in planar_rgbf32be_to_uv,      64 runs,      0 skips

 161680 decicycles in planar_rgbf32le_to_y,       1 runs,      0 skips
 153540 decicycles in planar_rgbf32le_to_y,       2 runs,      0 skips
 148255 decicycles in planar_rgbf32le_to_y,       4 runs,      0 skips
 140600 decicycles in planar_rgbf32le_to_y,       8 runs,      0 skips
 132935 decicycles in planar_rgbf32le_to_y,      16 runs,      0 skips
 128531 decicycles in planar_rgbf32le_to_y,      32 runs,      0 skips
 140933 decicycles in planar_rgbf32le_to_y,      64 runs,      0 skips

 190980 decicycles in planar_rgbf32be_to_y,       1 runs,      0 skips
 176080 decicycles in planar_rgbf32be_to_y,       2 runs,      0 skips
 167980 decicycles in planar_rgbf32be_to_y,       4 runs,      0 skips
 164685 decicycles in planar_rgbf32be_to_y,       8 runs,      0 skips
 162751 decicycles in planar_rgbf32be_to_y,      16 runs,      0 skips
 162404 decicycles in planar_rgbf32be_to_y,      32 runs,      0 skips
 167849 decicycles in planar_rgbf32be_to_y,      64 runs,      0 skips

new timings
 183320 decicycles in planar_rgbf32le_to_uv,       1 runs,      0 skips
 175700 decicycles in planar_rgbf32le_to_uv,       2 runs,      0 skips
 179570 decicycles in planar_rgbf32le_to_uv,       4 runs,      0 skips
 172932 decicycles in planar_rgbf32le_to_uv,       8 runs,      0 skips
 168707 decicycles in planar_rgbf32le_to_uv,      16 runs,      0 skips
 165224 decicycles in planar_rgbf32le_to_uv,      32 runs,      0 skips
 163423 decicycles in planar_rgbf32le_to_uv,      64 runs,      0 skips

 184940 decicycles in planar_rgbf32be_to_uv,       1 runs,      0 skips
 185150 decicycles in planar_rgbf32be_to_uv,       2 runs,      0 skips
 185790 decicycles in planar_rgbf32be_to_uv,       4 runs,      0 skips
 185472 decicycles in planar_rgbf32be_to_uv,       8 runs,      0 skips
 185277 decicycles in planar_rgbf32be_to_uv,      16 runs,      0 skips
 185813 decicycles in planar_rgbf32be_to_uv,      32 runs,      0 skips
 185332 decicycles in planar_rgbf32be_to_uv,      64 runs,      0 skips

 145400 decicycles in planar_rgbf32le_to_y,       1 runs,      0 skips
 145100 decicycles in planar_rgbf32le_to_y,       2 runs,      0 skips
 143490 decicycles in planar_rgbf32le_to_y,       4 runs,      0 skips
 136687 decicycles in planar_rgbf32le_to_y,       8 runs,      0 skips
 131271 decicycles in planar_rgbf32le_to_y,      16 runs,      0 skips
 128698 decicycles in planar_rgbf32le_to_y,      32 runs,      0 skips
 127170 decicycles in planar_rgbf32le_to_y,      64 runs,      0 skips

 156020 decicycles in planar_rgbf32be_to_y,       1 runs,      0 skips
 146990 decicycles in planar_rgbf32be_to_y,       2 runs,      0 skips
 142020 decicycles in planar_rgbf32be_to_y,       4 runs,      0 skips
 141052 decicycles in planar_rgbf32be_to_y,       8 runs,      0 skips
 138973 decicycles in planar_rgbf32be_to_y,      16 runs,      0 skips
 138027 decicycles in planar_rgbf32be_to_y,      32 runs,      0 skips
 143939 decicycles in planar_rgbf32be_to_y,      64 runs,      0 skips

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-11-15 16:50:10 -03:00
Mark Reid
74e49cc583 swscale/input: unify grayf32 funcs with rgbf32 funcs
This is ment to be a cosmetic change

old timings:
  42780 UNITS in grayf32le,       1 runs,      0 skips
  56720 UNITS in grayf32le,       2 runs,      0 skips
  67265 UNITS in grayf32le,       4 runs,      0 skips
  58082 UNITS in grayf32le,       8 runs,      0 skips
  63512 UNITS in grayf32le,      16 runs,      0 skips
  52720 UNITS in grayf32le,      32 runs,      0 skips
  46491 UNITS in grayf32le,      64 runs,      0 skips

  68500 UNITS in grayf32be,       1 runs,      0 skips
  66930 UNITS in grayf32be,       2 runs,      0 skips
  62305 UNITS in grayf32be,       4 runs,      0 skips
  55510 UNITS in grayf32be,       8 runs,      0 skips
  50216 UNITS in grayf32be,      16 runs,      0 skips
  44480 UNITS in grayf32be,      32 runs,      0 skips
  42394 UNITS in grayf32be,      64 runs,      0 skips

new timings:
  46660 UNITS in grayf32le,       1 runs,      0 skips
  51830 UNITS in grayf32le,       2 runs,      0 skips
  53390 UNITS in grayf32le,       4 runs,      0 skips
  50910 UNITS in grayf32le,       8 runs,      0 skips
  44968 UNITS in grayf32le,      16 runs,      0 skips
  40349 UNITS in grayf32le,      32 runs,      0 skips
  38330 UNITS in grayf32le,      64 runs,      0 skips

  39980 UNITS in grayf32be,       1 runs,      0 skips
  49630 UNITS in grayf32be,       2 runs,      0 skips
  53540 UNITS in grayf32be,       4 runs,      0 skips
  59767 UNITS in grayf32be,       8 runs,      0 skips
  51206 UNITS in grayf32be,      16 runs,      0 skips
  44743 UNITS in grayf32be,      32 runs,      0 skips
  41468 UNITS in grayf32be,      64 runs,      0 skips

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-11-14 17:12:13 +01:00
Soft Works
58dce6f010 swscale/swscale: check SWS_PRINT_INFO flag for printing alignment warnings
This makes output consistent with a similar warning just few
lines above where this flag is checked in the same way.

Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
2021-11-13 19:55:32 +01:00
Mark Reid
d2379bd6a0 swscale/input: fix planar_rgb16_to_a for gbrap10be and gbrap12be formats
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-11-04 11:52:33 +01:00
Michael Niedermayer
8316b2a15f swscale/swscale: Improve *ColorspaceDetails() doxy
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Michael Niedermayer
5f3a160b42 swscale/utils: Improve return codes of sws_setColorspaceDetails()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Michael Niedermayer
c7699f95bb swscale/utils: Set all threads to the same colorspace even on failure
Fixes: ./ffplay dav.y4m -vf "scale=hd1080:threads=4"
Found-by: Paul
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Wu Jianhua
2c734a8496 libswscale/x86/rgb2rgb: add shuffle_bytes avx2
Performance data(Less is better):
    shuffle_bytes_ssse3   3.64654
    shuffle_bytes_avx2    0.94288

Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
2021-10-15 10:59:20 +02:00
Michael Niedermayer
f801207568 swscale/swscale: Pass slice location into unscaled code also for dst scaling
Fixes: alphablend=checkerboard

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-03 20:38:29 +02:00
Michael Niedermayer
06d6726588 swscale/alphablend: Fix slice handling
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-03 20:38:29 +02:00
Michael Niedermayer
9f40b5badb swscale/swscale_internal: Avoid unsigned for slice parameters
Mixing unsigned and signed often leads to unexpected arithmetic results.
Fixes: out of array write
Found-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-30 19:47:15 +02:00
Manuel Stoeckl
32329397e2 swscale: add input/output support for X2BGR10LE
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 16:26:10 +02:00
Manuel Stoeckl
ca594df622 swscale/yuv2rgb: fix conversion to X2RGB10
This resolves a problem where conversions from YUV to X2RGB10LE
would produce color values a factor 4 too small, because an 8-bit
value was placed in a 10-bit channel.

Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 16:26:10 +02:00
Andreas Rheinhardt
1ea3650823 Replace all occurences of av_mallocz_array() by av_calloc()
They do the same.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-20 01:03:52 +02:00
Andreas Rheinhardt
044a7c08dc swscale/swscale: Disable x86-specific code for other arches
SSE2 is x86 specific, yet due to the call to av_get_cpu_flags()
compilers were unable to optimize the checks (and the call) away
on other arches.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Andreas Rheinhardt
f440c422b7 swscale/swscale: Fix races when using unaligned strides/data
In this case the current code tries to warn once; to do so, it uses
ordinary static ints to store whether the warning has already been
emitted. This is both a data race (and therefore undefined behaviour)
as well as a race condition, because it is really possible for multiple
threads to be the one thread to emit the warning. This is actually
common since the introduction of the new multithreaded scaling API.

This commit fixes this by using atomic integers for the state;
furthermore, these are not static anymore, but rather contained
in the user-facing SwsContext (i.e. the parent SwsContext in case
of slice-threading).

Given that these atomic variables are not intended for synchronization
at all (but only for atomicity, i.e. only to output the warning once),
the atomic operations use memory_order_relaxed.

This affected the nv12, nv21, yuv420, yuv420p10, yuv422, yuv422p10 and
yuv444 filter-overlay FATE-tests.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Andreas Rheinhardt
a1255a350d libswscale/options: Add parent_log_context_offset to AVClass
This allows to associate log messages from slice contexts to
the user-visible SwsContext.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
James Almer
5fe648d04a libswscale/swscale: initialize all dst plane pointers in sws_receive_slice()
Fixes valgrind warnings about use of uninitialised values.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-09-07 09:44:58 -03:00
Anton Khirnov
d6fdc78e91 sws: implement slice threading 2021-09-06 09:17:53 +02:00
Anton Khirnov
42cd64c182 sws: add a new scaling API 2021-09-06 09:16:52 +02:00
Andreas Rheinhardt
2c05ee092b avutil/internal, swresample/audioconvert: Remove cpu.h inclusions
These inclusions are not necessary, as cpu.h is already included
wherever it is needed (via direct inclusion or via the arch-specific
headers).
Also remove other unnecessary cpu.h inclusions from ordinary
non-headers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-07-22 14:33:45 +02:00
Michael Niedermayer
7874d40f10 swscale/slice: Fix wrong return on error
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 15:21:37 +02:00
Michael Niedermayer
fa1e158ef6 swscale/utils: Use full chroma interpolation for rgb4/8 and dither none
Dither none is only implemented in full chroma interpolation for these rgb formats
Its also a obscure choice (producing less nice images) that implementing it in the
other code-paths makes no sense

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Michael Niedermayer
7528532550 swscale/output: Implement dither none for yuv2rgb_write_full()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Michael Niedermayer
997f9cfc12 swscale/slice: Check slice for allocation failure
Fixes: null pointer dereference
Fixes: alloc_slice.mp4

Found-by: Rafael Dutra <rafael.dutra@cispa.de>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Anton Khirnov
37c0fe49b7 sws: move updating the palette higher up
It does not interact in any way with the code setting up the image
pointers/strides, so it should not be intermixed with it.
2021-07-03 16:13:40 +02:00
Anton Khirnov
d6649d9a3b sws: move initializing dither_error higher up
It does not interact in any way with the code setting up the image
pointers/strides, so it should not be intermixed with it.
2021-07-03 16:13:10 +02:00
Anton Khirnov
e188985598 sws: move the early return for zero-sized slices higher up
Place it right after the input parameter validation. There is no point
in performing any setup if the sws_scale() call won't do anything.
2021-07-03 16:09:43 +02:00
Anton Khirnov
a91e6c927e sws: simplify setting sliceDir 2021-07-03 16:09:21 +02:00
Anton Khirnov
ff753f41dd sws: merge handling frame start into a single block
Also, return an error code on failure rather than 0.
2021-07-03 16:09:07 +02:00
Anton Khirnov
1b11a324fe sws: make checking for the start of a new frame more explicit 2021-07-03 16:07:22 +02:00
Anton Khirnov
0fb014b7bb sws: reset sliceDir at the end of sws_scale()
Makes it more clear that resetting it does not interact with the scaling
code that it is currently intermixed with.
2021-07-03 16:05:39 +02:00
Anton Khirnov
1f80789bf7 sws: rename SwsContext.swscale to convert_unscaled
That function pointer is now used only for unscaled conversion.
2021-07-03 15:57:53 +02:00
Anton Khirnov
fe490ec165 sws: separate the calls to scaled vs unscaled conversion
Call the scaler function directly rather than through a function
pointer. Drop the now-unused return value from ff_getSwsFunc() and
rename the function to reflect its new role.

This will be useful in the following commits, where it will become
important that the amount of output is different for scaled vs unscaled
case.
2021-07-03 15:57:13 +02:00
Anton Khirnov
0f8e0957d2 sws: do not reallocate scratch buffers for each slice 2021-07-03 15:56:16 +02:00
Anton Khirnov
2730639259 sws: group the parameters validity checks together
Also, fail with an error code rather than 0.
2021-07-03 15:31:18 +02:00
Anton Khirnov
c05cab34a9 sws: initialize {src,dst}Stride2 consistently with {src,dst}2 2021-07-03 15:31:08 +02:00
Anton Khirnov
d3d8e09640 sws: cosmetics
Reindent after previous commit, rewrap long lines.
2021-07-03 15:30:56 +02:00
Anton Khirnov
f136493d03 sws: factor out cascaded scaling 2021-07-03 15:30:34 +02:00
Anton Khirnov
a2254aedc9 sws: cosmetics
Reindent after previous commit, split long lines.
2021-07-03 15:30:20 +02:00
Anton Khirnov
44f12718bf sws: factor out gamma-correct scaling 2021-07-03 15:29:50 +02:00
Anton Khirnov
e355af9be9 sws: return an error code on invalid parameters to sws_scale() 2021-07-03 15:29:35 +02:00
Anton Khirnov
21a4e48f88 sws: reindent after previous commit 2021-07-03 15:29:22 +02:00
Anton Khirnov
27acca1af0 sws: factor out updating the palette 2021-07-03 15:28:46 +02:00
Anton Khirnov
f8c21ccbfc sws: remove unnecessary braces
There used to be more code inside them, but it was removed in
6de58b4903.
2021-07-03 15:28:36 +02:00
Peter Lundblad
da0abbbb01 libswscale: Make sws_init_context thread safe.
Call ff_sws_rgb2rgb_init via ff_thread_once instead of checking one of the
variables it updates.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-01 23:49:41 +02:00
Limin Wang
43295ae6a9 swscale/swscale_unscaled: don't use the optimized bgr24toYV12 unscaled conversion when width%2
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2021-06-06 12:34:05 +08:00
Anton Khirnov
85ba17f36d Bump major versions of all libraries.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-27 11:48:05 -03:00
Andreas Rheinhardt
ea2d9b7a2e libswscale: Remove unused deprecated functions, make used ones static
Deprecated in 3b905b9fe6.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-04-27 10:43:11 -03:00
Andreas Rheinhardt
f3c197b129 Include attributes.h directly
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
Alan Kelly
3ce8d09244 libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-04-01 20:47:52 +02:00
Alan Kelly
dc57762cb4 libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-04-01 20:47:52 +02:00
Michael Niedermayer
c361fa9e21 Bump minor versions after release branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-03-20 01:02:11 +01:00
Michael Niedermayer
c67d2a2875 Bump Versions before release/4.4 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-03-20 01:01:12 +01:00
Andreas Rheinhardt
c23a5523b5 swscale/x86/swscale: Remove unused ASM constants
The last user of g15Mask, r15Mask, g16Mask and r16Mask was disabled
in 77a416e8aa and finally removed in
36e8de07ed62609df45d064b56501e3084d25723; b15Mask and b16Mask were
apparently always unused (except for in_asm_used_var_warning_killer,
a function that only existed to make the compiler not optimize ASM
constants away).
w10 is unused since d604bab901, w02
since ef423a6618.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:47:54 +01:00
Andreas Rheinhardt
aad597a93c swscale/x86/rgb2rgb: Remove unused ASM constants
mask24hh etc. are unused since f099fbf5f3,
mask32b and mask32r since 296609f859,
mask32g since b38d487466 and mask32 since
f8a138be52.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:45:17 +01:00
Andreas Rheinhardt
49db6e4b4e swscale/x86/yuv2rgb: Remove unused ASM constants
mmx_grnmask is unused since 531f97b0c3,
the other constants since e934194b6a.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:43:14 +01:00
Chip Kerchner
e7f53d6ac9 lsws/ppc/yuv2rgb_altivec: Fix build in non-VSX environments
Add inline function for vec_xl if VSX is not supported. vec_xl intrinsic
is only available on POWER 7 or higher.

Fixes ticket #8750.

Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>
2021-02-22 23:19:21 -05:00
James Almer
1a555d3c60 swscale/x86/yuv2yuvX: use the movsxdifnidn helper macro
Simplifies code

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:43 -03:00
James Almer
ebb48d85a0 swscale/x86/yuv2yuvX: use movq to load 8 bytes in all non-AVX2 functions
mova expands to movq on non-XMM functions

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:43 -03:00
James Almer
d512ebbaed swscale/x86/yuv2yuvX: use the SPLATW helper macro
Simplifies code

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:43 -03:00
James Almer
c00567647e swscale/x86/swscale: fix mix of inline and external function definitions
This includes removing pointless static function forward declarations.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:42 -03:00
James Almer
c2bf1dcace swscale/x86/swscale: fix compilation with old yasm
Where AVX2 may not be supported.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-17 21:09:36 -03:00
Alan Kelly
554c2bc708 swscale: move yuv2yuvX_sse3 to yasm, unrolls main loop
And other small optimizations for ~20% speedup.
2021-02-17 21:21:03 +01:00
Carl Eugen Hoyos
2687070d9b lsws/ppc/yuv2rgb: Fix transparency converting from yuv->rgb32.
Based on 68363b69 by Reimar Döffinger.

Fixes ticket #9077.
2021-01-24 17:17:29 +01:00
Anton Khirnov
e15371061d lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump
They are not properly namespaced and not intended for public use.
2021-01-01 14:14:57 +01:00
Anton Khirnov
c8c2dfbc37 lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h
That is a more appropriate place for it.
2021-01-01 14:11:01 +01:00
Jeremy Leconte
29cef1bcd6 libswscale: avoid UB nullptr-with-offset.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-12-24 15:27:56 +01:00
Andriy Gelman
1200264fc4 swscale/rgb2rgb_template: use shuffle macro on big-endian arches
Fixes fate-qtrle-32bit on big-endian.

The macro does a simple byte swap on uint8 array without any casts, so
it's valid on big-endian arches.

The mentioned test was failing because the byteswap function
shuffle_bytes_3210_c() is used in the pixel format conversion
(argb->bgra).

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>
2020-12-12 23:07:22 -05:00
Carl Eugen Hoyos
46e362b765 lsws/x86/yuv2rgb: Fix compilation with mmxext or ssse3 disabled.
Fixes ticket #8986.
2020-11-14 15:37:57 +01:00
Marton Balint
993429cfb4 swscale/x86/yuv2rgb: fix crashes when loading alpha from unaligned buffers
Regression since fc6a5883d6 on SSSE3 enabled
CPUs.

Fixes ticket #8955.

Signed-off-by: Marton Balint <cus@passwd.hu>
2020-11-02 00:31:34 +01:00