FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-02 03:06:28 +02:00

Author	SHA1	Message	Date
Andreas Rheinhardt	d2428d80ce	swscale/input: Remove spec-incompliant ';' These macros are definitions, not only declarations and therefore should not contain a semicolon. Such a semicolon is actually spec-incompliant, but compilers happen to accept them. Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-08 19:21:30 +02:00
Philip Langdale	4a59eba227	swscale/input: add support for Y212LE	2022-09-06 12:49:10 -07:00
Philip Langdale	198b5b90d5	swscale/input: add support for XV30LE	2022-09-06 12:49:10 -07:00
Philip Langdale	5bdd726115	swscale/input: add support for P012 As we now have three of these formats, I added macros to generate the conversion functions.	2022-09-06 12:49:10 -07:00
Philip Langdale	8d9462844a	swscale/input: add support for XV36LE	2022-09-06 12:49:10 -07:00
Philip Langdale	45726aa117	libswscale: add support for VUYX format As we already have support for VUYA, I figured I should do the small amount of work to support VUYX as well. That means a little refactoring to share code.	2022-08-25 19:03:49 -07:00
Andreas Rheinhardt	de33506e4b	swscale/x86/rgb_2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x filter-paletteuse-bayer filter-paletteuse-bayer0 filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to "mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten when using SSSE3). Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-23 12:21:00 +02:00
Timo Rothenpieler	aca569aad2	swscale/input: add rgbaf16 input support This is by no means perfect, since at least ddagrab will return scRGB data with values outside of 0.0f to 1.0f for HDR values. Its primary purpose is to be able to work with the format at all.	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	f2de911818	swscale: add opaque parameter to input functions	2022-08-19 22:09:36 +02:00
Andreas Rheinhardt	8bec225c3c	swscale/x86/yuv2yuvX: Remove unused ff_yuv2yuvX_mmx() Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-19 12:01:34 +02:00
Alan Kelly	a38293e444	libswscale: Enable hscale_avx2 for all input sizes. ff_shuffle_filter_coefficients shuffles the tail as required. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-08-18 16:24:48 +02:00
Alan Kelly	a6724285fd	sws: allow avx2 hscale to process inputs of any size. The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-08-18 16:24:48 +02:00
Alan Kelly	51a34e8525	sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-18 16:19:13 +02:00
Swinney, Jonathan	0d7caa5b09	swscale/aarch64: add vscale specializations This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. On AWS c7g (Graviton 3, Neoverse V1) instances: before after yuv2yuvX_2_0_512_accurate_neon: 558.8 268.9 yuv2yuvX_4_0_512_accurate_neon: 637.5 434.9 yuv2yuvX_8_0_512_accurate_neon: 1144.8 806.2 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1853.7 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 13:40:42 +03:00
Swinney, Jonathan	3e708722a2	swscale/aarch64: vscale optimization Use scalar times vector multiply accumlate instructions instead of vector times vector to remove the need for replicating load instructions which are slightly slower. On AWS c7g (Graviton 3, Neoverse V1) instances: yuv2yuvX_8_0_512_accurate_neon: 1144.8 987.4 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 13:40:42 +03:00
Swinney, Jonathan	4dcd191a50	checkasm: updated tests for sw_scale Change the reference to exactly match the C reference in swscale, instead of exactly matching the x86 SIMD implementations (which differs slightly). Test with and without SWS_ACCURATE_RND - if this flag isn't set, the output must match the C reference exactly, otherwise it is allowed to be off by 2. Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND is set - apparently this discrepancy hasn't been noticed in other exact tests before. Add a test for yuv2plane1. Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 13:40:42 +03:00
Swinney, Jonathan	75ffca7eef	libswscale/aarch64: add another hscale specialization This specialization handles the case where filtersize is 4 mod 8, e.g. 12, 20, etc. Aarch64 was previously using the c function for this case. This implementation speeds up that case significantly. hscale_8_to_15__fs_12_dstW_512_c: 6234.1 hscale_8_to_15__fs_12_dstW_512_neon: 1505.6 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 12:08:38 +03:00
Timo Rothenpieler	b77fff47d0	configure: always enable gnu_windres if available Use the appropiate Makefile variable to ensure the resource file is only built into shared libraries instead.	2022-08-13 14:42:36 +02:00
James Almer	68e017c487	swscale/output: fix reading chroma values when generating vuya output Signed-off-by: James Almer <jamrial@gmail.com>	2022-08-08 09:39:33 -03:00
James Almer	1974813261	swscale/output: add VUYA output support Signed-off-by: James Almer <jamrial@gmail.com>	2022-08-07 09:33:16 -03:00
James Almer	f0abd07996	swscale/input: add VUYA input support Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: James Almer <jamrial@gmail.com>	2022-08-05 09:39:21 -03:00
Andreas Rheinhardt	da668fa7d2	swscale/rgb2rgb: Don't cast const away Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-31 01:09:52 +02:00
Matthieu Bouron	0a6bb7da55	swscale: add NV16 input/output Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-07-19 12:20:16 +02:00
Michael Niedermayer	fd26b07e8b	Bump versions after 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-07-13 00:29:05 +02:00
Michael Niedermayer	6f1b144358	Bump Versions for 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-07-13 00:27:37 +02:00
Andreas Rheinhardt	81d3472031	swscale/x86/swscale: Simplify macro This is possible now that it is no longer used by MMX. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:36:18 +02:00
Andreas Rheinhardt	a05f22eaf3	swscale/x86/swscale: Remove obsolete and harmful MMX(EXT) functions x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Moreover, some of the removed code was buggy/not bitexact and lead to failures involving the f32le and f32be versions of gray, gbrp and gbrap on x86-32 when SSE2 was not disabled. See e.g. https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx Notice that yuv2yuvX_mmx is not removed, because it is used by SSE3 and AVX2 as fallback in case of unaligned data and also for tail processing. I don't know why yuv2yuvX_mmxext isn't being used for this; an earlier version [1] of `554c2bc708` used it, but the version that was eventually applied does not. [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:36:04 +02:00
Andreas Rheinhardt	2831837182	swscale/x86/yuv2rgb: Remove obsolete MMX functions x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:35:50 +02:00
Andreas Rheinhardt	608319a311	swscale/x86/rgb2rgb: Remove obsolete MMX, 3dnow functions x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:35:38 +02:00
Andreas Rheinhardt	40e6575aa3	all: Replace if (ARCH_FOO) checks by #if ARCH_FOO This is more spec-compliant because it does not rely on dead-code elimination by the compiler. Especially MSVC has problems with this, as can be seen in https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html or https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html This commit does not eliminate every instance where we rely on dead code elimination: It only tackles branching to the initialization of arch-specific dsp code, not e.g. all uses of CONFIG_ and HAVE_ checks. But maybe it is already enough to compile FFmpeg with MSVC with whole-programm-optimizations enabled (if one does not disable too many components). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-15 04:56:37 +02:00
Vardan Margaryan	73302aa193	swscale/x86/yuv_2_rgb: fix access to memory past the frame data in yuv to rgb conversion Y, U, V data is loaded at the end of the current iteration for the next iteration. It results in memory access past the frame data on the last iteration (that data is never used after the loading). So load data at the start of the iteration, so that only useful data is loaded. Signed-off-by: Vardan Margaryan <v.t.margaryan@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-06-06 09:51:17 +02:00
Swinney, Jonathan	0ea61725b1	swscale/aarch64: add hscale specializations This patch adds code to support specializations of the hscale function and adds a specialization for filterSize == 4. ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is loading the data from src, this data is loaded a whole block ahead and stored back to the stack to be loaded again with ld4. This arranges the data for most efficient use of the vector instructions and removes the need for completion adds at the end. The number of iterations of the C per iteration of the assembly is increased from 4 to 8, but because of the prefetching, there must be a special section without prefetching when dstW < 16. This improves speed on Graviton 2 (Neoverse N1) dramatically in the case where previously fs=8 would have been required. before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8 after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-05-28 01:09:05 +03:00
Andreas Rheinhardt	f2b79c5b85	lib*/version: Move library version functions into files of their own This avoids having to rebuild big files every time FFMPEG_VERSION changes (which it does with every commit). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-10 06:49:32 +02:00
Martin Storsjö	70db14376c	swscale: aarch64: Optimize the final summation in the hscale routine Before: Cortex A53 A72 A73 Graviton 2 Graviton 3 hscale_8_to_15_width8_neon: 8273.0 4602.5 4289.5 2429.7 1629.1 hscale_8_to_15_width16_neon: 12405.7 6803.0 6359.0 3549.0 2378.4 hscale_8_to_15_width32_neon: 21258.7 11491.7 11469.2 5797.2 3919.6 hscale_8_to_15_width40_neon: 25652.0 14173.7 12488.2 6893.5 4810.4 After: hscale_8_to_15_width8_neon: 7633.0 3981.5 3350.2 1980.7 1261.1 hscale_8_to_15_width16_neon: 11666.7 5951.0 5512.0 3080.7 2131.4 hscale_8_to_15_width32_neon: 20900.7 10733.2 9481.7 5275.2 3862.1 hscale_8_to_15_width40_neon: 24826.0 13536.2 11502.0 6397.2 4731.9 Thus, this gives overall a 8-29% speedup for the smaller filter sizes, around 1-8% for the larger filter sizes. Inspired by a patch by Jonathan Swinney <jswinney@amazon.com>. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-04-22 10:49:46 +03:00
Martin Storsjö	2d368392a5	Keep including the full version.h when headers are included externally This avoids unnecessary churn and build breakage for users, by making sure the whole version.h is included like it has been so far, while keeping the benefit of not needing to rebuild most files in the ffmpeg tree on minor/micro bumps. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-03-19 00:01:57 +02:00
Martin Storsjö	f3a0e2ee2b	doc: Add an entry to APIchanges about changes to version.h and version_major.h Also bump the minor versions of all libraries, to signify the API change of splitting the version.h headers and adding the new version_major.h header. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-03-16 14:12:46 +02:00
Martin Storsjö	6cd2ac388d	libswscale: Split version.h Signed-off-by: Martin Storsjö <martin@martin.st>	2022-03-16 14:05:26 +02:00
Martin Storsjö	c523724c69	swscale: Take the destination range into account for yuv->rgb->yuv conversions The range parameters need to be set up before calling sws_init_context (which selects which fastpaths can be used; this gets called by sws_getContext); solely passing them via sws_setColorspaceDetails isn't enough. This fixes producing full range YUV range output when doing YUV->YUV conversions between different YUV color spaces. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-02-25 11:01:17 +02:00
Andreas Rheinhardt	636631d9db	Remove unnecessary libavutil/(avutil\|common\|internal).h inclusions Some of these were made possible by moving several common macros to libavutil/macros.h. While just at it, also improve the other headers a bit. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-02-24 12:56:49 +01:00
Andreas Rheinhardt	155cd6baa4	Remove obsolete version.h inclusions Forgotten in `e7bd47e657`. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-02-24 12:56:49 +01:00
Alan Kelly	e534d98af3	libswscale: Re-factor ff_shuffle_filter_coefficients. Make the code more readable and follow the style guide. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-02-17 17:17:22 +01:00
Alan Kelly	f1a5414c97	libswscale: Check and propagate memory allocation errors from ff_shuffle_filter_coefficients. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-02-17 17:17:07 +01:00
Andreas Rheinhardt	71e2825150	swscale/x86/swscale: Remove superfluous and invalid ';' Inside a function an unnecessary ';' is just a null statement; yet outside of it it is actually illegal (but compilers happen to accept it without warning except when using -pedantic). So modify the macros to always expect the user to add a ';'. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-01-22 17:00:45 +01:00
Mark Reid	52f7026164	swscale/x86/input.asm: add x86-optimized planer rgb2yuv functions sse2 only operates on 2 lanes per loop for to_y and to_uv functions, due to the lack of pmulld instruction. Emulating pmulld with 2 pmuludq and shuffles proved too costly and made to_uv functions slower then the c implementation. For to_y on sse2 only float functions are generated, I was are not able outperform the c implementation on the integer pixel formats. For to_a on see4 only the float functions are generated. sse2 and sse4 generated nearly identical performing code on integer pixel formats, so only sse2/avx2 versions are generated. planar_gbrp_to_y_512_c: 1197.5 planar_gbrp_to_y_512_sse4: 444.5 planar_gbrp_to_y_512_avx2: 287.5 planar_gbrap_to_y_512_c: 1204.5 planar_gbrap_to_y_512_sse4: 447.5 planar_gbrap_to_y_512_avx2: 289.5 planar_gbrp9be_to_y_512_c: 1380.0 planar_gbrp9be_to_y_512_sse4: 543.5 planar_gbrp9be_to_y_512_avx2: 340.0 planar_gbrp9le_to_y_512_c: 1200.5 planar_gbrp9le_to_y_512_sse4: 442.0 planar_gbrp9le_to_y_512_avx2: 282.0 planar_gbrp10be_to_y_512_c: 1378.5 planar_gbrp10be_to_y_512_sse4: 544.0 planar_gbrp10be_to_y_512_avx2: 337.5 planar_gbrp10le_to_y_512_c: 1200.0 planar_gbrp10le_to_y_512_sse4: 448.0 planar_gbrp10le_to_y_512_avx2: 285.5 planar_gbrap10be_to_y_512_c: 1380.0 planar_gbrap10be_to_y_512_sse4: 542.0 planar_gbrap10be_to_y_512_avx2: 340.5 planar_gbrap10le_to_y_512_c: 1199.0 planar_gbrap10le_to_y_512_sse4: 446.0 planar_gbrap10le_to_y_512_avx2: 289.5 planar_gbrp12be_to_y_512_c: 10563.0 planar_gbrp12be_to_y_512_sse4: 542.5 planar_gbrp12be_to_y_512_avx2: 339.0 planar_gbrp12le_to_y_512_c: 1201.0 planar_gbrp12le_to_y_512_sse4: 440.5 planar_gbrp12le_to_y_512_avx2: 286.0 planar_gbrap12be_to_y_512_c: 1701.5 planar_gbrap12be_to_y_512_sse4: 917.0 planar_gbrap12be_to_y_512_avx2: 338.5 planar_gbrap12le_to_y_512_c: 1201.0 planar_gbrap12le_to_y_512_sse4: 444.5 planar_gbrap12le_to_y_512_avx2: 288.0 planar_gbrp14be_to_y_512_c: 1370.5 planar_gbrp14be_to_y_512_sse4: 545.0 planar_gbrp14be_to_y_512_avx2: 338.5 planar_gbrp14le_to_y_512_c: 1199.0 planar_gbrp14le_to_y_512_sse4: 444.0 planar_gbrp14le_to_y_512_avx2: 279.5 planar_gbrp16be_to_y_512_c: 1364.0 planar_gbrp16be_to_y_512_sse4: 544.5 planar_gbrp16be_to_y_512_avx2: 339.5 planar_gbrp16le_to_y_512_c: 1201.0 planar_gbrp16le_to_y_512_sse4: 445.5 planar_gbrp16le_to_y_512_avx2: 280.5 planar_gbrap16be_to_y_512_c: 1377.0 planar_gbrap16be_to_y_512_sse4: 545.0 planar_gbrap16be_to_y_512_avx2: 338.5 planar_gbrap16le_to_y_512_c: 1201.0 planar_gbrap16le_to_y_512_sse4: 442.0 planar_gbrap16le_to_y_512_avx2: 279.0 planar_gbrpf32be_to_y_512_c: 4113.0 planar_gbrpf32be_to_y_512_sse2: 2438.0 planar_gbrpf32be_to_y_512_sse4: 1068.0 planar_gbrpf32be_to_y_512_avx2: 904.5 planar_gbrpf32le_to_y_512_c: 3818.5 planar_gbrpf32le_to_y_512_sse2: 2024.5 planar_gbrpf32le_to_y_512_sse4: 1241.5 planar_gbrpf32le_to_y_512_avx2: 657.0 planar_gbrapf32be_to_y_512_c: 3707.0 planar_gbrapf32be_to_y_512_sse2: 2444.0 planar_gbrapf32be_to_y_512_sse4: 1077.0 planar_gbrapf32be_to_y_512_avx2: 909.0 planar_gbrapf32le_to_y_512_c: 3822.0 planar_gbrapf32le_to_y_512_sse2: 2024.5 planar_gbrapf32le_to_y_512_sse4: 1176.0 planar_gbrapf32le_to_y_512_avx2: 658.5 planar_gbrp_to_uv_512_c: 2325.8 planar_gbrp_to_uv_512_sse2: 1726.8 planar_gbrp_to_uv_512_sse4: 771.8 planar_gbrp_to_uv_512_avx2: 506.8 planar_gbrap_to_uv_512_c: 2281.8 planar_gbrap_to_uv_512_sse2: 1726.3 planar_gbrap_to_uv_512_sse4: 768.3 planar_gbrap_to_uv_512_avx2: 496.3 planar_gbrp9be_to_uv_512_c: 2336.8 planar_gbrp9be_to_uv_512_sse2: 1924.8 planar_gbrp9be_to_uv_512_sse4: 852.3 planar_gbrp9be_to_uv_512_avx2: 552.8 planar_gbrp9le_to_uv_512_c: 2270.3 planar_gbrp9le_to_uv_512_sse2: 1512.3 planar_gbrp9le_to_uv_512_sse4: 764.3 planar_gbrp9le_to_uv_512_avx2: 491.3 planar_gbrp10be_to_uv_512_c: 2281.8 planar_gbrp10be_to_uv_512_sse2: 1917.8 planar_gbrp10be_to_uv_512_sse4: 855.3 planar_gbrp10be_to_uv_512_avx2: 541.3 planar_gbrp10le_to_uv_512_c: 2269.8 planar_gbrp10le_to_uv_512_sse2: 1515.3 planar_gbrp10le_to_uv_512_sse4: 759.8 planar_gbrp10le_to_uv_512_avx2: 487.8 planar_gbrap10be_to_uv_512_c: 2382.3 planar_gbrap10be_to_uv_512_sse2: 1924.8 planar_gbrap10be_to_uv_512_sse4: 855.3 planar_gbrap10be_to_uv_512_avx2: 540.8 planar_gbrap10le_to_uv_512_c: 2382.3 planar_gbrap10le_to_uv_512_sse2: 1512.3 planar_gbrap10le_to_uv_512_sse4: 759.3 planar_gbrap10le_to_uv_512_avx2: 484.8 planar_gbrp12be_to_uv_512_c: 2283.8 planar_gbrp12be_to_uv_512_sse2: 1936.8 planar_gbrp12be_to_uv_512_sse4: 858.3 planar_gbrp12be_to_uv_512_avx2: 541.3 planar_gbrp12le_to_uv_512_c: 2278.8 planar_gbrp12le_to_uv_512_sse2: 1507.3 planar_gbrp12le_to_uv_512_sse4: 760.3 planar_gbrp12le_to_uv_512_avx2: 485.8 planar_gbrap12be_to_uv_512_c: 2385.3 planar_gbrap12be_to_uv_512_sse2: 1927.8 planar_gbrap12be_to_uv_512_sse4: 855.3 planar_gbrap12be_to_uv_512_avx2: 539.8 planar_gbrap12le_to_uv_512_c: 2377.3 planar_gbrap12le_to_uv_512_sse2: 1516.3 planar_gbrap12le_to_uv_512_sse4: 759.3 planar_gbrap12le_to_uv_512_avx2: 484.8 planar_gbrp14be_to_uv_512_c: 2283.8 planar_gbrp14be_to_uv_512_sse2: 1935.3 planar_gbrp14be_to_uv_512_sse4: 852.3 planar_gbrp14be_to_uv_512_avx2: 540.3 planar_gbrp14le_to_uv_512_c: 2276.8 planar_gbrp14le_to_uv_512_sse2: 1514.8 planar_gbrp14le_to_uv_512_sse4: 762.3 planar_gbrp14le_to_uv_512_avx2: 484.8 planar_gbrp16be_to_uv_512_c: 2383.3 planar_gbrp16be_to_uv_512_sse2: 1881.8 planar_gbrp16be_to_uv_512_sse4: 852.3 planar_gbrp16be_to_uv_512_avx2: 541.8 planar_gbrp16le_to_uv_512_c: 2378.3 planar_gbrp16le_to_uv_512_sse2: 1476.8 planar_gbrp16le_to_uv_512_sse4: 765.3 planar_gbrp16le_to_uv_512_avx2: 485.8 planar_gbrap16be_to_uv_512_c: 2382.3 planar_gbrap16be_to_uv_512_sse2: 1886.3 planar_gbrap16be_to_uv_512_sse4: 853.8 planar_gbrap16be_to_uv_512_avx2: 550.8 planar_gbrap16le_to_uv_512_c: 2381.8 planar_gbrap16le_to_uv_512_sse2: 1488.3 planar_gbrap16le_to_uv_512_sse4: 765.3 planar_gbrap16le_to_uv_512_avx2: 491.8 planar_gbrpf32be_to_uv_512_c: 4863.0 planar_gbrpf32be_to_uv_512_sse2: 3347.5 planar_gbrpf32be_to_uv_512_sse4: 1800.0 planar_gbrpf32be_to_uv_512_avx2: 1199.0 planar_gbrpf32le_to_uv_512_c: 4725.0 planar_gbrpf32le_to_uv_512_sse2: 2753.0 planar_gbrpf32le_to_uv_512_sse4: 1474.5 planar_gbrpf32le_to_uv_512_avx2: 927.5 planar_gbrapf32be_to_uv_512_c: 4859.0 planar_gbrapf32be_to_uv_512_sse2: 3269.0 planar_gbrapf32be_to_uv_512_sse4: 1802.0 planar_gbrapf32be_to_uv_512_avx2: 1201.5 planar_gbrapf32le_to_uv_512_c: 6338.0 planar_gbrapf32le_to_uv_512_sse2: 2756.5 planar_gbrapf32le_to_uv_512_sse4: 1476.0 planar_gbrapf32le_to_uv_512_avx2: 908.5 planar_gbrap_to_a_512_c: 383.3 planar_gbrap_to_a_512_sse2: 66.8 planar_gbrap_to_a_512_avx2: 43.8 planar_gbrap10be_to_a_512_c: 601.8 planar_gbrap10be_to_a_512_sse2: 86.3 planar_gbrap10be_to_a_512_avx2: 34.8 planar_gbrap10le_to_a_512_c: 602.3 planar_gbrap10le_to_a_512_sse2: 48.8 planar_gbrap10le_to_a_512_avx2: 31.3 planar_gbrap12be_to_a_512_c: 601.8 planar_gbrap12be_to_a_512_sse2: 111.8 planar_gbrap12be_to_a_512_avx2: 41.3 planar_gbrap12le_to_a_512_c: 385.8 planar_gbrap12le_to_a_512_sse2: 75.3 planar_gbrap12le_to_a_512_avx2: 39.8 planar_gbrap16be_to_a_512_c: 386.8 planar_gbrap16be_to_a_512_sse2: 79.8 planar_gbrap16be_to_a_512_avx2: 31.3 planar_gbrap16le_to_a_512_c: 600.3 planar_gbrap16le_to_a_512_sse2: 40.3 planar_gbrap16le_to_a_512_avx2: 30.3 planar_gbrapf32be_to_a_512_c: 1148.8 planar_gbrapf32be_to_a_512_sse2: 611.3 planar_gbrapf32be_to_a_512_sse4: 234.8 planar_gbrapf32be_to_a_512_avx2: 183.3 planar_gbrapf32le_to_a_512_c: 851.3 planar_gbrapf32le_to_a_512_sse2: 263.3 planar_gbrapf32le_to_a_512_sse4: 199.3 planar_gbrapf32le_to_a_512_avx2: 156.8 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2022-01-11 16:34:33 -03:00
Mark Reid	9e445a5be2	swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions changes since v2: * fixed label changes since v1: * remove vex intruction on sse4 path * some load/pack marcos use less intructions * fixed some typos yuv2gbrp_full_X_4_512_c: 12757.6 yuv2gbrp_full_X_4_512_sse2: 8946.6 yuv2gbrp_full_X_4_512_sse4: 5138.6 yuv2gbrp_full_X_4_512_avx2: 3889.6 yuv2gbrap_full_X_4_512_c: 15368.6 yuv2gbrap_full_X_4_512_sse2: 11916.1 yuv2gbrap_full_X_4_512_sse4: 6294.6 yuv2gbrap_full_X_4_512_avx2: 3477.1 yuv2gbrp9be_full_X_4_512_c: 14381.6 yuv2gbrp9be_full_X_4_512_sse2: 9139.1 yuv2gbrp9be_full_X_4_512_sse4: 5150.1 yuv2gbrp9be_full_X_4_512_avx2: 2834.6 yuv2gbrp9le_full_X_4_512_c: 12990.1 yuv2gbrp9le_full_X_4_512_sse2: 9118.1 yuv2gbrp9le_full_X_4_512_sse4: 5132.1 yuv2gbrp9le_full_X_4_512_avx2: 2833.1 yuv2gbrp10be_full_X_4_512_c: 14401.6 yuv2gbrp10be_full_X_4_512_sse2: 9133.1 yuv2gbrp10be_full_X_4_512_sse4: 5126.1 yuv2gbrp10be_full_X_4_512_avx2: 2837.6 yuv2gbrp10le_full_X_4_512_c: 12718.1 yuv2gbrp10le_full_X_4_512_sse2: 9106.1 yuv2gbrp10le_full_X_4_512_sse4: 5120.1 yuv2gbrp10le_full_X_4_512_avx2: 2826.1 yuv2gbrap10be_full_X_4_512_c: 18535.6 yuv2gbrap10be_full_X_4_512_sse2: 33617.6 yuv2gbrap10be_full_X_4_512_sse4: 6264.1 yuv2gbrap10be_full_X_4_512_avx2: 3422.1 yuv2gbrap10le_full_X_4_512_c: 16724.1 yuv2gbrap10le_full_X_4_512_sse2: 11787.1 yuv2gbrap10le_full_X_4_512_sse4: 6282.1 yuv2gbrap10le_full_X_4_512_avx2: 3441.6 yuv2gbrp12be_full_X_4_512_c: 13723.6 yuv2gbrp12be_full_X_4_512_sse2: 9128.1 yuv2gbrp12be_full_X_4_512_sse4: 7997.6 yuv2gbrp12be_full_X_4_512_avx2: 2844.1 yuv2gbrp12le_full_X_4_512_c: 12257.1 yuv2gbrp12le_full_X_4_512_sse2: 9107.6 yuv2gbrp12le_full_X_4_512_sse4: 5142.6 yuv2gbrp12le_full_X_4_512_avx2: 2837.6 yuv2gbrap12be_full_X_4_512_c: 18511.1 yuv2gbrap12be_full_X_4_512_sse2: 12156.6 yuv2gbrap12be_full_X_4_512_sse4: 6251.1 yuv2gbrap12be_full_X_4_512_avx2: 3444.6 yuv2gbrap12le_full_X_4_512_c: 16687.1 yuv2gbrap12le_full_X_4_512_sse2: 11785.1 yuv2gbrap12le_full_X_4_512_sse4: 6243.6 yuv2gbrap12le_full_X_4_512_avx2: 3446.1 yuv2gbrp14be_full_X_4_512_c: 13690.6 yuv2gbrp14be_full_X_4_512_sse2: 9120.6 yuv2gbrp14be_full_X_4_512_sse4: 5138.1 yuv2gbrp14be_full_X_4_512_avx2: 2843.1 yuv2gbrp14le_full_X_4_512_c: 14995.6 yuv2gbrp14le_full_X_4_512_sse2: 9119.1 yuv2gbrp14le_full_X_4_512_sse4: 5126.1 yuv2gbrp14le_full_X_4_512_avx2: 2843.1 yuv2gbrp16be_full_X_4_512_c: 12367.1 yuv2gbrp16be_full_X_4_512_sse2: 8233.6 yuv2gbrp16be_full_X_4_512_sse4: 4820.1 yuv2gbrp16be_full_X_4_512_avx2: 2666.6 yuv2gbrp16le_full_X_4_512_c: 10904.1 yuv2gbrp16le_full_X_4_512_sse2: 8214.1 yuv2gbrp16le_full_X_4_512_sse4: 4824.1 yuv2gbrp16le_full_X_4_512_avx2: 2629.1 yuv2gbrap16be_full_X_4_512_c: 26569.6 yuv2gbrap16be_full_X_4_512_sse2: 10884.1 yuv2gbrap16be_full_X_4_512_sse4: 5488.1 yuv2gbrap16be_full_X_4_512_avx2: 3272.1 yuv2gbrap16le_full_X_4_512_c: 14010.1 yuv2gbrap16le_full_X_4_512_sse2: 10562.1 yuv2gbrap16le_full_X_4_512_sse4: 5463.6 yuv2gbrap16le_full_X_4_512_avx2: 3255.1 yuv2gbrpf32be_full_X_4_512_c: 14524.1 yuv2gbrpf32be_full_X_4_512_sse2: 8552.6 yuv2gbrpf32be_full_X_4_512_sse4: 4636.1 yuv2gbrpf32be_full_X_4_512_avx2: 2474.6 yuv2gbrpf32le_full_X_4_512_c: 13060.6 yuv2gbrpf32le_full_X_4_512_sse2: 9682.6 yuv2gbrpf32le_full_X_4_512_sse4: 4298.1 yuv2gbrpf32le_full_X_4_512_avx2: 2453.1 yuv2gbrapf32be_full_X_4_512_c: 18629.6 yuv2gbrapf32be_full_X_4_512_sse2: 11363.1 yuv2gbrapf32be_full_X_4_512_sse4: 15201.6 yuv2gbrapf32be_full_X_4_512_avx2: 3727.1 yuv2gbrapf32le_full_X_4_512_c: 16677.6 yuv2gbrapf32le_full_X_4_512_sse2: 10221.6 yuv2gbrapf32le_full_X_4_512_sse4: 5693.6 yuv2gbrapf32le_full_X_4_512_avx2: 3656.6 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2022-01-11 16:33:17 -03:00
rcombs	df9180d8a0	swscale/output: use isSwappedChroma	2022-01-04 19:39:22 -06:00
rcombs	cb3a6cc082	swscale/output: use isSemiPlanarYUV for NV12/21/24/42 case	2022-01-04 19:39:22 -06:00
rcombs	f8e284be69	swscale: introduce isSwappedChroma	2022-01-04 19:39:22 -06:00
rcombs	bb4f19f2a2	swscale/output: use isDataInHighBits for 10-bit case This code will need fleshing-out (probably templating) if we ever add e.g. a P012 format.	2022-01-04 19:39:22 -06:00
rcombs	cf9e8cb52f	swscale/output: use isSemiPlanarYUV for 16-bit case	2022-01-04 19:39:22 -06:00
rcombs	e5d83463c8	swscale: introduce isDataInHighBits	2022-01-04 19:39:22 -06:00
rcombs	cb87a3b137	swscale/output: template-ize yuv2nv12cX 10-bit and 16-bit cases Fixes incorrect big-endian output introduced in `88d804b7ff` Avoids making the filter-time BE check more expensive	2022-01-04 19:39:22 -06:00
Andreas Rheinhardt	b189550137	lib*/version.h: Bump Versions after release/5.0 branch This is done a second time for 5.0 because master was merged into 5.0 so that it contains the recent DOVI additions. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-01-04 14:29:06 +01:00
Andreas Rheinhardt	c512be9a90	lib*/version.h: Bump Versions before release/5.0 branch Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-01-04 13:40:03 +01:00
Andreas Rheinhardt	20b0d24c2f	Makefile: Redo duplicating object files in shared builds In case of shared builds, some object files containing tables are currently duplicated into other libraries: log2_tab.c, golomb.c, reverse.c. The check for whether this is duplicated is simply whether CONFIG_SHARED is true. Yet this is crude: E.g. libavdevice includes reverse.c for shared builds, but only needs it for the decklink input device, which given that decklink is not enabled by default will be unused in most libavdevice.so. This commit changes this by making it more explicit about what to duplicate from other libraries. To do this, two new Makefile variables were added: SHLIBOBJS and STLIBOBJS. SHLIBOBJS contains the objects that are duplicated from other libraries in case of shared builds; STLIBOBJS contains stuff that a library has to provide for other libraries in case of static builds. These new variables provide a way to enable/disable with a finer granularity than just whether shared builds are enabled or not. E.g. lavd's Makefile now contains: SHLIBOBJS-$(CONFIG_DECKLINK_INDEV) += reverse.o Another example is provided by the golomb tables. These are provided by lavc for static builds, even if one uses a build configuration that makes only lavf use them. Therefore lavc's Makefile contains STLIBOBJS-$(CONFIG_MXF_MUXER) += golomb.o, whereas lavf's Makefile has a corresponding SHLIBOBJS-$(CONFIG_MXF_MUXER) += golomb_tab.o. E.g. in case the MXF muxer is the only component needing these tables only libavformat.so will contain them for shared builds; currently libavcodec.so does so, too. (There is currently a CONFIG_EXTRA group for golomb. But actually one would need two groups (golomb_avcodec and golomb_avformat) in order to know when and where to include these tables. Therefore this commit uses a Makefile-based approach for this and stops using these groups for the users in libavformat.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-01-04 05:01:04 +01:00
Michael Niedermayer	4be85c9331	lib*/version.h: Bump Versions after release/5.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-01-03 22:10:46 +01:00
Michael Niedermayer	f3964a59e1	lib*/version.h: Bump Versions before release/5.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-01-03 22:08:31 +01:00
rcombs	3e00b9e395	swscale/x86/init: use isSemiPlanarYUV Fixes P210/P410 cases introduced (and broken) in `88d804b7ff`	2021-12-23 01:41:03 -06:00
rcombs	88d804b7ff	swscale: add P210/P410/P216/P416 output	2021-12-22 18:38:40 -06:00
Alan Kelly	eebe406c80	libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions. This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions are only used where they are faster.	2021-12-21 17:44:53 -03:00
James Almer	eab91c3e2e	x86/scale_avx2: don't use $ for hex literals Fixes compilation with AVX2 enabled yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-16 17:29:21 -03:00
Alan Kelly	9092e58c44	x86/scale_avx2: Change asm indent from 2 to 4 spaces. Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-16 13:42:04 -03:00
Alan Kelly	86663963e6	x86/swscale: fix minor coding style issues Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-16 13:16:04 -03:00
James Almer	76a3f961f8	x86/scale_avx2: add missing check for AVX2 assembler support Should fix compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-16 09:41:56 -03:00
Alan Kelly	f900a19fa9	libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes. Fixes so that fate under 64 bit Windows passes. These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-15 20:04:59 -03:00
Andreas Rheinhardt	3be6fe9a56	swscale/yuv2rgb: Silence a set-but-unused-variable warning Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-12-03 16:10:51 +01:00
rcombs	f0204de47d	swscale: add P210/P410/P216/P416 input	2021-11-28 16:40:43 -06:00
Mark Reid	3f4ce004b8	swscale/input: clip rgbf32 values before lrintf if the float pixel * 65535.0f > 2147483647.0f lrintf may overfow and return negative values, depending on implementation. nan and +/-inf values may also be implementation defined clip the value first so lrintf always works. values < 0.0f, -inf, nan = 0.0f values > 65535.0f, +inf = 65535.0f old timings 195960 decicycles in planar_rgbf32le_to_uv, 1 runs, 0 skips 186120 decicycles in planar_rgbf32le_to_uv, 2 runs, 0 skips 188645 decicycles in planar_rgbf32le_to_uv, 4 runs, 0 skips 183625 decicycles in planar_rgbf32le_to_uv, 8 runs, 0 skips 181157 decicycles in planar_rgbf32le_to_uv, 16 runs, 0 skips 177533 decicycles in planar_rgbf32le_to_uv, 32 runs, 0 skips 175689 decicycles in planar_rgbf32le_to_uv, 64 runs, 0 skips 232960 decicycles in planar_rgbf32be_to_uv, 1 runs, 0 skips 221380 decicycles in planar_rgbf32be_to_uv, 2 runs, 0 skips 216640 decicycles in planar_rgbf32be_to_uv, 4 runs, 0 skips 213505 decicycles in planar_rgbf32be_to_uv, 8 runs, 0 skips 211558 decicycles in planar_rgbf32be_to_uv, 16 runs, 0 skips 210596 decicycles in planar_rgbf32be_to_uv, 32 runs, 0 skips 210202 decicycles in planar_rgbf32be_to_uv, 64 runs, 0 skips 161680 decicycles in planar_rgbf32le_to_y, 1 runs, 0 skips 153540 decicycles in planar_rgbf32le_to_y, 2 runs, 0 skips 148255 decicycles in planar_rgbf32le_to_y, 4 runs, 0 skips 140600 decicycles in planar_rgbf32le_to_y, 8 runs, 0 skips 132935 decicycles in planar_rgbf32le_to_y, 16 runs, 0 skips 128531 decicycles in planar_rgbf32le_to_y, 32 runs, 0 skips 140933 decicycles in planar_rgbf32le_to_y, 64 runs, 0 skips 190980 decicycles in planar_rgbf32be_to_y, 1 runs, 0 skips 176080 decicycles in planar_rgbf32be_to_y, 2 runs, 0 skips 167980 decicycles in planar_rgbf32be_to_y, 4 runs, 0 skips 164685 decicycles in planar_rgbf32be_to_y, 8 runs, 0 skips 162751 decicycles in planar_rgbf32be_to_y, 16 runs, 0 skips 162404 decicycles in planar_rgbf32be_to_y, 32 runs, 0 skips 167849 decicycles in planar_rgbf32be_to_y, 64 runs, 0 skips new timings 183320 decicycles in planar_rgbf32le_to_uv, 1 runs, 0 skips 175700 decicycles in planar_rgbf32le_to_uv, 2 runs, 0 skips 179570 decicycles in planar_rgbf32le_to_uv, 4 runs, 0 skips 172932 decicycles in planar_rgbf32le_to_uv, 8 runs, 0 skips 168707 decicycles in planar_rgbf32le_to_uv, 16 runs, 0 skips 165224 decicycles in planar_rgbf32le_to_uv, 32 runs, 0 skips 163423 decicycles in planar_rgbf32le_to_uv, 64 runs, 0 skips 184940 decicycles in planar_rgbf32be_to_uv, 1 runs, 0 skips 185150 decicycles in planar_rgbf32be_to_uv, 2 runs, 0 skips 185790 decicycles in planar_rgbf32be_to_uv, 4 runs, 0 skips 185472 decicycles in planar_rgbf32be_to_uv, 8 runs, 0 skips 185277 decicycles in planar_rgbf32be_to_uv, 16 runs, 0 skips 185813 decicycles in planar_rgbf32be_to_uv, 32 runs, 0 skips 185332 decicycles in planar_rgbf32be_to_uv, 64 runs, 0 skips 145400 decicycles in planar_rgbf32le_to_y, 1 runs, 0 skips 145100 decicycles in planar_rgbf32le_to_y, 2 runs, 0 skips 143490 decicycles in planar_rgbf32le_to_y, 4 runs, 0 skips 136687 decicycles in planar_rgbf32le_to_y, 8 runs, 0 skips 131271 decicycles in planar_rgbf32le_to_y, 16 runs, 0 skips 128698 decicycles in planar_rgbf32le_to_y, 32 runs, 0 skips 127170 decicycles in planar_rgbf32le_to_y, 64 runs, 0 skips 156020 decicycles in planar_rgbf32be_to_y, 1 runs, 0 skips 146990 decicycles in planar_rgbf32be_to_y, 2 runs, 0 skips 142020 decicycles in planar_rgbf32be_to_y, 4 runs, 0 skips 141052 decicycles in planar_rgbf32be_to_y, 8 runs, 0 skips 138973 decicycles in planar_rgbf32be_to_y, 16 runs, 0 skips 138027 decicycles in planar_rgbf32be_to_y, 32 runs, 0 skips 143939 decicycles in planar_rgbf32be_to_y, 64 runs, 0 skips Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2021-11-15 16:50:10 -03:00
Mark Reid	74e49cc583	swscale/input: unify grayf32 funcs with rgbf32 funcs This is ment to be a cosmetic change old timings: 42780 UNITS in grayf32le, 1 runs, 0 skips 56720 UNITS in grayf32le, 2 runs, 0 skips 67265 UNITS in grayf32le, 4 runs, 0 skips 58082 UNITS in grayf32le, 8 runs, 0 skips 63512 UNITS in grayf32le, 16 runs, 0 skips 52720 UNITS in grayf32le, 32 runs, 0 skips 46491 UNITS in grayf32le, 64 runs, 0 skips 68500 UNITS in grayf32be, 1 runs, 0 skips 66930 UNITS in grayf32be, 2 runs, 0 skips 62305 UNITS in grayf32be, 4 runs, 0 skips 55510 UNITS in grayf32be, 8 runs, 0 skips 50216 UNITS in grayf32be, 16 runs, 0 skips 44480 UNITS in grayf32be, 32 runs, 0 skips 42394 UNITS in grayf32be, 64 runs, 0 skips new timings: 46660 UNITS in grayf32le, 1 runs, 0 skips 51830 UNITS in grayf32le, 2 runs, 0 skips 53390 UNITS in grayf32le, 4 runs, 0 skips 50910 UNITS in grayf32le, 8 runs, 0 skips 44968 UNITS in grayf32le, 16 runs, 0 skips 40349 UNITS in grayf32le, 32 runs, 0 skips 38330 UNITS in grayf32le, 64 runs, 0 skips 39980 UNITS in grayf32be, 1 runs, 0 skips 49630 UNITS in grayf32be, 2 runs, 0 skips 53540 UNITS in grayf32be, 4 runs, 0 skips 59767 UNITS in grayf32be, 8 runs, 0 skips 51206 UNITS in grayf32be, 16 runs, 0 skips 44743 UNITS in grayf32be, 32 runs, 0 skips 41468 UNITS in grayf32be, 64 runs, 0 skips Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-11-14 17:12:13 +01:00
Soft Works	58dce6f010	swscale/swscale: check SWS_PRINT_INFO flag for printing alignment warnings This makes output consistent with a similar warning just few lines above where this flag is checked in the same way. Signed-off-by: softworkz <softworkz@hotmail.com> Signed-off-by: Marton Balint <cus@passwd.hu>	2021-11-13 19:55:32 +01:00
Mark Reid	d2379bd6a0	swscale/input: fix planar_rgb16_to_a for gbrap10be and gbrap12be formats Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-11-04 11:52:33 +01:00
Michael Niedermayer	8316b2a15f	swscale/swscale: Improve *ColorspaceDetails() doxy Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-24 16:54:36 +02:00
Michael Niedermayer	5f3a160b42	swscale/utils: Improve return codes of sws_setColorspaceDetails() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-24 16:54:36 +02:00
Michael Niedermayer	c7699f95bb	swscale/utils: Set all threads to the same colorspace even on failure Fixes: ./ffplay dav.y4m -vf "scale=hd1080:threads=4" Found-by: Paul Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-24 16:54:36 +02:00
Wu Jianhua	2c734a8496	libswscale/x86/rgb2rgb: add shuffle_bytes avx2 Performance data(Less is better): shuffle_bytes_ssse3 3.64654 shuffle_bytes_avx2 0.94288 Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-10-15 10:59:20 +02:00
Michael Niedermayer	f801207568	swscale/swscale: Pass slice location into unscaled code also for dst scaling Fixes: alphablend=checkerboard Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-03 20:38:29 +02:00
Michael Niedermayer	06d6726588	swscale/alphablend: Fix slice handling Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-03 20:38:29 +02:00
Michael Niedermayer	9f40b5badb	swscale/swscale_internal: Avoid unsigned for slice parameters Mixing unsigned and signed often leads to unexpected arithmetic results. Fixes: out of array write Found-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-09-30 19:47:15 +02:00
Manuel Stoeckl	32329397e2	swscale: add input/output support for X2BGR10LE Signed-off-by: Manuel Stoeckl <code@mstoeckl.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-09-26 16:26:10 +02:00
Manuel Stoeckl	ca594df622	swscale/yuv2rgb: fix conversion to X2RGB10 This resolves a problem where conversions from YUV to X2RGB10LE would produce color values a factor 4 too small, because an 8-bit value was placed in a 10-bit channel. Signed-off-by: Manuel Stoeckl <code@mstoeckl.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-09-26 16:26:10 +02:00
Andreas Rheinhardt	1ea3650823	Replace all occurences of av_mallocz_array() by av_calloc() They do the same. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-20 01:03:52 +02:00
Andreas Rheinhardt	044a7c08dc	swscale/swscale: Disable x86-specific code for other arches SSE2 is x86 specific, yet due to the call to av_get_cpu_flags() compilers were unable to optimize the checks (and the call) away on other arches. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-19 23:52:37 +02:00
Andreas Rheinhardt	f440c422b7	swscale/swscale: Fix races when using unaligned strides/data In this case the current code tries to warn once; to do so, it uses ordinary static ints to store whether the warning has already been emitted. This is both a data race (and therefore undefined behaviour) as well as a race condition, because it is really possible for multiple threads to be the one thread to emit the warning. This is actually common since the introduction of the new multithreaded scaling API. This commit fixes this by using atomic integers for the state; furthermore, these are not static anymore, but rather contained in the user-facing SwsContext (i.e. the parent SwsContext in case of slice-threading). Given that these atomic variables are not intended for synchronization at all (but only for atomicity, i.e. only to output the warning once), the atomic operations use memory_order_relaxed. This affected the nv12, nv21, yuv420, yuv420p10, yuv422, yuv422p10 and yuv444 filter-overlay FATE-tests. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-19 23:52:37 +02:00
Andreas Rheinhardt	a1255a350d	libswscale/options: Add parent_log_context_offset to AVClass This allows to associate log messages from slice contexts to the user-visible SwsContext. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-19 23:52:37 +02:00
James Almer	5fe648d04a	libswscale/swscale: initialize all dst plane pointers in sws_receive_slice() Fixes valgrind warnings about use of uninitialised values. Signed-off-by: James Almer <jamrial@gmail.com>	2021-09-07 09:44:58 -03:00
Anton Khirnov	d6fdc78e91	sws: implement slice threading	2021-09-06 09:17:53 +02:00
Anton Khirnov	42cd64c182	sws: add a new scaling API	2021-09-06 09:16:52 +02:00
Andreas Rheinhardt	2c05ee092b	avutil/internal, swresample/audioconvert: Remove cpu.h inclusions These inclusions are not necessary, as cpu.h is already included wherever it is needed (via direct inclusion or via the arch-specific headers). Also remove other unnecessary cpu.h inclusions from ordinary non-headers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-07-22 14:33:45 +02:00
Michael Niedermayer	7874d40f10	swscale/slice: Fix wrong return on error Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-09 15:21:37 +02:00
Michael Niedermayer	fa1e158ef6	swscale/utils: Use full chroma interpolation for rgb4/8 and dither none Dither none is only implemented in full chroma interpolation for these rgb formats Its also a obscure choice (producing less nice images) that implementing it in the other code-paths makes no sense Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-09 12:29:03 +02:00
Michael Niedermayer	7528532550	swscale/output: Implement dither none for yuv2rgb_write_full() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-09 12:29:03 +02:00
Michael Niedermayer	997f9cfc12	swscale/slice: Check slice for allocation failure Fixes: null pointer dereference Fixes: alloc_slice.mp4 Found-by: Rafael Dutra <rafael.dutra@cispa.de> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-09 12:29:03 +02:00
Anton Khirnov	37c0fe49b7	sws: move updating the palette higher up It does not interact in any way with the code setting up the image pointers/strides, so it should not be intermixed with it.	2021-07-03 16:13:40 +02:00
Anton Khirnov	d6649d9a3b	sws: move initializing dither_error higher up It does not interact in any way with the code setting up the image pointers/strides, so it should not be intermixed with it.	2021-07-03 16:13:10 +02:00
Anton Khirnov	e188985598	sws: move the early return for zero-sized slices higher up Place it right after the input parameter validation. There is no point in performing any setup if the sws_scale() call won't do anything.	2021-07-03 16:09:43 +02:00
Anton Khirnov	a91e6c927e	sws: simplify setting sliceDir	2021-07-03 16:09:21 +02:00
Anton Khirnov	ff753f41dd	sws: merge handling frame start into a single block Also, return an error code on failure rather than 0.	2021-07-03 16:09:07 +02:00
Anton Khirnov	1b11a324fe	sws: make checking for the start of a new frame more explicit	2021-07-03 16:07:22 +02:00
Anton Khirnov	0fb014b7bb	sws: reset sliceDir at the end of sws_scale() Makes it more clear that resetting it does not interact with the scaling code that it is currently intermixed with.	2021-07-03 16:05:39 +02:00
Anton Khirnov	1f80789bf7	sws: rename SwsContext.swscale to convert_unscaled That function pointer is now used only for unscaled conversion.	2021-07-03 15:57:53 +02:00
Anton Khirnov	fe490ec165	sws: separate the calls to scaled vs unscaled conversion Call the scaler function directly rather than through a function pointer. Drop the now-unused return value from ff_getSwsFunc() and rename the function to reflect its new role. This will be useful in the following commits, where it will become important that the amount of output is different for scaled vs unscaled case.	2021-07-03 15:57:13 +02:00
Anton Khirnov	0f8e0957d2	sws: do not reallocate scratch buffers for each slice	2021-07-03 15:56:16 +02:00
Anton Khirnov	2730639259	sws: group the parameters validity checks together Also, fail with an error code rather than 0.	2021-07-03 15:31:18 +02:00
Anton Khirnov	c05cab34a9	sws: initialize {src,dst}Stride2 consistently with {src,dst}2	2021-07-03 15:31:08 +02:00
Anton Khirnov	d3d8e09640	sws: cosmetics Reindent after previous commit, rewrap long lines.	2021-07-03 15:30:56 +02:00
Anton Khirnov	f136493d03	sws: factor out cascaded scaling	2021-07-03 15:30:34 +02:00
Anton Khirnov	a2254aedc9	sws: cosmetics Reindent after previous commit, split long lines.	2021-07-03 15:30:20 +02:00
Anton Khirnov	44f12718bf	sws: factor out gamma-correct scaling	2021-07-03 15:29:50 +02:00
Anton Khirnov	e355af9be9	sws: return an error code on invalid parameters to sws_scale()	2021-07-03 15:29:35 +02:00
Anton Khirnov	21a4e48f88	sws: reindent after previous commit	2021-07-03 15:29:22 +02:00
Anton Khirnov	27acca1af0	sws: factor out updating the palette	2021-07-03 15:28:46 +02:00
Anton Khirnov	f8c21ccbfc	sws: remove unnecessary braces There used to be more code inside them, but it was removed in `6de58b4903`.	2021-07-03 15:28:36 +02:00
Peter Lundblad	da0abbbb01	libswscale: Make sws_init_context thread safe. Call ff_sws_rgb2rgb_init via ff_thread_once instead of checking one of the variables it updates. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-01 23:49:41 +02:00
Limin Wang	43295ae6a9	swscale/swscale_unscaled: don't use the optimized bgr24toYV12 unscaled conversion when width%2 Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2021-06-06 12:34:05 +08:00
Anton Khirnov	85ba17f36d	Bump major versions of all libraries. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-04-27 11:48:05 -03:00
Andreas Rheinhardt	ea2d9b7a2e	libswscale: Remove unused deprecated functions, make used ones static Deprecated in `3b905b9fe6`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> Signed-off-by: James Almer <jamrial@gmail.com>	2021-04-27 10:43:11 -03:00
Andreas Rheinhardt	f3c197b129	Include attributes.h directly Some files currently rely on libavutil/cpu.h to include it for them; yet said file won't use include it any more after the currently deprecated functions are removed, so include attributes.h directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-04-19 14:34:10 +02:00
Alan Kelly	3ce8d09244	libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-04-01 20:47:52 +02:00
Alan Kelly	dc57762cb4	libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-04-01 20:47:52 +02:00
Michael Niedermayer	c361fa9e21	Bump minor versions after release branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-03-20 01:02:11 +01:00
Michael Niedermayer	c67d2a2875	Bump Versions before release/4.4 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-03-20 01:01:12 +01:00
Andreas Rheinhardt	c23a5523b5	swscale/x86/swscale: Remove unused ASM constants The last user of g15Mask, r15Mask, g16Mask and r16Mask was disabled in `77a416e8aa` and finally removed in 36e8de07ed62609df45d064b56501e3084d25723; b15Mask and b16Mask were apparently always unused (except for in_asm_used_var_warning_killer, a function that only existed to make the compiler not optimize ASM constants away). w10 is unused since `d604bab901`, w02 since `ef423a6618`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	2021-02-24 09:47:54 +01:00
Andreas Rheinhardt	aad597a93c	swscale/x86/rgb2rgb: Remove unused ASM constants mask24hh etc. are unused since `f099fbf5f3`, mask32b and mask32r since `296609f859`, mask32g since `b38d487466` and mask32 since `f8a138be52`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	2021-02-24 09:45:17 +01:00
Andreas Rheinhardt	49db6e4b4e	swscale/x86/yuv2rgb: Remove unused ASM constants mmx_grnmask is unused since `531f97b0c3`, the other constants since `e934194b6a`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	2021-02-24 09:43:14 +01:00
Chip Kerchner	e7f53d6ac9	lsws/ppc/yuv2rgb_altivec: Fix build in non-VSX environments Add inline function for vec_xl if VSX is not supported. vec_xl intrinsic is only available on POWER 7 or higher. Fixes ticket #8750. Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>	2021-02-22 23:19:21 -05:00
James Almer	1a555d3c60	swscale/x86/yuv2yuvX: use the movsxdifnidn helper macro Simplifies code Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-18 18:47:43 -03:00
James Almer	ebb48d85a0	swscale/x86/yuv2yuvX: use movq to load 8 bytes in all non-AVX2 functions mova expands to movq on non-XMM functions Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-18 18:47:43 -03:00
James Almer	d512ebbaed	swscale/x86/yuv2yuvX: use the SPLATW helper macro Simplifies code Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-18 18:47:43 -03:00
James Almer	c00567647e	swscale/x86/swscale: fix mix of inline and external function definitions This includes removing pointless static function forward declarations. Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-18 18:47:42 -03:00
James Almer	c2bf1dcace	swscale/x86/swscale: fix compilation with old yasm Where AVX2 may not be supported. Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-17 21:09:36 -03:00
Alan Kelly	554c2bc708	swscale: move yuv2yuvX_sse3 to yasm, unrolls main loop And other small optimizations for ~20% speedup.	2021-02-17 21:21:03 +01:00
Carl Eugen Hoyos	2687070d9b	lsws/ppc/yuv2rgb: Fix transparency converting from yuv->rgb32. Based on `68363b69` by Reimar Döffinger. Fixes ticket #9077.	2021-01-24 17:17:29 +01:00
Anton Khirnov	e15371061d	lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump They are not properly namespaced and not intended for public use.	2021-01-01 14:14:57 +01:00
Anton Khirnov	c8c2dfbc37	lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h That is a more appropriate place for it.	2021-01-01 14:11:01 +01:00
Jeremy Leconte	29cef1bcd6	libswscale: avoid UB nullptr-with-offset. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-12-24 15:27:56 +01:00
Andriy Gelman	1200264fc4	swscale/rgb2rgb_template: use shuffle macro on big-endian arches Fixes fate-qtrle-32bit on big-endian. The macro does a simple byte swap on uint8 array without any casts, so it's valid on big-endian arches. The mentioned test was failing because the byteswap function shuffle_bytes_3210_c() is used in the pixel format conversion (argb->bgra). Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>	2020-12-12 23:07:22 -05:00
Carl Eugen Hoyos	46e362b765	lsws/x86/yuv2rgb: Fix compilation with mmxext or ssse3 disabled. Fixes ticket #8986.	2020-11-14 15:37:57 +01:00
Marton Balint	993429cfb4	swscale/x86/yuv2rgb: fix crashes when loading alpha from unaligned buffers Regression since `fc6a5883d6` on SSSE3 enabled CPUs. Fixes ticket #8955. Signed-off-by: Marton Balint <cus@passwd.hu>	2020-11-02 00:31:34 +01:00
Jan Ekström	7ea4bcff7b	swscale/utils: override forced-zero formats back to full range Fixes vf_scale outputting RGB AVFrames with limited range flagged in case either input or output specifically sets the range. This is the reverse of the logic utilized for RGB and PAL8 content in sws_setColorspaceDetails.	2020-10-11 12:58:13 +03:00
Jan Ekström	3fe24fe232	swscale/utils: split range override check into its own function	2020-10-11 12:58:13 +03:00
Mark Reid	a48adcd136	libswcale/input: use more accurate planer rgb16 yuv conversions These conversion appears to be exhibiting the same rounding error as the rgbf32 formats where. I seperated the rounding value from the 16 and 128 offsets, I think it makes it a little more clear. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-10-06 17:56:52 +02:00
Mark Reid	453004fde6	libswcale/input: use more accurate rgbf32 yuv conversions Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-10-02 14:59:52 +02:00
Mark Reid	6bf57c6a2a	libswscale/tests: add floatimg_cmp test changes since v1: - made into fate test - fixed c90 warnings - tests more intermediate formats - tested on BE mips too Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-10-02 14:59:52 +02:00
James Almer	621e2625e0	swscale/x86/output: add missing AVX2 support preprocessor wrappers Fixes compilation with old yasm Signed-off-by: James Almer <jamrial@gmail.com>	2020-08-20 15:14:56 -03:00
Paul B Mahol	9d58cdb4ba	swscale: do not drop half of bits from 16bit bayer formats	2020-08-08 12:03:42 +02:00
Limin Wang	7c8ad72f1c	swscale/yuv2rgb: cosmetics Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2020-07-25 10:20:42 +08:00
Fei Wang	8544783280	swscale/yuv2rgb: consider x2rgb10le on big endian hardware This fixed FATE fail report by filter-pixfmts* for x2rgb10le on big endian hardware. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-07-20 21:00:00 +02:00
Michael Niedermayer	663f024415	swscale/tests/swscale: use 1 for indicating erros Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-07-16 17:44:53 +02:00
Michael Niedermayer	24c575e0aa	swscale/tests/swscale: Initialize res to a non random error code Regression since: `3adffab073` -1 is consistent what other error paths return Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-07-14 22:05:02 +02:00
Michael Niedermayer	ec27c1827c	swscale/tests/swscale: Fix incorrect return code check Regression since: `3adffab073` Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-07-14 22:05:02 +02:00

1 2 3 4 5 ...

2536 Commits