FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Rémi Denis-Courmont	6269c4a440	swscale/rgb2rgb: unroll RISC-V V uyvytoyuv422	2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont	e50f8e861b	swscale/rgb2rgb: avoid S-regs in RISC-V V uyvytoyuv422 We can make do with callee-clobbered registers only now. As an added bonus, this makes the code XLEN-independent.	2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont	be37a2e364	swscale/rgb2rgb: rework RISC-V V uyvytoyuv422 This avoids using relatively slow register strides.	2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont	1a4bd76ea5	swscale/rgb2rgb: remove R-V V shuffle_bytes_3012 This is slower than the Zbb version on real hardware due to register strides. Proper support for vector byte-swap requires the Zvbb extension, but it's much too early for me to worry about it.	2023-10-02 22:28:38 +03:00
Rémi Denis-Courmont	c4a144c29d	swscale/rgb2rgb: add R-V Zbb shuffle_bytes_3210	2023-10-02 22:28:25 +03:00
Paul B Mahol	29b673bdcf	swscale: add GBRAP14 format support	2023-09-28 19:37:58 +02:00
Andreas Rheinhardt	f8503b4c33	avutil/internal: Don't auto-include emms.h Instead include emms.h wherever it is needed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-09-04 11:04:45 +02:00
L. E. Segovia	ddc1cd5cdd	configure: Set WIN32_LEAN_AND_MEAN at configure time Including winsock2.h or windows.h without WIN32_LEAN_AND_MEAN cause bzlib.h to parse as nonsense, due to an instance of #define char small in rpcndr.h. See: https://stackoverflow.com/a/27794577 Signed-off-by: L. E. Segovia <amy@amyspark.me> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-08-14 22:57:28 +03:00
Rémi Denis-Courmont	c2b38619c0	swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{1230,3012} This avoids strided loads. Before: shuffle_bytes_1230_rvv_i32: 308.7 shuffle_bytes_3012_rvv_i32: 308.7 After: shuffle_bytes_1230_rvv_i32: 46.7 shuffle_bytes_3012_rvv_i32: 46.7	2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont	15982554e6	swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{0321,2103} This avoids strided loads. Before: shuffle_bytes_0321_rvv_i32: 307.7 shuffle_bytes_2103_rvv_i32: 308.7 After: shuffle_bytes_0321_rvv_i32: 59.7 shuffle_bytes_2103_rvv_i32: 61.5	2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont	d3948e4db5	swscale: inline ff_shuffle_bytes_3210_rvv No functional changes.	2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont	b6585eb04c	lavu: add/use flag for RISC-V Zba extension The code was blindly assuming that Zbb or V implied Zba. While the earlier is practically always true, the later broke some QEMU setups, as V was introduced earlier than Zba.	2023-07-19 19:29:35 +03:00
Khem Raj	a7b3c0203f	libswscale/riscv: fix syntax of vsetvli Add missing operand which clang complains about but GCC assumes it to be 'm1' if not specified. Works around build failure with Clang: \| src/libswscale/riscv/rgb2rgb_rvv.S:88:25: error: operand must be e[8\|16\|32\|64\|128\|256\|512\|1024],m[1\|2\|4\|8\|f2\|f4\|f8],[ta\|tu],[ma\|mu] \| vsetvli t4, t3, e8, ta, ma \| ^ Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-07-13 22:01:24 +03:00
Lynne	b3fb73af6b	swscale: bump minor for implementing support for the new pixfmts	2023-05-29 00:42:02 +02:00
Lynne	934525eae0	lsws: add in/out support for the new 12-bit 2-plane 422 and 444 pixfmts	2023-05-29 00:41:35 +02:00
Jin Bo	cb4ae8baee	swscale/la: Add following builtin optimized functions yuv420_rgb24_lsx yuv420_bgr24_lsx yuv420_rgba32_lsx yuv420_argb32_lsx yuv420_bgra32_lsx yuv420_abgr32_lsx ./configure --disable-lasx ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -pix_fmt rgb24 -y /dev/null -an before: 184fps after: 207fps Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-05-25 21:05:15 +02:00
Lu Wang	4501b1dfd7	swscale/la: Optimize the functions of the swscale series with lsx. ./configure --disable-lasx ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -pix_fmt bgra -y /dev/null -an before: 91fps after: 160fps Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-05-25 21:05:08 +02:00
Lynne	a62a3930c2	swscale/ppc: remove hScale8To19_vsx Fails checkasm on a Power9 system.	2023-05-20 20:07:18 +02:00
Michael Niedermayer	47ac3e6065	version.h: Bump minor post 6.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-02-19 18:37:36 +01:00
Michael Niedermayer	62efa096af	version.h: Bump minor for 6.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-02-19 18:32:07 +01:00
James Almer	5bad485603	Bump major versions of all libraries Signed-off-by: James Almer <jamrial@gmail.com>	2023-02-09 15:35:14 +01:00
Tomas Härdin	a678b0c252	sws/utils.c: Do not uselessly call initFilter() when unscaling	2023-02-08 15:53:55 +01:00
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2023-02-01 04:23:55 +01:00
Andreas Rheinhardt	1ff9c07fa6	swscale/utils: Fix indentation Forgotten after `c1eb3e7fec`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-11-24 21:02:57 +01:00
Andreas Rheinhardt	b2d1a25816	swscale/utils: Derive range from YUVJ-pix-fmt only once Currently, it is done once per slice-thread, leading to one warning per slice-thread in case a YUVJ pixel format has been originally used. This also fixes the anomaly that said parameter are only updated for the user-facing context (whose values are retrievable via av_opt_get()) if slice-threading is not in use. Fixes ticket #9860. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-11-24 20:59:03 +01:00
Andreas Rheinhardt	ff39dcb129	swscale/utils: Move functions to avoid forward declarations Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-11-24 20:58:21 +01:00
Andreas Rheinhardt	baccc1c541	swscale/utils: Avoid calling ff_thread_once() unnecessarily Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-11-24 20:58:21 +01:00
Andreas Rheinhardt	8ee0711228	swscale/utils: Don't allocate AVFrames for slice contexts Only the parent context's AVFrames are ever used. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-11-24 20:58:21 +01:00
Andreas Rheinhardt	64ed1d40df	swscale/utils: Factor initializing single slice context out Initializing slice threads currently uses the function (sws_init_context()) that is also used for initializing user-facing contexts with the only difference being that nb_threads is set to one before initializing the slice contexts. Yet sws_init_context() also initializes lots of stuff that is not slice-dependent, i.e. (src\|dst)Range. This currently only works because the code sets these fields to the same values for all slice contexts. This is not nice; even worse, it entails that log messages are printed once per slice context (and therefore fill the screen). This commit lays the groundwork to fix this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-11-24 20:58:21 +01:00
Michael Niedermayer	ba209e3d51	swscale/input: Use more unsigned intermediates Same principle as previous commit, with sufficiently huge rgb2yuv table values this produces wrong results and undefined behavior. The unsigned produces the same incorrect results. That is probably ok as these cases with huge values seem not to occur in any real use case. Fixes: signed integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-11-20 21:55:06 +01:00
Jeremy Dorfman	ce566281f9	swscale/input: Use unsigned intermediates in rgb64ToUV_c_template Large rgb2yuv tables and high pixel values cause the intermediate int32_t of rur + gug + bu*b to exceed INT_MAX, which is undefined behavior. This causes libswscale built with LLVM -fsanitize=undefined to assert. Using unsigned integers instead has defined behavior and produces identical results, and makes rgb64ToUV_c_template match rgb64ToY_c_template. Fixes: signed integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-11-20 21:23:57 +01:00
Andreas Rheinhardt	b616b04704	swscale/utils: Remove obsolete 3DNow reference swscale does not use 3DNow any more since commit `608319a311`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-11-09 17:39:00 +01:00
Michael Niedermayer	b74f89caae	swscale/output: Bias 16bps output calculations to improve non overflowing range for GBRP16/GBRPF32 Fixes: integer overflow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-11-04 22:44:16 +01:00
Michael Niedermayer	0f0afc7fb5	swscale/output: Bias 16bps output calculations to improve non overflowing range Fixes: integer overflow Fixes: ./ffmpeg -f rawvideo -video_size 66x64 -pixel_format yuva420p10le -i ~/videos/overflow_input_w66h64.yuva420p10le -filter_complex "scale=flags=bicubic+full_chroma_int+full_chroma_inp+bitexact+accurate_rnd:in_color_matrix=bt2020:out_color_matrix=bt2020:in_range=full:out_range=full,format=rgba64[out]" -pixel_format rgba64 -map '[out]' -y overflow_w66h64.png Found-by: Drew Dunne <asdunne@google.com> Tested-by: Drew Dunne <asdunne@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-11-04 22:44:16 +01:00
Hubert Mazur	2537fdc510	sw_scale: Add specializations for hscale 16 to 19 Provide arm64 neon optimized implementations for hscale16To19 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_19__fs_4_dstW_512_c: 6216.0 hscale_16_to_19__fs_4_dstW_512_neon: 2257.0 hscale_16_to_19__fs_8_dstW_512_c: 10417.7 hscale_16_to_19__fs_8_dstW_512_neon: 3112.5 hscale_16_to_19__fs_12_dstW_512_c: 14890.5 hscale_16_to_19__fs_12_dstW_512_neon: 3899.0 hscale_16_to_19__fs_16_dstW_512_c: 19006.5 hscale_16_to_19__fs_16_dstW_512_neon: 5341.2 hscale_16_to_19__fs_32_dstW_512_c: 36629.5 hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 hscale_16_to_19__fs_40_dstW_512_c: 45477.5 hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 (Note, the checkasm tests for these functions haven't been merged since they fail on x86.) Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-11-01 15:24:58 +02:00
Hubert Mazur	9ccf8c5bfc	sw_scale: Add specializations for hscale 16 to 15 Add arm64 neon implementations for hscale 16 to 15 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_15__fs_4_dstW_512_c: 6703.5 hscale_16_to_15__fs_4_dstW_512_neon: 2298.0 hscale_16_to_15__fs_8_dstW_512_c: 10983.0 hscale_16_to_15__fs_8_dstW_512_neon: 3216.5 hscale_16_to_15__fs_12_dstW_512_c: 15526.0 hscale_16_to_15__fs_12_dstW_512_neon: 3993.0 hscale_16_to_15__fs_16_dstW_512_c: 20183.5 hscale_16_to_15__fs_16_dstW_512_neon: 5369.7 hscale_16_to_15__fs_32_dstW_512_c: 39315.2 hscale_16_to_15__fs_32_dstW_512_neon: 9511.2 hscale_16_to_15__fs_40_dstW_512_c: 48995.7 hscale_16_to_15__fs_40_dstW_512_neon: 11570.0 (Note, the checkasm tests for these functions haven't been merged since they fail on x86.) Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-11-01 15:24:53 +02:00
Hubert Mazur	1e9cfa5bb0	sw_scale: Add specializations for hscale 8 to 19 Add arm64 neon implementations for hscale 8 to 19 with filter sizes 4, 4X and 8. Both implementations are based on very similar ones dedicated to hscale 8 to 15. The major changes refer to saving the data - instead of writing the result as int16_t it is done with int32_t. These functions are heavily inspired on patches provided by J. Swinney and M. Storsjö for hscale8to15 which were slightly adapted for hscale8to19. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool shown below. hscale_8_to_19__fs_4_dstW_512_c: 5663.2 hscale_8_to_19__fs_4_dstW_512_neon: 1259.7 hscale_8_to_19__fs_8_dstW_512_c: 9306.0 hscale_8_to_19__fs_8_dstW_512_neon: 2020.2 hscale_8_to_19__fs_12_dstW_512_c: 12932.7 hscale_8_to_19__fs_12_dstW_512_neon: 2462.5 hscale_8_to_19__fs_16_dstW_512_c: 16844.2 hscale_8_to_19__fs_16_dstW_512_neon: 4671.2 hscale_8_to_19__fs_32_dstW_512_c: 32803.7 hscale_8_to_19__fs_32_dstW_512_neon: 5474.2 hscale_8_to_19__fs_40_dstW_512_c: 40948.0 hscale_8_to_19__fs_40_dstW_512_neon: 6669.7 Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-11-01 15:24:43 +02:00
Martin Storsjö	cb803a0072	swscale: aarch64: Fix yuv2rgb with negative strides Treat the 32 bit stride registers as signed. Alternatively, we could make the stride arguments ptrdiff_t instead of int, and changing all of the assembly to operate on these registers with their full 64 bit width, but that's a much larger and more intrusive change (and risks missing some operation, which would clamp the intermediates to 32 bit still). Fixes: https://trac.ffmpeg.org/ticket/9985 Signed-off-by: Martin Storsjö <martin@martin.st>	2022-10-27 21:49:26 +03:00
Marvin Scholz	4aa04c255d	swscale: document some missing arguments	2022-10-17 09:56:47 +02:00
Marvin Scholz	aba8cf654f	swscale: Fix bogus doxy comment #ifdefs The intention here was probably to document this as use of conditionals does not make sense in a comment. Fixes doxy warning: warning: explicit link request to 'if' could not be resolved	2022-10-17 09:55:19 +02:00
Chema Gonzalez	bf64a75c5a	libswscale: force a minimum size of the slide for bayer sources Bayer sources are read in groups of 2 lines (e.g. for a BGGR flavor, the first row contains only B and G samples, while the second row contains only G and R samples). They need to be read as a whole. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-10-14 12:19:13 +02:00
Rémi Denis-Courmont	a1bfb5290e	sws/rgb2rgb: RISC-V 64-bit V packed YUYV/UYVY to planar 4:2:2 This is currently 64-bit only because the stack spilling code would not assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory). This could be added later in the unlikely that someone wants it.	2022-09-30 07:25:44 +02:00
Rémi Denis-Courmont	9181835a24	sws/rgb2rgb: RISC-V V interleaveBytes	2022-09-30 07:24:09 +02:00
Rémi Denis-Courmont	66a03f4053	sws/rgb2rgb: RISC-V V shuffle_bytes_xxxx functions	2022-09-30 07:24:09 +02:00
Andreas Rheinhardt	888a02a126	swscale/output: Don't call av_pix_fmt_desc_get() in a loop Up until now, libswscale/output.c used a macro to write an output pixel which involved a call to av_pix_fmt_desc_get() to find out whether the input pixel format is BE or LE despite this being known at compile-time (there are templates per pixfmt). Even worse, these calls are made in a loop, so that e.g. there are eight calls to av_pix_fmt_desc_get() for every pixel processed in yuv2rgba64_X_c_template() for 64bit RGB formats. This commit modifies these macros to ensure that isBE() is evaluated at compile-time. This saved 41184B of .text for me (GCC 11.2, -O3). Of course, it also improved performance. E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \ -threads 1 -t 1:00 -f null - (which uses yuv2rgba64le_X_c, which is an invocation of yuv2rgba64_X_c_template() mentioned above), performance improved from 95589 to 41387 decicycles for one call to yuv2packedX; for the be variant the numbers went down from 76087 to 43024 decicycles. Reviewed-by: Anton Khirnov <anton@khirnov.net> Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-19 23:40:41 +02:00
Andreas Rheinhardt	4d7a1a4619	swscale/input: Avoid calls to av_pix_fmt_desc_get() Up until now, libswscale/input.c used a macro to read an input pixel which involved a call to av_pix_fmt_desc_get() to find out whether the input pixel format is BE or LE despite this being known at compile-time (there are templates per pixfmt). Even worse, these calls are made in a loop, so that e.g. there are six calls to av_pix_fmt_desc_get() for every pair of UV pixel processed in rgb64ToUV_half_c_template(). This commit modifies these macros to ensure that isBE() is evaluated at compile-time. This saved 9743B of .text for me (GCC 11.2, -O3). For a simple RGB64LE->YUV420P transformation like ffmpeg -f lavfi -i haldclutsrc,format=rgba64le -pix_fmt yuv420p \ -threads 1 -t 1:00 -f null - the amount of decicycles spent in rgb64LEToUV_half_c (which is created via the template mentioned above) decreases from 19751 to 5341; for RGBA64BE the number went down from 11945 to 5393. For shared builds (where the call to av_pix_fmt_desc_get() is indirect) the old numbers are 15230 for RGBA64BE and 27502 for RGBA64LE, whereas the numbers with this patch are indistinguishable from the numbers from a static build. Also make the macros that are touched conform to the usual convention of using uppercase names while just at it. Reviewed-by: Anton Khirnov <anton@khirnov.net> Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-19 23:40:41 +02:00
Hao Chen	925ac0da32	swscale/la: Add output_lasx.c file. ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -pix_fmt rgb24 -y /dev/null -an before: 150fps after: 183fps Signed-off-by: Hao Chen <chenhao@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-09-10 22:56:39 +02:00
Hao Chen	74d09b068d	swscale/la: Add yuv2rgb_lasx.c and rgb2rgb_lasx.c files ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -pix_fmt rgb24 -y /dev/null -an before: 178fps after: 210fps Signed-off-by: Hao Chen <chenhao@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-09-10 22:56:38 +02:00
Hao Chen	38cacce22a	swscale/la: Optimize hscale functions with lasx. ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -y /dev/null -an before: 101fps after: 138fps Signed-off-by: Hao Chen <chenhao@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-09-10 22:56:38 +02:00
Philip Langdale	09a8e5debb	swscale/output: add support for Y210LE and Y212LE	2022-09-10 12:29:12 -07:00
Philip Langdale	68181623e9	swscale/output: add support for XV30LE	2022-09-10 12:29:12 -07:00
Philip Langdale	366f073c62	swscale/output: add support for XV36LE	2022-09-10 12:29:12 -07:00
Philip Langdale	caf8d4d256	swscale/output: add support for P012 This generalises the existing P010 support.	2022-09-10 12:29:12 -07:00
Andreas Rheinhardt	d2428d80ce	swscale/input: Remove spec-incompliant ';' These macros are definitions, not only declarations and therefore should not contain a semicolon. Such a semicolon is actually spec-incompliant, but compilers happen to accept them. Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-08 19:21:30 +02:00
Philip Langdale	4a59eba227	swscale/input: add support for Y212LE	2022-09-06 12:49:10 -07:00
Philip Langdale	198b5b90d5	swscale/input: add support for XV30LE	2022-09-06 12:49:10 -07:00
Philip Langdale	5bdd726115	swscale/input: add support for P012 As we now have three of these formats, I added macros to generate the conversion functions.	2022-09-06 12:49:10 -07:00
Philip Langdale	8d9462844a	swscale/input: add support for XV36LE	2022-09-06 12:49:10 -07:00
Philip Langdale	45726aa117	libswscale: add support for VUYX format As we already have support for VUYA, I figured I should do the small amount of work to support VUYX as well. That means a little refactoring to share code.	2022-08-25 19:03:49 -07:00
Andreas Rheinhardt	de33506e4b	swscale/x86/rgb_2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x filter-paletteuse-bayer filter-paletteuse-bayer0 filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to "mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten when using SSSE3). Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-23 12:21:00 +02:00
Timo Rothenpieler	aca569aad2	swscale/input: add rgbaf16 input support This is by no means perfect, since at least ddagrab will return scRGB data with values outside of 0.0f to 1.0f for HDR values. Its primary purpose is to be able to work with the format at all.	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	f2de911818	swscale: add opaque parameter to input functions	2022-08-19 22:09:36 +02:00
Andreas Rheinhardt	8bec225c3c	swscale/x86/yuv2yuvX: Remove unused ff_yuv2yuvX_mmx() Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-19 12:01:34 +02:00
Alan Kelly	a38293e444	libswscale: Enable hscale_avx2 for all input sizes. ff_shuffle_filter_coefficients shuffles the tail as required. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-08-18 16:24:48 +02:00
Alan Kelly	a6724285fd	sws: allow avx2 hscale to process inputs of any size. The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-08-18 16:24:48 +02:00
Alan Kelly	51a34e8525	sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-18 16:19:13 +02:00
Swinney, Jonathan	0d7caa5b09	swscale/aarch64: add vscale specializations This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. On AWS c7g (Graviton 3, Neoverse V1) instances: before after yuv2yuvX_2_0_512_accurate_neon: 558.8 268.9 yuv2yuvX_4_0_512_accurate_neon: 637.5 434.9 yuv2yuvX_8_0_512_accurate_neon: 1144.8 806.2 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1853.7 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 13:40:42 +03:00
Swinney, Jonathan	3e708722a2	swscale/aarch64: vscale optimization Use scalar times vector multiply accumlate instructions instead of vector times vector to remove the need for replicating load instructions which are slightly slower. On AWS c7g (Graviton 3, Neoverse V1) instances: yuv2yuvX_8_0_512_accurate_neon: 1144.8 987.4 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 13:40:42 +03:00
Swinney, Jonathan	4dcd191a50	checkasm: updated tests for sw_scale Change the reference to exactly match the C reference in swscale, instead of exactly matching the x86 SIMD implementations (which differs slightly). Test with and without SWS_ACCURATE_RND - if this flag isn't set, the output must match the C reference exactly, otherwise it is allowed to be off by 2. Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND is set - apparently this discrepancy hasn't been noticed in other exact tests before. Add a test for yuv2plane1. Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 13:40:42 +03:00
Swinney, Jonathan	75ffca7eef	libswscale/aarch64: add another hscale specialization This specialization handles the case where filtersize is 4 mod 8, e.g. 12, 20, etc. Aarch64 was previously using the c function for this case. This implementation speeds up that case significantly. hscale_8_to_15__fs_12_dstW_512_c: 6234.1 hscale_8_to_15__fs_12_dstW_512_neon: 1505.6 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-16 12:08:38 +03:00
Timo Rothenpieler	b77fff47d0	configure: always enable gnu_windres if available Use the appropiate Makefile variable to ensure the resource file is only built into shared libraries instead.	2022-08-13 14:42:36 +02:00
James Almer	68e017c487	swscale/output: fix reading chroma values when generating vuya output Signed-off-by: James Almer <jamrial@gmail.com>	2022-08-08 09:39:33 -03:00
James Almer	1974813261	swscale/output: add VUYA output support Signed-off-by: James Almer <jamrial@gmail.com>	2022-08-07 09:33:16 -03:00
James Almer	f0abd07996	swscale/input: add VUYA input support Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: James Almer <jamrial@gmail.com>	2022-08-05 09:39:21 -03:00
Andreas Rheinhardt	da668fa7d2	swscale/rgb2rgb: Don't cast const away Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-31 01:09:52 +02:00
Matthieu Bouron	0a6bb7da55	swscale: add NV16 input/output Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-07-19 12:20:16 +02:00
Michael Niedermayer	fd26b07e8b	Bump versions after 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-07-13 00:29:05 +02:00
Michael Niedermayer	6f1b144358	Bump Versions for 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-07-13 00:27:37 +02:00
Andreas Rheinhardt	81d3472031	swscale/x86/swscale: Simplify macro This is possible now that it is no longer used by MMX. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:36:18 +02:00
Andreas Rheinhardt	a05f22eaf3	swscale/x86/swscale: Remove obsolete and harmful MMX(EXT) functions x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Moreover, some of the removed code was buggy/not bitexact and lead to failures involving the f32le and f32be versions of gray, gbrp and gbrap on x86-32 when SSE2 was not disabled. See e.g. https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx Notice that yuv2yuvX_mmx is not removed, because it is used by SSE3 and AVX2 as fallback in case of unaligned data and also for tail processing. I don't know why yuv2yuvX_mmxext isn't being used for this; an earlier version [1] of `554c2bc708` used it, but the version that was eventually applied does not. [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:36:04 +02:00
Andreas Rheinhardt	2831837182	swscale/x86/yuv2rgb: Remove obsolete MMX functions x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:35:50 +02:00
Andreas Rheinhardt	608319a311	swscale/x86/rgb2rgb: Remove obsolete MMX, 3dnow functions x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:35:38 +02:00
Andreas Rheinhardt	40e6575aa3	all: Replace if (ARCH_FOO) checks by #if ARCH_FOO This is more spec-compliant because it does not rely on dead-code elimination by the compiler. Especially MSVC has problems with this, as can be seen in https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html or https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html This commit does not eliminate every instance where we rely on dead code elimination: It only tackles branching to the initialization of arch-specific dsp code, not e.g. all uses of CONFIG_ and HAVE_ checks. But maybe it is already enough to compile FFmpeg with MSVC with whole-programm-optimizations enabled (if one does not disable too many components). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-15 04:56:37 +02:00
Vardan Margaryan	73302aa193	swscale/x86/yuv_2_rgb: fix access to memory past the frame data in yuv to rgb conversion Y, U, V data is loaded at the end of the current iteration for the next iteration. It results in memory access past the frame data on the last iteration (that data is never used after the loading). So load data at the start of the iteration, so that only useful data is loaded. Signed-off-by: Vardan Margaryan <v.t.margaryan@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-06-06 09:51:17 +02:00
Swinney, Jonathan	0ea61725b1	swscale/aarch64: add hscale specializations This patch adds code to support specializations of the hscale function and adds a specialization for filterSize == 4. ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is loading the data from src, this data is loaded a whole block ahead and stored back to the stack to be loaded again with ld4. This arranges the data for most efficient use of the vector instructions and removes the need for completion adds at the end. The number of iterations of the C per iteration of the assembly is increased from 4 to 8, but because of the prefetching, there must be a special section without prefetching when dstW < 16. This improves speed on Graviton 2 (Neoverse N1) dramatically in the case where previously fs=8 would have been required. before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8 after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-05-28 01:09:05 +03:00
Andreas Rheinhardt	f2b79c5b85	lib*/version: Move library version functions into files of their own This avoids having to rebuild big files every time FFMPEG_VERSION changes (which it does with every commit). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-10 06:49:32 +02:00
Martin Storsjö	70db14376c	swscale: aarch64: Optimize the final summation in the hscale routine Before: Cortex A53 A72 A73 Graviton 2 Graviton 3 hscale_8_to_15_width8_neon: 8273.0 4602.5 4289.5 2429.7 1629.1 hscale_8_to_15_width16_neon: 12405.7 6803.0 6359.0 3549.0 2378.4 hscale_8_to_15_width32_neon: 21258.7 11491.7 11469.2 5797.2 3919.6 hscale_8_to_15_width40_neon: 25652.0 14173.7 12488.2 6893.5 4810.4 After: hscale_8_to_15_width8_neon: 7633.0 3981.5 3350.2 1980.7 1261.1 hscale_8_to_15_width16_neon: 11666.7 5951.0 5512.0 3080.7 2131.4 hscale_8_to_15_width32_neon: 20900.7 10733.2 9481.7 5275.2 3862.1 hscale_8_to_15_width40_neon: 24826.0 13536.2 11502.0 6397.2 4731.9 Thus, this gives overall a 8-29% speedup for the smaller filter sizes, around 1-8% for the larger filter sizes. Inspired by a patch by Jonathan Swinney <jswinney@amazon.com>. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-04-22 10:49:46 +03:00
Martin Storsjö	2d368392a5	Keep including the full version.h when headers are included externally This avoids unnecessary churn and build breakage for users, by making sure the whole version.h is included like it has been so far, while keeping the benefit of not needing to rebuild most files in the ffmpeg tree on minor/micro bumps. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-03-19 00:01:57 +02:00
Martin Storsjö	f3a0e2ee2b	doc: Add an entry to APIchanges about changes to version.h and version_major.h Also bump the minor versions of all libraries, to signify the API change of splitting the version.h headers and adding the new version_major.h header. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-03-16 14:12:46 +02:00
Martin Storsjö	6cd2ac388d	libswscale: Split version.h Signed-off-by: Martin Storsjö <martin@martin.st>	2022-03-16 14:05:26 +02:00
Martin Storsjö	c523724c69	swscale: Take the destination range into account for yuv->rgb->yuv conversions The range parameters need to be set up before calling sws_init_context (which selects which fastpaths can be used; this gets called by sws_getContext); solely passing them via sws_setColorspaceDetails isn't enough. This fixes producing full range YUV range output when doing YUV->YUV conversions between different YUV color spaces. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-02-25 11:01:17 +02:00
Andreas Rheinhardt	636631d9db	Remove unnecessary libavutil/(avutil\|common\|internal).h inclusions Some of these were made possible by moving several common macros to libavutil/macros.h. While just at it, also improve the other headers a bit. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-02-24 12:56:49 +01:00
Andreas Rheinhardt	155cd6baa4	Remove obsolete version.h inclusions Forgotten in `e7bd47e657`. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-02-24 12:56:49 +01:00
Alan Kelly	e534d98af3	libswscale: Re-factor ff_shuffle_filter_coefficients. Make the code more readable and follow the style guide. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-02-17 17:17:22 +01:00
Alan Kelly	f1a5414c97	libswscale: Check and propagate memory allocation errors from ff_shuffle_filter_coefficients. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-02-17 17:17:07 +01:00
Andreas Rheinhardt	71e2825150	swscale/x86/swscale: Remove superfluous and invalid ';' Inside a function an unnecessary ';' is just a null statement; yet outside of it it is actually illegal (but compilers happen to accept it without warning except when using -pedantic). So modify the macros to always expect the user to add a ';'. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-01-22 17:00:45 +01:00
Mark Reid	52f7026164	swscale/x86/input.asm: add x86-optimized planer rgb2yuv functions sse2 only operates on 2 lanes per loop for to_y and to_uv functions, due to the lack of pmulld instruction. Emulating pmulld with 2 pmuludq and shuffles proved too costly and made to_uv functions slower then the c implementation. For to_y on sse2 only float functions are generated, I was are not able outperform the c implementation on the integer pixel formats. For to_a on see4 only the float functions are generated. sse2 and sse4 generated nearly identical performing code on integer pixel formats, so only sse2/avx2 versions are generated. planar_gbrp_to_y_512_c: 1197.5 planar_gbrp_to_y_512_sse4: 444.5 planar_gbrp_to_y_512_avx2: 287.5 planar_gbrap_to_y_512_c: 1204.5 planar_gbrap_to_y_512_sse4: 447.5 planar_gbrap_to_y_512_avx2: 289.5 planar_gbrp9be_to_y_512_c: 1380.0 planar_gbrp9be_to_y_512_sse4: 543.5 planar_gbrp9be_to_y_512_avx2: 340.0 planar_gbrp9le_to_y_512_c: 1200.5 planar_gbrp9le_to_y_512_sse4: 442.0 planar_gbrp9le_to_y_512_avx2: 282.0 planar_gbrp10be_to_y_512_c: 1378.5 planar_gbrp10be_to_y_512_sse4: 544.0 planar_gbrp10be_to_y_512_avx2: 337.5 planar_gbrp10le_to_y_512_c: 1200.0 planar_gbrp10le_to_y_512_sse4: 448.0 planar_gbrp10le_to_y_512_avx2: 285.5 planar_gbrap10be_to_y_512_c: 1380.0 planar_gbrap10be_to_y_512_sse4: 542.0 planar_gbrap10be_to_y_512_avx2: 340.5 planar_gbrap10le_to_y_512_c: 1199.0 planar_gbrap10le_to_y_512_sse4: 446.0 planar_gbrap10le_to_y_512_avx2: 289.5 planar_gbrp12be_to_y_512_c: 10563.0 planar_gbrp12be_to_y_512_sse4: 542.5 planar_gbrp12be_to_y_512_avx2: 339.0 planar_gbrp12le_to_y_512_c: 1201.0 planar_gbrp12le_to_y_512_sse4: 440.5 planar_gbrp12le_to_y_512_avx2: 286.0 planar_gbrap12be_to_y_512_c: 1701.5 planar_gbrap12be_to_y_512_sse4: 917.0 planar_gbrap12be_to_y_512_avx2: 338.5 planar_gbrap12le_to_y_512_c: 1201.0 planar_gbrap12le_to_y_512_sse4: 444.5 planar_gbrap12le_to_y_512_avx2: 288.0 planar_gbrp14be_to_y_512_c: 1370.5 planar_gbrp14be_to_y_512_sse4: 545.0 planar_gbrp14be_to_y_512_avx2: 338.5 planar_gbrp14le_to_y_512_c: 1199.0 planar_gbrp14le_to_y_512_sse4: 444.0 planar_gbrp14le_to_y_512_avx2: 279.5 planar_gbrp16be_to_y_512_c: 1364.0 planar_gbrp16be_to_y_512_sse4: 544.5 planar_gbrp16be_to_y_512_avx2: 339.5 planar_gbrp16le_to_y_512_c: 1201.0 planar_gbrp16le_to_y_512_sse4: 445.5 planar_gbrp16le_to_y_512_avx2: 280.5 planar_gbrap16be_to_y_512_c: 1377.0 planar_gbrap16be_to_y_512_sse4: 545.0 planar_gbrap16be_to_y_512_avx2: 338.5 planar_gbrap16le_to_y_512_c: 1201.0 planar_gbrap16le_to_y_512_sse4: 442.0 planar_gbrap16le_to_y_512_avx2: 279.0 planar_gbrpf32be_to_y_512_c: 4113.0 planar_gbrpf32be_to_y_512_sse2: 2438.0 planar_gbrpf32be_to_y_512_sse4: 1068.0 planar_gbrpf32be_to_y_512_avx2: 904.5 planar_gbrpf32le_to_y_512_c: 3818.5 planar_gbrpf32le_to_y_512_sse2: 2024.5 planar_gbrpf32le_to_y_512_sse4: 1241.5 planar_gbrpf32le_to_y_512_avx2: 657.0 planar_gbrapf32be_to_y_512_c: 3707.0 planar_gbrapf32be_to_y_512_sse2: 2444.0 planar_gbrapf32be_to_y_512_sse4: 1077.0 planar_gbrapf32be_to_y_512_avx2: 909.0 planar_gbrapf32le_to_y_512_c: 3822.0 planar_gbrapf32le_to_y_512_sse2: 2024.5 planar_gbrapf32le_to_y_512_sse4: 1176.0 planar_gbrapf32le_to_y_512_avx2: 658.5 planar_gbrp_to_uv_512_c: 2325.8 planar_gbrp_to_uv_512_sse2: 1726.8 planar_gbrp_to_uv_512_sse4: 771.8 planar_gbrp_to_uv_512_avx2: 506.8 planar_gbrap_to_uv_512_c: 2281.8 planar_gbrap_to_uv_512_sse2: 1726.3 planar_gbrap_to_uv_512_sse4: 768.3 planar_gbrap_to_uv_512_avx2: 496.3 planar_gbrp9be_to_uv_512_c: 2336.8 planar_gbrp9be_to_uv_512_sse2: 1924.8 planar_gbrp9be_to_uv_512_sse4: 852.3 planar_gbrp9be_to_uv_512_avx2: 552.8 planar_gbrp9le_to_uv_512_c: 2270.3 planar_gbrp9le_to_uv_512_sse2: 1512.3 planar_gbrp9le_to_uv_512_sse4: 764.3 planar_gbrp9le_to_uv_512_avx2: 491.3 planar_gbrp10be_to_uv_512_c: 2281.8 planar_gbrp10be_to_uv_512_sse2: 1917.8 planar_gbrp10be_to_uv_512_sse4: 855.3 planar_gbrp10be_to_uv_512_avx2: 541.3 planar_gbrp10le_to_uv_512_c: 2269.8 planar_gbrp10le_to_uv_512_sse2: 1515.3 planar_gbrp10le_to_uv_512_sse4: 759.8 planar_gbrp10le_to_uv_512_avx2: 487.8 planar_gbrap10be_to_uv_512_c: 2382.3 planar_gbrap10be_to_uv_512_sse2: 1924.8 planar_gbrap10be_to_uv_512_sse4: 855.3 planar_gbrap10be_to_uv_512_avx2: 540.8 planar_gbrap10le_to_uv_512_c: 2382.3 planar_gbrap10le_to_uv_512_sse2: 1512.3 planar_gbrap10le_to_uv_512_sse4: 759.3 planar_gbrap10le_to_uv_512_avx2: 484.8 planar_gbrp12be_to_uv_512_c: 2283.8 planar_gbrp12be_to_uv_512_sse2: 1936.8 planar_gbrp12be_to_uv_512_sse4: 858.3 planar_gbrp12be_to_uv_512_avx2: 541.3 planar_gbrp12le_to_uv_512_c: 2278.8 planar_gbrp12le_to_uv_512_sse2: 1507.3 planar_gbrp12le_to_uv_512_sse4: 760.3 planar_gbrp12le_to_uv_512_avx2: 485.8 planar_gbrap12be_to_uv_512_c: 2385.3 planar_gbrap12be_to_uv_512_sse2: 1927.8 planar_gbrap12be_to_uv_512_sse4: 855.3 planar_gbrap12be_to_uv_512_avx2: 539.8 planar_gbrap12le_to_uv_512_c: 2377.3 planar_gbrap12le_to_uv_512_sse2: 1516.3 planar_gbrap12le_to_uv_512_sse4: 759.3 planar_gbrap12le_to_uv_512_avx2: 484.8 planar_gbrp14be_to_uv_512_c: 2283.8 planar_gbrp14be_to_uv_512_sse2: 1935.3 planar_gbrp14be_to_uv_512_sse4: 852.3 planar_gbrp14be_to_uv_512_avx2: 540.3 planar_gbrp14le_to_uv_512_c: 2276.8 planar_gbrp14le_to_uv_512_sse2: 1514.8 planar_gbrp14le_to_uv_512_sse4: 762.3 planar_gbrp14le_to_uv_512_avx2: 484.8 planar_gbrp16be_to_uv_512_c: 2383.3 planar_gbrp16be_to_uv_512_sse2: 1881.8 planar_gbrp16be_to_uv_512_sse4: 852.3 planar_gbrp16be_to_uv_512_avx2: 541.8 planar_gbrp16le_to_uv_512_c: 2378.3 planar_gbrp16le_to_uv_512_sse2: 1476.8 planar_gbrp16le_to_uv_512_sse4: 765.3 planar_gbrp16le_to_uv_512_avx2: 485.8 planar_gbrap16be_to_uv_512_c: 2382.3 planar_gbrap16be_to_uv_512_sse2: 1886.3 planar_gbrap16be_to_uv_512_sse4: 853.8 planar_gbrap16be_to_uv_512_avx2: 550.8 planar_gbrap16le_to_uv_512_c: 2381.8 planar_gbrap16le_to_uv_512_sse2: 1488.3 planar_gbrap16le_to_uv_512_sse4: 765.3 planar_gbrap16le_to_uv_512_avx2: 491.8 planar_gbrpf32be_to_uv_512_c: 4863.0 planar_gbrpf32be_to_uv_512_sse2: 3347.5 planar_gbrpf32be_to_uv_512_sse4: 1800.0 planar_gbrpf32be_to_uv_512_avx2: 1199.0 planar_gbrpf32le_to_uv_512_c: 4725.0 planar_gbrpf32le_to_uv_512_sse2: 2753.0 planar_gbrpf32le_to_uv_512_sse4: 1474.5 planar_gbrpf32le_to_uv_512_avx2: 927.5 planar_gbrapf32be_to_uv_512_c: 4859.0 planar_gbrapf32be_to_uv_512_sse2: 3269.0 planar_gbrapf32be_to_uv_512_sse4: 1802.0 planar_gbrapf32be_to_uv_512_avx2: 1201.5 planar_gbrapf32le_to_uv_512_c: 6338.0 planar_gbrapf32le_to_uv_512_sse2: 2756.5 planar_gbrapf32le_to_uv_512_sse4: 1476.0 planar_gbrapf32le_to_uv_512_avx2: 908.5 planar_gbrap_to_a_512_c: 383.3 planar_gbrap_to_a_512_sse2: 66.8 planar_gbrap_to_a_512_avx2: 43.8 planar_gbrap10be_to_a_512_c: 601.8 planar_gbrap10be_to_a_512_sse2: 86.3 planar_gbrap10be_to_a_512_avx2: 34.8 planar_gbrap10le_to_a_512_c: 602.3 planar_gbrap10le_to_a_512_sse2: 48.8 planar_gbrap10le_to_a_512_avx2: 31.3 planar_gbrap12be_to_a_512_c: 601.8 planar_gbrap12be_to_a_512_sse2: 111.8 planar_gbrap12be_to_a_512_avx2: 41.3 planar_gbrap12le_to_a_512_c: 385.8 planar_gbrap12le_to_a_512_sse2: 75.3 planar_gbrap12le_to_a_512_avx2: 39.8 planar_gbrap16be_to_a_512_c: 386.8 planar_gbrap16be_to_a_512_sse2: 79.8 planar_gbrap16be_to_a_512_avx2: 31.3 planar_gbrap16le_to_a_512_c: 600.3 planar_gbrap16le_to_a_512_sse2: 40.3 planar_gbrap16le_to_a_512_avx2: 30.3 planar_gbrapf32be_to_a_512_c: 1148.8 planar_gbrapf32be_to_a_512_sse2: 611.3 planar_gbrapf32be_to_a_512_sse4: 234.8 planar_gbrapf32be_to_a_512_avx2: 183.3 planar_gbrapf32le_to_a_512_c: 851.3 planar_gbrapf32le_to_a_512_sse2: 263.3 planar_gbrapf32le_to_a_512_sse4: 199.3 planar_gbrapf32le_to_a_512_avx2: 156.8 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2022-01-11 16:34:33 -03:00
Mark Reid	9e445a5be2	swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions changes since v2: * fixed label changes since v1: * remove vex intruction on sse4 path * some load/pack marcos use less intructions * fixed some typos yuv2gbrp_full_X_4_512_c: 12757.6 yuv2gbrp_full_X_4_512_sse2: 8946.6 yuv2gbrp_full_X_4_512_sse4: 5138.6 yuv2gbrp_full_X_4_512_avx2: 3889.6 yuv2gbrap_full_X_4_512_c: 15368.6 yuv2gbrap_full_X_4_512_sse2: 11916.1 yuv2gbrap_full_X_4_512_sse4: 6294.6 yuv2gbrap_full_X_4_512_avx2: 3477.1 yuv2gbrp9be_full_X_4_512_c: 14381.6 yuv2gbrp9be_full_X_4_512_sse2: 9139.1 yuv2gbrp9be_full_X_4_512_sse4: 5150.1 yuv2gbrp9be_full_X_4_512_avx2: 2834.6 yuv2gbrp9le_full_X_4_512_c: 12990.1 yuv2gbrp9le_full_X_4_512_sse2: 9118.1 yuv2gbrp9le_full_X_4_512_sse4: 5132.1 yuv2gbrp9le_full_X_4_512_avx2: 2833.1 yuv2gbrp10be_full_X_4_512_c: 14401.6 yuv2gbrp10be_full_X_4_512_sse2: 9133.1 yuv2gbrp10be_full_X_4_512_sse4: 5126.1 yuv2gbrp10be_full_X_4_512_avx2: 2837.6 yuv2gbrp10le_full_X_4_512_c: 12718.1 yuv2gbrp10le_full_X_4_512_sse2: 9106.1 yuv2gbrp10le_full_X_4_512_sse4: 5120.1 yuv2gbrp10le_full_X_4_512_avx2: 2826.1 yuv2gbrap10be_full_X_4_512_c: 18535.6 yuv2gbrap10be_full_X_4_512_sse2: 33617.6 yuv2gbrap10be_full_X_4_512_sse4: 6264.1 yuv2gbrap10be_full_X_4_512_avx2: 3422.1 yuv2gbrap10le_full_X_4_512_c: 16724.1 yuv2gbrap10le_full_X_4_512_sse2: 11787.1 yuv2gbrap10le_full_X_4_512_sse4: 6282.1 yuv2gbrap10le_full_X_4_512_avx2: 3441.6 yuv2gbrp12be_full_X_4_512_c: 13723.6 yuv2gbrp12be_full_X_4_512_sse2: 9128.1 yuv2gbrp12be_full_X_4_512_sse4: 7997.6 yuv2gbrp12be_full_X_4_512_avx2: 2844.1 yuv2gbrp12le_full_X_4_512_c: 12257.1 yuv2gbrp12le_full_X_4_512_sse2: 9107.6 yuv2gbrp12le_full_X_4_512_sse4: 5142.6 yuv2gbrp12le_full_X_4_512_avx2: 2837.6 yuv2gbrap12be_full_X_4_512_c: 18511.1 yuv2gbrap12be_full_X_4_512_sse2: 12156.6 yuv2gbrap12be_full_X_4_512_sse4: 6251.1 yuv2gbrap12be_full_X_4_512_avx2: 3444.6 yuv2gbrap12le_full_X_4_512_c: 16687.1 yuv2gbrap12le_full_X_4_512_sse2: 11785.1 yuv2gbrap12le_full_X_4_512_sse4: 6243.6 yuv2gbrap12le_full_X_4_512_avx2: 3446.1 yuv2gbrp14be_full_X_4_512_c: 13690.6 yuv2gbrp14be_full_X_4_512_sse2: 9120.6 yuv2gbrp14be_full_X_4_512_sse4: 5138.1 yuv2gbrp14be_full_X_4_512_avx2: 2843.1 yuv2gbrp14le_full_X_4_512_c: 14995.6 yuv2gbrp14le_full_X_4_512_sse2: 9119.1 yuv2gbrp14le_full_X_4_512_sse4: 5126.1 yuv2gbrp14le_full_X_4_512_avx2: 2843.1 yuv2gbrp16be_full_X_4_512_c: 12367.1 yuv2gbrp16be_full_X_4_512_sse2: 8233.6 yuv2gbrp16be_full_X_4_512_sse4: 4820.1 yuv2gbrp16be_full_X_4_512_avx2: 2666.6 yuv2gbrp16le_full_X_4_512_c: 10904.1 yuv2gbrp16le_full_X_4_512_sse2: 8214.1 yuv2gbrp16le_full_X_4_512_sse4: 4824.1 yuv2gbrp16le_full_X_4_512_avx2: 2629.1 yuv2gbrap16be_full_X_4_512_c: 26569.6 yuv2gbrap16be_full_X_4_512_sse2: 10884.1 yuv2gbrap16be_full_X_4_512_sse4: 5488.1 yuv2gbrap16be_full_X_4_512_avx2: 3272.1 yuv2gbrap16le_full_X_4_512_c: 14010.1 yuv2gbrap16le_full_X_4_512_sse2: 10562.1 yuv2gbrap16le_full_X_4_512_sse4: 5463.6 yuv2gbrap16le_full_X_4_512_avx2: 3255.1 yuv2gbrpf32be_full_X_4_512_c: 14524.1 yuv2gbrpf32be_full_X_4_512_sse2: 8552.6 yuv2gbrpf32be_full_X_4_512_sse4: 4636.1 yuv2gbrpf32be_full_X_4_512_avx2: 2474.6 yuv2gbrpf32le_full_X_4_512_c: 13060.6 yuv2gbrpf32le_full_X_4_512_sse2: 9682.6 yuv2gbrpf32le_full_X_4_512_sse4: 4298.1 yuv2gbrpf32le_full_X_4_512_avx2: 2453.1 yuv2gbrapf32be_full_X_4_512_c: 18629.6 yuv2gbrapf32be_full_X_4_512_sse2: 11363.1 yuv2gbrapf32be_full_X_4_512_sse4: 15201.6 yuv2gbrapf32be_full_X_4_512_avx2: 3727.1 yuv2gbrapf32le_full_X_4_512_c: 16677.6 yuv2gbrapf32le_full_X_4_512_sse2: 10221.6 yuv2gbrapf32le_full_X_4_512_sse4: 5693.6 yuv2gbrapf32le_full_X_4_512_avx2: 3656.6 Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2022-01-11 16:33:17 -03:00
rcombs	df9180d8a0	swscale/output: use isSwappedChroma	2022-01-04 19:39:22 -06:00
rcombs	cb3a6cc082	swscale/output: use isSemiPlanarYUV for NV12/21/24/42 case	2022-01-04 19:39:22 -06:00
rcombs	f8e284be69	swscale: introduce isSwappedChroma	2022-01-04 19:39:22 -06:00
rcombs	bb4f19f2a2	swscale/output: use isDataInHighBits for 10-bit case This code will need fleshing-out (probably templating) if we ever add e.g. a P012 format.	2022-01-04 19:39:22 -06:00
rcombs	cf9e8cb52f	swscale/output: use isSemiPlanarYUV for 16-bit case	2022-01-04 19:39:22 -06:00
rcombs	e5d83463c8	swscale: introduce isDataInHighBits	2022-01-04 19:39:22 -06:00
rcombs	cb87a3b137	swscale/output: template-ize yuv2nv12cX 10-bit and 16-bit cases Fixes incorrect big-endian output introduced in `88d804b7ff` Avoids making the filter-time BE check more expensive	2022-01-04 19:39:22 -06:00
Andreas Rheinhardt	b189550137	lib*/version.h: Bump Versions after release/5.0 branch This is done a second time for 5.0 because master was merged into 5.0 so that it contains the recent DOVI additions. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-01-04 14:29:06 +01:00
Andreas Rheinhardt	c512be9a90	lib*/version.h: Bump Versions before release/5.0 branch Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-01-04 13:40:03 +01:00
Andreas Rheinhardt	20b0d24c2f	Makefile: Redo duplicating object files in shared builds In case of shared builds, some object files containing tables are currently duplicated into other libraries: log2_tab.c, golomb.c, reverse.c. The check for whether this is duplicated is simply whether CONFIG_SHARED is true. Yet this is crude: E.g. libavdevice includes reverse.c for shared builds, but only needs it for the decklink input device, which given that decklink is not enabled by default will be unused in most libavdevice.so. This commit changes this by making it more explicit about what to duplicate from other libraries. To do this, two new Makefile variables were added: SHLIBOBJS and STLIBOBJS. SHLIBOBJS contains the objects that are duplicated from other libraries in case of shared builds; STLIBOBJS contains stuff that a library has to provide for other libraries in case of static builds. These new variables provide a way to enable/disable with a finer granularity than just whether shared builds are enabled or not. E.g. lavd's Makefile now contains: SHLIBOBJS-$(CONFIG_DECKLINK_INDEV) += reverse.o Another example is provided by the golomb tables. These are provided by lavc for static builds, even if one uses a build configuration that makes only lavf use them. Therefore lavc's Makefile contains STLIBOBJS-$(CONFIG_MXF_MUXER) += golomb.o, whereas lavf's Makefile has a corresponding SHLIBOBJS-$(CONFIG_MXF_MUXER) += golomb_tab.o. E.g. in case the MXF muxer is the only component needing these tables only libavformat.so will contain them for shared builds; currently libavcodec.so does so, too. (There is currently a CONFIG_EXTRA group for golomb. But actually one would need two groups (golomb_avcodec and golomb_avformat) in order to know when and where to include these tables. Therefore this commit uses a Makefile-based approach for this and stops using these groups for the users in libavformat.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-01-04 05:01:04 +01:00
Michael Niedermayer	4be85c9331	lib*/version.h: Bump Versions after release/5.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-01-03 22:10:46 +01:00
Michael Niedermayer	f3964a59e1	lib*/version.h: Bump Versions before release/5.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-01-03 22:08:31 +01:00
rcombs	3e00b9e395	swscale/x86/init: use isSemiPlanarYUV Fixes P210/P410 cases introduced (and broken) in `88d804b7ff`	2021-12-23 01:41:03 -06:00
rcombs	88d804b7ff	swscale: add P210/P410/P216/P416 output	2021-12-22 18:38:40 -06:00
Alan Kelly	eebe406c80	libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions. This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions are only used where they are faster.	2021-12-21 17:44:53 -03:00
James Almer	eab91c3e2e	x86/scale_avx2: don't use $ for hex literals Fixes compilation with AVX2 enabled yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-16 17:29:21 -03:00
Alan Kelly	9092e58c44	x86/scale_avx2: Change asm indent from 2 to 4 spaces. Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-16 13:42:04 -03:00
Alan Kelly	86663963e6	x86/swscale: fix minor coding style issues Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-16 13:16:04 -03:00
James Almer	76a3f961f8	x86/scale_avx2: add missing check for AVX2 assembler support Should fix compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-16 09:41:56 -03:00
Alan Kelly	f900a19fa9	libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes. Fixes so that fate under 64 bit Windows passes. These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available. Signed-off-by: James Almer <jamrial@gmail.com>	2021-12-15 20:04:59 -03:00
Andreas Rheinhardt	3be6fe9a56	swscale/yuv2rgb: Silence a set-but-unused-variable warning Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-12-03 16:10:51 +01:00
rcombs	f0204de47d	swscale: add P210/P410/P216/P416 input	2021-11-28 16:40:43 -06:00
Mark Reid	3f4ce004b8	swscale/input: clip rgbf32 values before lrintf if the float pixel * 65535.0f > 2147483647.0f lrintf may overfow and return negative values, depending on implementation. nan and +/-inf values may also be implementation defined clip the value first so lrintf always works. values < 0.0f, -inf, nan = 0.0f values > 65535.0f, +inf = 65535.0f old timings 195960 decicycles in planar_rgbf32le_to_uv, 1 runs, 0 skips 186120 decicycles in planar_rgbf32le_to_uv, 2 runs, 0 skips 188645 decicycles in planar_rgbf32le_to_uv, 4 runs, 0 skips 183625 decicycles in planar_rgbf32le_to_uv, 8 runs, 0 skips 181157 decicycles in planar_rgbf32le_to_uv, 16 runs, 0 skips 177533 decicycles in planar_rgbf32le_to_uv, 32 runs, 0 skips 175689 decicycles in planar_rgbf32le_to_uv, 64 runs, 0 skips 232960 decicycles in planar_rgbf32be_to_uv, 1 runs, 0 skips 221380 decicycles in planar_rgbf32be_to_uv, 2 runs, 0 skips 216640 decicycles in planar_rgbf32be_to_uv, 4 runs, 0 skips 213505 decicycles in planar_rgbf32be_to_uv, 8 runs, 0 skips 211558 decicycles in planar_rgbf32be_to_uv, 16 runs, 0 skips 210596 decicycles in planar_rgbf32be_to_uv, 32 runs, 0 skips 210202 decicycles in planar_rgbf32be_to_uv, 64 runs, 0 skips 161680 decicycles in planar_rgbf32le_to_y, 1 runs, 0 skips 153540 decicycles in planar_rgbf32le_to_y, 2 runs, 0 skips 148255 decicycles in planar_rgbf32le_to_y, 4 runs, 0 skips 140600 decicycles in planar_rgbf32le_to_y, 8 runs, 0 skips 132935 decicycles in planar_rgbf32le_to_y, 16 runs, 0 skips 128531 decicycles in planar_rgbf32le_to_y, 32 runs, 0 skips 140933 decicycles in planar_rgbf32le_to_y, 64 runs, 0 skips 190980 decicycles in planar_rgbf32be_to_y, 1 runs, 0 skips 176080 decicycles in planar_rgbf32be_to_y, 2 runs, 0 skips 167980 decicycles in planar_rgbf32be_to_y, 4 runs, 0 skips 164685 decicycles in planar_rgbf32be_to_y, 8 runs, 0 skips 162751 decicycles in planar_rgbf32be_to_y, 16 runs, 0 skips 162404 decicycles in planar_rgbf32be_to_y, 32 runs, 0 skips 167849 decicycles in planar_rgbf32be_to_y, 64 runs, 0 skips new timings 183320 decicycles in planar_rgbf32le_to_uv, 1 runs, 0 skips 175700 decicycles in planar_rgbf32le_to_uv, 2 runs, 0 skips 179570 decicycles in planar_rgbf32le_to_uv, 4 runs, 0 skips 172932 decicycles in planar_rgbf32le_to_uv, 8 runs, 0 skips 168707 decicycles in planar_rgbf32le_to_uv, 16 runs, 0 skips 165224 decicycles in planar_rgbf32le_to_uv, 32 runs, 0 skips 163423 decicycles in planar_rgbf32le_to_uv, 64 runs, 0 skips 184940 decicycles in planar_rgbf32be_to_uv, 1 runs, 0 skips 185150 decicycles in planar_rgbf32be_to_uv, 2 runs, 0 skips 185790 decicycles in planar_rgbf32be_to_uv, 4 runs, 0 skips 185472 decicycles in planar_rgbf32be_to_uv, 8 runs, 0 skips 185277 decicycles in planar_rgbf32be_to_uv, 16 runs, 0 skips 185813 decicycles in planar_rgbf32be_to_uv, 32 runs, 0 skips 185332 decicycles in planar_rgbf32be_to_uv, 64 runs, 0 skips 145400 decicycles in planar_rgbf32le_to_y, 1 runs, 0 skips 145100 decicycles in planar_rgbf32le_to_y, 2 runs, 0 skips 143490 decicycles in planar_rgbf32le_to_y, 4 runs, 0 skips 136687 decicycles in planar_rgbf32le_to_y, 8 runs, 0 skips 131271 decicycles in planar_rgbf32le_to_y, 16 runs, 0 skips 128698 decicycles in planar_rgbf32le_to_y, 32 runs, 0 skips 127170 decicycles in planar_rgbf32le_to_y, 64 runs, 0 skips 156020 decicycles in planar_rgbf32be_to_y, 1 runs, 0 skips 146990 decicycles in planar_rgbf32be_to_y, 2 runs, 0 skips 142020 decicycles in planar_rgbf32be_to_y, 4 runs, 0 skips 141052 decicycles in planar_rgbf32be_to_y, 8 runs, 0 skips 138973 decicycles in planar_rgbf32be_to_y, 16 runs, 0 skips 138027 decicycles in planar_rgbf32be_to_y, 32 runs, 0 skips 143939 decicycles in planar_rgbf32be_to_y, 64 runs, 0 skips Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2021-11-15 16:50:10 -03:00
Mark Reid	74e49cc583	swscale/input: unify grayf32 funcs with rgbf32 funcs This is ment to be a cosmetic change old timings: 42780 UNITS in grayf32le, 1 runs, 0 skips 56720 UNITS in grayf32le, 2 runs, 0 skips 67265 UNITS in grayf32le, 4 runs, 0 skips 58082 UNITS in grayf32le, 8 runs, 0 skips 63512 UNITS in grayf32le, 16 runs, 0 skips 52720 UNITS in grayf32le, 32 runs, 0 skips 46491 UNITS in grayf32le, 64 runs, 0 skips 68500 UNITS in grayf32be, 1 runs, 0 skips 66930 UNITS in grayf32be, 2 runs, 0 skips 62305 UNITS in grayf32be, 4 runs, 0 skips 55510 UNITS in grayf32be, 8 runs, 0 skips 50216 UNITS in grayf32be, 16 runs, 0 skips 44480 UNITS in grayf32be, 32 runs, 0 skips 42394 UNITS in grayf32be, 64 runs, 0 skips new timings: 46660 UNITS in grayf32le, 1 runs, 0 skips 51830 UNITS in grayf32le, 2 runs, 0 skips 53390 UNITS in grayf32le, 4 runs, 0 skips 50910 UNITS in grayf32le, 8 runs, 0 skips 44968 UNITS in grayf32le, 16 runs, 0 skips 40349 UNITS in grayf32le, 32 runs, 0 skips 38330 UNITS in grayf32le, 64 runs, 0 skips 39980 UNITS in grayf32be, 1 runs, 0 skips 49630 UNITS in grayf32be, 2 runs, 0 skips 53540 UNITS in grayf32be, 4 runs, 0 skips 59767 UNITS in grayf32be, 8 runs, 0 skips 51206 UNITS in grayf32be, 16 runs, 0 skips 44743 UNITS in grayf32be, 32 runs, 0 skips 41468 UNITS in grayf32be, 64 runs, 0 skips Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-11-14 17:12:13 +01:00
Soft Works	58dce6f010	swscale/swscale: check SWS_PRINT_INFO flag for printing alignment warnings This makes output consistent with a similar warning just few lines above where this flag is checked in the same way. Signed-off-by: softworkz <softworkz@hotmail.com> Signed-off-by: Marton Balint <cus@passwd.hu>	2021-11-13 19:55:32 +01:00
Mark Reid	d2379bd6a0	swscale/input: fix planar_rgb16_to_a for gbrap10be and gbrap12be formats Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-11-04 11:52:33 +01:00
Michael Niedermayer	8316b2a15f	swscale/swscale: Improve *ColorspaceDetails() doxy Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-24 16:54:36 +02:00
Michael Niedermayer	5f3a160b42	swscale/utils: Improve return codes of sws_setColorspaceDetails() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-24 16:54:36 +02:00
Michael Niedermayer	c7699f95bb	swscale/utils: Set all threads to the same colorspace even on failure Fixes: ./ffplay dav.y4m -vf "scale=hd1080:threads=4" Found-by: Paul Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-24 16:54:36 +02:00
Wu Jianhua	2c734a8496	libswscale/x86/rgb2rgb: add shuffle_bytes avx2 Performance data(Less is better): shuffle_bytes_ssse3 3.64654 shuffle_bytes_avx2 0.94288 Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-10-15 10:59:20 +02:00
Michael Niedermayer	f801207568	swscale/swscale: Pass slice location into unscaled code also for dst scaling Fixes: alphablend=checkerboard Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-03 20:38:29 +02:00
Michael Niedermayer	06d6726588	swscale/alphablend: Fix slice handling Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-10-03 20:38:29 +02:00
Michael Niedermayer	9f40b5badb	swscale/swscale_internal: Avoid unsigned for slice parameters Mixing unsigned and signed often leads to unexpected arithmetic results. Fixes: out of array write Found-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-09-30 19:47:15 +02:00
Manuel Stoeckl	32329397e2	swscale: add input/output support for X2BGR10LE Signed-off-by: Manuel Stoeckl <code@mstoeckl.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-09-26 16:26:10 +02:00
Manuel Stoeckl	ca594df622	swscale/yuv2rgb: fix conversion to X2RGB10 This resolves a problem where conversions from YUV to X2RGB10LE would produce color values a factor 4 too small, because an 8-bit value was placed in a 10-bit channel. Signed-off-by: Manuel Stoeckl <code@mstoeckl.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-09-26 16:26:10 +02:00
Andreas Rheinhardt	1ea3650823	Replace all occurences of av_mallocz_array() by av_calloc() They do the same. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-20 01:03:52 +02:00
Andreas Rheinhardt	044a7c08dc	swscale/swscale: Disable x86-specific code for other arches SSE2 is x86 specific, yet due to the call to av_get_cpu_flags() compilers were unable to optimize the checks (and the call) away on other arches. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-19 23:52:37 +02:00
Andreas Rheinhardt	f440c422b7	swscale/swscale: Fix races when using unaligned strides/data In this case the current code tries to warn once; to do so, it uses ordinary static ints to store whether the warning has already been emitted. This is both a data race (and therefore undefined behaviour) as well as a race condition, because it is really possible for multiple threads to be the one thread to emit the warning. This is actually common since the introduction of the new multithreaded scaling API. This commit fixes this by using atomic integers for the state; furthermore, these are not static anymore, but rather contained in the user-facing SwsContext (i.e. the parent SwsContext in case of slice-threading). Given that these atomic variables are not intended for synchronization at all (but only for atomicity, i.e. only to output the warning once), the atomic operations use memory_order_relaxed. This affected the nv12, nv21, yuv420, yuv420p10, yuv422, yuv422p10 and yuv444 filter-overlay FATE-tests. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-19 23:52:37 +02:00
Andreas Rheinhardt	a1255a350d	libswscale/options: Add parent_log_context_offset to AVClass This allows to associate log messages from slice contexts to the user-visible SwsContext. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-09-19 23:52:37 +02:00
James Almer	5fe648d04a	libswscale/swscale: initialize all dst plane pointers in sws_receive_slice() Fixes valgrind warnings about use of uninitialised values. Signed-off-by: James Almer <jamrial@gmail.com>	2021-09-07 09:44:58 -03:00
Anton Khirnov	d6fdc78e91	sws: implement slice threading	2021-09-06 09:17:53 +02:00
Anton Khirnov	42cd64c182	sws: add a new scaling API	2021-09-06 09:16:52 +02:00
Andreas Rheinhardt	2c05ee092b	avutil/internal, swresample/audioconvert: Remove cpu.h inclusions These inclusions are not necessary, as cpu.h is already included wherever it is needed (via direct inclusion or via the arch-specific headers). Also remove other unnecessary cpu.h inclusions from ordinary non-headers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-07-22 14:33:45 +02:00
Michael Niedermayer	7874d40f10	swscale/slice: Fix wrong return on error Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-09 15:21:37 +02:00
Michael Niedermayer	fa1e158ef6	swscale/utils: Use full chroma interpolation for rgb4/8 and dither none Dither none is only implemented in full chroma interpolation for these rgb formats Its also a obscure choice (producing less nice images) that implementing it in the other code-paths makes no sense Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-09 12:29:03 +02:00
Michael Niedermayer	7528532550	swscale/output: Implement dither none for yuv2rgb_write_full() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-09 12:29:03 +02:00
Michael Niedermayer	997f9cfc12	swscale/slice: Check slice for allocation failure Fixes: null pointer dereference Fixes: alloc_slice.mp4 Found-by: Rafael Dutra <rafael.dutra@cispa.de> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-09 12:29:03 +02:00
Anton Khirnov	37c0fe49b7	sws: move updating the palette higher up It does not interact in any way with the code setting up the image pointers/strides, so it should not be intermixed with it.	2021-07-03 16:13:40 +02:00
Anton Khirnov	d6649d9a3b	sws: move initializing dither_error higher up It does not interact in any way with the code setting up the image pointers/strides, so it should not be intermixed with it.	2021-07-03 16:13:10 +02:00
Anton Khirnov	e188985598	sws: move the early return for zero-sized slices higher up Place it right after the input parameter validation. There is no point in performing any setup if the sws_scale() call won't do anything.	2021-07-03 16:09:43 +02:00
Anton Khirnov	a91e6c927e	sws: simplify setting sliceDir	2021-07-03 16:09:21 +02:00
Anton Khirnov	ff753f41dd	sws: merge handling frame start into a single block Also, return an error code on failure rather than 0.	2021-07-03 16:09:07 +02:00
Anton Khirnov	1b11a324fe	sws: make checking for the start of a new frame more explicit	2021-07-03 16:07:22 +02:00
Anton Khirnov	0fb014b7bb	sws: reset sliceDir at the end of sws_scale() Makes it more clear that resetting it does not interact with the scaling code that it is currently intermixed with.	2021-07-03 16:05:39 +02:00
Anton Khirnov	1f80789bf7	sws: rename SwsContext.swscale to convert_unscaled That function pointer is now used only for unscaled conversion.	2021-07-03 15:57:53 +02:00
Anton Khirnov	fe490ec165	sws: separate the calls to scaled vs unscaled conversion Call the scaler function directly rather than through a function pointer. Drop the now-unused return value from ff_getSwsFunc() and rename the function to reflect its new role. This will be useful in the following commits, where it will become important that the amount of output is different for scaled vs unscaled case.	2021-07-03 15:57:13 +02:00
Anton Khirnov	0f8e0957d2	sws: do not reallocate scratch buffers for each slice	2021-07-03 15:56:16 +02:00
Anton Khirnov	2730639259	sws: group the parameters validity checks together Also, fail with an error code rather than 0.	2021-07-03 15:31:18 +02:00
Anton Khirnov	c05cab34a9	sws: initialize {src,dst}Stride2 consistently with {src,dst}2	2021-07-03 15:31:08 +02:00
Anton Khirnov	d3d8e09640	sws: cosmetics Reindent after previous commit, rewrap long lines.	2021-07-03 15:30:56 +02:00
Anton Khirnov	f136493d03	sws: factor out cascaded scaling	2021-07-03 15:30:34 +02:00
Anton Khirnov	a2254aedc9	sws: cosmetics Reindent after previous commit, split long lines.	2021-07-03 15:30:20 +02:00
Anton Khirnov	44f12718bf	sws: factor out gamma-correct scaling	2021-07-03 15:29:50 +02:00
Anton Khirnov	e355af9be9	sws: return an error code on invalid parameters to sws_scale()	2021-07-03 15:29:35 +02:00
Anton Khirnov	21a4e48f88	sws: reindent after previous commit	2021-07-03 15:29:22 +02:00
Anton Khirnov	27acca1af0	sws: factor out updating the palette	2021-07-03 15:28:46 +02:00
Anton Khirnov	f8c21ccbfc	sws: remove unnecessary braces There used to be more code inside them, but it was removed in `6de58b4903`.	2021-07-03 15:28:36 +02:00
Peter Lundblad	da0abbbb01	libswscale: Make sws_init_context thread safe. Call ff_sws_rgb2rgb_init via ff_thread_once instead of checking one of the variables it updates. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-07-01 23:49:41 +02:00
Limin Wang	43295ae6a9	swscale/swscale_unscaled: don't use the optimized bgr24toYV12 unscaled conversion when width%2 Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2021-06-06 12:34:05 +08:00
Anton Khirnov	85ba17f36d	Bump major versions of all libraries. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-04-27 11:48:05 -03:00
Andreas Rheinhardt	ea2d9b7a2e	libswscale: Remove unused deprecated functions, make used ones static Deprecated in `3b905b9fe6`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> Signed-off-by: James Almer <jamrial@gmail.com>	2021-04-27 10:43:11 -03:00
Andreas Rheinhardt	f3c197b129	Include attributes.h directly Some files currently rely on libavutil/cpu.h to include it for them; yet said file won't use include it any more after the currently deprecated functions are removed, so include attributes.h directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-04-19 14:34:10 +02:00
Alan Kelly	3ce8d09244	libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-04-01 20:47:52 +02:00
Alan Kelly	dc57762cb4	libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-04-01 20:47:52 +02:00
Michael Niedermayer	c361fa9e21	Bump minor versions after release branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-03-20 01:02:11 +01:00
Michael Niedermayer	c67d2a2875	Bump Versions before release/4.4 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2021-03-20 01:01:12 +01:00
Andreas Rheinhardt	c23a5523b5	swscale/x86/swscale: Remove unused ASM constants The last user of g15Mask, r15Mask, g16Mask and r16Mask was disabled in `77a416e8aa` and finally removed in 36e8de07ed62609df45d064b56501e3084d25723; b15Mask and b16Mask were apparently always unused (except for in_asm_used_var_warning_killer, a function that only existed to make the compiler not optimize ASM constants away). w10 is unused since `d604bab901`, w02 since `ef423a6618`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	2021-02-24 09:47:54 +01:00
Andreas Rheinhardt	aad597a93c	swscale/x86/rgb2rgb: Remove unused ASM constants mask24hh etc. are unused since `f099fbf5f3`, mask32b and mask32r since `296609f859`, mask32g since `b38d487466` and mask32 since `f8a138be52`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	2021-02-24 09:45:17 +01:00
Andreas Rheinhardt	49db6e4b4e	swscale/x86/yuv2rgb: Remove unused ASM constants mmx_grnmask is unused since `531f97b0c3`, the other constants since `e934194b6a`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	2021-02-24 09:43:14 +01:00
Chip Kerchner	e7f53d6ac9	lsws/ppc/yuv2rgb_altivec: Fix build in non-VSX environments Add inline function for vec_xl if VSX is not supported. vec_xl intrinsic is only available on POWER 7 or higher. Fixes ticket #8750. Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>	2021-02-22 23:19:21 -05:00
James Almer	1a555d3c60	swscale/x86/yuv2yuvX: use the movsxdifnidn helper macro Simplifies code Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-18 18:47:43 -03:00
James Almer	ebb48d85a0	swscale/x86/yuv2yuvX: use movq to load 8 bytes in all non-AVX2 functions mova expands to movq on non-XMM functions Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-18 18:47:43 -03:00
James Almer	d512ebbaed	swscale/x86/yuv2yuvX: use the SPLATW helper macro Simplifies code Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-18 18:47:43 -03:00
James Almer	c00567647e	swscale/x86/swscale: fix mix of inline and external function definitions This includes removing pointless static function forward declarations. Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-18 18:47:42 -03:00
James Almer	c2bf1dcace	swscale/x86/swscale: fix compilation with old yasm Where AVX2 may not be supported. Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-17 21:09:36 -03:00
Alan Kelly	554c2bc708	swscale: move yuv2yuvX_sse3 to yasm, unrolls main loop And other small optimizations for ~20% speedup.	2021-02-17 21:21:03 +01:00
Carl Eugen Hoyos	2687070d9b	lsws/ppc/yuv2rgb: Fix transparency converting from yuv->rgb32. Based on `68363b69` by Reimar Döffinger. Fixes ticket #9077.	2021-01-24 17:17:29 +01:00
Anton Khirnov	e15371061d	lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump They are not properly namespaced and not intended for public use.	2021-01-01 14:14:57 +01:00
Anton Khirnov	c8c2dfbc37	lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h That is a more appropriate place for it.	2021-01-01 14:11:01 +01:00
Jeremy Leconte	29cef1bcd6	libswscale: avoid UB nullptr-with-offset. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-12-24 15:27:56 +01:00
Andriy Gelman	1200264fc4	swscale/rgb2rgb_template: use shuffle macro on big-endian arches Fixes fate-qtrle-32bit on big-endian. The macro does a simple byte swap on uint8 array without any casts, so it's valid on big-endian arches. The mentioned test was failing because the byteswap function shuffle_bytes_3210_c() is used in the pixel format conversion (argb->bgra). Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>	2020-12-12 23:07:22 -05:00
Carl Eugen Hoyos	46e362b765	lsws/x86/yuv2rgb: Fix compilation with mmxext or ssse3 disabled. Fixes ticket #8986.	2020-11-14 15:37:57 +01:00
Marton Balint	993429cfb4	swscale/x86/yuv2rgb: fix crashes when loading alpha from unaligned buffers Regression since `fc6a5883d6` on SSSE3 enabled CPUs. Fixes ticket #8955. Signed-off-by: Marton Balint <cus@passwd.hu>	2020-11-02 00:31:34 +01:00
Jan Ekström	7ea4bcff7b	swscale/utils: override forced-zero formats back to full range Fixes vf_scale outputting RGB AVFrames with limited range flagged in case either input or output specifically sets the range. This is the reverse of the logic utilized for RGB and PAL8 content in sws_setColorspaceDetails.	2020-10-11 12:58:13 +03:00
Jan Ekström	3fe24fe232	swscale/utils: split range override check into its own function	2020-10-11 12:58:13 +03:00
Mark Reid	a48adcd136	libswcale/input: use more accurate planer rgb16 yuv conversions These conversion appears to be exhibiting the same rounding error as the rgbf32 formats where. I seperated the rounding value from the 16 and 128 offsets, I think it makes it a little more clear. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-10-06 17:56:52 +02:00
Mark Reid	453004fde6	libswcale/input: use more accurate rgbf32 yuv conversions Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-10-02 14:59:52 +02:00
Mark Reid	6bf57c6a2a	libswscale/tests: add floatimg_cmp test changes since v1: - made into fate test - fixed c90 warnings - tests more intermediate formats - tested on BE mips too Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-10-02 14:59:52 +02:00
James Almer	621e2625e0	swscale/x86/output: add missing AVX2 support preprocessor wrappers Fixes compilation with old yasm Signed-off-by: James Almer <jamrial@gmail.com>	2020-08-20 15:14:56 -03:00
Paul B Mahol	9d58cdb4ba	swscale: do not drop half of bits from 16bit bayer formats	2020-08-08 12:03:42 +02:00
Limin Wang	7c8ad72f1c	swscale/yuv2rgb: cosmetics Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2020-07-25 10:20:42 +08:00
Fei Wang	8544783280	swscale/yuv2rgb: consider x2rgb10le on big endian hardware This fixed FATE fail report by filter-pixfmts* for x2rgb10le on big endian hardware. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-07-20 21:00:00 +02:00

... 2 3 4 5 6 ...

2639 Commits