FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Nelson Gomez	7c39c3c1a6	swscale: make yuv2interleavedX more asm-friendly Extracting information from SwsContext in assembly is difficult, and rearranging SwsContext just for asm access didn't look good. These functions only need a couple of fields from it anyway, so just make them parameters in their own right. Signed-off-by: Nelson Gomez <nelson.gomez@microsoft.com>	2020-06-14 16:34:07 +01:00
Limin Wang	67a07dc778	swscale/utils: return better error code from initFilter() Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2020-06-14 21:54:40 +08:00
Limin Wang	8efecc9063	swscale/utils: reindent Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2020-06-14 21:54:40 +08:00
Limin Wang	a408d03ee6	swscale/utils: remove FF_ALLOC_ARRAY_OR_GOTO macros Signed-off-by: Limin Wang <lance.lmwang@gmail.com>	2020-06-13 06:59:19 +08:00
Fei Wang	c721b45014	swscale: Add swscale input/output support for X2RGB10LE Signed-off-by: Fei Wang <fei.w.wang@intel.com>	2020-06-12 17:56:15 +01:00
Michael Niedermayer	c5079bf3bc	Bump minor versions after branching 4.3 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-06-08 22:49:04 +02:00
Michael Niedermayer	0a8a96c251	Bump minor versions to separate 4.3 from master Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-06-08 22:49:04 +02:00
Martin Storsjö	e0604d508e	swscale: aarch64: Add a NEON implementation of interleaveBytes This allows speeding up format conversions from yuv420 to nv12. Cortex A53 A72 A73 interleave_bytes_c: 86077.5 51433.0 66972.0 interleave_bytes_neon: 19701.7 23019.2 15859.2 interleave_bytes_aligned_c: 86603.0 52017.2 67484.2 interleave_bytes_aligned_neon: 9061.0 7623.0 6309.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-15 23:38:17 +03:00
Josh de Kock	70b14cc8d6	swscale: arm: fix NEON hscale init The NEON hscale function only supports X8 filter sizes and should only be selected when these are being used. At the moment filterAlign is set to 8 but in the future when extra NEON assembly for specific sizes is added they will need to have checks here too. The immediate usecase for this change is making the hscale checkasm test easier and without NEON specific edge-cases (x86 already has these guards). This applies the same fix from `718c8f9aa5` on the 32 bit arm version of the function, fixing fate-checkasm-sw_scale there. Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-15 23:33:46 +03:00
Josh de Kock	718c8f9aa5	swscale: fix NEON hscale init The NEON hscale function only supports X8 filter sizes and should only be selected when these are being used. At the moment filterAlign is set to 8 but in the future when extra NEON assembly for specific sizes is added they will need to have checks here too. The immediate usecase for this change is making the hscale checkasm test easier and without NEON specific edge-cases (x86 already has these guards). Signed-off-by: Josh de Kock <josh@itanimul.li>	2020-05-15 10:29:30 +01:00
Mark Reid	fabeef22d9	libswscale: fix for floating point formats, require full chroma upon more floating point testing, looks like I missed adding this bit. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-05-12 01:00:28 +02:00
Mark Reid	b4967fc71c	libswscale: add output support for AV_PIX_FMT_GBRAPF32 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-05-05 20:06:58 +02:00
Mark Reid	ba5d0515a6	libswscale: add input support AV_PIX_FMT_GBRAPF32 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-05-05 20:06:58 +02:00
Andreas Rheinhardt	2fae000994	swscale/vscale: Increase type strictness libswscale/vscale.c makes extensive use of function pointers and in doing so it converts these function pointers to and from a pointer to void. Yet this is actually against the C standard: C90 only guarantees that one can convert a pointer to any incomplete type or object type to void* and back with the result comparing equal to the original which makes pointers to void generic pointers to incomplete or object type. Yet C90 lacks a generic function pointer type. C99 additionally guarantees that a pointer to a function of one type may be converted to a pointer to a function of another type with the result and the original comparing equal when converting back. This makes any function pointer type a generic function pointer type. Yet even this does not make pointers to void generic function pointers. Both GCC and Clang emit warnings for this when in pedantic mode. This commit fixes this by using a union that can hold one member of any of the required function pointer types to store the function pointer. This works even for C90. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	2020-04-27 23:34:31 +02:00
Martin Storsjö	9025d5c5ce	swscale: aarch64: Don't clobber callee-saved registers v8-v15 Signed-off-by: Martin Storsjö <martin@martin.st>	2020-04-21 23:41:13 +03:00
Martin Storsjö	872790b1f9	swscale: aarch64: Avoid using the x18 register The x18 is a reserved platform register on Darwin and Windows. x8/w8 seems to be unused in this function though (and same about x10 and x14), so there's really no reason to use x18 here - just change the uses of x18/w18 into x8/w8 instead without any further rewrites. Signed-off-by: Martin Storsjö <martin@martin.st>	2020-04-20 00:09:34 +03:00
Michael Niedermayer	be3c29e379	swscale/yuv2rgb: Fix vertical dither offset with slices Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-04-12 16:36:47 +02:00
Michael Niedermayer	e057e83a4f	swscale/output: Fix integer overflow in yuv2rgb_write_full() with out of range input Fixes: signed integer overflow: 1169365504 + 981452800 cannot be represented in type 'int' Fixes: ticket8293 Found-by: Suhwan Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-04-04 22:09:46 +02:00
Michael Niedermayer	49ba1879ad	swscale/output: Fix integer overflow in alpha computation in yuv2gbrp16_full_X_c() Fixes: signed integer overflow: 524280 * 4432 cannot be represented in type 'int' Fixes: ticket8322 Found-by: Suhwan Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-04-04 22:09:46 +02:00
Ruiling Song	4700f7d6fc	swscale/swscale: remove useless code Signed-off-by: Ruiling Song <ruiling.song@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-04-03 00:58:07 +02:00
Carl Eugen Hoyos	5f8c383452	lsws/input: Do not change transparency range. Fixes ticket #8509.	2020-03-11 22:55:49 +01:00
Ting Fu	828f7db5d9	libswscale/x86/yuv2rgb: Fix Segmentation Fault when load unaligned data Fixes ticket #8532 Signed-off-by: Ting Fu <ting.fu@intel.com>	2020-02-26 11:10:46 +01:00
Linjie Fu	d2aa1fbfd4	swscale: Add swscale input support for Y210LE Add swscale input support for Y210LE, output support and fate test could be added later if there is requirement for software CSC to this packed format. Signed-off-by: Linjie Fu <linjie.fu@intel.com>	2020-02-24 00:09:51 +00:00
Ting Fu	fc6a5883d6	libswscale/x86/yuv2rgb: add ssse3 version Tested using this command: /ffmpeg -pix_fmt yuv420p -s 19201080 -i ArashRawYuv420.yuv \ -vcodec rawvideo -s 19201080 -pix_fmt rgb24 -f null /dev/null The fps increase from 389 to 640 on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz Signed-off-by: Ting Fu <ting.fu@intel.com>	2020-02-10 15:08:33 +01:00
Gautam Ramakrishnan	da399e2135	libswscale/utils.c: Fix bug #8255 Bug #8255 points out a double free error in libwscale/utils.c file. The double free is because the pointer to cascaded_context of an sw_context is not set to NULL after freeing it. When the sw_context is later freed, sws_freeContext is called on the cascaded_context, causing a double free. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-02-09 23:33:18 +01:00
Ting Fu	e934194b6a	libswscale/x86/yuv2rgb: Change inline assembly into nasm code The original inline assembly and nasm code have the same fps when called by command. NASM code almost has no impact on the perfromance. Signed-off-by: Ting Fu <ting.fu@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-02-05 17:41:59 +01:00
Michael Niedermayer	d48e510124	swscale/input: Fix several invalid shifts related to rgb2yuv constants Fixes: Invalid shifts Fixes: #8140 Fixes: #8146 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-01-22 21:50:49 +01:00
Michael Niedermayer	7b7f97532b	swscale/output: Fix several invalid shifts in yuv2rgb_full_1_c_template() Fixes: Invalid shifts Fixes: #8320 Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-01-22 18:41:46 +01:00
Michael Niedermayer	a6ca22c118	swscale/swscale: Fix several invalid shifts related to vChrDrop Fixes: Invalid shifts Fixes: #8166 Fixes: filter-crop_scale_vflip FATE-test Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-01-22 18:41:46 +01:00
Carl Eugen Hoyos	96fab29e96	Silence "string-plus-int" warning shown by clang. libswscale/utils.c:89:42: warning: adding 'unsigned long' to a string does not append to the string [-Wstring-plus-int]	2020-01-06 22:38:56 +01:00
Sebastian Pop	c3a17ffff6	swscale/aarch64: use multiply accumulate and shift-right narrow This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and horizontal adds by using fused multiply adds. The patch also uses ld1r to load one element and replicate it across all lanes of the vector. The patch also improves the clipping code by removing the shift right instructions and performing the shift with the shift-right narrow instructions. I see 8% difference on an m6g instance with neoverse-n1 CPUs: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971 after: t:0.012985 avg:0.013013 max:0.013996 min:0.012818 Tested with `make check` on aarch64-linux. Signed-off-by: Sebastian Pop <spop@amazon.com> Reviewed-by: Clément Bœsch <u@pkh.me> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2020-01-04 20:59:31 +01:00
Zhao Zhili	1e3e547a5b	swscale/utils: remove access of AV_PIX_FMT_NB Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-12-31 12:37:47 +01:00
Sebastian Pop	bd83191271	swscale/aarch64: use multiply accumulate and increase vector factor to 4 This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate and bumps the vectorization factor from 2 to 4. The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214 after: t:0.032168 avg:0.032215 max:0.033081 min:0.032146 The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181 after: t:0.014015 avg:0.014096 max:0.015018 min:0.013971 Tested with `make check` on aarch64-linux. Signed-off-by: Sebastian Pop <spop@amazon.com> Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-12-17 23:41:47 +01:00
Limin Wang	8558c231fb	swscale/swscale_unscaled: add AV_PIX_FMT_GBRAP10 for LE and BE conversion wrapper Signed-off-by: Limin Wang <lance.lmwang@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-12-10 16:09:14 +01:00
Ting Fu	039a0ebe6f	libswscale/swscale_unscaled.c: remove redundant code Signed-off-by: Ting Fu <ting.fu@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-12-06 11:25:29 +01:00
Limin Wang	a5e24be52a	swscale/swscale_unscaled: fix gbrap10be md5 different on big endian system You can reproduce it by below command: ./ffmpeg -f lavfi -i "testsrc=duration=1:rate=30" -vf format=gbrap10 -vcodec rawvideo \ -pix_fmt gbrap10le -flags +bitexact -sws_flags +accurate_rnd+bitexact -fflags +bitexact \ -frames:v 1 -f nut md5: little-endian: f91e2edd8098276579c1929e5e160416 big-endian: ba4d011dbbdc78ccbf6cc7d698630929 Signed-off-by: Limin Wang <lance.lmwang@gmail.com> Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-11-01 14:43:16 +01:00
Michael Niedermayer	d260621089	swscale/output: Avoid 64bit in Alpha in yuv2ya16_X_c_template() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-10-16 19:17:57 +02:00
Michael Niedermayer	3e6682931b	swscale/output: Correct Alpha in yuv2ya16_X_c_template() Untested, no testcase Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-10-16 19:17:57 +02:00
Michael Niedermayer	4f4ca675e5	swscale/output: Implement Luma computation from yuv2ya16_X_c_template() without 64bit This also reverts `21838cad2f` The revert is in this commit to avoid 2 fate updates Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-10-16 19:17:57 +02:00
Daniel Kolesa	e6625ca41f	swscale: Fix AltiVec/VSX build with recent GCC The argument to vec_splat_u16 must be a literal. By making the function always inline and marking the arguments const, gcc can turn those into literals, and avoid build errors like: swscale_vsx.c:165:53: error: argument 1 must be a 5-bit signed literal Fixes #7861. Signed-off-by: Daniel Kolesa <daniel@octaforge.org> Signed-off-by: Lauri Kasanen <cand@gmx.com>	2019-10-04 08:58:17 +03:00
Daniel Kolesa	1bdb47b734	swscale: Replace illegal vector keyword usage in altivec code While this technically compiles in current ffmpeg, this is only because ffmpeg is compiled in strict ISO C mode, which disables the builtin 'vector' keyword for AltiVec/VSX. Instead this gets replaced with a macro inside altivec.h, which defines vector to be actually __vector, which accepts random types. Normally, the vector keyword should be used only with plain scalar non-typedef types, such as unsigned int. But we have the vec_(s\|u)(8\|16\|32) macros, which can be used in a portable manner, in util_altivec.h in libavutil. This is also consistent with other AltiVec/VSX code elsewhere in the tree. Fixes #7861. Signed-off-by: Daniel Kolesa <daniel@octaforge.org> Signed-off-by: Lauri Kasanen <cand@gmx.com>	2019-10-04 08:58:17 +03:00
Andreas Rheinhardt	e2646e23be	swscale/utils: Fix invalid left shifts of negative numbers Affected the FATE-tests vsynth_lena-dv-411, vsynth1-dv-411, vsynth2-dv-411 and hevc-paramchange-yuv420p.yuv420p10. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-09-28 17:24:32 +02:00
Andreas Rheinhardt	736c7c20e7	swscale/x86/swscale: Fix undefined left shifts of negative numbers This affected many FATE-tests: The number of failing tests went down from 663 to 344. (Both numbers exclude tests that failed because of unaligned accesses in code that is inside #if HAVE_FAST_UNALIGNED.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-09-28 17:24:32 +02:00
Limin Wang	cde1d70a39	swscale/swscale: cosmetics Signed-off-by: Limin Wang <lance.lmwang@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-09-27 10:58:30 +02:00
Paul B Mahol	21838cad2f	swscale/output: fix signed integer overflow for ya16 Fixes #7666.	2019-09-26 15:56:47 +02:00
Limin Wang	29bde4b3b6	swscale/swscale: delete unwanted assignments Signed-off-by: Limin Wang <lance.lmwang@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-09-09 18:16:06 +02:00
Linjie Fu	ef1342650f	swscale/output: fix some code indentations Signed-off-by: Linjie Fu <linjie.fu@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-09-06 22:06:12 +02:00
Chip Kerchner	3a557c5d88	lsws/ppc/yuv2rgb_altivec: Replace vec_lvsl/vec_perm with vec_xl gcc 6.x and 7.x generate wrong code for little endian machines for the vec_lvsl/vec_perm instruction combos in some cases. The bug was fixed in version 8.x If these instructions are replaced with vec_xl, the problem goes away for all versions of the compilers. Fixes ticket #7124.	2019-08-13 02:21:24 +02:00
Michael Niedermayer	80bb65fafa	Bump minor versions again on master to keep 4.2 versions separate from master Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-07-21 18:36:31 +02:00
Michael Niedermayer	22db337a40	Bump minor versions to separate 4.2 from master Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-07-21 18:36:18 +02:00
Michael Niedermayer	9d269301f0	swscale/tests/swscale: Lengthen pixfmt name buffer to 21 bytes Some formats use longer names than 12. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-05-13 13:39:49 +02:00
Adam Richter	b8ed493061	libswcale: Fix possible string overflow in test. In libswcale/tests/swcale.c, the function fileTest() calls sscanf in an argument of "%12s" on character srcStr[] and dstStr[], which are only 12 bytes. So, if the input string is 12 characters, a terminating null byte can be written past the end of these arrays. This bug was found by cppcheck. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-05-13 13:39:40 +02:00
Philip Langdale	4fa4f1d7a9	swscale: Add test for isSemiPlanarYUV to pixdesc_query Lauri had asked me what the semi planar formats were and that reminded me that we could add it to pixdesc_query so we know exactly what the list is.	2019-05-12 07:51:02 -07:00
Philip Langdale	cd48318035	swscale: Add support for NV24 and NV42 The implementation is pretty straight-forward. Most of the existing NV12 codepaths work regardless of subsampling and are re-used as is. Where necessary I wrote the slightly different NV24 versions. Finally, the one thing that confused me for a long time was the asm specific x86 path that did an explicit exclusion check for NV12. I replaced that with a semi-planar check and also updated the equivalent PPC code, which Lauri kindly checked.	2019-05-12 07:51:02 -07:00
Lauri Kasanen	e25bddf5fc	swscale/ppc: Shorten power8 tests via a var	2019-05-07 10:08:16 +03:00
Lauri Kasanen	a2a16206aa	swscale/ppc: VSX-optimize hScale16To* ./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \ -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw ./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \ -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p -nostats test.raw 32-bit mul, power8 only 2x speedup for hScale8To19_vsx (x86 SSE2 is 2.37): 30896 UNITS in hscale, 8192 runs, 0 skips 63956 UNITS in hscale, 8192 runs, 0 skips 2.06 for hScale16To15_vsx: 30531 UNITS in hscale, 8192 runs, 0 skips 63161 UNITS in hscale, 8192 runs, 0 skips	2019-05-07 10:08:16 +03:00
Lauri Kasanen	3437111f17	swscale/ppc: Indent	2019-05-07 10:08:16 +03:00
Lauri Kasanen	9456adc223	swscale/ppc: VSX-optimize hScale8To19 ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw 2.26 speedup (x86 SSE2 is 2.32): 23772 UNITS in hscale, 4096 runs, 0 skips 53862 UNITS in hscale, 4096 runs, 0 skips	2019-05-07 10:08:16 +03:00
Lauri Kasanen	d0e4d0429e	swscale/ppc: VSX-optimize hscale_fast ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \ -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw 4.27 speedup for hyscale_fast: 24796 UNITS in hyscale_fast, 4096 runs, 0 skips 5797 UNITS in hyscale_fast, 4096 runs, 0 skips 4.48 speedup for hcscale_fast: 19911 UNITS in hcscale_fast, 4095 runs, 1 skips 4437 UNITS in hcscale_fast, 4096 runs, 0 skips	2019-04-30 14:41:28 +03:00
Lauri Kasanen	ce92ee4b4f	swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2 ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 32-bit mul, power8 only. ~2x speedup: rgb24 24431 UNITS in yuv2packed2, 16384 runs, 0 skips 13783 UNITS in yuv2packed2, 16383 runs, 1 skips bgr24 24396 UNITS in yuv2packed2, 16384 runs, 0 skips 14059 UNITS in yuv2packed2, 16384 runs, 0 skips rgba 26815 UNITS in yuv2packed2, 16383 runs, 1 skips 12797 UNITS in yuv2packed2, 16383 runs, 1 skips bgra 27060 UNITS in yuv2packed2, 16384 runs, 0 skips 13138 UNITS in yuv2packed2, 16384 runs, 0 skips argb 26998 UNITS in yuv2packed2, 16384 runs, 0 skips 12728 UNITS in yuv2packed2, 16381 runs, 3 skips bgra 26651 UNITS in yuv2packed2, 16384 runs, 0 skips 13124 UNITS in yuv2packed2, 16384 runs, 0 skips This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version is also heavily inaccurate, while the vsx version has high accuracy.	2019-04-11 09:08:51 +03:00
Lauri Kasanen	8607e29fa3	swscale/ppc: VSX-optimize yuv2rgb_full_X ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 32-bit mul, power8 only. ~6.4x speedup: rgb24 214278 UNITS in yuv2packedX, 16384 runs, 0 skips 33249 UNITS in yuv2packedX, 16384 runs, 0 skips bgr24 214616 UNITS in yuv2packedX, 16384 runs, 0 skips 33233 UNITS in yuv2packedX, 16384 runs, 0 skips rgba 214517 UNITS in yuv2packedX, 16384 runs, 0 skips 33271 UNITS in yuv2packedX, 16384 runs, 0 skips bgra 214973 UNITS in yuv2packedX, 16384 runs, 0 skips 33397 UNITS in yuv2packedX, 16384 runs, 0 skips argb 214613 UNITS in yuv2packedX, 16384 runs, 0 skips 33310 UNITS in yuv2packedX, 16384 runs, 0 skips bgra 214637 UNITS in yuv2packedX, 16384 runs, 0 skips 33330 UNITS in yuv2packedX, 16384 runs, 0 skips	2019-04-07 09:20:34 +03:00
Lauri Kasanen	3256e949be	swscale/ppc: VSX-optimize yuv2rgb_full_2 ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 32-bit mul, power8 only. ~4x speedup: rgb24 52763 UNITS in yuv2packed2, 16384 runs, 0 skips 13453 UNITS in yuv2packed2, 16384 runs, 0 skips bgr24 53144 UNITS in yuv2packed2, 16384 runs, 0 skips 13616 UNITS in yuv2packed2, 16384 runs, 0 skips rgba 52796 UNITS in yuv2packed2, 16384 runs, 0 skips 12904 UNITS in yuv2packed2, 16384 runs, 0 skips bgra 52732 UNITS in yuv2packed2, 16384 runs, 0 skips 13262 UNITS in yuv2packed2, 16384 runs, 0 skips argb 52661 UNITS in yuv2packed2, 16384 runs, 0 skips 12879 UNITS in yuv2packed2, 16384 runs, 0 skips bgra 52662 UNITS in yuv2packed2, 16384 runs, 0 skips 12932 UNITS in yuv2packed2, 16384 runs, 0 skips	2019-04-07 09:20:33 +03:00
Lauri Kasanen	50e672bc54	swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1 ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \ -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 32-bit mul, power8 only. 1.8-2.3x speedup: rgb24 18192 UNITS in yuv2packed1, 32767 runs, 1 skips 9983 UNITS in yuv2packed1, 32760 runs, 8 skips bgr24 18665 UNITS in yuv2packed1, 32766 runs, 2 skips 9925 UNITS in yuv2packed1, 32763 runs, 5 skips rgba 20239 UNITS in yuv2packed1, 32767 runs, 1 skips 8794 UNITS in yuv2packed1, 32759 runs, 9 skips bgra 20354 UNITS in yuv2packed1, 32768 runs, 0 skips 8770 UNITS in yuv2packed1, 32761 runs, 7 skips argb 20185 UNITS in yuv2packed1, 32768 runs, 0 skips 8761 UNITS in yuv2packed1, 32761 runs, 7 skips bgra 20360 UNITS in yuv2packed1, 32766 runs, 2 skips 8759 UNITS in yuv2packed1, 32764 runs, 4 skips This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version is also heavily inaccurate, while the vsx version has high accuracy.	2019-04-07 09:20:31 +03:00
Lauri Kasanen	7adce3e64c	swscale/ppc: VSX-optimize yuv2422_X ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 7.2x speedup: yuyv422 126354 UNITS in yuv2packedX, 16384 runs, 0 skips 16383 UNITS in yuv2packedX, 16382 runs, 2 skips yvyu422 117669 UNITS in yuv2packedX, 16384 runs, 0 skips 16271 UNITS in yuv2packedX, 16379 runs, 5 skips uyvy422 117310 UNITS in yuv2packedX, 16384 runs, 0 skips 16226 UNITS in yuv2packedX, 16382 runs, 2 skips	2019-03-31 12:41:34 +03:00
Lauri Kasanen	9a2db4dc61	swscale/ppc: VSX-optimize yuv2422_2 ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \ -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 5.1x speedup: yuyv422 19339 UNITS in yuv2packed2, 16384 runs, 0 skips 3718 UNITS in yuv2packed2, 16383 runs, 1 skips yvyu422 19438 UNITS in yuv2packed2, 16384 runs, 0 skips 3800 UNITS in yuv2packed2, 16380 runs, 4 skips uyvy422 19128 UNITS in yuv2packed2, 16384 runs, 0 skips 3721 UNITS in yuv2packed2, 16380 runs, 4 skips	2019-03-31 12:41:33 +03:00
Lauri Kasanen	a6a31ca3d9	swscale/ppc: VSX-optimize yuv2422_1 ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - 15.3x speedup: yuyv422 14513 UNITS in yuv2packed1, 32768 runs, 0 skips 949 UNITS in yuv2packed1, 32767 runs, 1 skips yvyu422 14516 UNITS in yuv2packed1, 32767 runs, 1 skips 943 UNITS in yuv2packed1, 32767 runs, 1 skips uyvy422 14530 UNITS in yuv2packed1, 32767 runs, 1 skips 941 UNITS in yuv2packed1, 32766 runs, 2 skips	2019-03-31 12:41:32 +03:00
Michael Niedermayer	8865ae959b	swscale/swscale_unscaled: Fix chroma slice height Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-03-28 22:47:32 +01:00
Dong, Jerry	c47fada298	swscale/swscale_unscaled: fixed the issue that when width/height is not 2-multiple, transition of nv12 to u/v planes is not completed. Signed-off-by: Dong, Jerry <jerry.dong@intel.com> Signed-off-by: Decai Lin <decai.lin@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-03-28 20:28:43 +01:00
Lauri Kasanen	681957b88d	swscale/ppc: VSX-optimize yuv2rgb_full ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \ -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \ -cpuflags 0 -v error - This uses 32-bit mul, so POWER8 only. The following output formats get about 4.5x speedup: rgb24 39980 UNITS in yuv2packed1, 32768 runs, 0 skips 8774 UNITS in yuv2packed1, 32768 runs, 0 skips bgr24 40069 UNITS in yuv2packed1, 32768 runs, 0 skips 8772 UNITS in yuv2packed1, 32766 runs, 2 skips rgba 39759 UNITS in yuv2packed1, 32768 runs, 0 skips 8681 UNITS in yuv2packed1, 32767 runs, 1 skips bgra 39729 UNITS in yuv2packed1, 32768 runs, 0 skips 8696 UNITS in yuv2packed1, 32766 runs, 2 skips argb 39766 UNITS in yuv2packed1, 32768 runs, 0 skips 8672 UNITS in yuv2packed1, 32766 runs, 2 skips bgra 39784 UNITS in yuv2packed1, 32768 runs, 0 skips 8659 UNITS in yuv2packed1, 32767 runs, 1 skips	2019-03-27 09:05:08 +02:00
Lauri Kasanen	81a4719d8e	swscale: Remove duplicated code In this function, the exact same clamping happens both in the if and unconditionally.	2019-03-27 09:00:06 +02:00
Lauri Kasanen	6b5ea90eac	swscale/ppc: Add av_unused to template vars only used in one includer	2019-03-20 10:21:55 +02:00
Lauri Kasanen	ac3062f1a4	swscale/ppc: Clean up some mixed decl warnings	2019-03-20 10:21:53 +02:00
Lauri Kasanen	8522d219ce	libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \ -s 1920x1728 -f null -vframes 100 -v error -nostats - 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x. Fate passes, each format tested with an image to video conversion. Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out of the 16-bit function. This includes the vec_mulo/mule functions too, not just vmuluwm. With TIMER_REPORT skips disabled: yuv420p9le 12412 UNITS in planarX, 131072 runs, 0 skips 73136 UNITS in planarX, 131072 runs, 0 skips yuv420p9be 12481 UNITS in planarX, 131072 runs, 0 skips 73410 UNITS in planarX, 131072 runs, 0 skips yuv420p10le 12322 UNITS in planarX, 131072 runs, 0 skips 72546 UNITS in planarX, 131072 runs, 0 skips yuv420p10be 12291 UNITS in planarX, 131072 runs, 0 skips 72935 UNITS in planarX, 131072 runs, 0 skips yuv420p12le 12316 UNITS in planarX, 131072 runs, 0 skips 72708 UNITS in planarX, 131072 runs, 0 skips yuv420p12be 12319 UNITS in planarX, 131072 runs, 0 skips 72577 UNITS in planarX, 131072 runs, 0 skips yuv420p14le 12259 UNITS in planarX, 131072 runs, 0 skips 72516 UNITS in planarX, 131072 runs, 0 skips yuv420p14be 12440 UNITS in planarX, 131072 runs, 0 skips 72962 UNITS in planarX, 131072 runs, 0 skips yuv420p16le 10548 UNITS in planarX, 131072 runs, 0 skips 73429 UNITS in planarX, 131072 runs, 0 skips yuv420p16be 10634 UNITS in planarX, 131072 runs, 0 skips 150959 UNITS in planarX, 131072 runs, 0 skips Signed-off-by: Lauri Kasanen <cand@gmx.com>	2019-02-05 09:34:53 +02:00
Michael Niedermayer	fe17f9b956	swscale/yuv2rgb: Return a more specific error code from ff_yuv2rgb_c_init_tables() Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2019-01-01 21:11:47 +01:00
Lauri Kasanen	8dd9df9ecd	swscale/output: Altivec-optimize float yuv2plane1 This function wouldn't benefit from VSX instructions, so I put it under altivec. ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \ -f null -vframes 100 -v error -nostats - 3743 UNITS in planar1, 65495 runs, 41 skips -cpuflags 0 23511 UNITS in planar1, 65530 runs, 6 skips grayf32be 4647 UNITS in planar1, 65449 runs, 87 skips -cpuflags 0 28608 UNITS in planar1, 65530 runs, 6 skips The native speedup is 6.28133, and the bswapping one 6.15623. Fate passes, each format tested with an image to video conversion. Signed-off-by: Lauri Kasanen <cand@gmx.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-12-26 20:28:58 +01:00
Lauri Kasanen	b4c8c03b00	swscale/output: VSX-optimize 16-bit yuv2plane1 ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16le \ -f null -vframes 100 -v error -nostats - 2120 UNITS in planar1, 65393 runs, 143 skips -cpuflags 0 19157 UNITS in planar1, 65512 runs, 24 skips 9.03632 speedup, 16be similarly. Fate passes, each format tested with an image to video conversion. Signed-off-by: Lauri Kasanen <cand@gmx.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-12-14 19:09:11 +01:00
Lauri Kasanen	1046cba24b	swscale/output: VSX-optimize nbps yuv2plane1 ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \ -f null -vframes 100 -v error -nostats - Speedups: yuv2plane1_9BE_vsx 11.2042 yuv2plane1_9LE_vsx 11.156 yuv2plane1_10BE_vsx 9.89428 yuv2plane1_10LE_vsx 10.3637 yuv2plane1_12BE_vsx 9.71923 yuv2plane1_12LE_vsx 11.0404 yuv2plane1_14BE_vsx 10.1763 yuv2plane1_14LE_vsx 11.2728 Fate passes, each format tested with an image to video conversion. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-12-12 01:56:57 +01:00
Lauri Kasanen	78c7ff7d25	swscale/ppc: Move VSX-using code to its own file Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied). Signed-off-by: Lauri Kasanen <cand@gmx.com> Tested-by: Michael Kostylev on BE Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-12-04 02:59:07 +01:00
Lauri Kasanen	46c5693ea3	swscale/output: Altivec-optimize yuv2plane1_8 ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \ -f null -vframes 100 -v error -nostats - 1158 UNITS in planar1, 65528 runs, 8 skips -cpuflags 0 19082 UNITS in planar1, 65533 runs, 3 skips 16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version takes as many cycles as the x86 SSE2 version, yikes it's fast. Note that this function uses VSX instructions, but is not marked so. This is because several existing functions also make that mistake. I'll submit a patch moving them once this is reviewed. Signed-off-by: Lauri Kasanen <cand@gmx.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-11-26 02:56:25 +01:00
Martin Vignali	86e6f0dbc7	swscale : add support for YUVA444P12 and YUVA422P12	2018-11-24 16:24:47 +01:00
Michael Niedermayer	517573a670	Bump minor version for master after 4.1 branchpoint Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-11-02 00:53:07 +01:00
Michael Niedermayer	780d5e30a0	Bump minor versions for branching 4.1 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-11-02 00:15:32 +01:00
Martin Vignali	156120fcf8	swscale/swscale_unscaled : rename packed_16bpc_bswap is used for packed and planar format	2018-10-24 21:21:20 +02:00
Martin Vignali	26bf4a4050	swscale/unscaled : add grayf32 le to be	2018-10-24 21:21:14 +02:00
Martin Vignali	3db33b446f	swscale/utils : simplify unscaled initial test for float pixfmt	2018-10-24 21:21:10 +02:00
Martin Vignali	db4771af81	swscale : add YA16 LE/BE output	2018-10-18 21:43:24 +02:00
Martin Vignali	658bbc0060	swscale/x86/rgb2rgb.asm : add Ivo Van Poorten name to the top of the file suggested by Carl Eugen Hoyos	2018-10-18 21:43:19 +02:00
Martin Vignali	296609f859	swscale/x86/rgb2rgb : port shuffle 2103 mmxext to external asm and remove inline asm version	2018-10-13 14:12:41 +02:00
Martin Vignali	04afdbb560	swscale/x86/rgb2rgb : remove mmx version for shuffle2103	2018-10-13 14:12:36 +02:00
Paul B Mahol	931e7c050e	swscale/swscale_unscaled: add gbrap -> packed rgb path	2018-09-09 22:58:26 +02:00
Martin Vignali	bdd6754648	swscale/swscale : small cosmetic	2018-08-22 11:36:15 +02:00
Martin Vignali	3af1c4ea7d	swscale : treat float input data as uint 16bpc Currently float are converted to 16b uint in input part using src depth (32 bits) in hScale16To19 and hScale16to15, make an invalid shift for the data So shift the value when using float input like 16 bpc uint.	2018-08-22 11:36:09 +02:00
Sergey Lavrushkin	582bc5a348	libswscale: Adds conversions from/to float gray format. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-08-14 18:22:39 +02:00
Carl Eugen Hoyos	3a56ade1f3	lsws/rgb2rgb_template: Do not compile unneeded shuffle functions on big-endian. Fixes the following warnings: In file included from libswscale/rgb2rgb.c:128:0: libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3210_c' defined but not used libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3012_c' defined but not used libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_1230_c' defined but not used	2018-06-10 03:22:59 +02:00
Paul B Mahol	b9dd058f7a	swscale: add gray14 support Signed-off-by: Paul B Mahol <onemda@gmail.com>	2018-05-05 21:35:31 +02:00
Martin Vignali	07a566e7d6	swscale/swscale_unscaled : add X86_64 (SSE2 and AVX) for uyvyto422 and checkasm test	2018-04-22 19:15:32 +02:00
Michael Niedermayer	3c1ecb057d	Bump minor versions after release/4.0 branching Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-04-16 12:35:12 +02:00
Michael Niedermayer	7e3a070d9a	Bump minor versions for branching release/4.0 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-04-16 12:35:12 +02:00
wm4	d6fc031caf	avutil/pixdesc: deprecate AV_PIX_FMT_FLAG_PSEUDOPAL PSEUDOPAL pixel formats are not paletted, but carried a palette with the intention of allowing code to treat unpaletted formats as paletted. The palette simply mapped the byte values to the resulting RGB values, making it some sort of LUT for RGB conversion. It was used for 1 byte formats only: RGB4_BYTE, BGR4_BYTE, RGB8, BGR8, GRAY8. The first 4 are awfully obscure, used only by some ancient bitmap formats. The last one, GRAY8, is more common, but its treatment is grossly incorrect. It considers full range GRAY8 only, so GRAY8 coming from typical Y video planes was not mapped to the correct RGB values. This cannot be fixed, because AVFrame.color_range can be freely changed at runtime, and there is nothing to ensure the pseudo palette is updated. Also, nothing actually used the PSEUDOPAL palette data, except xwdenc (trivially changed in the previous commit). All other code had to treat it as a special case, just to ignore or to propagate palette data. In conclusion, this was just a very strange old mechnaism that has no real justification to exist anymore (although it may have been nice and useful in the past). Now it's an artifact that makes the API harder to use: API users who allocate their own pixel data have to be aware that they need to allocate the palette, or FFmpeg will crash on them in _some_ situations. On top of this, there was no API to allocate the pseuo palette outside of av_frame_get_buffer(). This patch not only deprecates AV_PIX_FMT_FLAG_PSEUDOPAL, but also makes the pseudo palette optional. Nothing accesses it anymore, though if it's set, it's propagated. It's still allocated and initialized for compatibility with API users that rely on this feature. But new API users do not need to allocate it. This was an explicit goal of this patch. Most changes replace AV_PIX_FMT_FLAG_PSEUDOPAL with FF_PSEUDOPAL. I first tried #ifdefing all code, but it was a mess. The FF_PSEUDOPAL macro reduces the mess, and still allows defining FF_API_PSEUDOPAL to 0. Passes FATE with FF_API_PSEUDOPAL enabled and disabled. In addition, FATE passes with FF_API_PSEUDOPAL set to 1, but with allocation functions manually changed to not allocating a palette.	2018-04-03 17:53:00 +02:00
Martin Storsjö	f33f728470	arm: swscale: Only compile the rgb2yuv asm if .dn aliases are supported Vanilla clang supports altmacro since clang 5.0, and thus doesn't require gas-preprocessor for building the arm assembly any longer. However, the built-in assembler doesn't support .dn directives. This readds checks that were removed in `d7320ca3ed`, when the last usage of .dn directives within libav were removed. Alternatively, the assembly could be rewritten to not use the .dn directive, making it available to clang users. Signed-off-by: Martin Storsjö <martin@martin.st>	2018-03-31 21:54:56 +03:00

1 2 3 4 5 ...

2331 Commits