1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

2639 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
6269c4a440 swscale/rgb2rgb: unroll RISC-V V uyvytoyuv422 2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont
e50f8e861b swscale/rgb2rgb: avoid S-regs in RISC-V V uyvytoyuv422
We can make do with callee-clobbered registers only now.
As an added bonus, this makes the code XLEN-independent.
2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont
be37a2e364 swscale/rgb2rgb: rework RISC-V V uyvytoyuv422
This avoids using relatively slow register strides.
2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont
1a4bd76ea5 swscale/rgb2rgb: remove R-V V shuffle_bytes_3012
This is slower than the Zbb version on real hardware due to register
strides. Proper support for vector byte-swap requires the Zvbb
extension, but it's much too early for me to worry about it.
2023-10-02 22:28:38 +03:00
Rémi Denis-Courmont
c4a144c29d swscale/rgb2rgb: add R-V Zbb shuffle_bytes_3210 2023-10-02 22:28:25 +03:00
Paul B Mahol
29b673bdcf swscale: add GBRAP14 format support 2023-09-28 19:37:58 +02:00
Andreas Rheinhardt
f8503b4c33 avutil/internal: Don't auto-include emms.h
Instead include emms.h wherever it is needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-04 11:04:45 +02:00
L. E. Segovia
ddc1cd5cdd configure: Set WIN32_LEAN_AND_MEAN at configure time
Including winsock2.h or windows.h without WIN32_LEAN_AND_MEAN cause
bzlib.h to parse as nonsense, due to an instance of #define char small
in rpcndr.h.

See:

https://stackoverflow.com/a/27794577

Signed-off-by: L. E. Segovia <amy@amyspark.me>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-08-14 22:57:28 +03:00
Rémi Denis-Courmont
c2b38619c0 swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{1230,3012}
This avoids strided loads.

Before:
shuffle_bytes_1230_rvv_i32: 308.7
shuffle_bytes_3012_rvv_i32: 308.7

After:
shuffle_bytes_1230_rvv_i32: 46.7
shuffle_bytes_3012_rvv_i32: 46.7
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont
15982554e6 swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{0321,2103}
This avoids strided loads.

Before:
shuffle_bytes_0321_rvv_i32: 307.7
shuffle_bytes_2103_rvv_i32: 308.7

After:
shuffle_bytes_0321_rvv_i32: 59.7
shuffle_bytes_2103_rvv_i32: 61.5
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont
d3948e4db5 swscale: inline ff_shuffle_bytes_3210_rvv
No functional changes.
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont
b6585eb04c lavu: add/use flag for RISC-V Zba extension
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
2023-07-19 19:29:35 +03:00
Khem Raj
a7b3c0203f libswscale/riscv: fix syntax of vsetvli
Add missing operand which clang complains about but GCC assumes it to be
'm1' if not specified.

Works around build failure with Clang:
| src/libswscale/riscv/rgb2rgb_rvv.S:88:25: error: operand must be e[8|16|32|64|128|256|512|1024],m[1|2|4|8|f2|f4|f8],[ta|tu],[ma|mu]
|         vsetvli t4, t3, e8, ta, ma
|                         ^

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-07-13 22:01:24 +03:00
Lynne
b3fb73af6b
swscale: bump minor for implementing support for the new pixfmts 2023-05-29 00:42:02 +02:00
Lynne
934525eae0
lsws: add in/out support for the new 12-bit 2-plane 422 and 444 pixfmts 2023-05-29 00:41:35 +02:00
Jin Bo
cb4ae8baee
swscale/la: Add following builtin optimized functions
yuv420_rgb24_lsx
yuv420_bgr24_lsx
yuv420_rgba32_lsx
yuv420_argb32_lsx
yuv420_bgra32_lsx
yuv420_abgr32_lsx
./configure --disable-lasx
ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo
-pix_fmt rgb24 -y /dev/null -an
before: 184fps
after:  207fps

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-05-25 21:05:15 +02:00
Lu Wang
4501b1dfd7
swscale/la: Optimize the functions of the swscale series with lsx.
./configure --disable-lasx
ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480
-pix_fmt bgra -y /dev/null -an
before: 91fps
after:  160fps

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-05-25 21:05:08 +02:00
Lynne
a62a3930c2
swscale/ppc: remove hScale8To19_vsx
Fails checkasm on a Power9 system.
2023-05-20 20:07:18 +02:00
Michael Niedermayer
47ac3e6065
version.h: Bump minor post 6.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-02-19 18:37:36 +01:00
Michael Niedermayer
62efa096af
version.h: Bump minor for 6.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-02-19 18:32:07 +01:00
James Almer
5bad485603 Bump major versions of all libraries
Signed-off-by: James Almer <jamrial@gmail.com>
2023-02-09 15:35:14 +01:00
Tomas Härdin
a678b0c252 sws/utils.c: Do not uselessly call initFilter() when unscaling 2023-02-08 15:53:55 +01:00
Lynne
bbe95f7353
x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
Andreas Rheinhardt
1ff9c07fa6 swscale/utils: Fix indentation
Forgotten after c1eb3e7fec.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 21:02:57 +01:00
Andreas Rheinhardt
b2d1a25816 swscale/utils: Derive range from YUVJ-pix-fmt only once
Currently, it is done once per slice-thread, leading to
one warning per slice-thread in case a YUVJ pixel format
has been originally used.

This also fixes the anomaly that said parameter are only
updated for the user-facing context (whose values are retrievable
via av_opt_get()) if slice-threading is not in use.

Fixes ticket #9860.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:59:03 +01:00
Andreas Rheinhardt
ff39dcb129 swscale/utils: Move functions to avoid forward declarations
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:58:21 +01:00
Andreas Rheinhardt
baccc1c541 swscale/utils: Avoid calling ff_thread_once() unnecessarily
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:58:21 +01:00
Andreas Rheinhardt
8ee0711228 swscale/utils: Don't allocate AVFrames for slice contexts
Only the parent context's AVFrames are ever used.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:58:21 +01:00
Andreas Rheinhardt
64ed1d40df swscale/utils: Factor initializing single slice context out
Initializing slice threads currently uses the function
(sws_init_context()) that is also used for initializing
user-facing contexts with the only difference being that
nb_threads is set to one before initializing the slice contexts.

Yet sws_init_context() also initializes lots of stuff
that is not slice-dependent, i.e. (src|dst)Range. This
currently only works because the code sets these fields
to the same values for all slice contexts. This is not
nice; even worse, it entails that log messages are printed
once per slice context (and therefore fill the screen).

This commit lays the groundwork to fix this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-24 20:58:21 +01:00
Michael Niedermayer
ba209e3d51
swscale/input: Use more unsigned intermediates
Same principle as previous commit, with sufficiently huge rgb2yuv table
values this produces wrong results and undefined behavior.
The unsigned produces the same incorrect results. That is probably
ok as these cases with huge values seem not to occur in any real
use case.

Fixes: signed integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-11-20 21:55:06 +01:00
Jeremy Dorfman
ce566281f9
swscale/input: Use unsigned intermediates in rgb64ToUV_c_template
Large rgb2yuv tables and high pixel values cause the intermediate
int32_t of ru*r + gu*g + bu*b to exceed INT_MAX, which is undefined
behavior. This causes libswscale built with LLVM -fsanitize=undefined to
assert. Using unsigned integers instead has defined behavior and
produces identical results, and makes rgb64ToUV_c_template match
rgb64ToY_c_template.

Fixes: signed integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-11-20 21:23:57 +01:00
Andreas Rheinhardt
b616b04704 swscale/utils: Remove obsolete 3DNow reference
swscale does not use 3DNow any more since commit
608319a311.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-09 17:39:00 +01:00
Michael Niedermayer
b74f89caae
swscale/output: Bias 16bps output calculations to improve non overflowing range for GBRP16/GBRPF32
Fixes: integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-11-04 22:44:16 +01:00
Michael Niedermayer
0f0afc7fb5
swscale/output: Bias 16bps output calculations to improve non overflowing range
Fixes: integer overflow
Fixes: ./ffmpeg   -f rawvideo -video_size 66x64 -pixel_format yuva420p10le   -i ~/videos/overflow_input_w66h64.yuva420p10le   -filter_complex "scale=flags=bicubic+full_chroma_int+full_chroma_inp+bitexact+accurate_rnd:in_color_matrix=bt2020:out_color_matrix=bt2020:in_range=full:out_range=full,format=rgba64[out]"   -pixel_format rgba64 -map '[out]'   -y overflow_w66h64.png

Found-by: Drew Dunne <asdunne@google.com>
Tested-by: Drew Dunne <asdunne@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-11-04 22:44:16 +01:00
Hubert Mazur
2537fdc510 sw_scale: Add specializations for hscale 16 to 19
Provide arm64 neon optimized implementations for hscale16To19 with
filter sizes 4, 8 and X4.

The tests and benchmarks run on AWS Graviton 2 instances.
The results from a checkasm tool are shown below.

hscale_16_to_19__fs_4_dstW_512_c: 6216.0
hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
hscale_16_to_19__fs_8_dstW_512_c: 10417.7
hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
hscale_16_to_19__fs_12_dstW_512_c: 14890.5
hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
hscale_16_to_19__fs_16_dstW_512_c: 19006.5
hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
hscale_16_to_19__fs_32_dstW_512_c: 36629.5
hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
hscale_16_to_19__fs_40_dstW_512_c: 45477.5
hscale_16_to_19__fs_40_dstW_512_neon: 11552.0

(Note, the checkasm tests for these functions haven't been
merged since they fail on x86.)

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-11-01 15:24:58 +02:00
Hubert Mazur
9ccf8c5bfc sw_scale: Add specializations for hscale 16 to 15
Add arm64 neon implementations for hscale 16 to 15 with filter
sizes 4, 8 and X4.

The tests and benchmarks run on AWS Graviton 2 instances.
The results from a checkasm tool are shown below.

hscale_16_to_15__fs_4_dstW_512_c: 6703.5
hscale_16_to_15__fs_4_dstW_512_neon: 2298.0
hscale_16_to_15__fs_8_dstW_512_c: 10983.0
hscale_16_to_15__fs_8_dstW_512_neon: 3216.5
hscale_16_to_15__fs_12_dstW_512_c: 15526.0
hscale_16_to_15__fs_12_dstW_512_neon: 3993.0
hscale_16_to_15__fs_16_dstW_512_c: 20183.5
hscale_16_to_15__fs_16_dstW_512_neon: 5369.7
hscale_16_to_15__fs_32_dstW_512_c: 39315.2
hscale_16_to_15__fs_32_dstW_512_neon: 9511.2
hscale_16_to_15__fs_40_dstW_512_c: 48995.7
hscale_16_to_15__fs_40_dstW_512_neon: 11570.0

(Note, the checkasm tests for these functions haven't been
merged since they fail on x86.)

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-11-01 15:24:53 +02:00
Hubert Mazur
1e9cfa5bb0 sw_scale: Add specializations for hscale 8 to 19
Add arm64 neon implementations for hscale 8 to 19 with filter
sizes 4, 4X and 8. Both implementations are based on very similar ones
dedicated to hscale 8 to 15. The major changes refer to saving
the data - instead of writing the result as int16_t it is done
with int32_t.

These functions are heavily inspired on patches provided by J. Swinney
and M. Storsjö for hscale8to15 which were slightly adapted for
hscale8to19.

The tests and benchmarks run on AWS Graviton 2 instances. The results
from a checkasm tool shown below.

hscale_8_to_19__fs_4_dstW_512_c: 5663.2
hscale_8_to_19__fs_4_dstW_512_neon: 1259.7
hscale_8_to_19__fs_8_dstW_512_c: 9306.0
hscale_8_to_19__fs_8_dstW_512_neon: 2020.2
hscale_8_to_19__fs_12_dstW_512_c: 12932.7
hscale_8_to_19__fs_12_dstW_512_neon: 2462.5
hscale_8_to_19__fs_16_dstW_512_c: 16844.2
hscale_8_to_19__fs_16_dstW_512_neon: 4671.2
hscale_8_to_19__fs_32_dstW_512_c: 32803.7
hscale_8_to_19__fs_32_dstW_512_neon: 5474.2
hscale_8_to_19__fs_40_dstW_512_c: 40948.0
hscale_8_to_19__fs_40_dstW_512_neon: 6669.7

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-11-01 15:24:43 +02:00
Martin Storsjö
cb803a0072 swscale: aarch64: Fix yuv2rgb with negative strides
Treat the 32 bit stride registers as signed.

Alternatively, we could make the stride arguments ptrdiff_t instead
of int, and changing all of the assembly to operate on these
registers with their full 64 bit width, but that's a much larger
and more intrusive change (and risks missing some operation, which
would clamp the intermediates to 32 bit still).

Fixes: https://trac.ffmpeg.org/ticket/9985

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-10-27 21:49:26 +03:00
Marvin Scholz
4aa04c255d swscale: document some missing arguments 2022-10-17 09:56:47 +02:00
Marvin Scholz
aba8cf654f swscale: Fix bogus doxy comment #ifdefs
The intention here was probably to document this as use of
conditionals does not make sense in a comment.

Fixes doxy warning:
  warning: explicit link request to 'if' could not be resolved
2022-10-17 09:55:19 +02:00
Chema Gonzalez
bf64a75c5a libswscale: force a minimum size of the slide for bayer sources
Bayer sources are read in groups of 2 lines (e.g. for a
BGGR flavor, the first row contains only B and G samples,
while the second row contains only G and R samples). They
need to be read as a whole.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-10-14 12:19:13 +02:00
Rémi Denis-Courmont
a1bfb5290e sws/rgb2rgb: RISC-V 64-bit V packed YUYV/UYVY to planar 4:2:2
This is currently 64-bit only because the stack spilling code would not
assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory).

This could be added later in the unlikely that someone wants it.
2022-09-30 07:25:44 +02:00
Rémi Denis-Courmont
9181835a24 sws/rgb2rgb: RISC-V V interleaveBytes 2022-09-30 07:24:09 +02:00
Rémi Denis-Courmont
66a03f4053 sws/rgb2rgb: RISC-V V shuffle_bytes_xxxx functions 2022-09-30 07:24:09 +02:00
Andreas Rheinhardt
888a02a126 swscale/output: Don't call av_pix_fmt_desc_get() in a loop
Up until now, libswscale/output.c used a macro to write
an output pixel which involved a call to av_pix_fmt_desc_get()
to find out whether the input pixel format is BE or LE
despite this being known at compile-time (there are templates
per pixfmt). Even worse, these calls are made in a loop,
so that e.g. there are eight calls to av_pix_fmt_desc_get()
for every pixel processed in yuv2rgba64_X_c_template()
for 64bit RGB formats.

This commit modifies these macros to ensure that isBE()
is evaluated at compile-time. This saved 41184B of .text
for me (GCC 11.2, -O3). Of course, it also improved performance.
E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \
-threads 1  -t 1:00  -f null - (which uses yuv2rgba64le_X_c,
which is an invocation of yuv2rgba64_X_c_template() mentioned above),
performance improved from 95589 to 41387 decicycles for one call
to yuv2packedX; for the be variant the numbers went down from
76087 to 43024 decicycles.

Reviewed-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-19 23:40:41 +02:00
Andreas Rheinhardt
4d7a1a4619 swscale/input: Avoid calls to av_pix_fmt_desc_get()
Up until now, libswscale/input.c used a macro to read
an input pixel which involved a call to av_pix_fmt_desc_get()
to find out whether the input pixel format is BE or LE
despite this being known at compile-time (there are templates
per pixfmt). Even worse, these calls are made in a loop,
so that e.g. there are six calls to av_pix_fmt_desc_get()
for every pair of UV pixel processed in
rgb64ToUV_half_c_template().

This commit modifies these macros to ensure that isBE()
is evaluated at compile-time. This saved 9743B of .text
for me (GCC 11.2, -O3). For a simple RGB64LE->YUV420P
transformation like
ffmpeg -f lavfi -i haldclutsrc,format=rgba64le -pix_fmt yuv420p \
-threads 1  -t 1:00  -f null -
the amount of decicycles spent in rgb64LEToUV_half_c
(which is created via the template mentioned above)
decreases from 19751 to 5341; for RGBA64BE the number
went down from 11945 to 5393. For shared builds (where
the call to av_pix_fmt_desc_get() is indirect) the old numbers
are 15230 for RGBA64BE and 27502 for RGBA64LE, whereas
the numbers with this patch are indistinguishable from
the numbers from a static build.

Also make the macros that are touched conform to the
usual convention of using uppercase names while just at it.

Reviewed-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-19 23:40:41 +02:00
Hao Chen
925ac0da32
swscale/la: Add output_lasx.c file.
ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -pix_fmt
rgb24 -y /dev/null -an
before: 150fps
after:  183fps

Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 22:56:39 +02:00
Hao Chen
74d09b068d
swscale/la: Add yuv2rgb_lasx.c and rgb2rgb_lasx.c files
ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -pix_fmt rgb24 -y /dev/null -an
before: 178fps
after:  210fps

Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 22:56:38 +02:00
Hao Chen
38cacce22a
swscale/la: Optimize hscale functions with lasx.
ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -y /dev/null -an
before: 101fps
after:  138fps

Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 22:56:38 +02:00
Philip Langdale
09a8e5debb swscale/output: add support for Y210LE and Y212LE 2022-09-10 12:29:12 -07:00
Philip Langdale
68181623e9 swscale/output: add support for XV30LE 2022-09-10 12:29:12 -07:00
Philip Langdale
366f073c62 swscale/output: add support for XV36LE 2022-09-10 12:29:12 -07:00
Philip Langdale
caf8d4d256 swscale/output: add support for P012
This generalises the existing P010 support.
2022-09-10 12:29:12 -07:00
Andreas Rheinhardt
d2428d80ce swscale/input: Remove spec-incompliant ';'
These macros are definitions, not only declarations and therefore
should not contain a semicolon. Such a semicolon is actually
spec-incompliant, but compilers happen to accept them.

Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-08 19:21:30 +02:00
Philip Langdale
4a59eba227 swscale/input: add support for Y212LE 2022-09-06 12:49:10 -07:00
Philip Langdale
198b5b90d5 swscale/input: add support for XV30LE 2022-09-06 12:49:10 -07:00
Philip Langdale
5bdd726115 swscale/input: add support for P012
As we now have three of these formats, I added macros to generate the
conversion functions.
2022-09-06 12:49:10 -07:00
Philip Langdale
8d9462844a swscale/input: add support for XV36LE 2022-09-06 12:49:10 -07:00
Philip Langdale
45726aa117 libswscale: add support for VUYX format
As we already have support for VUYA, I figured I should do the small
amount of work to support VUYX as well. That means a little refactoring
to share code.
2022-08-25 19:03:49 -07:00
Andreas Rheinhardt
de33506e4b swscale/x86/rgb_2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext
Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr
filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x
filter-paletteuse-bayer filter-paletteuse-bayer0
filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests
when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to
"mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten
when using SSSE3).

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-23 12:21:00 +02:00
Timo Rothenpieler
aca569aad2 swscale/input: add rgbaf16 input support
This is by no means perfect, since at least ddagrab will return scRGB
data with values outside of 0.0f to 1.0f for HDR values.
Its primary purpose is to be able to work with the format at all.
2022-08-19 22:09:36 +02:00
Timo Rothenpieler
f2de911818 swscale: add opaque parameter to input functions 2022-08-19 22:09:36 +02:00
Andreas Rheinhardt
8bec225c3c swscale/x86/yuv2yuvX: Remove unused ff_yuv2yuvX_mmx()
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-19 12:01:34 +02:00
Alan Kelly
a38293e444 libswscale: Enable hscale_avx2 for all input sizes.
ff_shuffle_filter_coefficients shuffles the tail as required.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-08-18 16:24:48 +02:00
Alan Kelly
a6724285fd sws: allow avx2 hscale to process inputs of any size.
The main loop processes blocks of 16 pixels. The tail processes blocks
of size 4.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-08-18 16:24:48 +02:00
Alan Kelly
51a34e8525 sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 16:19:13 +02:00
Swinney, Jonathan
0d7caa5b09 swscale/aarch64: add vscale specializations
This commit adds new code paths for vscale when filterSize is 2, 4, or
8. By using specialized code with unrolling to match the filterSize we
can improve performance.

On AWS c7g (Graviton 3, Neoverse V1) instances:
                                 before   after
yuv2yuvX_2_0_512_accurate_neon:  558.8    268.9
yuv2yuvX_4_0_512_accurate_neon:  637.5    434.9
yuv2yuvX_8_0_512_accurate_neon:  1144.8   806.2
yuv2yuvX_16_0_512_accurate_neon: 2080.5   1853.7

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 13:40:42 +03:00
Swinney, Jonathan
3e708722a2 swscale/aarch64: vscale optimization
Use scalar times vector multiply accumlate instructions instead of
vector times vector to remove the need for replicating load instructions
which are slightly slower.

On AWS c7g (Graviton 3, Neoverse V1) instances:
yuv2yuvX_8_0_512_accurate_neon:  1144.8  987.4
yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 13:40:42 +03:00
Swinney, Jonathan
4dcd191a50 checkasm: updated tests for sw_scale
Change the reference to exactly match the C reference in swscale,
instead of exactly matching the x86 SIMD implementations (which
differs slightly). Test with and without SWS_ACCURATE_RND - if this
flag isn't set, the output must match the C reference exactly,
otherwise it is allowed to be off by 2.

Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND
is set - apparently this discrepancy hasn't been noticed in other
exact tests before.

Add a test for yuv2plane1.

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 13:40:42 +03:00
Swinney, Jonathan
75ffca7eef libswscale/aarch64: add another hscale specialization
This specialization handles the case where filtersize is 4 mod 8, e.g.
12, 20, etc. Aarch64 was previously using the c function for this case.
This implementation speeds up that case significantly.

hscale_8_to_15__fs_12_dstW_512_c: 6234.1
hscale_8_to_15__fs_12_dstW_512_neon: 1505.6

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 12:08:38 +03:00
Timo Rothenpieler
b77fff47d0 configure: always enable gnu_windres if available
Use the appropiate Makefile variable to ensure the resource file is
only built into shared libraries instead.
2022-08-13 14:42:36 +02:00
James Almer
68e017c487 swscale/output: fix reading chroma values when generating vuya output
Signed-off-by: James Almer <jamrial@gmail.com>
2022-08-08 09:39:33 -03:00
James Almer
1974813261 swscale/output: add VUYA output support
Signed-off-by: James Almer <jamrial@gmail.com>
2022-08-07 09:33:16 -03:00
James Almer
f0abd07996 swscale/input: add VUYA input support
Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-08-05 09:39:21 -03:00
Andreas Rheinhardt
da668fa7d2 swscale/rgb2rgb: Don't cast const away
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-31 01:09:52 +02:00
Matthieu Bouron
0a6bb7da55 swscale: add NV16 input/output
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-07-19 12:20:16 +02:00
Michael Niedermayer
fd26b07e8b Bump versions after 5.1 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-07-13 00:29:05 +02:00
Michael Niedermayer
6f1b144358 Bump Versions for 5.1 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-07-13 00:27:37 +02:00
Andreas Rheinhardt
81d3472031 swscale/x86/swscale: Simplify macro
This is possible now that it is no longer used by MMX.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:36:18 +02:00
Andreas Rheinhardt
a05f22eaf3 swscale/x86/swscale: Remove obsolete and harmful MMX(EXT) functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.

Moreover, some of the removed code was buggy/not bitexact
and lead to failures involving the f32le and f32be versions of
gray, gbrp and gbrap on x86-32 when SSE2 was not disabled.
See e.g.
https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx

Notice that yuv2yuvX_mmx is not removed, because it is used
by SSE3 and AVX2 as fallback in case of unaligned data and
also for tail processing. I don't know why yuv2yuvX_mmxext
isn't being used for this; an earlier version [1] of
554c2bc708 used it, but
the version that was eventually applied does not.

[1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:36:04 +02:00
Andreas Rheinhardt
2831837182 swscale/x86/yuv2rgb: Remove obsolete MMX functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:35:50 +02:00
Andreas Rheinhardt
608319a311 swscale/x86/rgb2rgb: Remove obsolete MMX, 3dnow functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:35:38 +02:00
Andreas Rheinhardt
40e6575aa3 all: Replace if (ARCH_FOO) checks by #if ARCH_FOO
This is more spec-compliant because it does not rely
on dead-code elimination by the compiler. Especially
MSVC has problems with this, as can be seen in
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html
or
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html

This commit does not eliminate every instance where we rely
on dead code elimination: It only tackles branching to
the initialization of arch-specific dsp code, not e.g. all
uses of CONFIG_ and HAVE_ checks. But maybe it is already
enough to compile FFmpeg with MSVC with whole-programm-optimizations
enabled (if one does not disable too many components).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-15 04:56:37 +02:00
Vardan Margaryan
73302aa193 swscale/x86/yuv_2_rgb: fix access to memory past the frame data in yuv to rgb conversion
Y, U, V data is loaded at the end of the current iteration for the next
iteration.
It results in memory access past the frame data on the last iteration
(that data is never used after the loading).

So load data at the start of the iteration, so that only useful data is
loaded.

Signed-off-by: Vardan Margaryan <v.t.margaryan@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-06-06 09:51:17 +02:00
Swinney, Jonathan
0ea61725b1 swscale/aarch64: add hscale specializations
This patch adds code to support specializations of the hscale function
and adds a specialization for filterSize == 4.

ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck
here is loading the data from src, this data is loaded a whole block
ahead and stored back to the stack to be loaded again with ld4. This
arranges the data for most efficient use of the vector instructions and
removes the need for completion adds at the end. The number of
iterations of the C per iteration of the assembly is increased from 4 to
8, but because of the prefetching, there must be a special section
without prefetching when dstW < 16.

This improves speed on Graviton 2 (Neoverse N1) dramatically in the case
where previously fs=8 would have been required.

before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8
after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-05-28 01:09:05 +03:00
Andreas Rheinhardt
f2b79c5b85 lib*/version: Move library version functions into files of their own
This avoids having to rebuild big files every time FFMPEG_VERSION
changes (which it does with every commit).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-05-10 06:49:32 +02:00
Martin Storsjö
70db14376c swscale: aarch64: Optimize the final summation in the hscale routine
Before:                     Cortex A53      A72      A73  Graviton 2  Graviton 3
hscale_8_to_15_width8_neon:     8273.0   4602.5   4289.5      2429.7      1629.1
hscale_8_to_15_width16_neon:   12405.7   6803.0   6359.0      3549.0      2378.4
hscale_8_to_15_width32_neon:   21258.7  11491.7  11469.2      5797.2      3919.6
hscale_8_to_15_width40_neon:   25652.0  14173.7  12488.2      6893.5      4810.4

After:
hscale_8_to_15_width8_neon:     7633.0   3981.5   3350.2      1980.7      1261.1
hscale_8_to_15_width16_neon:   11666.7   5951.0   5512.0      3080.7      2131.4
hscale_8_to_15_width32_neon:   20900.7  10733.2   9481.7      5275.2      3862.1
hscale_8_to_15_width40_neon:   24826.0  13536.2  11502.0      6397.2      4731.9

Thus, this gives overall a 8-29% speedup for the smaller filter
sizes, around 1-8% for the larger filter sizes.

Inspired by a patch by Jonathan Swinney <jswinney@amazon.com>.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-22 10:49:46 +03:00
Martin Storsjö
2d368392a5 Keep including the full version.h when headers are included externally
This avoids unnecessary churn and build breakage for users, by
making sure the whole version.h is included like it has been so far,
while keeping the benefit of not needing to rebuild most files in
the ffmpeg tree on minor/micro bumps.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-03-19 00:01:57 +02:00
Martin Storsjö
f3a0e2ee2b doc: Add an entry to APIchanges about changes to version.h and version_major.h
Also bump the minor versions of all libraries, to signify the
API change of splitting the version.h headers and adding the
new version_major.h header.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-03-16 14:12:46 +02:00
Martin Storsjö
6cd2ac388d libswscale: Split version.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-03-16 14:05:26 +02:00
Martin Storsjö
c523724c69 swscale: Take the destination range into account for yuv->rgb->yuv conversions
The range parameters need to be set up before calling
sws_init_context (which selects which fastpaths can be used;
this gets called by sws_getContext); solely passing them via
sws_setColorspaceDetails isn't enough.

This fixes producing full range YUV range output when doing
YUV->YUV conversions between different YUV color spaces.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-02-25 11:01:17 +02:00
Andreas Rheinhardt
636631d9db Remove unnecessary libavutil/(avutil|common|internal).h inclusions
Some of these were made possible by moving several common macros to
libavutil/macros.h.

While just at it, also improve the other headers a bit.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-02-24 12:56:49 +01:00
Andreas Rheinhardt
155cd6baa4 Remove obsolete version.h inclusions
Forgotten in e7bd47e657.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-02-24 12:56:49 +01:00
Alan Kelly
e534d98af3 libswscale: Re-factor ff_shuffle_filter_coefficients.
Make the code more readable and follow the style guide.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-02-17 17:17:22 +01:00
Alan Kelly
f1a5414c97 libswscale: Check and propagate memory allocation errors from ff_shuffle_filter_coefficients.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-02-17 17:17:07 +01:00
Andreas Rheinhardt
71e2825150 swscale/x86/swscale: Remove superfluous and invalid ';'
Inside a function an unnecessary ';' is just a null statement;
yet outside of it it is actually illegal (but compilers happen
to accept it without warning except when using -pedantic).
So modify the macros to always expect the user to add a ';'.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-22 17:00:45 +01:00
Mark Reid
52f7026164 swscale/x86/input.asm: add x86-optimized planer rgb2yuv functions
sse2 only operates on 2 lanes per loop for to_y and to_uv functions, due
to the lack of pmulld instruction.  Emulating pmulld with 2 pmuludq and shuffles
proved too costly and made to_uv functions slower then the c implementation.

For to_y on sse2 only float functions are generated,
I was are not able outperform the c implementation on the integer pixel formats.

For to_a on see4 only the float functions are generated.
sse2 and sse4 generated nearly identical performing code on integer pixel formats,
so only sse2/avx2 versions are generated.

planar_gbrp_to_y_512_c: 1197.5
planar_gbrp_to_y_512_sse4: 444.5
planar_gbrp_to_y_512_avx2: 287.5
planar_gbrap_to_y_512_c: 1204.5
planar_gbrap_to_y_512_sse4: 447.5
planar_gbrap_to_y_512_avx2: 289.5
planar_gbrp9be_to_y_512_c: 1380.0
planar_gbrp9be_to_y_512_sse4: 543.5
planar_gbrp9be_to_y_512_avx2: 340.0
planar_gbrp9le_to_y_512_c: 1200.5
planar_gbrp9le_to_y_512_sse4: 442.0
planar_gbrp9le_to_y_512_avx2: 282.0
planar_gbrp10be_to_y_512_c: 1378.5
planar_gbrp10be_to_y_512_sse4: 544.0
planar_gbrp10be_to_y_512_avx2: 337.5
planar_gbrp10le_to_y_512_c: 1200.0
planar_gbrp10le_to_y_512_sse4: 448.0
planar_gbrp10le_to_y_512_avx2: 285.5
planar_gbrap10be_to_y_512_c: 1380.0
planar_gbrap10be_to_y_512_sse4: 542.0
planar_gbrap10be_to_y_512_avx2: 340.5
planar_gbrap10le_to_y_512_c: 1199.0
planar_gbrap10le_to_y_512_sse4: 446.0
planar_gbrap10le_to_y_512_avx2: 289.5
planar_gbrp12be_to_y_512_c: 10563.0
planar_gbrp12be_to_y_512_sse4: 542.5
planar_gbrp12be_to_y_512_avx2: 339.0
planar_gbrp12le_to_y_512_c: 1201.0
planar_gbrp12le_to_y_512_sse4: 440.5
planar_gbrp12le_to_y_512_avx2: 286.0
planar_gbrap12be_to_y_512_c: 1701.5
planar_gbrap12be_to_y_512_sse4: 917.0
planar_gbrap12be_to_y_512_avx2: 338.5
planar_gbrap12le_to_y_512_c: 1201.0
planar_gbrap12le_to_y_512_sse4: 444.5
planar_gbrap12le_to_y_512_avx2: 288.0
planar_gbrp14be_to_y_512_c: 1370.5
planar_gbrp14be_to_y_512_sse4: 545.0
planar_gbrp14be_to_y_512_avx2: 338.5
planar_gbrp14le_to_y_512_c: 1199.0
planar_gbrp14le_to_y_512_sse4: 444.0
planar_gbrp14le_to_y_512_avx2: 279.5
planar_gbrp16be_to_y_512_c: 1364.0
planar_gbrp16be_to_y_512_sse4: 544.5
planar_gbrp16be_to_y_512_avx2: 339.5
planar_gbrp16le_to_y_512_c: 1201.0
planar_gbrp16le_to_y_512_sse4: 445.5
planar_gbrp16le_to_y_512_avx2: 280.5
planar_gbrap16be_to_y_512_c: 1377.0
planar_gbrap16be_to_y_512_sse4: 545.0
planar_gbrap16be_to_y_512_avx2: 338.5
planar_gbrap16le_to_y_512_c: 1201.0
planar_gbrap16le_to_y_512_sse4: 442.0
planar_gbrap16le_to_y_512_avx2: 279.0
planar_gbrpf32be_to_y_512_c: 4113.0
planar_gbrpf32be_to_y_512_sse2: 2438.0
planar_gbrpf32be_to_y_512_sse4: 1068.0
planar_gbrpf32be_to_y_512_avx2: 904.5
planar_gbrpf32le_to_y_512_c: 3818.5
planar_gbrpf32le_to_y_512_sse2: 2024.5
planar_gbrpf32le_to_y_512_sse4: 1241.5
planar_gbrpf32le_to_y_512_avx2: 657.0
planar_gbrapf32be_to_y_512_c: 3707.0
planar_gbrapf32be_to_y_512_sse2: 2444.0
planar_gbrapf32be_to_y_512_sse4: 1077.0
planar_gbrapf32be_to_y_512_avx2: 909.0
planar_gbrapf32le_to_y_512_c: 3822.0
planar_gbrapf32le_to_y_512_sse2: 2024.5
planar_gbrapf32le_to_y_512_sse4: 1176.0
planar_gbrapf32le_to_y_512_avx2: 658.5

planar_gbrp_to_uv_512_c: 2325.8
planar_gbrp_to_uv_512_sse2: 1726.8
planar_gbrp_to_uv_512_sse4: 771.8
planar_gbrp_to_uv_512_avx2: 506.8
planar_gbrap_to_uv_512_c: 2281.8
planar_gbrap_to_uv_512_sse2: 1726.3
planar_gbrap_to_uv_512_sse4: 768.3
planar_gbrap_to_uv_512_avx2: 496.3
planar_gbrp9be_to_uv_512_c: 2336.8
planar_gbrp9be_to_uv_512_sse2: 1924.8
planar_gbrp9be_to_uv_512_sse4: 852.3
planar_gbrp9be_to_uv_512_avx2: 552.8
planar_gbrp9le_to_uv_512_c: 2270.3
planar_gbrp9le_to_uv_512_sse2: 1512.3
planar_gbrp9le_to_uv_512_sse4: 764.3
planar_gbrp9le_to_uv_512_avx2: 491.3
planar_gbrp10be_to_uv_512_c: 2281.8
planar_gbrp10be_to_uv_512_sse2: 1917.8
planar_gbrp10be_to_uv_512_sse4: 855.3
planar_gbrp10be_to_uv_512_avx2: 541.3
planar_gbrp10le_to_uv_512_c: 2269.8
planar_gbrp10le_to_uv_512_sse2: 1515.3
planar_gbrp10le_to_uv_512_sse4: 759.8
planar_gbrp10le_to_uv_512_avx2: 487.8
planar_gbrap10be_to_uv_512_c: 2382.3
planar_gbrap10be_to_uv_512_sse2: 1924.8
planar_gbrap10be_to_uv_512_sse4: 855.3
planar_gbrap10be_to_uv_512_avx2: 540.8
planar_gbrap10le_to_uv_512_c: 2382.3
planar_gbrap10le_to_uv_512_sse2: 1512.3
planar_gbrap10le_to_uv_512_sse4: 759.3
planar_gbrap10le_to_uv_512_avx2: 484.8
planar_gbrp12be_to_uv_512_c: 2283.8
planar_gbrp12be_to_uv_512_sse2: 1936.8
planar_gbrp12be_to_uv_512_sse4: 858.3
planar_gbrp12be_to_uv_512_avx2: 541.3
planar_gbrp12le_to_uv_512_c: 2278.8
planar_gbrp12le_to_uv_512_sse2: 1507.3
planar_gbrp12le_to_uv_512_sse4: 760.3
planar_gbrp12le_to_uv_512_avx2: 485.8
planar_gbrap12be_to_uv_512_c: 2385.3
planar_gbrap12be_to_uv_512_sse2: 1927.8
planar_gbrap12be_to_uv_512_sse4: 855.3
planar_gbrap12be_to_uv_512_avx2: 539.8
planar_gbrap12le_to_uv_512_c: 2377.3
planar_gbrap12le_to_uv_512_sse2: 1516.3
planar_gbrap12le_to_uv_512_sse4: 759.3
planar_gbrap12le_to_uv_512_avx2: 484.8
planar_gbrp14be_to_uv_512_c: 2283.8
planar_gbrp14be_to_uv_512_sse2: 1935.3
planar_gbrp14be_to_uv_512_sse4: 852.3
planar_gbrp14be_to_uv_512_avx2: 540.3
planar_gbrp14le_to_uv_512_c: 2276.8
planar_gbrp14le_to_uv_512_sse2: 1514.8
planar_gbrp14le_to_uv_512_sse4: 762.3
planar_gbrp14le_to_uv_512_avx2: 484.8
planar_gbrp16be_to_uv_512_c: 2383.3
planar_gbrp16be_to_uv_512_sse2: 1881.8
planar_gbrp16be_to_uv_512_sse4: 852.3
planar_gbrp16be_to_uv_512_avx2: 541.8
planar_gbrp16le_to_uv_512_c: 2378.3
planar_gbrp16le_to_uv_512_sse2: 1476.8
planar_gbrp16le_to_uv_512_sse4: 765.3
planar_gbrp16le_to_uv_512_avx2: 485.8
planar_gbrap16be_to_uv_512_c: 2382.3
planar_gbrap16be_to_uv_512_sse2: 1886.3
planar_gbrap16be_to_uv_512_sse4: 853.8
planar_gbrap16be_to_uv_512_avx2: 550.8
planar_gbrap16le_to_uv_512_c: 2381.8
planar_gbrap16le_to_uv_512_sse2: 1488.3
planar_gbrap16le_to_uv_512_sse4: 765.3
planar_gbrap16le_to_uv_512_avx2: 491.8
planar_gbrpf32be_to_uv_512_c: 4863.0
planar_gbrpf32be_to_uv_512_sse2: 3347.5
planar_gbrpf32be_to_uv_512_sse4: 1800.0
planar_gbrpf32be_to_uv_512_avx2: 1199.0
planar_gbrpf32le_to_uv_512_c: 4725.0
planar_gbrpf32le_to_uv_512_sse2: 2753.0
planar_gbrpf32le_to_uv_512_sse4: 1474.5
planar_gbrpf32le_to_uv_512_avx2: 927.5
planar_gbrapf32be_to_uv_512_c: 4859.0
planar_gbrapf32be_to_uv_512_sse2: 3269.0
planar_gbrapf32be_to_uv_512_sse4: 1802.0
planar_gbrapf32be_to_uv_512_avx2: 1201.5
planar_gbrapf32le_to_uv_512_c: 6338.0
planar_gbrapf32le_to_uv_512_sse2: 2756.5
planar_gbrapf32le_to_uv_512_sse4: 1476.0
planar_gbrapf32le_to_uv_512_avx2: 908.5

planar_gbrap_to_a_512_c: 383.3
planar_gbrap_to_a_512_sse2: 66.8
planar_gbrap_to_a_512_avx2: 43.8
planar_gbrap10be_to_a_512_c: 601.8
planar_gbrap10be_to_a_512_sse2: 86.3
planar_gbrap10be_to_a_512_avx2: 34.8
planar_gbrap10le_to_a_512_c: 602.3
planar_gbrap10le_to_a_512_sse2: 48.8
planar_gbrap10le_to_a_512_avx2: 31.3
planar_gbrap12be_to_a_512_c: 601.8
planar_gbrap12be_to_a_512_sse2: 111.8
planar_gbrap12be_to_a_512_avx2: 41.3
planar_gbrap12le_to_a_512_c: 385.8
planar_gbrap12le_to_a_512_sse2: 75.3
planar_gbrap12le_to_a_512_avx2: 39.8
planar_gbrap16be_to_a_512_c: 386.8
planar_gbrap16be_to_a_512_sse2: 79.8
planar_gbrap16be_to_a_512_avx2: 31.3
planar_gbrap16le_to_a_512_c: 600.3
planar_gbrap16le_to_a_512_sse2: 40.3
planar_gbrap16le_to_a_512_avx2: 30.3
planar_gbrapf32be_to_a_512_c: 1148.8
planar_gbrapf32be_to_a_512_sse2: 611.3
planar_gbrapf32be_to_a_512_sse4: 234.8
planar_gbrapf32be_to_a_512_avx2: 183.3
planar_gbrapf32le_to_a_512_c: 851.3
planar_gbrapf32le_to_a_512_sse2: 263.3
planar_gbrapf32le_to_a_512_sse4: 199.3
planar_gbrapf32le_to_a_512_avx2: 156.8

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-01-11 16:34:33 -03:00
Mark Reid
9e445a5be2 swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions
changes since v2:
 * fixed label
changes since v1:
 * remove vex intruction on sse4 path
 * some load/pack marcos use less intructions
 * fixed some typos

yuv2gbrp_full_X_4_512_c: 12757.6
yuv2gbrp_full_X_4_512_sse2: 8946.6
yuv2gbrp_full_X_4_512_sse4: 5138.6
yuv2gbrp_full_X_4_512_avx2: 3889.6
yuv2gbrap_full_X_4_512_c: 15368.6
yuv2gbrap_full_X_4_512_sse2: 11916.1
yuv2gbrap_full_X_4_512_sse4: 6294.6
yuv2gbrap_full_X_4_512_avx2: 3477.1
yuv2gbrp9be_full_X_4_512_c: 14381.6
yuv2gbrp9be_full_X_4_512_sse2: 9139.1
yuv2gbrp9be_full_X_4_512_sse4: 5150.1
yuv2gbrp9be_full_X_4_512_avx2: 2834.6
yuv2gbrp9le_full_X_4_512_c: 12990.1
yuv2gbrp9le_full_X_4_512_sse2: 9118.1
yuv2gbrp9le_full_X_4_512_sse4: 5132.1
yuv2gbrp9le_full_X_4_512_avx2: 2833.1
yuv2gbrp10be_full_X_4_512_c: 14401.6
yuv2gbrp10be_full_X_4_512_sse2: 9133.1
yuv2gbrp10be_full_X_4_512_sse4: 5126.1
yuv2gbrp10be_full_X_4_512_avx2: 2837.6
yuv2gbrp10le_full_X_4_512_c: 12718.1
yuv2gbrp10le_full_X_4_512_sse2: 9106.1
yuv2gbrp10le_full_X_4_512_sse4: 5120.1
yuv2gbrp10le_full_X_4_512_avx2: 2826.1
yuv2gbrap10be_full_X_4_512_c: 18535.6
yuv2gbrap10be_full_X_4_512_sse2: 33617.6
yuv2gbrap10be_full_X_4_512_sse4: 6264.1
yuv2gbrap10be_full_X_4_512_avx2: 3422.1
yuv2gbrap10le_full_X_4_512_c: 16724.1
yuv2gbrap10le_full_X_4_512_sse2: 11787.1
yuv2gbrap10le_full_X_4_512_sse4: 6282.1
yuv2gbrap10le_full_X_4_512_avx2: 3441.6
yuv2gbrp12be_full_X_4_512_c: 13723.6
yuv2gbrp12be_full_X_4_512_sse2: 9128.1
yuv2gbrp12be_full_X_4_512_sse4: 7997.6
yuv2gbrp12be_full_X_4_512_avx2: 2844.1
yuv2gbrp12le_full_X_4_512_c: 12257.1
yuv2gbrp12le_full_X_4_512_sse2: 9107.6
yuv2gbrp12le_full_X_4_512_sse4: 5142.6
yuv2gbrp12le_full_X_4_512_avx2: 2837.6
yuv2gbrap12be_full_X_4_512_c: 18511.1
yuv2gbrap12be_full_X_4_512_sse2: 12156.6
yuv2gbrap12be_full_X_4_512_sse4: 6251.1
yuv2gbrap12be_full_X_4_512_avx2: 3444.6
yuv2gbrap12le_full_X_4_512_c: 16687.1
yuv2gbrap12le_full_X_4_512_sse2: 11785.1
yuv2gbrap12le_full_X_4_512_sse4: 6243.6
yuv2gbrap12le_full_X_4_512_avx2: 3446.1
yuv2gbrp14be_full_X_4_512_c: 13690.6
yuv2gbrp14be_full_X_4_512_sse2: 9120.6
yuv2gbrp14be_full_X_4_512_sse4: 5138.1
yuv2gbrp14be_full_X_4_512_avx2: 2843.1
yuv2gbrp14le_full_X_4_512_c: 14995.6
yuv2gbrp14le_full_X_4_512_sse2: 9119.1
yuv2gbrp14le_full_X_4_512_sse4: 5126.1
yuv2gbrp14le_full_X_4_512_avx2: 2843.1
yuv2gbrp16be_full_X_4_512_c: 12367.1
yuv2gbrp16be_full_X_4_512_sse2: 8233.6
yuv2gbrp16be_full_X_4_512_sse4: 4820.1
yuv2gbrp16be_full_X_4_512_avx2: 2666.6
yuv2gbrp16le_full_X_4_512_c: 10904.1
yuv2gbrp16le_full_X_4_512_sse2: 8214.1
yuv2gbrp16le_full_X_4_512_sse4: 4824.1
yuv2gbrp16le_full_X_4_512_avx2: 2629.1
yuv2gbrap16be_full_X_4_512_c: 26569.6
yuv2gbrap16be_full_X_4_512_sse2: 10884.1
yuv2gbrap16be_full_X_4_512_sse4: 5488.1
yuv2gbrap16be_full_X_4_512_avx2: 3272.1
yuv2gbrap16le_full_X_4_512_c: 14010.1
yuv2gbrap16le_full_X_4_512_sse2: 10562.1
yuv2gbrap16le_full_X_4_512_sse4: 5463.6
yuv2gbrap16le_full_X_4_512_avx2: 3255.1
yuv2gbrpf32be_full_X_4_512_c: 14524.1
yuv2gbrpf32be_full_X_4_512_sse2: 8552.6
yuv2gbrpf32be_full_X_4_512_sse4: 4636.1
yuv2gbrpf32be_full_X_4_512_avx2: 2474.6
yuv2gbrpf32le_full_X_4_512_c: 13060.6
yuv2gbrpf32le_full_X_4_512_sse2: 9682.6
yuv2gbrpf32le_full_X_4_512_sse4: 4298.1
yuv2gbrpf32le_full_X_4_512_avx2: 2453.1
yuv2gbrapf32be_full_X_4_512_c: 18629.6
yuv2gbrapf32be_full_X_4_512_sse2: 11363.1
yuv2gbrapf32be_full_X_4_512_sse4: 15201.6
yuv2gbrapf32be_full_X_4_512_avx2: 3727.1
yuv2gbrapf32le_full_X_4_512_c: 16677.6
yuv2gbrapf32le_full_X_4_512_sse2: 10221.6
yuv2gbrapf32le_full_X_4_512_sse4: 5693.6
yuv2gbrapf32le_full_X_4_512_avx2: 3656.6

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-01-11 16:33:17 -03:00
rcombs
df9180d8a0 swscale/output: use isSwappedChroma 2022-01-04 19:39:22 -06:00
rcombs
cb3a6cc082 swscale/output: use isSemiPlanarYUV for NV12/21/24/42 case 2022-01-04 19:39:22 -06:00
rcombs
f8e284be69 swscale: introduce isSwappedChroma 2022-01-04 19:39:22 -06:00
rcombs
bb4f19f2a2 swscale/output: use isDataInHighBits for 10-bit case
This code will need fleshing-out (probably templating) if we ever add
e.g. a P012 format.
2022-01-04 19:39:22 -06:00
rcombs
cf9e8cb52f swscale/output: use isSemiPlanarYUV for 16-bit case 2022-01-04 19:39:22 -06:00
rcombs
e5d83463c8 swscale: introduce isDataInHighBits 2022-01-04 19:39:22 -06:00
rcombs
cb87a3b137 swscale/output: template-ize yuv2nv12cX 10-bit and 16-bit cases
Fixes incorrect big-endian output introduced in 88d804b7ff

Avoids making the filter-time BE check more expensive
2022-01-04 19:39:22 -06:00
Andreas Rheinhardt
b189550137 lib*/version.h: Bump Versions after release/5.0 branch
This is done a second time for 5.0 because master was
merged into 5.0 so that it contains the recent DOVI additions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 14:29:06 +01:00
Andreas Rheinhardt
c512be9a90 lib*/version.h: Bump Versions before release/5.0 branch
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 13:40:03 +01:00
Andreas Rheinhardt
20b0d24c2f Makefile: Redo duplicating object files in shared builds
In case of shared builds, some object files containing tables
are currently duplicated into other libraries: log2_tab.c,
golomb.c, reverse.c. The check for whether this is duplicated
is simply whether CONFIG_SHARED is true. Yet this is crude:
E.g. libavdevice includes reverse.c for shared builds, but only
needs it for the decklink input device, which given that decklink
is not enabled by default will be unused in most libavdevice.so.

This commit changes this by making it more explicit about what
to duplicate from other libraries. To do this, two new Makefile
variables were added: SHLIBOBJS and STLIBOBJS. SHLIBOBJS contains
the objects that are duplicated from other libraries in case of
shared builds; STLIBOBJS contains stuff that a library has to
provide for other libraries in case of static builds. These new
variables provide a way to enable/disable with a finer granularity
than just whether shared builds are enabled or not. E.g. lavd's
Makefile now contains: SHLIBOBJS-$(CONFIG_DECKLINK_INDEV) += reverse.o

Another example is provided by the golomb tables. These are provided
by lavc for static builds, even if one uses a build configuration
that makes only lavf use them. Therefore lavc's Makefile contains
STLIBOBJS-$(CONFIG_MXF_MUXER) += golomb.o, whereas lavf's Makefile
has a corresponding SHLIBOBJS-$(CONFIG_MXF_MUXER) += golomb_tab.o.
E.g. in case the MXF muxer is the only component needing these tables
only libavformat.so will contain them for shared builds; currently
libavcodec.so does so, too.
(There is currently a CONFIG_EXTRA group for golomb. But actually
one would need two groups (golomb_avcodec and golomb_avformat) in
order to know when and where to include these tables. Therefore
this commit uses a Makefile-based approach for this and stops
using these groups for the users in libavformat.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 05:01:04 +01:00
Michael Niedermayer
4be85c9331 lib*/version.h: Bump Versions after release/5.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-01-03 22:10:46 +01:00
Michael Niedermayer
f3964a59e1 lib*/version.h: Bump Versions before release/5.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-01-03 22:08:31 +01:00
rcombs
3e00b9e395 swscale/x86/init: use isSemiPlanarYUV
Fixes P210/P410 cases introduced (and broken) in 88d804b7ff
2021-12-23 01:41:03 -06:00
rcombs
88d804b7ff swscale: add P210/P410/P216/P416 output 2021-12-22 18:38:40 -06:00
Alan Kelly
eebe406c80 libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
2021-12-21 17:44:53 -03:00
James Almer
eab91c3e2e x86/scale_avx2: don't use $ for hex literals
Fixes compilation with AVX2 enabled yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 17:29:21 -03:00
Alan Kelly
9092e58c44 x86/scale_avx2: Change asm indent from 2 to 4 spaces.
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 13:42:04 -03:00
Alan Kelly
86663963e6 x86/swscale: fix minor coding style issues
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 13:16:04 -03:00
James Almer
76a3f961f8 x86/scale_avx2: add missing check for AVX2 assembler support
Should fix compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 09:41:56 -03:00
Alan Kelly
f900a19fa9 libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.
Fixes so that fate under 64 bit Windows passes.

These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-15 20:04:59 -03:00
Andreas Rheinhardt
3be6fe9a56 swscale/yuv2rgb: Silence a set-but-unused-variable warning
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-12-03 16:10:51 +01:00
rcombs
f0204de47d swscale: add P210/P410/P216/P416 input 2021-11-28 16:40:43 -06:00
Mark Reid
3f4ce004b8 swscale/input: clip rgbf32 values before lrintf
if the float pixel * 65535.0f > 2147483647.0f
lrintf may overfow and return negative values, depending on implementation.
nan and +/-inf values may also be implementation defined

clip the value first so lrintf always works.

values <     0.0f, -inf, nan = 0.0f
values > 65535.0f, +inf      = 65535.0f

old timings
 195960 decicycles in planar_rgbf32le_to_uv,       1 runs,      0 skips
 186120 decicycles in planar_rgbf32le_to_uv,       2 runs,      0 skips
 188645 decicycles in planar_rgbf32le_to_uv,       4 runs,      0 skips
 183625 decicycles in planar_rgbf32le_to_uv,       8 runs,      0 skips
 181157 decicycles in planar_rgbf32le_to_uv,      16 runs,      0 skips
 177533 decicycles in planar_rgbf32le_to_uv,      32 runs,      0 skips
 175689 decicycles in planar_rgbf32le_to_uv,      64 runs,      0 skips

 232960 decicycles in planar_rgbf32be_to_uv,       1 runs,      0 skips
 221380 decicycles in planar_rgbf32be_to_uv,       2 runs,      0 skips
 216640 decicycles in planar_rgbf32be_to_uv,       4 runs,      0 skips
 213505 decicycles in planar_rgbf32be_to_uv,       8 runs,      0 skips
 211558 decicycles in planar_rgbf32be_to_uv,      16 runs,      0 skips
 210596 decicycles in planar_rgbf32be_to_uv,      32 runs,      0 skips
 210202 decicycles in planar_rgbf32be_to_uv,      64 runs,      0 skips

 161680 decicycles in planar_rgbf32le_to_y,       1 runs,      0 skips
 153540 decicycles in planar_rgbf32le_to_y,       2 runs,      0 skips
 148255 decicycles in planar_rgbf32le_to_y,       4 runs,      0 skips
 140600 decicycles in planar_rgbf32le_to_y,       8 runs,      0 skips
 132935 decicycles in planar_rgbf32le_to_y,      16 runs,      0 skips
 128531 decicycles in planar_rgbf32le_to_y,      32 runs,      0 skips
 140933 decicycles in planar_rgbf32le_to_y,      64 runs,      0 skips

 190980 decicycles in planar_rgbf32be_to_y,       1 runs,      0 skips
 176080 decicycles in planar_rgbf32be_to_y,       2 runs,      0 skips
 167980 decicycles in planar_rgbf32be_to_y,       4 runs,      0 skips
 164685 decicycles in planar_rgbf32be_to_y,       8 runs,      0 skips
 162751 decicycles in planar_rgbf32be_to_y,      16 runs,      0 skips
 162404 decicycles in planar_rgbf32be_to_y,      32 runs,      0 skips
 167849 decicycles in planar_rgbf32be_to_y,      64 runs,      0 skips

new timings
 183320 decicycles in planar_rgbf32le_to_uv,       1 runs,      0 skips
 175700 decicycles in planar_rgbf32le_to_uv,       2 runs,      0 skips
 179570 decicycles in planar_rgbf32le_to_uv,       4 runs,      0 skips
 172932 decicycles in planar_rgbf32le_to_uv,       8 runs,      0 skips
 168707 decicycles in planar_rgbf32le_to_uv,      16 runs,      0 skips
 165224 decicycles in planar_rgbf32le_to_uv,      32 runs,      0 skips
 163423 decicycles in planar_rgbf32le_to_uv,      64 runs,      0 skips

 184940 decicycles in planar_rgbf32be_to_uv,       1 runs,      0 skips
 185150 decicycles in planar_rgbf32be_to_uv,       2 runs,      0 skips
 185790 decicycles in planar_rgbf32be_to_uv,       4 runs,      0 skips
 185472 decicycles in planar_rgbf32be_to_uv,       8 runs,      0 skips
 185277 decicycles in planar_rgbf32be_to_uv,      16 runs,      0 skips
 185813 decicycles in planar_rgbf32be_to_uv,      32 runs,      0 skips
 185332 decicycles in planar_rgbf32be_to_uv,      64 runs,      0 skips

 145400 decicycles in planar_rgbf32le_to_y,       1 runs,      0 skips
 145100 decicycles in planar_rgbf32le_to_y,       2 runs,      0 skips
 143490 decicycles in planar_rgbf32le_to_y,       4 runs,      0 skips
 136687 decicycles in planar_rgbf32le_to_y,       8 runs,      0 skips
 131271 decicycles in planar_rgbf32le_to_y,      16 runs,      0 skips
 128698 decicycles in planar_rgbf32le_to_y,      32 runs,      0 skips
 127170 decicycles in planar_rgbf32le_to_y,      64 runs,      0 skips

 156020 decicycles in planar_rgbf32be_to_y,       1 runs,      0 skips
 146990 decicycles in planar_rgbf32be_to_y,       2 runs,      0 skips
 142020 decicycles in planar_rgbf32be_to_y,       4 runs,      0 skips
 141052 decicycles in planar_rgbf32be_to_y,       8 runs,      0 skips
 138973 decicycles in planar_rgbf32be_to_y,      16 runs,      0 skips
 138027 decicycles in planar_rgbf32be_to_y,      32 runs,      0 skips
 143939 decicycles in planar_rgbf32be_to_y,      64 runs,      0 skips

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-11-15 16:50:10 -03:00
Mark Reid
74e49cc583 swscale/input: unify grayf32 funcs with rgbf32 funcs
This is ment to be a cosmetic change

old timings:
  42780 UNITS in grayf32le,       1 runs,      0 skips
  56720 UNITS in grayf32le,       2 runs,      0 skips
  67265 UNITS in grayf32le,       4 runs,      0 skips
  58082 UNITS in grayf32le,       8 runs,      0 skips
  63512 UNITS in grayf32le,      16 runs,      0 skips
  52720 UNITS in grayf32le,      32 runs,      0 skips
  46491 UNITS in grayf32le,      64 runs,      0 skips

  68500 UNITS in grayf32be,       1 runs,      0 skips
  66930 UNITS in grayf32be,       2 runs,      0 skips
  62305 UNITS in grayf32be,       4 runs,      0 skips
  55510 UNITS in grayf32be,       8 runs,      0 skips
  50216 UNITS in grayf32be,      16 runs,      0 skips
  44480 UNITS in grayf32be,      32 runs,      0 skips
  42394 UNITS in grayf32be,      64 runs,      0 skips

new timings:
  46660 UNITS in grayf32le,       1 runs,      0 skips
  51830 UNITS in grayf32le,       2 runs,      0 skips
  53390 UNITS in grayf32le,       4 runs,      0 skips
  50910 UNITS in grayf32le,       8 runs,      0 skips
  44968 UNITS in grayf32le,      16 runs,      0 skips
  40349 UNITS in grayf32le,      32 runs,      0 skips
  38330 UNITS in grayf32le,      64 runs,      0 skips

  39980 UNITS in grayf32be,       1 runs,      0 skips
  49630 UNITS in grayf32be,       2 runs,      0 skips
  53540 UNITS in grayf32be,       4 runs,      0 skips
  59767 UNITS in grayf32be,       8 runs,      0 skips
  51206 UNITS in grayf32be,      16 runs,      0 skips
  44743 UNITS in grayf32be,      32 runs,      0 skips
  41468 UNITS in grayf32be,      64 runs,      0 skips

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-11-14 17:12:13 +01:00
Soft Works
58dce6f010 swscale/swscale: check SWS_PRINT_INFO flag for printing alignment warnings
This makes output consistent with a similar warning just few
lines above where this flag is checked in the same way.

Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
2021-11-13 19:55:32 +01:00
Mark Reid
d2379bd6a0 swscale/input: fix planar_rgb16_to_a for gbrap10be and gbrap12be formats
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-11-04 11:52:33 +01:00
Michael Niedermayer
8316b2a15f swscale/swscale: Improve *ColorspaceDetails() doxy
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Michael Niedermayer
5f3a160b42 swscale/utils: Improve return codes of sws_setColorspaceDetails()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Michael Niedermayer
c7699f95bb swscale/utils: Set all threads to the same colorspace even on failure
Fixes: ./ffplay dav.y4m -vf "scale=hd1080:threads=4"
Found-by: Paul
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Wu Jianhua
2c734a8496 libswscale/x86/rgb2rgb: add shuffle_bytes avx2
Performance data(Less is better):
    shuffle_bytes_ssse3   3.64654
    shuffle_bytes_avx2    0.94288

Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
2021-10-15 10:59:20 +02:00
Michael Niedermayer
f801207568 swscale/swscale: Pass slice location into unscaled code also for dst scaling
Fixes: alphablend=checkerboard

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-03 20:38:29 +02:00
Michael Niedermayer
06d6726588 swscale/alphablend: Fix slice handling
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-03 20:38:29 +02:00
Michael Niedermayer
9f40b5badb swscale/swscale_internal: Avoid unsigned for slice parameters
Mixing unsigned and signed often leads to unexpected arithmetic results.
Fixes: out of array write
Found-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-30 19:47:15 +02:00
Manuel Stoeckl
32329397e2 swscale: add input/output support for X2BGR10LE
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 16:26:10 +02:00
Manuel Stoeckl
ca594df622 swscale/yuv2rgb: fix conversion to X2RGB10
This resolves a problem where conversions from YUV to X2RGB10LE
would produce color values a factor 4 too small, because an 8-bit
value was placed in a 10-bit channel.

Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 16:26:10 +02:00
Andreas Rheinhardt
1ea3650823 Replace all occurences of av_mallocz_array() by av_calloc()
They do the same.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-20 01:03:52 +02:00
Andreas Rheinhardt
044a7c08dc swscale/swscale: Disable x86-specific code for other arches
SSE2 is x86 specific, yet due to the call to av_get_cpu_flags()
compilers were unable to optimize the checks (and the call) away
on other arches.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Andreas Rheinhardt
f440c422b7 swscale/swscale: Fix races when using unaligned strides/data
In this case the current code tries to warn once; to do so, it uses
ordinary static ints to store whether the warning has already been
emitted. This is both a data race (and therefore undefined behaviour)
as well as a race condition, because it is really possible for multiple
threads to be the one thread to emit the warning. This is actually
common since the introduction of the new multithreaded scaling API.

This commit fixes this by using atomic integers for the state;
furthermore, these are not static anymore, but rather contained
in the user-facing SwsContext (i.e. the parent SwsContext in case
of slice-threading).

Given that these atomic variables are not intended for synchronization
at all (but only for atomicity, i.e. only to output the warning once),
the atomic operations use memory_order_relaxed.

This affected the nv12, nv21, yuv420, yuv420p10, yuv422, yuv422p10 and
yuv444 filter-overlay FATE-tests.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Andreas Rheinhardt
a1255a350d libswscale/options: Add parent_log_context_offset to AVClass
This allows to associate log messages from slice contexts to
the user-visible SwsContext.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
James Almer
5fe648d04a libswscale/swscale: initialize all dst plane pointers in sws_receive_slice()
Fixes valgrind warnings about use of uninitialised values.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-09-07 09:44:58 -03:00
Anton Khirnov
d6fdc78e91 sws: implement slice threading 2021-09-06 09:17:53 +02:00
Anton Khirnov
42cd64c182 sws: add a new scaling API 2021-09-06 09:16:52 +02:00
Andreas Rheinhardt
2c05ee092b avutil/internal, swresample/audioconvert: Remove cpu.h inclusions
These inclusions are not necessary, as cpu.h is already included
wherever it is needed (via direct inclusion or via the arch-specific
headers).
Also remove other unnecessary cpu.h inclusions from ordinary
non-headers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-07-22 14:33:45 +02:00
Michael Niedermayer
7874d40f10 swscale/slice: Fix wrong return on error
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 15:21:37 +02:00
Michael Niedermayer
fa1e158ef6 swscale/utils: Use full chroma interpolation for rgb4/8 and dither none
Dither none is only implemented in full chroma interpolation for these rgb formats
Its also a obscure choice (producing less nice images) that implementing it in the
other code-paths makes no sense

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Michael Niedermayer
7528532550 swscale/output: Implement dither none for yuv2rgb_write_full()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Michael Niedermayer
997f9cfc12 swscale/slice: Check slice for allocation failure
Fixes: null pointer dereference
Fixes: alloc_slice.mp4

Found-by: Rafael Dutra <rafael.dutra@cispa.de>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Anton Khirnov
37c0fe49b7 sws: move updating the palette higher up
It does not interact in any way with the code setting up the image
pointers/strides, so it should not be intermixed with it.
2021-07-03 16:13:40 +02:00
Anton Khirnov
d6649d9a3b sws: move initializing dither_error higher up
It does not interact in any way with the code setting up the image
pointers/strides, so it should not be intermixed with it.
2021-07-03 16:13:10 +02:00
Anton Khirnov
e188985598 sws: move the early return for zero-sized slices higher up
Place it right after the input parameter validation. There is no point
in performing any setup if the sws_scale() call won't do anything.
2021-07-03 16:09:43 +02:00
Anton Khirnov
a91e6c927e sws: simplify setting sliceDir 2021-07-03 16:09:21 +02:00
Anton Khirnov
ff753f41dd sws: merge handling frame start into a single block
Also, return an error code on failure rather than 0.
2021-07-03 16:09:07 +02:00
Anton Khirnov
1b11a324fe sws: make checking for the start of a new frame more explicit 2021-07-03 16:07:22 +02:00
Anton Khirnov
0fb014b7bb sws: reset sliceDir at the end of sws_scale()
Makes it more clear that resetting it does not interact with the scaling
code that it is currently intermixed with.
2021-07-03 16:05:39 +02:00
Anton Khirnov
1f80789bf7 sws: rename SwsContext.swscale to convert_unscaled
That function pointer is now used only for unscaled conversion.
2021-07-03 15:57:53 +02:00
Anton Khirnov
fe490ec165 sws: separate the calls to scaled vs unscaled conversion
Call the scaler function directly rather than through a function
pointer. Drop the now-unused return value from ff_getSwsFunc() and
rename the function to reflect its new role.

This will be useful in the following commits, where it will become
important that the amount of output is different for scaled vs unscaled
case.
2021-07-03 15:57:13 +02:00
Anton Khirnov
0f8e0957d2 sws: do not reallocate scratch buffers for each slice 2021-07-03 15:56:16 +02:00
Anton Khirnov
2730639259 sws: group the parameters validity checks together
Also, fail with an error code rather than 0.
2021-07-03 15:31:18 +02:00
Anton Khirnov
c05cab34a9 sws: initialize {src,dst}Stride2 consistently with {src,dst}2 2021-07-03 15:31:08 +02:00
Anton Khirnov
d3d8e09640 sws: cosmetics
Reindent after previous commit, rewrap long lines.
2021-07-03 15:30:56 +02:00
Anton Khirnov
f136493d03 sws: factor out cascaded scaling 2021-07-03 15:30:34 +02:00
Anton Khirnov
a2254aedc9 sws: cosmetics
Reindent after previous commit, split long lines.
2021-07-03 15:30:20 +02:00
Anton Khirnov
44f12718bf sws: factor out gamma-correct scaling 2021-07-03 15:29:50 +02:00
Anton Khirnov
e355af9be9 sws: return an error code on invalid parameters to sws_scale() 2021-07-03 15:29:35 +02:00
Anton Khirnov
21a4e48f88 sws: reindent after previous commit 2021-07-03 15:29:22 +02:00
Anton Khirnov
27acca1af0 sws: factor out updating the palette 2021-07-03 15:28:46 +02:00
Anton Khirnov
f8c21ccbfc sws: remove unnecessary braces
There used to be more code inside them, but it was removed in
6de58b4903.
2021-07-03 15:28:36 +02:00
Peter Lundblad
da0abbbb01 libswscale: Make sws_init_context thread safe.
Call ff_sws_rgb2rgb_init via ff_thread_once instead of checking one of the
variables it updates.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-01 23:49:41 +02:00
Limin Wang
43295ae6a9 swscale/swscale_unscaled: don't use the optimized bgr24toYV12 unscaled conversion when width%2
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2021-06-06 12:34:05 +08:00
Anton Khirnov
85ba17f36d Bump major versions of all libraries.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-27 11:48:05 -03:00
Andreas Rheinhardt
ea2d9b7a2e libswscale: Remove unused deprecated functions, make used ones static
Deprecated in 3b905b9fe6.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-04-27 10:43:11 -03:00
Andreas Rheinhardt
f3c197b129 Include attributes.h directly
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
Alan Kelly
3ce8d09244 libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-04-01 20:47:52 +02:00
Alan Kelly
dc57762cb4 libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-04-01 20:47:52 +02:00
Michael Niedermayer
c361fa9e21 Bump minor versions after release branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-03-20 01:02:11 +01:00
Michael Niedermayer
c67d2a2875 Bump Versions before release/4.4 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-03-20 01:01:12 +01:00
Andreas Rheinhardt
c23a5523b5 swscale/x86/swscale: Remove unused ASM constants
The last user of g15Mask, r15Mask, g16Mask and r16Mask was disabled
in 77a416e8aa and finally removed in
36e8de07ed62609df45d064b56501e3084d25723; b15Mask and b16Mask were
apparently always unused (except for in_asm_used_var_warning_killer,
a function that only existed to make the compiler not optimize ASM
constants away).
w10 is unused since d604bab901, w02
since ef423a6618.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:47:54 +01:00
Andreas Rheinhardt
aad597a93c swscale/x86/rgb2rgb: Remove unused ASM constants
mask24hh etc. are unused since f099fbf5f3,
mask32b and mask32r since 296609f859,
mask32g since b38d487466 and mask32 since
f8a138be52.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:45:17 +01:00
Andreas Rheinhardt
49db6e4b4e swscale/x86/yuv2rgb: Remove unused ASM constants
mmx_grnmask is unused since 531f97b0c3,
the other constants since e934194b6a.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:43:14 +01:00
Chip Kerchner
e7f53d6ac9 lsws/ppc/yuv2rgb_altivec: Fix build in non-VSX environments
Add inline function for vec_xl if VSX is not supported. vec_xl intrinsic
is only available on POWER 7 or higher.

Fixes ticket #8750.

Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>
2021-02-22 23:19:21 -05:00
James Almer
1a555d3c60 swscale/x86/yuv2yuvX: use the movsxdifnidn helper macro
Simplifies code

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:43 -03:00
James Almer
ebb48d85a0 swscale/x86/yuv2yuvX: use movq to load 8 bytes in all non-AVX2 functions
mova expands to movq on non-XMM functions

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:43 -03:00
James Almer
d512ebbaed swscale/x86/yuv2yuvX: use the SPLATW helper macro
Simplifies code

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:43 -03:00
James Almer
c00567647e swscale/x86/swscale: fix mix of inline and external function definitions
This includes removing pointless static function forward declarations.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:42 -03:00
James Almer
c2bf1dcace swscale/x86/swscale: fix compilation with old yasm
Where AVX2 may not be supported.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-17 21:09:36 -03:00
Alan Kelly
554c2bc708 swscale: move yuv2yuvX_sse3 to yasm, unrolls main loop
And other small optimizations for ~20% speedup.
2021-02-17 21:21:03 +01:00
Carl Eugen Hoyos
2687070d9b lsws/ppc/yuv2rgb: Fix transparency converting from yuv->rgb32.
Based on 68363b69 by Reimar Döffinger.

Fixes ticket #9077.
2021-01-24 17:17:29 +01:00
Anton Khirnov
e15371061d lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump
They are not properly namespaced and not intended for public use.
2021-01-01 14:14:57 +01:00
Anton Khirnov
c8c2dfbc37 lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h
That is a more appropriate place for it.
2021-01-01 14:11:01 +01:00
Jeremy Leconte
29cef1bcd6 libswscale: avoid UB nullptr-with-offset.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-12-24 15:27:56 +01:00
Andriy Gelman
1200264fc4 swscale/rgb2rgb_template: use shuffle macro on big-endian arches
Fixes fate-qtrle-32bit on big-endian.

The macro does a simple byte swap on uint8 array without any casts, so
it's valid on big-endian arches.

The mentioned test was failing because the byteswap function
shuffle_bytes_3210_c() is used in the pixel format conversion
(argb->bgra).

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>
2020-12-12 23:07:22 -05:00
Carl Eugen Hoyos
46e362b765 lsws/x86/yuv2rgb: Fix compilation with mmxext or ssse3 disabled.
Fixes ticket #8986.
2020-11-14 15:37:57 +01:00
Marton Balint
993429cfb4 swscale/x86/yuv2rgb: fix crashes when loading alpha from unaligned buffers
Regression since fc6a5883d6 on SSSE3 enabled
CPUs.

Fixes ticket #8955.

Signed-off-by: Marton Balint <cus@passwd.hu>
2020-11-02 00:31:34 +01:00
Jan Ekström
7ea4bcff7b swscale/utils: override forced-zero formats back to full range
Fixes vf_scale outputting RGB AVFrames with limited range flagged
in case either input or output specifically sets the range.

This is the reverse of the logic utilized for RGB and PAL8 content
in sws_setColorspaceDetails.
2020-10-11 12:58:13 +03:00
Jan Ekström
3fe24fe232 swscale/utils: split range override check into its own function 2020-10-11 12:58:13 +03:00
Mark Reid
a48adcd136 libswcale/input: use more accurate planer rgb16 yuv conversions
These conversion appears to be exhibiting the same rounding error as the rgbf32 formats where.
I seperated the rounding value from the 16 and 128 offsets, I think it makes it a little more clear.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-10-06 17:56:52 +02:00
Mark Reid
453004fde6 libswcale/input: use more accurate rgbf32 yuv conversions
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-10-02 14:59:52 +02:00
Mark Reid
6bf57c6a2a libswscale/tests: add floatimg_cmp test
changes since v1:
- made into fate test
- fixed c90 warnings
- tests more intermediate formats
- tested on BE mips too

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-10-02 14:59:52 +02:00
James Almer
621e2625e0 swscale/x86/output: add missing AVX2 support preprocessor wrappers
Fixes compilation with old yasm

Signed-off-by: James Almer <jamrial@gmail.com>
2020-08-20 15:14:56 -03:00
Paul B Mahol
9d58cdb4ba swscale: do not drop half of bits from 16bit bayer formats 2020-08-08 12:03:42 +02:00
Limin Wang
7c8ad72f1c swscale/yuv2rgb: cosmetics
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2020-07-25 10:20:42 +08:00
Fei Wang
8544783280 swscale/yuv2rgb: consider x2rgb10le on big endian hardware
This fixed FATE fail report by filter-pixfmts* for x2rgb10le on big
endian hardware.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-20 21:00:00 +02:00