1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-02 03:06:28 +02:00
Commit Graph

49499 Commits

Author SHA1 Message Date
Anton Khirnov
08bebeb1be Revert "all: Don't set AVClass.item_name to its default value"
Some callers assume that item_name is always set, so this may be
considered an API break.

This reverts commit 0c6203c97a.
2024-01-20 10:34:48 +01:00
James Almer
0a5813fc68 avcodec/vvcdec: allocate and store structs on their own within the table list
Fixes "runtime error: member access within misaligned address 0xf00 for type
'struct bar', which requires # byte alignment" errors under GCC ubsan.

Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-19 08:53:32 -03:00
sunyuechi
8e23ebe6f9 lavc/svq1enc: R-V V ssd_int8_vs_int16
C908
ssd_int8_vs_int16_c: 207.7
ssd_int8_vs_int16_rvv_i32: 14.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-01-17 17:49:54 +02:00
Nuo Mi
d595e0a0b6 avcodec/vvcdec: misc, constify hor_ctu_edge
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-17 10:14:50 -03:00
Nuo Mi
375dcf469e avcodec/vvcdec: deblock, fix uninitialized values
see https://fate.ffmpeg.org/report.cgi?slot=x86_64-archlinux-gcc-valgrind&time=20240105201935
If tc is zero, the max_len_q, max_len_p are uninitialized.

Reported-by: James Almer <jamrial@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-17 10:14:35 -03:00
aybe aybe
36b402f80d
avcodec/mdec: DC reading for STRv1 is like STRv2
As I understand, support for .STR files is broken for almost 10 years now (since 161442ff2c it seems).

Currently, ffmpeg fails with tons of errors like this on version 1 STRs, e.g. Wipeout 1:
[mdec @ 00000000027c72c0] ac-tex damaged at 1 9

What happens is that only the audio is present in the video file.

Anyway, that one character patch fixes the problem, video is now rendered.

Signed-off-by: aybe <aybe@users.noreply.github.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-01-16 01:34:56 +01:00
sunyuechi
0befc1fca7 lvac/svqenc: add ff_svq1enc_init
This is for clarity and use in testing, consistent with other parts of the code

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-01-15 19:03:03 +02:00
Rémi Denis-Courmont
278b4b60d6 lavc/takdsp: R-V V decorrelate_sf
decorrelate_sf_c:      259.2
decorrelate_sf_rvv_i32: 45.5
2024-01-15 19:00:25 +02:00
yuanhecai
a87a52ed0b
avcodec/hevc: Add ff_hevc_idct_32x32_lasx asm opt
tests/checkasm/checkasm:

                          C          LSX       LASX
hevc_idct_32x32_8_c:      1243.0     211.7     101.7

Speedup of decoding H265 4K 30FPS 30Mbps on
3A6000 with 8 threads is 1fps(56fps-->57fps).

Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-01-12 23:35:40 +01:00
jinbo
9239081db3
avcodec/hevc: Add asm opt for the following functions
tests/checkasm/checkasm:           C       LSX     LASX
put_hevc_qpel_uni_h4_8_c:          5.7     1.2
put_hevc_qpel_uni_h6_8_c:          12.2    2.7
put_hevc_qpel_uni_h8_8_c:          21.5    3.2
put_hevc_qpel_uni_h12_8_c:         47.2    9.2     7.2
put_hevc_qpel_uni_h16_8_c:         87.0    11.7    9.0
put_hevc_qpel_uni_h24_8_c:         188.2   27.5    21.0
put_hevc_qpel_uni_h32_8_c:         335.2   46.7    28.5
put_hevc_qpel_uni_h48_8_c:         772.5   104.5   65.2
put_hevc_qpel_uni_h64_8_c:         1383.2  142.2   109.0

put_hevc_epel_uni_w_v4_8_c:        5.0     1.5
put_hevc_epel_uni_w_v6_8_c:        10.7    3.5     2.5
put_hevc_epel_uni_w_v8_8_c:        18.2    3.7     3.0
put_hevc_epel_uni_w_v12_8_c:       40.2    10.7    7.5
put_hevc_epel_uni_w_v16_8_c:       70.2    13.0    9.2
put_hevc_epel_uni_w_v24_8_c:       158.2   30.2    22.5
put_hevc_epel_uni_w_v32_8_c:       281.0   52.0    36.5
put_hevc_epel_uni_w_v48_8_c:       631.7   116.7   82.7
put_hevc_epel_uni_w_v64_8_c:       1108.2  207.5   142.2

put_hevc_epel_uni_w_h4_8_c:        4.7     1.2
put_hevc_epel_uni_w_h6_8_c:        9.7     3.5     2.7
put_hevc_epel_uni_w_h8_8_c:        17.2    4.2     3.5
put_hevc_epel_uni_w_h12_8_c:       38.0    11.5    7.2
put_hevc_epel_uni_w_h16_8_c:       69.2    14.5    9.2
put_hevc_epel_uni_w_h24_8_c:       152.0   34.7    22.5
put_hevc_epel_uni_w_h32_8_c:       271.0   58.0    40.0
put_hevc_epel_uni_w_h48_8_c:       597.5   136.7   95.0
put_hevc_epel_uni_w_h64_8_c:       1074.0  252.2   168.0

put_hevc_epel_bi_h4_8_c:           4.5     0.7
put_hevc_epel_bi_h6_8_c:           9.0     1.5
put_hevc_epel_bi_h8_8_c:           15.2    1.7
put_hevc_epel_bi_h12_8_c:          33.5    4.2     3.7
put_hevc_epel_bi_h16_8_c:          59.7    5.2     4.7
put_hevc_epel_bi_h24_8_c:          132.2   11.0
put_hevc_epel_bi_h32_8_c:          232.7   20.2    13.2
put_hevc_epel_bi_h48_8_c:          521.7   45.2    31.2
put_hevc_epel_bi_h64_8_c:          949.0   71.5    51.0

After this patch, the peformance of decoding H265 4K 30FPS
30Mbps on 3A6000 with 8 threads improves 1fps(55fps-->56fsp).

Change-Id: I8cc1e41daa63ca478039bc55d1ee8934a7423f51
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-01-12 23:35:40 +01:00
jinbo
1f642b99af
avcodec/hevc: Add epel_uni_w_hv4/6/8/12/16/24/32/48/64 asm opt
tests/checkasm/checkasm:           C       LSX     LASX
put_hevc_epel_uni_w_hv4_8_c:       9.5     2.2
put_hevc_epel_uni_w_hv6_8_c:       18.5    5.0     3.7
put_hevc_epel_uni_w_hv8_8_c:       30.7    6.0     4.5
put_hevc_epel_uni_w_hv12_8_c:      63.7    14.0    10.7
put_hevc_epel_uni_w_hv16_8_c:      107.5   22.7    17.0
put_hevc_epel_uni_w_hv24_8_c:      236.7   50.2    31.7
put_hevc_epel_uni_w_hv32_8_c:      414.5   88.0    53.0
put_hevc_epel_uni_w_hv48_8_c:      917.5   197.7   118.5
put_hevc_epel_uni_w_hv64_8_c:      1617.0  349.5   203.0

After this patch, the peformance of decoding H265 4K 30FPS 30Mbps
on 3A6000 with 8 threads improves 3fps (52fps-->55fsp).

Change-Id: If067e394cec4685c62193e7adb829ac93ba4804d
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-01-12 23:35:40 +01:00
jinbo
6c6bf18ce8
avcodec/hevc: Add qpel_uni_w_v|h4/6/8/12/16/24/32/48/64 asm opt
tests/checkasm/checkasm:           C       LSX     LASX
put_hevc_qpel_uni_w_h4_8_c:        6.5     1.7     1.2
put_hevc_qpel_uni_w_h6_8_c:        14.5    4.5     3.7
put_hevc_qpel_uni_w_h8_8_c:        24.5    5.7     4.5
put_hevc_qpel_uni_w_h12_8_c:       54.7    17.5    12.0
put_hevc_qpel_uni_w_h16_8_c:       96.5    22.7    13.2
put_hevc_qpel_uni_w_h24_8_c:       216.0   51.2    33.2
put_hevc_qpel_uni_w_h32_8_c:       385.7   87.0    53.2
put_hevc_qpel_uni_w_h48_8_c:       860.5   192.0   113.2
put_hevc_qpel_uni_w_h64_8_c:       1531.0  334.2   200.0

put_hevc_qpel_uni_w_v4_8_c:        8.0     1.7
put_hevc_qpel_uni_w_v6_8_c:        17.2    4.5
put_hevc_qpel_uni_w_v8_8_c:        29.5    6.0     5.2
put_hevc_qpel_uni_w_v12_8_c:       65.2    16.0    11.7
put_hevc_qpel_uni_w_v16_8_c:       116.5   20.5    14.0
put_hevc_qpel_uni_w_v24_8_c:       259.2   48.5    37.2
put_hevc_qpel_uni_w_v32_8_c:       459.5   80.5    56.0
put_hevc_qpel_uni_w_v48_8_c:       1028.5  180.2   126.5
put_hevc_qpel_uni_w_v64_8_c:       1831.2  319.2   224.2

Speedup of decoding H265 4K 30FPS 30Mbps on
3A6000 with 8 threads is 4fps(48fps-->52fps).

Change-Id: I1178848541d90083869225ba98a02e6aa8bb8c5a
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-01-12 23:35:40 +01:00
jinbo
a28eea2a27
avcodec/hevc: Add pel_uni_w_pixels4/6/8/12/16/24/32/48/64 asm opt
tests/checkasm/checkasm:           C       LSX     LASX
put_hevc_pel_uni_w_pixels4_8_c:    2.7     1.0
put_hevc_pel_uni_w_pixels6_8_c:    6.2     2.0     1.5
put_hevc_pel_uni_w_pixels8_8_c:    10.7    2.5     1.7
put_hevc_pel_uni_w_pixels12_8_c:   23.0    5.5     5.0
put_hevc_pel_uni_w_pixels16_8_c:   41.0    8.2     5.0
put_hevc_pel_uni_w_pixels24_8_c:   91.0    19.7    13.2
put_hevc_pel_uni_w_pixels32_8_c:   161.7   32.5    16.2
put_hevc_pel_uni_w_pixels48_8_c:   354.5   73.7    43.0
put_hevc_pel_uni_w_pixels64_8_c:   641.5   130.0   64.2

Speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 with
8 threads is 1fps(47fps-->48fps).

Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-01-12 23:35:40 +01:00
jinbo
cfbdda607d
avcodec/hevc: Add add_residual_4/8/16/32 asm opt
After this patch, the peformance of decoding H265 4K 30FPS 30Mbps
on 3A6000 with 8 threads improves 2fps (45fps-->47fsp).

Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-01-12 23:35:40 +01:00
Zhao Zhili
13c1fea92f avcodec/videotoolboxenc: fix setting avctx color_range doesn't work
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-01-12 10:49:18 +08:00
Nuo Mi
8d0dda8260 vvcdec: reuse h26x/h2656_deblock_template.c 2024-01-11 22:53:05 +08:00
Nuo Mi
ae0a83477b hevcdec: move deblock template to h26x/h2656_deblock_template.c 2024-01-11 22:53:05 +08:00
Nuo Mi
69e179e8bf vvcdec: reuse h26x/h2656_sao_template.c 2024-01-11 22:53:05 +08:00
Nuo Mi
d2fe23b835 hevcdec: move sao template to h26x/h2656_sao_template.c 2024-01-11 22:53:05 +08:00
Clément Bœsch
af509f9957 avcodec/proresenc_anatoliy: do not write into chroma reserved bitfields
The layout for the frame flags is as follow:

   chroma_format  u(2)
   reserved       u(2)
   interlace_mode u(2)
   reserved       u(2)

chroma_format has 2 allowed values:
   0: reserved
   1: reserved
   2: 4:2:2
   3: 4:4:4

interlace_mode has 3 allowed values:
   0: progressive
   1: tff
   2: bff
   3: reserved

0x80 is what we expect for "422 not interlaced", and the extra 0x2 from
0x82 is actually writing into the reserved bits.
2024-01-10 23:33:02 +01:00
Clément Bœsch
21f7a814ea avcodec/proresenc_anatoliy: do not write into alpha reserved bitfields
This byte represents 4 reserved bits followed by 4 alpha_channel_type bits.

alpha_channel_type currently has 3 differents defined values: 0 (no
alpha), 1 (8b alpha), and 2 (16b alpha), all the other values are
reserved. The 4 initial reserved bits are expected to be 0.
2024-01-10 23:33:02 +01:00
Clément Bœsch
6d35911667 avcodec/proresenc_kostya: do not write into alpha reserved bitfields
This byte represents 4 reserved bits followed by 4 alpha_channel_type bits.

alpha_channel_type currently has 3 differents defined values: 0 (no
alpha), 1 (8b alpha), and 2 (16b alpha), all the other values are
reserved. This part is correctly written (alpha_bits>>3 does the correct
thing), but the 4 initial bits are reserved.
2024-01-10 23:33:02 +01:00
Clément Bœsch
aa7ccd0ce9 avcodec/proresenc_kostya: use a compatible bitstream version
Quoting SMPTE RDD 36:2015:
  A decoder shall abort if it encounters a bitstream with an unsupported
  bitstream_version value. If 0, the value of the chroma_format syntax
  element shall be 2 (4:2:2 sampling) and the value of the
  alpha_channel_type element shall be 0 (no encoded alpha); if 1, any
  permissible value may be used for those syntax elements.

So if we're not in 4:2:2 or if there is alpha, we are not allowed to use
version 0.
2024-01-10 23:33:02 +01:00
Clément Bœsch
85cb1b9b20 avcodec/proresenc_anatoliy: use a compatible bitstream version
Quoting SMPTE RDD 36:2015:
  A decoder shall abort if it encounters a bitstream with an unsupported
  bitstream_version value. If 0, the value of the chroma_format syntax
  element shall be 2 (4:2:2 sampling) and the value of the
  alpha_channel_type element shall be 0 (no encoded alpha); if 1, any
  permissible value may be used for those syntax elements.

So if we're not in 4:2:2 or if there is alpha, we are not allowed to use
version 0.
2024-01-10 23:33:02 +01:00
Clément Bœsch
1081bae94d avcodec/proresenc_kostya: make a few cosmetics in encode_acs()
Unify cosmetics with encode_acs() from proresenc_anatoliy.
2024-01-10 14:08:00 +01:00
Clément Bœsch
cc2206d142 avcodec/proresenc_anatoliy: make a few cosmetics in encode_acs()
This makes the function pretty much identical to the function of the
same name in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
8fb2e96d7e avcodec/proresenc_anatoliy: execute AC run/level FFMIN() at assignment
This matches the logic from the function of the same name in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
096a69ad43 avcodec/proresenc_anatoliy: rework inner loop in encode_acs()
This matches the logic from the function of the same name in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
25f28b9308 avcodec/proresenc_anatoliy: avoid using ff_ prefix in function arguments 2024-01-10 14:08:00 +01:00
Clément Bœsch
29fd3f75fe avcodec/proresenc_anatoliy: rework encode_ac_coeffs() prototype
This makes the prototype closer to the function of the same name in
proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
3543100a05 avcodec/proresenc_anatoliy: replace get_level() with FFABS()
This matches the code from proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
ed8692446c avcodec/proresenc_anatoliy: cosmetics to make encode_dcs() identical to the one in Kostya encoder 2024-01-10 14:08:00 +01:00
Clément Bœsch
e87bc5641c avcodec/proresenc_anatoliy: remove TO_GOLOMB2()
A few cosmetics aside, this makes the function identical to the one with
the same name in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
a026f98f29 avcodec/proresenc_anatoliy: only pass down the first scale to encode_dcs()
This matches encode_dcs() prototype from proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
1aa7d504ec avcodec/proresenc_anatoliy: shuffle declarations around in encode_dcs()
This makes the function closer to the same function in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
87ba89281c avcodec/proresenc_anatoliy: rename TO_GOLOMB() to MAKE_CODE()
This matches the name in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
7af42088d7 avcodec/proresenc_kostya: add Anatoliy copyright
Both encoders share a lot of code from both authors.
2024-01-10 14:08:00 +01:00
Clément Bœsch
d269f84199 avcodec/proresenc_anatoliy: remove IS_NEGATIVE() macro
This makes the function closer to encode_acs() in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
9c7f6d89fd avcodec/proresenc_anatoliy: rename new_dc to dc
This makes the function closer to encode_dcs() in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
9258f4eaf9 avcodec/proresenc_anatoliy: compute sign only once
This makes the function closer to encode_dcs() in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
17392ca84f avcodec/proresenc_anatoliy: import GET_SIGN() macro from Kostya encoder and use it 2024-01-10 14:08:00 +01:00
Clément Bœsch
273f591a3d avcodec/proresenc_anatoliy: directly work with blocks in encode_dcs()
This makes the function closer to encode_dcs() in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
dadc5ac24a avcodec/proresenc_anatoliy: reduce DC encoding function prototype differences with Kostya encoder 2024-01-10 14:08:00 +01:00
Clément Bœsch
8e42d3aba0 avcodec/proresenc_anatoliy: execute codebook FFMIN() at assignment
This makes the function closer to encode_dcs() in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
43baba4647 avcodec/proresenc_anatoliy: rename new_code/code to code/codebook
This makes the function closer to encode_dcs() in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
c44cd371ca avcodec/proresenc_anatoliy: inline QSCALE()
Also replaces 16384 with 0x4000.

This makes the function slightly closer to same function in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
1574475033 avcodec/proresenc_anatoliy: rework encode_codeword() prototype
This matches the function of the same name in proresenc_kostya.
2024-01-10 14:08:00 +01:00
Clément Bœsch
1832bd7838 avcodec/proresenc_anatoliy: shuffle encode_codeword() code to match Kostya encoder
Code is functionally identical, it's just rename of variables, cosmetics
and branch logic shuffling.
2024-01-10 14:08:00 +01:00
Clément Bœsch
3885d2493d avcodec/proresenc_anatoliy: use FRAME_ID defined in proresdata.h 2024-01-10 14:08:00 +01:00
Clément Bœsch
d6e0fb7c92 avcodec/proresenc_kostya: simplify quantization matrix bytestream writing 2024-01-10 14:08:00 +01:00
Clément Bœsch
cbee015867 avcodec/proresenc_kostya: fix chroma quantisation matrix in frame header
Most of the time the quantisation matrices are the same, it only matters
with the proxy profile.
2024-01-10 14:08:00 +01:00
Clément Bœsch
631fa19ee0 avcodec/proresenc_kostya: save a few operations in DC encoding
This matches the logic from proresenc_anatoliy.
2024-01-10 14:08:00 +01:00
Clément Bœsch
f06f2cf16a avcodec/proresenc_anatoliy: move DC codebook LUT to shared proresdata
This is going to be shared with proresenc_kostya in the upcoming commit.
2024-01-10 14:08:00 +01:00
Clément Bœsch
9f547e2f15 avcodec/proresenc_anatoliy: remove duplicated define
This is already defined in proresdata.h
2024-01-10 14:08:00 +01:00
Clément Bœsch
c35733006a avcodec/proresenc_kostya: remove one LUT indirection for run/level to codebook mapping
This is following the same logic as proresenc_anatoliy.
2024-01-10 14:08:00 +01:00
Clément Bœsch
3ba52f18e4 avcodec/proresenc_anatoliy: move run/lev to codebook LUT to shared proresdata
This is going to be shared with proresenc_kostya in the upcoming commit.
2024-01-10 14:08:00 +01:00
Clément Bœsch
e940baa65b avcodec/proresenc_kostya: remove redundant codebook assignments
This is already assigned at declaration.
2024-01-10 14:08:00 +01:00
Clément Bœsch
e453efcfbc avcodec/proresenc_kostya: remove unused plane factor variables 2024-01-10 14:08:00 +01:00
Clément Bœsch
2ac88c1362 avcodec/proresenc_kostya: remove an unnecessary parenthesis level in MAKE_CODE() macro 2024-01-10 14:08:00 +01:00
Marton Balint
363b3ec98a all: use av_channel_layout_describe_bprint instead of av_channel_layout_describe in a few places
Where an AVBPrint buffer is used later anyway.

Signed-off-by: Marton Balint <cus@passwd.hu>
2024-01-07 22:47:22 +01:00
James Almer
b95ccfcada avcodec/vvc_thread: don't use an anonymous union
Should fix compilation with old GCC.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-06 23:28:03 -03:00
Nuo Mi
02d600c568 vvcdec: add TODO for combining transform, lmcs_scale_chroma, and add_residual
Thanks for the suggestion from Lynne.
2024-01-07 09:01:04 +08:00
Nuo Mi
26769024d1 avcodec/vvcdec: decode extradata to support container formats
For example:
wget https://www.elecard.com/storage/video/NovosobornayaSquare_1920x1080.mp4
./ffplay NovosobornayaSquare_1920x1080.mp4
2024-01-07 08:58:43 +08:00
Sam James
2f24f10d9c libavcodec: fix -Wint-conversion in vulkan
FIx warnings (soon to be errors in GCC 14, already so in Clang 15):
```
src/libavcodec/vulkan_av1.c: In function ‘vk_av1_create_params’:
src/libavcodec/vulkan_av1.c:183:43: error: initialization of ‘long long unsigned int’ from ‘void *’ makes integer from pointer without a cast [-Wint-conversion]
  183 |         .videoSessionParametersTemplate = NULL,
      |                                           ^~~~
src/libavcodec/vulkan_av1.c:183:43: note: (near initialization for ‘(anonymous).videoSessionParametersTemplate’)
```

Use Vulkan's VK_NULL_HANDLE instead of bare NULL.

Fix Trac ticket #10724.

Was reported downstream in Gentoo at https://bugs.gentoo.org/919067.

Signed-off-by: Sam James <sam@gentoo.org>
2024-01-06 22:38:55 +01:00
Clément Bœsch
9109273e3b avcodec/proresenc: fix alpha plane encoding bitstream
These functions encode a slice of alpha (1 to 8 macroblocks) which are
expected to be encoded as a repeated sequence of "[diff][run-1]", where
diff is the running difference of the alpha value and run is how many
times that value is expected to be duplicated (within the limit of a
grand total of 2048 unpacked samples, corresponding to a slice of 8 MB).

Even when run==0 (the run variable semantic is actually "run minus 1"),
there is always a diff previously encoded that needs a counter of at
least 1. This means we need to call put_alpha_run() unconditionally at
the end of the bitstream to account for the last running diff.

This commit fixes glitchy playbacks on QuickTime with M2 and M3 hardware
(but not M1 for some mysterious reason) with files generated with
commands such as:

  ffmpeg -f lavfi -i testsrc2=d=5:s=912x320,chromakey -c:v prores_aw -profile:v 4    -y aw.mov
  ffmpeg -f lavfi -i testsrc2=d=5:s=912x320,chromakey -c:v prores_ks -profile:v 4444 -y ks.mov

The glitch expresses itself deterministically as blinking black
rectangles on random frames (for example on frame 21, 54, 71, 79, ...).

Even with the proresdec from FFmpeg, overreads actually happens while
reading the run-minus-1 value (around val = get_bits(gb, 4) in
unpack_alpha()). This doesn't seem to cause any particular issue because
it simply overreads into the next slice, and because the decoder is
resilient, but it's still a problem.

The investigation leading to this fix was made possible because of paid
work for Jitter (https://jitter.video).

Fixes ticket #10255.
2024-01-06 17:29:59 +01:00
Clément Bœsch
2142141a16 avcodec/proresenc: make transparency honored in mov/QT
In the mov muxer (in mov_write_video_tag()), bits_per_coded_sample will
be written under certain conditions and is required to be 32 for the
transparency to be honored in QuickTime.

prores_kostya already has this setting but prores_anatoliy and
prores_videotoolbox didn't.
2024-01-06 17:29:59 +01:00
Wu Jianhua
94949d4770 avcodec/d3d12va_decode: don't change the resource state if the referenced frame is the same as the current frame
This commit removes the follow warning and error:

D3D12 WARNING: ID3D12CommandList::ResourceBarrier: Called on the same subresource(s) of
Resource(0x000002236E0E00D0:'Unnamed ID3D12Resource Object') in separate Barrier Descs
which is inefficient and likely unintentional. Desc[0] and Desc[1] on (subresource :
4294967295). [RESOURCE_MANIPULATION WARNING #1008: RESOURCE_BARRIER_DUPLICATE_SUBRESOURCE_TRANSITIONS]

D3D12 ERROR: ID3D12CommandList::ResourceBarrier: Before state (0x0: D3D12_RESOURCE_STATE_[COMMON|PRESENT])
of resource (0x000002236E0E00D0:'Unnamed ID3D12Resource Object') (subresource: 0) specified
by transition barrier does not match with the state (0x20000: D3D12_RESOURCE_STATE_VIDEO_DECODE_WRITE)
specified in the previous call to ResourceBarrier [RESOURCE_MANIPULATION ERROR #527:
RESOURCE_BARRIER_BEFORE_AFTER_MISMATCH]

Tested-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-05 11:08:17 +08:00
Tong Wu
270cd14bbb avcodec/dxva2(h264|mpeg2|vc1): use av_assert0 instead of assert
Signed-off-by: Tong Wu <tong1.wu@intel.com>
2024-01-05 11:06:57 +08:00
Tong Wu
0f01581ccd avcodec/d3d12va_decode|dxva2: add a warning to replace assertion
Previous assertion was not useful. Now a warning is added to replace it.
For get_surface_index, we should return a zero index in case the index is not found.
But a warning is necessary to notify.

Signed-off-by: Tong Wu <tong1.wu@intel.com>
2024-01-05 11:06:57 +08:00
Tong Wu
56c671c3b0 avcodec/d3d12va_h264: replace assert with av_assert0
Signed-off-by: Tong Wu <tong1.wu@intel.com>
2024-01-05 11:06:57 +08:00
Tong Wu
83e0fcbe03 avcodec/d3d12va_decode: delete an empty line and fix a fuction alignment
Signed-off-by: Tong Wu <tong1.wu@intel.com>
2024-01-05 11:06:57 +08:00
Tong Wu
d18ed2ddb5 avcodec/d3d12va_vp9: fix vp9 max_num_refs value
Previous max_num_refs was based on pp.frame_refs plus 1 and it could possibly
reaches the size limit. Actually it should be the size of pp.ref_frame_map
plus 1.

Signed-off-by: Tong Wu <tong1.wu@intel.com>
2024-01-05 11:06:57 +08:00
Zhao Zhili
33698ef891 avcodec/mpegutils: print axis in debug_info2
For example,

./ffmpeg -nostats -threads 1 -debug qp \
	-export_side_data +venc_params \
	-i reinit-small_420_9-to-small_420_8.h264 \
	-frames 2 \
	-f null -

New frame, type: B
    0       64      128     192
  0 313131313131313131313131313129
 16 292929292929292929292929292929
 32 323232323232323232323232323232
 48 323232323232323232323232323232
 64 323232323232323232323232323232
 80 323232323232323232323232323232
 96 323232323030303030303030303030
112 303030303030303030303030303030
128 303030303030303030303030303028
144 313131312929292929292929292929
160 292929292929292929292929292929
176 292929292929292929292929292931
192 312831312631313131312730283131

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-01-04 17:35:11 +08:00
Zhao Zhili
bd48c08f80 avcodec/mpegutils: make debug_info2 thread safe
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-01-04 17:33:56 +08:00
Zhao Zhili
7f900a737f avcodec/videotoolbox: specify color range for hw frame ctx
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-01-04 17:33:38 +08:00
Zhao Zhili
0f824d792d avcodec/hevc_parser: fix missing zero_byte at frame beginning
The start code is matched against 0x000001, zero_byte was treated
as last byte of last frame rather than the beginning of next frame.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-01-05 01:14:33 +08:00
Zhao Zhili
b7ac1f9856 avcodec/hevc_mp4toannexb_bsf: use HEVCNALUnitType instead of integer literal
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-01-05 01:14:33 +08:00
Nuo Mi
301ed950d1 vvcdec: add vvc decoder
vvc decoder plug-in to avcodec.
split frames into slices/tiles and send them to vvc_thread for further decoding
reorder and wait for the frame decoding to be done and output the frame

Features:
    + Support I, P, B frames
    + Support 8/10/12 bits, chroma 400, 420, 422, and 444 and range extension
    + Support VVC new tools like MIP, CCLM, AFFINE, GPM, DMVR, PROF, BDOF, LMCS, ALF
    + 295 conformace clips passed
    - Not support RPR, IBC, PALETTE, and other minor features yet

Performance:
    C code FPS on an i7-12700K (x86):
        BQTerrace_1920x1080_60_10_420_22_RA.vvc      93.0
        Chimera_8bit_1080P_1000_frames.vvc          184.3
        NovosobornayaSquare_1920x1080.bin           191.3
        RitualDance_1920x1080_60_10_420_32_LD.266   150.7
        RitualDance_1920x1080_60_10_420_37_RA.266   170.0
        Tango2_3840x2160_60_10_420_27_LD.266         33.7

    C code FPS on a M1 Mac Pro (ARM):
        BQTerrace_1920x1080_60_10_420_22_RA.vvc     58.7
        Chimera_8bit_1080P_1000_frames.vvc          153.3
        NovosobornayaSquare_1920x1080.bin           150.3
        RitualDance_1920x1080_60_10_420_32_LD.266   105.0
        RitualDance_1920x1080_60_10_420_37_RA.266   133.0
        Tango2_3840x2160_60_10_420_27_LD.266        21.7

    Asm optimizations still working in progress. please check
    https://github.com/ffvvc/FFmpeg/wiki#performance-data for the latest

Contributors (based on code merge order):
    Nuo Mi <nuomi2021@gmail.com>
    Xu Mu <toxumu@outlook.com>
    Frank Plowman <post@frankplowman.com>
    Shaun Loo <shaunloo10@gmail.com>
    Wu Jianhua <toqsxw@outlook.com>

Thank you for reporting issues and providing performance reports:
    Łukasz Czech <lukaszcz18@wp.pl>
    Xu Fulong <839789740@qq.com>

Thank you for providing review comments:
    Ronald S. Bultje <rsbultje@gmail.com>
    James Almer <jamrial@gmail.com>
    Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:12 +08:00
Nuo Mi
e7ef457d6b vvcdec: add CTU thread logical
This is the main entry point for the CTU (Coding Tree Unit) decoder.
The code will divide the CTU decoder into several stages.
It will check the stage dependencies and run the stage decoder.

Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:12 +08:00
Nuo Mi
07f75d5e02 vvcdec: add CTU parser
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:12 +08:00
Nuo Mi
b49575f4cf vvcdec: add dsp init and inv transform
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:12 +08:00
Nuo Mi
02c1455b44 vvcdec: add LMCS, Deblocking, SAO, and ALF filters
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:12 +08:00
Nuo Mi
c05ba94ce8 vvcdec: add intra prediction
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:12 +08:00
Nuo Mi
2592cc1f96 vvcdec: add inv transform 1d
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:11 +08:00
Nuo Mi
ea49c83bad vvcdec: add inter prediction
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:11 +08:00
Nuo Mi
603d0bd171 vvcdec: add motion vector decoder
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:11 +08:00
Nuo Mi
c1a3d17491 vvcdec: add reference management
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:11 +08:00
Nuo Mi
976d3b7d69 vvcdec: add cabac decoder
add Context-based Adaptive Binary Arithmetic Coding (CABAC) decoder

Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 23:15:06 +08:00
Nuo Mi
e97a5bbb13 vvcdec: add parameter parser for sps, pps, ph, sh
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 16:31:59 +08:00
Nuo Mi
49db9fc171 vvcdec: add vvc_data
Co-authored-by: Xu Mu <toxumu@outlook.com>
Co-authored-by: Frank Plowman <post@frankplowman.com>
Co-authored-by: Shaun Loo <shaunloo10@gmail.com>
Co-authored-by: Wu Jianhua <toqsxw@outlook.com>
2024-01-03 16:31:59 +08:00
James Almer
85b8d59ec7 avcodec/d3d12va_mpeg2: change the type for the ID3D12Resource_Map input data argument
Fixes -Wincompatible-pointer-types warnings.

Reviewed-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-01 09:46:41 -03:00
James Almer
e9722735fa avcodec/d3d12va_mpeg2: remove unused variables
Reviewed-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-01-01 09:46:06 -03:00
Michael Niedermayer
e063c1d079
avcodec/mpegvideo_enc: Use ptrdiff_t for stride
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-30 21:50:05 +01:00
Michael Niedermayer
a066b8a809
avcodec/mpegvideo_enc: Dont copy beyond the image
Fixes: out of array access
Fixes: tickets/10754/poc17ffmpeg

Discovered by Zeng Yunxiang.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-30 21:50:04 +01:00
Michael Niedermayer
bf1159774b
avcodec/vaapi_encode: Avoid double AVERRORS
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-29 21:36:02 +01:00
Michael Niedermayer
850ab8f6da
avcodec/jpegxl_parser: Check get_vlc2()
Fixes: shift exponent -1 is negative
Fixes: 63889/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-6009343056936960

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-29 19:21:26 +01:00
Michael Niedermayer
d909d8e5e0
avcodec/leaddec: Check remaining bits in decode_block()
Fixes: Timeout
Fixes: 64163/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_LEAD_fuzzer-6418925835124736

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-29 01:15:42 +01:00
Michael Niedermayer
5f88458bea
avcodec/jpegxl_parser: Add padding to cs_buffer
Fixes: out of array access
Fixes: 64081/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-6151006496620544

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-29 01:15:42 +01:00
Michael Niedermayer
c72a20f01a
avcodec/jpeglsdec: Check Jpeg-LS LSE
Fixes: signed integer overflow: 2147478526 + 33924 cannot be represented in type 'int'
Fixes: shift exponent 32 is too large for 32-bit type 'unsigned int'
Fixes: 64243/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEGLS_fuzzer-5195717848989696

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-29 01:00:48 +01:00
Michael Niedermayer
c75fccd1d4
avcodec/osq: Implement flush()
Fixes: out of array access
Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6227491892887552
Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6268561729126400
Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6414805046788096
Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6538151088488448
Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6608131540779008

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-29 00:45:20 +01:00
jinbo
545686e49e
avcodec/hevc: Add init for sao_edge_filter
Forgot to init c->sao_edge_filter[idx] when idx=0/1/2/3.
After this patch, the speedup of decoding H265 4K 30FPS
30Mbps on 3A6000 is about 7% (42fps==>45fps).

Change-Id: I521999b397fa72b931a23c165cf45f276440cdfb
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-29 00:45:20 +01:00
Devin Heitmueller
b2c82b23b9 avcodec/bitpacked_dec: optimize bitpacked_decode_yuv422p10
Rework the code a bit to speed up the 10-bit bitpacked decoding
routine.  This is probably about as fast as I can get it without
switching to assembly language.

Demonstratable with:

./ffmpeg -f lavfi -i "smptehdbars=size=3840x2160" -c bitpacked -f image2 -frames:v 1 source.yuv
./ffmpeg -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le out.yuv

On my development system, it went from 80ms for a 2160p frame
down to 20ms (i.e. a 4X speedup).  Good enough for now, I hope...

Comments from Marton:

Originally on my system better performance could be achieved by simply
switching to the cached bitstream reader, but for Devin it was slower than
his direct byte operations.

I changed the order of writing output from u/y/v/y to u/v/y/y, and that made
the code faster than the cached bitstream reader on my system as well.

TIMER measurement of the decode loop on Ryzen 5 3600 with command line:

./ffmpeg -stream_loop 256 -threads 1 -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le -f null none -loglevel error

Before: 823204127 decicycles in YUV,     256 runs,      0 skips
After:  315070524 decicycles in YUV,     256 runs,      0 skips

Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
2023-12-28 23:56:14 +01:00
Marton Balint
059ea1d6f6 avcodec/mjpegdec: avoid indirection when accessing avctx
Signed-off-by: Marton Balint <cus@passwd.hu>
2023-12-28 23:15:56 +01:00
Marton Balint
e6b9bfaac3 avcodec/mjpegdec: use memset to clear alpha
Signed-off-by: Marton Balint <cus@passwd.hu>
2023-12-28 23:15:56 +01:00
Leo Izen
fb54c89a0d
avcodec/jpegxl_parser: check ANS cluster alphabet size vs bundle size
The specification doesn't mention that clusters cannot have alphabet
sizes greater than 1 << bundle->log_alphabet_size, but the reference
implementation rejects these entropy streams as invalid, so we should
too. Refusing to do so can overflow a stack variable that should be
large enough otherwise.

Fixes #10738.

Found-by: Zeng Yunxiang and Li Zeyuan
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-27 10:10:09 -05:00
James Almer
4fee63b241 x86/takdsp: add missing wrappers to AVX2 functions
Fixes compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-25 22:31:15 -03:00
James Almer
591dc3b4b8 x86/takdsp: add avx2 versions of all functions
On an Intel Core i7 12700k:

decorrelate_ls_c: 814.3
decorrelate_ls_sse2: 165.8
decorrelate_ls_avx2: 101.3
decorrelate_sf_c: 1602.6
decorrelate_sf_sse4: 640.1
decorrelate_sf_avx2: 324.6
decorrelate_sm_c: 1564.8
decorrelate_sm_sse2: 379.3
decorrelate_sm_avx2: 203.3
decorrelate_sr_c: 785.3
decorrelate_sr_sse2: 176.3
decorrelate_sr_avx2: 99.8

Tested-by: Lynne <dev@lynne.ee>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-23 08:39:22 -03:00
Andreas Rheinhardt
370ce305f4
avcodec/libjxlenc: Set AV_CODEC_CAP_DR1
This encoder uses ff_get_encode_buffer() to allocate the packet buffer.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-12-22 22:08:29 -05:00
Andreas Rheinhardt
577dd7b762
avcodec/libjxlenc: Don't refer to decoder in comments
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-12-22 22:08:29 -05:00
Leo Izen
5942cf46b6
avcodec/libjxldec: emit proper PTS to decoded AVFrame
If a sequence of JXL images is encapsulated in a container that has PTS
information, we should use the PTS information from the container. At
this time there is no container that does this, but if JPEG XL support
is ever added to NUT, AVTransport, or some other container, this commit
should allow the PTS information those containers provide to work as
expected.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-22 22:08:29 -05:00
Leo Izen
42f78925d7
avcodec/libjxlenc: accept rgbf32 and rgbaf32 frames
These pixel formats have always been supported by libjxl, but at the
time this plugin was written, they were not in FFmpeg yet. Now that
they are in FFmpeg, we should support them.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-22 22:08:29 -05:00
Leo Izen
f6ef6a853c
avcodec/libjxldec: produce rgbf32 and rgbaf32 frames
These pixel formats have always been supported by libjxl, but at the
time this plugin was written, they were not in FFmpeg yet. Now that
they are in FFmpeg, we should support them.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-22 22:08:29 -05:00
Leo Izen
4013b8d3f0
avcodec/pngdec: improve handling of bad cICP range tags
FFmpeg doesn't support tv-range RGB throughout most of its pipeline, so
we should keep the warning. However, in case something does support it
we should at least keep it tagged properly. Additionally, the encoder
writes this tag if the space is tagged as such so this makes a round
trip work as it should.

Also, PNG doesn't support nonzero matrices but we only warn and ignore
in that case, so we have no reason to error out for illegal cICP ranges
either (i.e. greater than 1).

Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Kacper Michajłow <kasper93@gmail.com>
2023-12-22 22:07:35 -05:00
sunyuechi
3d39b8d4e7 lavc/takdsp: R-V V decorrelate_sm
C908:
decorrelate_sm_c: 130.0
decorrelate_sm_rvv_i32: 43.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
(with minor changes)
2023-12-22 17:40:00 +02:00
Andreas Rheinhardt
0c6203c97a all: Don't set AVClass.item_name to its default value
Unnecessary since acf63d5350adeae551d412db699f8ca03f7e76b9;
also avoids relocations.

Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-12-22 15:12:33 +01:00
James Almer
46775e64f8 avcodec/takdsp: fix const correctness
Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-22 09:28:04 -03:00
Andreas Rheinhardt
45b4781e9a avcodec/v4l2_m2m: Remove redundant av_frame_unref()
This frame will be freed in the next line.

Reviewed-by: Zhao Zhili <quinkblack@foxmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-12-21 23:29:02 +01:00
Rémi Denis-Courmont
0f05f9ed3e mlp: move pack_output pointer to decoder context
The current pack_output function pointer is a property of the decoder,
rather than a constant method provided by the DSP code. Indeed, except
for an unused initialisation, the field is never used in DSP code.
2023-12-21 22:42:34 +02:00
sunyuechi
c933ff2779 lavc/takdsp: R-V V decorrelate_sr
C908:
decorrelate_sr_c: 95.5
decorrelate_sr_rvv_i32: 28.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-21 22:42:34 +02:00
sunyuechi
864174dd00 lavc/takdsp: R-V V decorrelate_ls
C908:
decorrelate_ls_c: 69.7
decorrelate_ls_rvv_i32: 27.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-21 22:42:34 +02:00
Rémi Denis-Courmont
cdd38a2ffe lavc/aacpsdsp: fix R-V V stereo interpolate
The penultimate loop iteration could pick any vl such that:
 vlenb/4 < vl <= vlenb/2
Thus if the total length is not a multiple of vlenb/2, the vfadd.vf
on the penultimate iteration would yield corrupt values for the last
iteration.

To avoid this, force vl = vlen/2 until the last iteration. Unfortunately
this latent bug is not reproducible with either hardware or QEMU as of now.
2023-12-21 17:54:23 +02:00
Rémi Denis-Courmont
db32f75c63 lavc/opusdsp: simplify R-V V postfilter
This skips the round-trip to scalar register for the sliding 'x'
coefficients, improving performance by about 5%. The trick here is that
the vector slide-up instruction preserves elements in destination vector
until the slide offset.

The switch from vfslide1up.vf to vslideup.vi also allows the elimination
of data dependencies on consecutive slides. Since the specifications
recommend sticking to power of two offsets, we could slide as follows:

        vslideup.vi v8, v0, 2
        vslideup.vi v4, v0, 1
        vslideup.vi v12, v8, 1
        vslideup.vi v16, v8, 2

However in the device under test, this seems to make performance slightly
worse, so this is left for (in)validation with future better hardware.
2023-12-21 17:54:08 +02:00
Martin Storsjö
327685bafe d3d12va: Add a missing include for the declaration of ff_d3d12va_get_surface_index
This fixes the following build error:

src/libavcodec/d3d12va_decode.c:49:10: error: no previous prototype for function
 'ff_d3d12va_get_surface_index' [-Werror,-Wmissing-prototypes]
   49 | unsigned ff_d3d12va_get_surface_index(const AVCodecContext *avctx,
      |          ^

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-21 13:55:16 +02:00
Tong Wu
6d5fbea289 avcodec/d3d12va_hevc: enable allow_profile_mismatch flag for d3d12va msp profile
Same as d3d11va, this flag enables main still picture profile for
d3d12va. User should add this flag when decoding main still picture
profile.

Signed-off-by: Tong Wu <tong1.wu@intel.com>
2023-12-21 16:15:23 +08:00
Wu Jianhua
ffa158edbd avcodec: add D3D12VA hardware accelerated VC1 decoding
The command below is how to enable d3d12va:
ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
2023-12-21 16:15:23 +08:00
Wu Jianhua
c6c05dd34a avcodec: add D3D12VA hardware accelerated MPEG-2 decoding
The command below is how to enable d3d12va:
ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
2023-12-21 16:15:23 +08:00
Wu Jianhua
b16fd96c5f avcodec: add D3D12VA hardware accelerated AV1 decoding
The command below is how to enable d3d12va:
ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
2023-12-21 16:15:23 +08:00
Wu Jianhua
326288c70a avcodec: add D3D12VA hardware accelerated VP9 decoding
The command below is how to enable d3d12va:
ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
2023-12-21 16:15:23 +08:00
Wu Jianhua
cbb93c4ff6 avcodec: add D3D12VA hardware accelerated HEVC decoding
The command below is how to enable d3d12va:
ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
2023-12-21 16:15:23 +08:00
Wu Jianhua
349ce30e4e avcodec: add D3D12VA hardware accelerated H264 decoding
The implementation is based on:
https://learn.microsoft.com/en-us/windows/win32/medfound/direct3d-12-video-overview

With the Direct3D 12 video decoding support, we can render or process
the decoded images by the pixel shaders or compute shaders directly
without the extra copy overhead, which is beneficial especially if you
are trying to render or post-process a 4K or 8K video.

The command below is how to enable d3d12va:
ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
2023-12-21 16:15:23 +08:00
Zhao Zhili
287e22f745 avcodec/mediacodecenc: set quality in cq mode
From AOSP doc, these values are device and codec specific, but lower
values generally result in more efficient (smaller-sized) encoding.

For example, global_quality 50 on Pixel 6 results a 1080P 30 FPS
HEVC with 3744 kb/s, while global_quality 80 results 28178 kb/s.

Fix #10689

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-12-21 19:28:47 +08:00
Michael Niedermayer
a6a553ba94
avcodec/cbs_vp8: fix GetBitContext setup
Fixes: abort()
Fixes: 64232/clusterfuzz-testcase-minimized-ffmpeg_BSF_TRACE_HEADERS_fuzzer-5417957987319808

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: James Almer <jamrial@gmail.com>

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-19 00:27:26 +01:00
Kalev Lember
b391fdbf1a lavc/libopenh264: Drop openh264 runtime version checks
With the way the runtime checks are currently set up, every single
openh264 release, no matter how minor, is considered an ABI break and
requires ffmpeg recompilation. This is unnecessarily strict because it
doesn't allow downstream distributions to ship any openh264 bug fix
version updates without breaking ffmpeg's openh264 support.

Years ago, at the time when ffmpeg's openh264 support was merged,
openh264 releases were done without a versioned soname (the library was
just libopenh264.so, unversioned). Since then, starting with version
1.3.0, openh264 has started using versioned sonames and the intent has
been to bump the soname every time there's a new release with an ABI
change.

This patch drops the exact version check and instead adds a minimum
requirement on 1.3.0 to the configure script.

Signed-off-by: Kalev Lember <klember@redhat.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-18 23:59:51 +02:00
James Almer
0cc0d8c0b5 avcodec/get_bits: add get_leb()
Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-18 15:19:36 -03:00
James Almer
12eac23637 avcodec/packet: add IAMF Parameters side data types
Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-18 15:19:30 -03:00
Rémi Denis-Courmont
419145c11b lavc/vc1dsp: fix R-V V vector lengths
The 8x4 and 4x4 use a needlessly large multiplier (unless/until we care
about embedded 64-bit-vector hardware). This is merely suboptimal.

The 8x4 case also uses an incorrect vector length, which leads to incorrect
behaviour on future/hypothetical hardware with 256-bit or larger vectors.

Pointed-out-by: Martin Storsjö <martin@martin.st>
2023-12-17 09:27:52 +02:00
Martin Storsjö
b51d9eb58e riscv: vc1dsp: Don't check vlenb before checking the CPU flags
We can't call ff_get_rv_vlenb() if we don't have RVV available
at all.

Acked-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-16 22:30:26 +02:00
Rémi Denis-Courmont
918b3ed2d5 lavc/lpc: R-V V compute_autocorr
The loop iterates over the length of the vector, not the order. This is
to avoid reloading the same data for each lag value. However this means
the loop only works if the maximum order is no larger than VLENB.

The loop is roughly equivalent to:

    for (size_t j = 0; j < lag; j++)
        autoc[j] = 1.;

    while (len > lag) {
        for (ptrdiff_t j = 0; j < lag; j++)
            autoc[j] += data[j] * *data;
        data++;
        len--;
    }

    while (len > 0) {
        for (ptrdiff_t j = 0; j < len; j++)
            autoc[j] += data[j] * *data;
        data++;
        len--;
    }

Since register pressure is only at 50%, it should be possible to implement
the same loop for order up to 2xVLENB. But this is left for future work.

Performance numbers are all over the place from ~1.25x to ~4x speedups,
but at least they are always noticeably better than nothing.
2023-12-16 11:18:01 +02:00
Nuo Mi
ce0c178a40
avcodec/cbs_h266: more restrictive check on pps_tile_idx_delta_val
Fixes: out of array access
Fixes: 62603/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-5837632490569728

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-14 23:53:10 +01:00
Pierre-Anthony Lemieux
a1384b4e86
avcodec/jpeg2000htdec: check if block decoding will exceed internal precision
Intended to replace https://patchwork.ffmpeg.org/project/ffmpeg/patch/20230802000135.26482-3-michael@niedermayer.cc/
with a more accurate block decoding magnitude bound.

Fixes: 62433/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5828618092937216
Fixes: 58299/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5828618092937216
Previous-version-reviewed-by: Tomas Härdin <git@haerdin.se>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-14 23:53:10 +01:00
James Almer
34d56e1766 x86/aacencdsp: clear the high bits for size in ff_abs_pow34_sse
Fixes checkasm failures on win64.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-12 15:24:08 -03:00
sunyuechi
98596f90f4 lavc/aacencdsp: R-V V abs_pow34
C908:
abs_pow34_c: 535.5
abs_pow34_rvv_f32: 337.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-11 18:42:07 +02:00
sunyuechi
e880a97e7c lvac/aacenc: add ff_aac_dsp_init
This is for clarity and use in testing, consistent with other parts of the code.

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-11 18:42:04 +02:00
Rémi Denis-Courmont
272d0c164d lavc/lpc: R-V V apply_welch_window
apply_welch_window_even_c:       617.5
apply_welch_window_even_rvv_f64: 235.0
apply_welch_window_odd_c:        709.0
apply_welch_window_odd_rvv_f64:  256.5
2023-12-11 18:17:43 +02:00
Rémi Denis-Courmont
b3825bbe45 riscv: test for assembler support
This should fix the build on LLVM 16 and earlier, at the cost of turning
all non-RVV optimisations off.
2023-12-08 17:21:09 +02:00
sunyuechi
0b9d009b4a lavc/vc1dsp: R-V V inv_trans
C908:
vc1dsp.vc1_inv_trans_4x4_dc_c:      125.7
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5
vc1dsp.vc1_inv_trans_4x8_dc_c:      230.7
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.5
vc1dsp.vc1_inv_trans_8x4_dc_c:      228.7
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 64.5
vc1dsp.vc1_inv_trans_8x8_dc_c:      476.5
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 80.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-08 17:20:48 +02:00
Mikhail Nitenko
0f745b74ec lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions
Benchmarks                         A53      A55     A72     A76
avg_h264_qpel_8_mc01_10_c:        936.5    924.0   656.0   504.7
avg_h264_qpel_8_mc01_10_neon:     234.7    202.0   120.7    63.2
avg_h264_qpel_8_mc02_10_c:        921.0    920.0   669.2   493.7
avg_h264_qpel_8_mc02_10_neon:     202.0    173.2   102.7    58.5
avg_h264_qpel_8_mc03_10_c:        936.5    924.0   656.0   509.5
avg_h264_qpel_8_mc03_10_neon:     236.2    203.7   120.0    63.2
avg_h264_qpel_8_mc10_10_c:       1441.0   1437.7   806.7   478.5
avg_h264_qpel_8_mc10_10_neon:     325.7    324.0   153.7    94.2
avg_h264_qpel_8_mc11_10_c:       2160.7   2148.2  1366.7   906.7
avg_h264_qpel_8_mc11_10_neon:     492.0    464.0   242.5   134.5
avg_h264_qpel_8_mc13_10_c:       2157.0   2138.2  1357.0   908.2
avg_h264_qpel_8_mc13_10_neon:     494.0    467.2   242.0   140.0
avg_h264_qpel_8_mc20_10_c:       1433.5   1410.0   785.2   486.0
avg_h264_qpel_8_mc20_10_neon:     293.7    289.7   138.0    91.5
avg_h264_qpel_8_mc30_10_c:       1458.5   1461.7   813.7   483.2
avg_h264_qpel_8_mc30_10_neon:     341.7    339.2   154.0    95.2
avg_h264_qpel_8_mc31_10_c:       2194.7   2197.2  1358.7   928.0
avg_h264_qpel_8_mc31_10_neon:     520.0    495.0   245.5   142.5
avg_h264_qpel_8_mc33_10_c:       2188.0   2205.5  1356.7   910.7
avg_h264_qpel_8_mc33_10_neon:     521.0    494.5   245.7   145.7
avg_h264_qpel_16_mc01_10_c:      3717.2   3595.0  2610.0  2012.0
avg_h264_qpel_16_mc01_10_neon:    920.5    791.5   483.2   240.5
avg_h264_qpel_16_mc02_10_c:      3684.0   3633.0  2659.0  1919.7
avg_h264_qpel_16_mc02_10_neon:    790.7    678.2   409.2   217.0
avg_h264_qpel_16_mc03_10_c:      3726.5   3596.0  2606.7  2010.0
avg_h264_qpel_16_mc03_10_neon:    922.0    792.5   483.2   239.7
avg_h264_qpel_16_mc10_10_c:      5912.0   5803.2  3241.5  1916.7
avg_h264_qpel_16_mc10_10_neon:   1267.5   1277.2   616.5   365.0
avg_h264_qpel_16_mc11_10_c:      8599.2   8482.5  5338.0  3616.2
avg_h264_qpel_16_mc11_10_neon:   1913.0   1827.0   956.2   542.2
avg_h264_qpel_16_mc13_10_c:      8643.7   8488.5  5388.0  3628.5
avg_h264_qpel_16_mc13_10_neon:   1914.7   1828.7   969.2   530.5
avg_h264_qpel_16_mc20_10_c:      5719.5   5641.0  3147.0  1946.2
avg_h264_qpel_16_mc20_10_neon:   1139.5   1150.0   539.5   344.0
avg_h264_qpel_16_mc30_10_c:      5930.0   5872.5  3267.5  1918.0
avg_h264_qpel_16_mc30_10_neon:   1331.5   1341.2   616.5   369.5
avg_h264_qpel_16_mc31_10_c:      8758.7   8697.7  5353.0  3630.7
avg_h264_qpel_16_mc31_10_neon:   2018.7   1941.7   982.2   574.7
avg_h264_qpel_16_mc33_10_c:      8683.2   8675.2  5339.2  3634.7
avg_h264_qpel_16_mc33_10_neon:   2019.7   1940.2   994.5   566.0
put_h264_qpel_8_mc01_10_c:        854.2    843.0   599.2   478.0
put_h264_qpel_8_mc01_10_neon:     192.7    168.0   101.7    56.7
put_h264_qpel_8_mc02_10_c:        766.5    760.0   550.2   441.0
put_h264_qpel_8_mc02_10_neon:     160.0    139.2    88.7    53.0
put_h264_qpel_8_mc03_10_c:        854.2    843.0   599.2   479.0
put_h264_qpel_8_mc03_10_neon:     194.2    169.7   102.0    56.2
put_h264_qpel_8_mc10_10_c:       1352.7   1353.7   749.7   446.7
put_h264_qpel_8_mc10_10_neon:     289.7    294.2   135.5    88.5
put_h264_qpel_8_mc11_10_c:       2080.0   2066.2  1309.5   876.7
put_h264_qpel_8_mc11_10_neon:     450.0    429.7   229.7   131.2
put_h264_qpel_8_mc13_10_c:       2074.7   2060.2  1294.5   870.5
put_h264_qpel_8_mc13_10_neon:     452.5    434.5   226.5   130.0
put_h264_qpel_8_mc20_10_c:       1221.5   1216.0   684.5   399.7
put_h264_qpel_8_mc20_10_neon:     257.7    262.5   121.2    78.7
put_h264_qpel_8_mc30_10_c:       1379.0   1374.7   757.2   449.5
put_h264_qpel_8_mc30_10_neon:     305.7    310.2   135.5    86.5
put_h264_qpel_8_mc31_10_c:       2109.2   2119.7  1299.5   878.0
put_h264_qpel_8_mc31_10_neon:     478.0    458.5   226.0   137.2
put_h264_qpel_8_mc33_10_c:       2101.5   2115.2  1306.5   887.0
put_h264_qpel_8_mc33_10_neon:     479.0    458.7   229.7   141.7
put_h264_qpel_16_mc01_10_c:      3485.7   3396.7  2460.5  1914.5
put_h264_qpel_16_mc01_10_neon:    752.5    665.5   397.0   213.2
put_h264_qpel_16_mc02_10_c:      3103.5   3023.2  2154.7  1720.7
put_h264_qpel_16_mc02_10_neon:    622.7    551.2   347.7   196.2
put_h264_qpel_16_mc03_10_c:      3486.2   3394.0  2436.5  1917.7
put_h264_qpel_16_mc03_10_neon:    754.0    666.5   397.0   215.7
put_h264_qpel_16_mc10_10_c:      5533.0   5488.5  2989.0  1783.0
put_h264_qpel_16_mc10_10_neon:   1123.5   1165.2   535.2   334.7
put_h264_qpel_16_mc11_10_c:      8437.7   8281.2  5209.0  3510.7
put_h264_qpel_16_mc11_10_neon:   1745.0   1697.0   878.5   513.5
put_h264_qpel_16_mc13_10_c:      8567.7   8468.0  5221.5  3528.0
put_h264_qpel_16_mc13_10_neon:   1751.7   1698.2   889.2   507.0
put_h264_qpel_16_mc20_10_c:      4907.5   4885.0  2786.2  1607.5
put_h264_qpel_16_mc20_10_neon:    995.5   1034.5   475.5   307.0
put_h264_qpel_16_mc30_10_c:      5579.7   5537.7  3045.2  1789.5
put_h264_qpel_16_mc30_10_neon:   1187.5   1231.2   532.5   334.5
put_h264_qpel_16_mc31_10_c:      8677.2   8672.5  5204.2  3516.0
put_h264_qpel_16_mc31_10_neon:   1850.7   1813.2   893.0   545.2
put_h264_qpel_16_mc33_10_c:      8688.7   8671.2  5223.2  3512.0
put_h264_qpel_16_mc33_10_neon:   1851.7   1814.2   908.5   535.2

Signed-off-by: Mikhail Nitenko <mnitenko@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-07 23:20:14 +02:00
sunyuechi
8bdb663062 lavc/ac3dsp: R-V V float_to_fixed24
c910
    float_to_fixed24_c: 2207.2
    float_to_fixed24_rvv_f32: 696.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-06 16:04:22 +02:00
Paul B Mahol
7e453dad3c avcodec/qoadec: fix overreads and fix packet size check 2023-12-05 14:50:21 +01:00
Michael Niedermayer
22daf2148f
avcodec/av1dec: Fix resolving zero divisor
Fixes: Out of array read
Fixes: global-buffer-overflow-AV1

Found-by: "Leonelli, Matteo" <matteo.leonelli@cispa.de>
Tested-by: "Wang, Fei W" <fei.w.wang@intel.com>
Reviewed-by: "Wang, Fei W" <fei.w.wang@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-05 12:38:16 +01:00
Leo Izen
c4be080e65
avcodec/jpegxl_parser: fix parsing sequences of extremely small files
This patch allows the JXL parser to parse sequences of extremely small
files concatenated together. (e.g. smaller than the parser buffer)

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-05 05:54:34 -05:00
Leo Izen
019b3ea65a
avcodec/jpegxl_parse{,r}: use correct ISOBMFF extended size location
According to ISO/IEC 14996-12, size == 1 means a 64-bit extended-size
field occurs *after* the 32-bit box type, not before. This fix should
allow correct parsing of JXL files with extended-size boxes.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-05 05:53:32 -05:00
Haihao Xiang
fc73b372cd lavc/qsvdec: reduce info message when more data is required
demote the info to AV_LOG_VERBOSE

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:10:57 +08:00
Haihao Xiang
e233f3e75f lavc/qsvdec: return 0 if more data is required
The type of qsv decoders is FF_CODEC_CB_TYPE_DECODE which must not
return AVERROR(EAGAIN). commit 42b20c9 added an assertion to check the
returned value.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:10:57 +08:00
Lynne
8c117b75af
lavc/Makefile: build vulkan decode code if vulkan_av1 has been enabled
Forgotten.

Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Tested-by: Neal Gompa <ngompa13@gmail.com>
2023-12-04 07:57:27 +01:00
Paul B Mahol
0a13178de8 avcodec/qoadec: add support for midstream sample rate/layout changes 2023-12-02 16:51:00 +01:00
Anton Khirnov
5230257ea1 lavc/dvdsubenc: only check canvas size when it is actually set
Fixes #10650
2023-12-02 11:22:46 +01:00
Logan Lyu
fa0470347e lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_hv
put_hevc_qpel_bi_hv4_8_c: 433.7
put_hevc_qpel_bi_hv4_8_i8mm: 117.9
put_hevc_qpel_bi_hv6_8_c: 803.9
put_hevc_qpel_bi_hv6_8_i8mm: 252.7
put_hevc_qpel_bi_hv8_8_c: 1296.4
put_hevc_qpel_bi_hv8_8_i8mm: 316.2
put_hevc_qpel_bi_hv12_8_c: 2867.4
put_hevc_qpel_bi_hv12_8_i8mm: 669.2
put_hevc_qpel_bi_hv16_8_c: 4709.4
put_hevc_qpel_bi_hv16_8_i8mm: 929.9
put_hevc_qpel_bi_hv24_8_c: 9639.7
put_hevc_qpel_bi_hv24_8_i8mm: 2072.4
put_hevc_qpel_bi_hv32_8_c: 16663.7
put_hevc_qpel_bi_hv32_8_i8mm: 3391.4
put_hevc_qpel_bi_hv48_8_c: 36972.9
put_hevc_qpel_bi_hv48_8_i8mm: 7505.7
put_hevc_qpel_bi_hv64_8_c: 64106.4
put_hevc_qpel_bi_hv64_8_i8mm: 13145.2

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
595f97028b lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_v
put_hevc_qpel_bi_v4_8_c: 166.1
put_hevc_qpel_bi_v4_8_neon: 61.9
put_hevc_qpel_bi_v6_8_c: 309.4
put_hevc_qpel_bi_v6_8_neon: 75.6
put_hevc_qpel_bi_v8_8_c: 531.1
put_hevc_qpel_bi_v8_8_neon: 78.1
put_hevc_qpel_bi_v12_8_c: 1139.9
put_hevc_qpel_bi_v12_8_neon: 238.1
put_hevc_qpel_bi_v16_8_c: 2063.6
put_hevc_qpel_bi_v16_8_neon: 308.9
put_hevc_qpel_bi_v24_8_c: 4317.1
put_hevc_qpel_bi_v24_8_neon: 629.9
put_hevc_qpel_bi_v32_8_c: 8241.9
put_hevc_qpel_bi_v32_8_neon: 1140.1
put_hevc_qpel_bi_v48_8_c: 18422.9
put_hevc_qpel_bi_v48_8_neon: 2533.9
put_hevc_qpel_bi_v64_8_c: 37508.6
put_hevc_qpel_bi_v64_8_neon: 4520.1

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
00290a64f7 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_hv
put_hevc_epel_bi_hv4_8_c: 242.9
put_hevc_epel_bi_hv4_8_i8mm: 68.6
put_hevc_epel_bi_hv6_8_c: 402.4
put_hevc_epel_bi_hv6_8_i8mm: 135.9
put_hevc_epel_bi_hv8_8_c: 636.4
put_hevc_epel_bi_hv8_8_i8mm: 145.6
put_hevc_epel_bi_hv12_8_c: 1363.1
put_hevc_epel_bi_hv12_8_i8mm: 324.1
put_hevc_epel_bi_hv16_8_c: 2222.1
put_hevc_epel_bi_hv16_8_i8mm: 509.1
put_hevc_epel_bi_hv24_8_c: 4793.4
put_hevc_epel_bi_hv24_8_i8mm: 1091.9
put_hevc_epel_bi_hv32_8_c: 8393.9
put_hevc_epel_bi_hv32_8_i8mm: 1720.6
put_hevc_epel_bi_hv48_8_c: 19526.6
put_hevc_epel_bi_hv48_8_i8mm: 4285.9
put_hevc_epel_bi_hv64_8_c: 33915.4
put_hevc_epel_bi_hv64_8_i8mm: 6783.6

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
0448f27f41 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_v
put_hevc_epel_bi_v4_8_c: 138.4
put_hevc_epel_bi_v4_8_neon: 33.7
put_hevc_epel_bi_v6_8_c: 302.9
put_hevc_epel_bi_v6_8_neon: 46.7
put_hevc_epel_bi_v8_8_c: 408.7
put_hevc_epel_bi_v8_8_neon: 48.7
put_hevc_epel_bi_v12_8_c: 779.4
put_hevc_epel_bi_v12_8_neon: 139.7
put_hevc_epel_bi_v16_8_c: 1344.9
put_hevc_epel_bi_v16_8_neon: 160.2
put_hevc_epel_bi_v24_8_c: 2981.7
put_hevc_epel_bi_v24_8_neon: 344.9
put_hevc_epel_bi_v32_8_c: 5280.9
put_hevc_epel_bi_v32_8_neon: 618.4
put_hevc_epel_bi_v48_8_c: 12494.9
put_hevc_epel_bi_v48_8_neon: 1364.4
put_hevc_epel_bi_v64_8_c: 22127.7
put_hevc_epel_bi_v64_8_neon: 2473.7

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
216275bd80 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_h
put_hevc_epel_bi_h4_8_c: 96.0
put_hevc_epel_bi_h4_8_neon: 36.3
put_hevc_epel_bi_h6_8_c: 288.3
put_hevc_epel_bi_h6_8_neon: 59.3
put_hevc_epel_bi_h8_8_c: 358.5
put_hevc_epel_bi_h8_8_neon: 61.5
put_hevc_epel_bi_h12_8_c: 759.8
put_hevc_epel_bi_h12_8_neon: 159.5
put_hevc_epel_bi_h16_8_c: 1307.0
put_hevc_epel_bi_h16_8_neon: 182.0
put_hevc_epel_bi_h24_8_c: 2778.3
put_hevc_epel_bi_h24_8_neon: 430.5
put_hevc_epel_bi_h32_8_c: 4952.3
put_hevc_epel_bi_h32_8_neon: 679.5
put_hevc_epel_bi_h48_8_c: 11803.3
put_hevc_epel_bi_h48_8_neon: 1443.5
put_hevc_epel_bi_h64_8_c: 20654.8
put_hevc_epel_bi_h64_8_neon: 2737.0
put_hevc_qpel_bi_h4_8_c: 140.0
put_hevc_qpel_bi_h4_8_neon: 111.5
put_hevc_qpel_bi_h6_8_c: 318.0
put_hevc_qpel_bi_h6_8_neon: 85.8
put_hevc_qpel_bi_h8_8_c: 536.5
put_hevc_qpel_bi_h8_8_neon: 95.3
put_hevc_qpel_bi_h12_8_c: 1188.5
put_hevc_qpel_bi_h12_8_neon: 291.3
put_hevc_qpel_bi_h16_8_c: 2064.3
put_hevc_qpel_bi_h16_8_neon: 365.3
put_hevc_qpel_bi_h24_8_c: 4757.5
put_hevc_qpel_bi_h24_8_neon: 1010.0
put_hevc_qpel_bi_h32_8_c: 8351.8
put_hevc_qpel_bi_h32_8_neon: 2917.8
put_hevc_qpel_bi_h48_8_c: 19299.8
put_hevc_qpel_bi_h48_8_neon: 2976.8
put_hevc_qpel_bi_h64_8_c: 34182.5
put_hevc_qpel_bi_h64_8_neon: 5236.3

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
40cf4a5ca3 lavc/aarch64: new optimization for 8-bit hevc_pel_bi_pixels
put_hevc_pel_bi_pixels4_8_c: 54.7
put_hevc_pel_bi_pixels4_8_neon: 43.0
put_hevc_pel_bi_pixels6_8_c: 94.7
put_hevc_pel_bi_pixels6_8_neon: 37.0
put_hevc_pel_bi_pixels8_8_c: 171.0
put_hevc_pel_bi_pixels8_8_neon: 24.0
put_hevc_pel_bi_pixels12_8_c: 354.0
put_hevc_pel_bi_pixels12_8_neon: 68.7
put_hevc_pel_bi_pixels16_8_c: 588.2
put_hevc_pel_bi_pixels16_8_neon: 77.5
put_hevc_pel_bi_pixels24_8_c: 1670.7
put_hevc_pel_bi_pixels24_8_neon: 173.0
put_hevc_pel_bi_pixels32_8_c: 2267.7
put_hevc_pel_bi_pixels32_8_neon: 281.2
put_hevc_pel_bi_pixels48_8_c: 5787.5
put_hevc_pel_bi_pixels48_8_neon: 673.5
put_hevc_pel_bi_pixels64_8_c: 9897.0
put_hevc_pel_bi_pixels64_8_neon: 1159.5

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
James Almer
6d19611251 avcodec/ac3dsp: add missing stddef.h include
Should fix make checkheaders

Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-01 12:42:22 -03:00
xufuji456
cc86343b96 lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d
Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b"

Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-11-28 15:54:49 +02:00
Zhao Zhili
d526a34c20 avcodec/videotoolboxenc: refactor dump encoder name
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-27 23:49:01 +08:00
Zhao Zhili
cb049d377f avcodec/videotoolboxenc: Fix build failure due to PropertyKey_EncoderID
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-27 23:48:55 +08:00
Paul B Mahol
3609d2b783 avcodec: add QOA decoder 2023-11-26 17:49:09 +01:00
Geoffrey McRae
93b5d9030b libavcodec/mlpdec: add missing correction to ch_layout when downmixing
This fixes corrupted audio for applications relying on ch_layout when
codec downmixing is active.

Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-26 10:18:33 -03:00
Geoffrey McRae
a8677bcc8f libavcodec/dcadec: adjust the ch_layout when downmix is active
Applications making use of this codec with the `downmix` option are
segfaulting unless the `ch_layout` is overridden after `avcodec_open2`
as can be seen in projects like MythTV[1]

This patch fixes this by overriding the ch_layout as done in other
decoders such as AC3.

1: af6f362a14/mythtv/libs/libmythtv/decoders/avformatdecoder.cpp (L4607)

Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-26 10:18:33 -03:00
James Almer
72390dea00 mips/ac3dsp_mips: add missing stddef.h header include
Fixes compilation failures after 567c67c6c8.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-25 21:51:04 -03:00
James Almer
e40ea9f34b x86/ac3dsp: add ff_float_to_fixed24_avx()
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-25 21:50:56 -03:00
James Almer
d8b1a34433 x86/ac3dsp: reduce instruction count inside the float_to_fixed24 loop
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-25 21:50:56 -03:00
Rémi Denis-Courmont
0fa421c8f1 lavc/llvidencdsp: add R-V V diff_bytes
diff_bytes_c:      163.0
diff_bytes_rvv_i32: 52.7
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont
0183c2c830 lavc/aacpsdsp: use LMUL=2 and amortise strides
The input is laid out in 16 segments, of which 13 actually need to be
loaded. There are no really efficient ways to deal with this:
1) If we load 8 segments wit unit stride, then narrow to 16 segments with
   right shifts, we can only get one half-size vector per segment, or just 2
   elements per vector (EMUL=1/2) - at least with 128-bit vectors.
   This ends up unsurprisingly about as fas as the C code.
2) The current approach is to load with strides. We keep that approach,
   but improve it using three 4-segmented loads instead of 12 single-segment
   loads. This divides the number of distinct loaded addresses by 4.
3) A potential third approach would be to avoid segmentation altogether
   and splat the scalar coefficient into vectors. Then we can use a
   unit-stride and maximum EMUL. But the downside then is that we have to
   multiply the 3 (of 16) unused segments with zero as part of the
   multiply-accumulate operations.

In addition, we also reuse vectors mid-loop so as to increase the EMUL
from 1 to 2, which also improves performance a little bit.

Oeverall the gains are quite small with the device under test, as it does
not deal with segmented loads very well. But at least the code is tidier,
and should enjoy bigger speed-ups on better hardware implementation.

Before:
ps_hybrid_analysis_c:       1819.2
ps_hybrid_analysis_rvv_f32: 1037.0 (before)
ps_hybrid_analysis_rvv_f32:  990.0 (after)
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont
b88d4058f9 lavc/g722dsp: optimise R-V V apply_qmf
This stores the constant coefficients deinterleaved, so that they can be
loaded directly with NF=0. Unfortunately, we cannot optimise loading the
input, due to insufficient memory alignment (not 32-bit).

Before:
g722_apply_qmf_c:       82.5
g722_apply_qmf_rvv_i32: 78.2

After:
g722_apply_qmf_c:       82.5
g722_apply_qmf_rvv_i32: 65.2
2023-11-23 18:57:18 +02:00
James Almer
567c67c6c8 avcodec/ac3dsp: make len a size_t in float_to_fixed24
Should simplify asm implementations, and prevent UB on at least win64.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-22 18:33:00 -03:00
James Almer
2d9fd814d0 x86/: clear the high bits for order in scalarproduct_and_madd functions
Should fix checkasm failures on win64.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-22 14:18:42 -03:00
Zhao Zhili
e8a49b1424 avcodec/mmaldec: Fix build error
Fix #10670.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 21:02:04 +08:00
Zhao Zhili
f27fce0c0c avcodec/mediacodecdec: fix return EAGAIN after EOF
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 21:02:04 +08:00
Dmitry Rogozhkin
e9c93009fc avcodec/decode: validate hw_frames_ctx when AVHWAccel.free_frame_priv is used
Validate that a hw_frames_ctx is available before using it for
the AVHWAccel.free_frame_priv callback, and don't require it to
be present when the callback is not in use by the HWAccel.

v2: check for free_frame_priv (Hendrik)
v3: return EINVAL (Christoph Reiter)
v4: better commit message (Hendrik)
v5: fix typo with missed frames_ctx (Lynne)

See[1]: https://github.com/msys2/MINGW-packages/pull/19050
Fixes: be07145109 ("avcodec: add AVHWAccel.free_frame_priv callback")
CC: Lynne <dev@lynne.ee>
CC: Christoph Reiter <reiter.christoph@gmail.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2023-11-22 05:01:16 +01:00
Zhao Zhili
aa3b857101 avcodec/h264_mp4toannexb_bsf: process new extradata
For fate-h264_mp4toannexb_ticket5927 and
fate-h264_mp4toannexb_ticket5927_2, they work by accident
previously. The sample file has two 'avc1' entries, and video
samples use the second one. It means packets should be decoded with
new extradata in side data. Before this patch, only extradata was
kept in the output, new extradata has been dropped. The output can
be decoded because the two extradata are almost the same, except
level indication. This patch fixed the issue, and add another
fate test.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00
Zhao Zhili
d3aa0cd16f avcodec/h264_mp4toannexb_bsf: fix missing PS before IDR frames
If there is a single group of SPS/PPS before an IDR frame, but no
SPS/PPS after that, we will miss the chance to reset
idr_sps_seen/idr_pps_seen. No SPS/PPS are inserted afterwards.

This patch saves in-band SPS/PPS and insert them before IDR frames
when necessary.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00
Zhao Zhili
4c4b833abd avcodec/h264_mp4toannexb_bsf: remove pass padding size as argument
It's a fixed value. There is no use case to change that.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00
Zhao Zhili
91cbae2f6c avcodec/h264_mp4toannexb_bsf: refactor start_code_size handling
start_code_size depends on whether PS comes from out-of-band or
in-band. Make the code more readable.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00
Michael Niedermayer
fb52070848
avcodec/h264dec: use BOOL for skip_gray, noref_gray
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-22 01:22:31 +01:00
Jun Zhao
c961ac4b0c vulkan_decode: fix the print format of VkDeviceSize
VkDeviceSize represents device memory size and offset
values as uint64_t in Spec.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2023-11-21 08:02:43 +08:00
James Almer
1258f99978 avcodec: bump version after EVC additions
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-20 11:55:51 -03:00
Dawid Kozinski
cfe2947887 avcodec/evc_decoder: Provided support for EVC decoder
- Added EVC decoder wrapper
- Changes in project configuration file and libavcodec Makefile
- Added documentation for xevd wrapper

Signed-off-by: Dawid Kozinski <d.kozinski@samsung.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-20 11:55:51 -03:00
Dawid Kozinski
c59a96fd08 avcodec/evc_encoder: Provided support for EVC encoder
- Added EVC encoder wrapper
- Changes in project configuration file and libavcodec Makefile
- Added documentation for xeve wrapper

Signed-off-by: Dawid Kozinski <d.kozinski@samsung.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-20 11:55:51 -03:00
Michael Niedermayer
e56d91f8a8
avcodec/h264dec: Support skipping frames that used gray gap frames
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:19:25 +01:00
Michael Niedermayer
6364fa9e9a
avcodec/h264: Avoid using gray gap frames as references
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:19:25 +01:00
Michael Niedermayer
29f6c9b04d
avcodec/h264: keep track of which frames used gray references
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:19:04 +01:00
Michael Niedermayer
e4337606e1
avcodec/h264dec: More elaborate documentation for frame_recovered
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:12:30 +01:00
Michael Niedermayer
68e1cf204a
avcodec/h264: Use FRAME_RECOVERED_HEURISTIC instead of IDR/SEI
This keeps IDR/SEI and heuristically detected recovery points cleaner seperated

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:12:30 +01:00
Michael Niedermayer
3f4a1a24a5
avcodec/h264: Seperate SEI and IDR recovery handling
This avoids SEI and IDR recovery flags affecting each other

Also eliminate litteral numbers from recovery handling
This should make the code clearer

Improves: tickets/4738/tickets_cut.ts

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:12:29 +01:00
Rémi Denis-Courmont
fbc7adba67 lavc/llviddsp: R-V V add_bytes
add_bytes_c:      2077.2
add_bytes_rvv_i32: 105.0
2023-11-18 22:07:14 +02:00
Rémi Denis-Courmont
ca664f2254 lavc/flacdsp: R-V V LPC16 function
In this case, the inner loop computing the scalar product can be reduced
to just one multiplication and one sum even with 128-bit vectors. The
result is a lot simpler, but also brings more modest performance gains:

flac_lpc_16_13_c:       15241.0
flac_lpc_16_13_rvv_i32: 11230.0
flac_lpc_16_16_c:       17884.0
flac_lpc_16_16_rvv_i32: 12125.7
flac_lpc_16_29_c:       27847.7
flac_lpc_16_29_rvv_i32: 10494.0
flac_lpc_16_32_c:       30051.5
flac_lpc_16_32_rvv_i32: 10355.0
2023-11-18 22:06:57 +02:00
Rémi Denis-Courmont
295092b46d lavc/flacdsp: R-V V LPC32
The entire set of 32 coefficients and corresponding past 32 samples can
fit in a single vector (with LMUL=8) exactly, but... since widening
double the needed vector sizes, we still end up too short with 128-bit
vectors. This adds a very simple version for future 256+-bit hardware,
and for pred_orders values up to 16, and a bit more involved loop for
for 128-bit hardware with pred_orders between 17 and 32.

With 128-bit hardware, the benchmarks look like this:
flac_lpc_32_13_c:       30152.0
flac_lpc_32_13_rvv_i32: 10244.7
flac_lpc_32_16_c:       37314.2
flac_lpc_32_16_rvv_i32: 10126.2
flac_lpc_32_29_c:       61910.0
flac_lpc_32_29_rvv_i32: 14495.2
flac_lpc_32_32_c:       68204.0
flac_lpc_32_32_rvv_i32: 13273.7
2023-11-18 22:05:43 +02:00
Diederik de Haas via ffmpeg-devel
c07ed10b0e apply spelling fixes
Fix spelling issue as reported by Debian's lintian tool:
accomodate -> accommodate
addtional -> additional
auxillary -> auxiliary
bellow -> below
betweeen -> between
Calulate -> Calculate
coefficents -> coefficients
Defalt -> Default
defaul -> default
higer -> higher
neccesary -> necessary
orignal -> original
ouput -> output
precison -> precision
processsing -> processing
substract -> subtract
Transfered -> Transferred
upto -> up to

Also add several of them to the 'common typos' check in patcheck.

Signed-off-by: Diederik de Haas <didi.debian@cknow.org>
2023-11-18 19:55:42 +01:00
Rémi Denis-Courmont
07c303b708 lavc/flacdsp: R-V V decorrelate_indep 16-bit packed
flac_decorrelate_indep2_16_c:        981.7
flac_decorrelate_indep2_16_rvv_i32:  199.2
flac_decorrelate_indep4_16_c:       1749.7
flac_decorrelate_indep4_16_rvv_i32:  401.2
flac_decorrelate_indep6_16_c:       2517.7
flac_decorrelate_indep6_16_rvv_i32:  858.0
flac_decorrelate_indep8_16_c:       3285.7
flac_decorrelate_indep8_16_rvv_i32: 1123.5
2023-11-17 23:59:56 +02:00
Rémi Denis-Courmont
fb0295e5fd lavc/flacdsp: R-V V decorrelate_indep 32-bit packed
flac_decorrelate_indep2_32_c:       981.7
flac_decorrelate_indep2_32_rvv_i32: 183.7
flac_decorrelate_indep4_32_c:      1749.7
flac_decorrelate_indep4_32_rvv_i32: 362.5
flac_decorrelate_indep6_32_c:      2517.7
flac_decorrelate_indep6_32_rvv_i32: 715.2
flac_decorrelate_indep8_32_c:      3285.7
flac_decorrelate_indep8_32_rvv_i32: 909.0
2023-11-17 23:59:56 +02:00
Rémi Denis-Courmont
6183a69c0b lavc/flacdsp: R-V V decorrelate_ms packed
flac_decorrelate_ms_16_c:       585.5
flac_decorrelate_ms_16_rvv_i32: 263.0
flac_decorrelate_ms_32_c:       584.7
flac_decorrelate_ms_32_rvv_i32: 250.0
2023-11-17 23:59:23 +02:00
Rémi Denis-Courmont
636ae0e0bc lavc/flacdsp: R-V V packed decorrelate_{l,r}s
flac_decorrelate_ms_16_c:       457.2
flac_decorrelate_ms_16_rvv_i32: 203.0
flac_decorrelate_ms_32_c:       457.2
flac_decorrelate_ms_32_rvv_i32: 203.5
flac_decorrelate_rs_16_c:       456.2
flac_decorrelate_rs_16_rvv_i32: 207.0
flac_decorrelate_rs_32_c:       456.2
flac_decorrelate_rs_32_rvv_i32: 210.5
2023-11-17 23:59:22 +02:00
Rémi Denis-Courmont
d076517056 lavc/llauddsp: R-V V scalarproduct_and_madd_int32
scalarproduct_and_madd_int32_c:      10899.7
scalarproduct_and_madd_int32_rvv_i32: 1749.0
2023-11-16 16:53:44 +02:00
Rémi Denis-Courmont
45d0eb3f70 lavc/llauddsp: R-V V scalarproduct_and_madd_int16
scalarproduct_and_madd_int16_c:      10355.7
scalarproduct_and_madd_int16_rvv_i32: 1480.0
2023-11-16 16:53:44 +02:00
James Almer
78f55457c9 x86/flacds: clear the high bits from pred_order in lpc_32 functions
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-15 16:10:15 -03:00
Dai, Jianhui J
c9fe9fb863 avcodec/cbs_vp8: Add support for VP8 codec bitstream
This commit adds support for VP8 bitstream read methods to the cbs
codec. This enables the trace_headers bitstream filter to support VP8,
in addition to AV1, H.264, H.265, and VP9. This can be useful for
debugging VP8 stream issues.

The CBS VP8 implements a simple VP8 boolean decoder using GetBitContext
to read the bitstream.

Only the read methods `read_unit` and `split_fragment` are implemented.
The write methods `write_unit` and `assemble_fragment` return the error
code AVERROR_PATCHWELCOME. This is because CBS VP8 write is unlikely to
be used by any applications at the moment. The write methods can be
added later if there is a real need for them.

TESTS: ffmpeg -i fate-suite/vp8/frame_size_change.webm -vcodec copy
-bsf:v trace_headers -f null -

Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2023-11-15 10:29:03 -05:00
Dai, Jianhui J
5cb8accd09 avcodec/vp8: Export vp8_token_update_probs variable
This commit exports the `vp8_token_update_probs` variable to internal
library scope to facilitate its reuse within the library.

Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2023-11-15 10:29:03 -05:00
Rémi Denis-Courmont
90a779bed6 lavc/huffyuvdsp: basic R-V V add_hfyu_left_pred_bgr32
Better performance can probably be achieved with a more intricate
unrolled loop, but this is a start:

add_hfyu_left_pred_bgr32_c: 15084.0
add_hfyu_left_pred_bgr32_rvv_i32: 10280.2

This would actually be cleaner with the RISC-V P extension, but that is
not ratified yet (I think?) and usually not supported if V is supported.
2023-11-15 16:51:07 +02:00
James Almer
b360c91752 avcodec/codecpar: mention how to allocate coded_side_data
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-14 14:26:42 -03:00
Anton Khirnov
6dbde68cb5 lavc/8bps: fix exporting palette after 63767b79a5
It would be left empty on each frame whose packet does not come with
palette attached.
2023-11-14 18:18:26 +01:00
Rémi Denis-Courmont
ce467421dc lavc/exrdsp: unroll predictor
With explicit unrolling, we can skip half of the sign bit flips, and
the compiler is then better able to optimise the scalar loop:

predictor_c: 31376.0 (before)
predictor_c: 23703.0 (after)
2023-11-14 19:15:51 +02:00
Rémi Denis-Courmont
c536e92207 lavc/sbrdsp: R-V V hf_apply_noise functions
This is restricted to 128-bit vectors as larger vector sizes could read
past the end of the noise array. Support for future hardware with larger
vector sizes is left for some other time.

hf_apply_noise_0_c:       2319.7
hf_apply_noise_0_rvv_f32: 1229.0
hf_apply_noise_1_c:       2539.0
hf_apply_noise_1_rvv_f32: 1244.7
hf_apply_noise_2_c:       2319.7
hf_apply_noise_2_rvv_f32: 1232.7
hf_apply_noise_3_c:       2541.2
hf_apply_noise_3_rvv_f32: 1244.2
2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont
5b33104fca lavc/sbrdsp: R-V V hf_gen
hf_gen_c:      2922.7
hf_gen_rvv_f32: 731.5
2023-11-13 18:33:02 +02:00
Gyan Doshi
67a2571a55 avcodec/libsvtav1: add version guard for external param
Setting of external param 'force_key_frames' was added in 7bcc1b4eb8.
It is available since v1.1.0 but ffmpeg allows linking against v0.9.0.
2023-11-13 13:14:43 +05:30
Evgeny Pavlov
da3ce21f68 libavcodec/amfenc: Fix issue with missing headers in AV1 encoder
This commit fixes issue with missing SPS/PPS headers in video
encoded by AMF AV1 encoder.
Missing headers leads to broken seek in MPV video player.
Default value for property AV1_HEADER_INSERTION_MODE shouldn't be setup
to NONE (no headers insertion). We need to skip definition of this property,
because default value depends on USAGE property.

Signed-off-by: Dmitrii Ovchinnikov <ovchinnikov.dmitrii@gmail.com>
2023-11-12 22:57:17 +01:00
Sebastian Ramacher
250471ea17 avcoded/fft: Fix memory leak if ctx2 is used
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-12 14:47:56 -03:00
Sebastian Ramacher
a562cfee2e avcodec/fft: Use av_mallocz to avoid invalid free/uninit
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-12 14:47:56 -03:00
Rémi Denis-Courmont
cd7b352c53 lavc/sbrdsp: R-V V autocorrelate
With 5 accumulator vectors and 6 inputs, this can only use LMUL=2.
Also the number of vector loop iterations is small, just 5 on 128-bit
vector hardware.

The vector loop is somewhat unusual in that it processes data in
descending memory order, in order to save on vector slides:
in descending order, we can extract elements to carry over to the next
iteration from the bottom of the vectors directly. With ascending order
(see in the Opus postfilter function), there are no ways to get the top
elements directly. On the downside, this requires the use of separate
shift and sub (the would-be SH3SUB instruction does not exist), with
a small pipeline stall on the vector load address.

The edge cases in scalar are done in scalar as this saves on loads
and remains significantly faster than C.

autocorrelate_c: 669.2
autocorrelate_rvv_f32: 421.0
2023-11-12 14:03:09 +02:00
Rémi Denis-Courmont
f576a0835b lavc/aacpsdsp: rework R-V V hybrid_synthesis_deint
Given the size of the data set, strided memory accesses cannot be avoided.
We can still do better than the current code.

ps_hybrid_synthesis_deint_c:       12065.5
ps_hybrid_synthesis_deint_rvv_i32: 13650.2 (before)
ps_hybrid_synthesis_deint_rvv_i64:  8181.0 (after)
2023-11-12 14:03:09 +02:00
Rémi Denis-Courmont
eb508702a8 lavc/aacpsdsp: rework R-V V add_squares
Segmented loads may be slower than not. So this advantageously uses a
unit-strided load and narrowing shifts instead.

Before:
ps_add_squares_c: 60757.7
ps_add_squares_rvv_f32: 22242.5

After:
ps_add_squares_c: 60516.0
ps_add_squares_rvv_i64: 17067.7
2023-11-12 14:03:09 +02:00
Paul B Mahol
10440a489a avcodec/gif_parser: split correctly also bitstreams that do not have extension blocks 2023-11-12 02:19:53 +01:00
Nuo Mi
09f783692e avcodec/cbs_h266: H266RawSliceHeader, expose curr_subpic_idx
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-11 11:53:21 -03:00
Michael Niedermayer
ac4e3e188a
avcodec/evc_parse: Check num_remaining_tiles_in_slice_minus1
Fixes: out of array access
Fixes: 62467/clusterfuzz-testcase-minimized-ffmpeg_BSF_EVC_FRAME_MERGE_fuzzer-6092990982258688

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: "Dawid Kozinski/Multimedia (PLT) /SRPOL/Staff Engineer/Samsung Electronics" <d.kozinski@samsung.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-10 00:15:28 +01:00
Michael Niedermayer
bb0a684d93
avcodec/4xm: Check for cfrm exhaustion
Fixes: index -1 out of bounds for type 'CFrameBuffer [100]'
Fixes: 63877/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FOURXM_fuzzer-5854263397711872

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-10 00:14:02 +01:00
Niklas Haas
96d2a40b9e avcodec/pnm: explicitly tag color range
PGMYUV seems to be always limited range. This was a format originally
invented by FFmpeg at a time when YUVJ distinguished limited from full
range YUV, and this codec never appeared to output YUVJ in any
circumstance, so hard-coding limited range preserves the status quo.

The other formats are explicitly documented to be full range RGB/gray
formats. That said, don't tag them yet, due to outstanding bugs w.r.t
grayscale formats and color range handling.

This change in behavior updates a bunch of FATE tests in trivial ways
(added tagging being the only difference).
2023-11-09 12:53:35 +01:00
Peter Ross
10869cd849 avcodec: LEAD MCMP decoder
Partially fixes ticket #798

Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Peter Ross <pross@xvid.org>
2023-11-08 17:37:58 +11:00
Rémi Denis-Courmont
adc87a5f7c lavc/opusdsp: rewrite R-V V postfilter
This uses a more traditional approach allowing up processing of up to
period minus two elements per iteration. This also allows the algorithm
to work for all and any vector length.

As the T-Head C908 device under test can load 16 elements loop, there is
unsurprisingly a little performance drop when the period is minimal and
the parallelism is capped at 13 elements:

Before:
postfilter_15_c:         21222.2
postfilter_15_rvv_f32:   22007.7
postfilter_512_c:        20189.7
postfilter_512_rvv_f32:  22004.2
postfilter_1022_c:       20189.7
postfilter_1022_rvv_f32: 22004.2

After:
postfilter_15_c:         20189.5
postfilter_15_rvv_f32:    7057.2
postfilter_512_c:        20189.5
postfilter_512_rvv_f32:   5667.2
postfilter_1022_c:       20192.7
postfilter_1022_rvv_f32:  5667.2
2023-11-06 22:09:30 +02:00
Rémi Denis-Courmont
02594c8c01 lavc/pixblockdsp: rework R-V V get_pixels_unaligned
As in the aligned case, we can use VLSE64.V, though the way of doing so
gets more convoluted, so the performance gains are more modest:

get_pixels_unaligned_c:       126.7
get_pixels_unaligned_rvv_i32: 145.5 (before)
get_pixels_unaligned_rvv_i64:  62.2 (after)

For the reference, those are the aligned benchmarks (unchanged) on the
same T-Head C908 hardware:

get_pixels_c:                 126.7
get_pixels_rvi:                85.7
get_pixels_rvv_i64:            33.2
2023-11-06 19:42:49 +02:00
Rémi Denis-Courmont
f68ad5d2de lavc/sbrdsp: R-V V sbr_hf_g_filt
hf_g_filt_c:      1552.5
hf_g_filt_rvv_f32: 679.5
2023-11-06 19:42:49 +02:00
Andreas Rheinhardt
3f890fbfd9 avcodec/cbs_h2645: Fix leak of SPS VUI extension data
Fixes: VUI extension leak
Fixes: 63004/clusterfuzz-testcase-minimized-ffmpeg_BSF_VVC_METADATA_fuzzer-4928832253329408

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-04 01:27:41 +01:00
Andreas Rheinhardt
5935423e1e avcodec/aactab: Deduplicate swb_offset_960 tabs
swb_offset_960_48 and swb_offset_960_32 coincide.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-04 01:24:09 +01:00
Michael Niedermayer
03a4aa9699
avcodec/flicvideo: consider width in copy loops
Fixes: out of array write
Fixes: 63520/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLIC_fuzzer-4876198087622656
Regression since: c7f8d42c12 (was not posted to ffmpeg-devel)

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-03 22:16:33 +01:00
Rémi Denis-Courmont
d06fd18f8f lavc/sbrdsp: R-V V neg_odd_64
With 128-bit vectors, this is mostly pointless but also harmless.
Performance gains should be more noticeable with larger vector sizes.

neg_odd_64_c:       76.2
neg_odd_64_rvv_i64: 74.7
2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont
b0aba7dd0c lavc/sbrdsp: R-V V sum_square
sum_square_c:       803.5
sum_square_rvv_f32: 283.2
2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont
86bee42473 lavc/sbrdsp: R-V V sum64x5
sum64x5_c:       385.0
sum64x5_rvv_f32: 116.0
2023-11-01 22:53:26 +02:00
Andreas Rheinhardt
eba73142ad avcodec/vp9: Join extradata buffer pools
Up until now each thread had its own buffer pool for extradata
buffers when using frame-threading. Each thread can have at most
three references to extradata and in the long run, each thread's
bufferpool seems to fill up with three entries. But given
that at any given time there can be at most 2 + number of threads
entries used (the oldest thread can have two references to preceding
frames that are not currently decoded and each thread has its own
current frame, but there can be no references to any other frames),
this is wasteful. This commit therefore uses a single buffer pool
that is synced across threads.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-01 20:16:02 +01:00
Andreas Rheinhardt
0c44f63b02 avcodec/refstruct: Allow to share pools
To do this, make FFRefStructPool itself refcounted according
to the RefStruct API.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-01 20:15:54 +01:00
Andreas Rheinhardt
92abc7266b avcodec/vaapi_encode: Use RefStruct pool API, stop abusing AVBuffer API
Up until now, the VAAPI encoder uses fake data with the
AVBuffer-API: The data pointer does not point to real memory,
but is instead just a VABufferID converted to a pointer.
This has probably been copied from the VAAPI-hwcontext-API
(which presumably does it to avoid allocations).

This commit changes this without causing additional allocations
by switching to the RefStruct-pool API. This also fixes an
unchecked av_buffer_ref().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-01 20:14:22 +01:00
Andreas Rheinhardt
8c0350f57e avcodec/vp9: Use RefStruct-pool API for extradata
It avoids allocations and corresponding error checks.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-01 20:14:06 +01:00
Andreas Rheinhardt
090d9956fd avcodec/refstruct: Allow to always return zeroed pool entries
This is in preparation for the following commit.

Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-01 20:13:40 +01:00
Andreas Rheinhardt
e01e30ede1 avcodec/nvdec: Use RefStruct-pool API for decoder pool
It involves less allocations, in particular no allocations
after the entry has been created. Therefore creating a new
reference from an existing one can't fail and therefore
need not be checked. It also avoids indirections and casts.

Also note that nvdec_decoder_frame_init() (the callback
to initialize new entries from the pool) does not use
atomics to read and replace the number of entries
currently used by the pool. This relies on nvdec (like
most other hwaccels) not being run in a truely frame-threaded
way.

Tested-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-01 20:13:01 +01:00
Andreas Rheinhardt
fd2e65871c avcodec/hevcdec: Use RefStruct-pool API instead of AVBufferPool API
It involves less allocations and therefore has the nice property
that deriving a reference from a reference can't fail,
simplifying hevc_ref_frame().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-01 20:10:20 +01:00
Andreas Rheinhardt
736b510fcc avcodec/h264dec: Use RefStruct-pool API instead of AVBufferPool API
It involves less allocations and therefore has the nice property
that deriving a reference from a reference can't fail.
This allows for considerable simplifications in
ff_h264_(ref|replace)_picture().
Switching to the RefStruct API also allows to make H264Picture
smaller, because some AVBufferRef* pointers could be removed
without replacement.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-01 20:07:56 +01:00
Andreas Rheinhardt
26c0a7321f avcodec/refstruct: Add RefStruct pool API
Very similar to the AVBufferPool API, but with some differences:
1. Reusing an already existing entry does not incur an allocation
at all any more (the AVBufferPool API needs to allocate an AVBufferRef).
2. The tasks done while holding the lock are smaller; e.g.
allocating new entries is now performed without holding the lock.
The same goes for freeing.
3. The entries are freed as soon as possible (the AVBufferPool API
frees them in two batches: The first in av_buffer_pool_uninit() and
the second immediately before the pool is freed when the last
outstanding entry is returned to the pool).
4. The API is designed for objects and not naked buffers and
therefore has a reset callback. This is called whenever an object
is returned to the pool.
5. Just like with the RefStruct API, custom allocators are not
supported.

(If desired, the FFRefStructPool struct itself could be made
reference counted via the RefStruct API; an FFRefStructPool
would then be freed via ff_refstruct_unref().)

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-11-01 20:07:23 +01:00
Rémi Denis-Courmont
92bcc6703a lavc/pixblockdsp: remove R-V V get_pixels_16
In the aligned case, the existing RVI assembler is actually much
faster. In the unaligned case, there is nothing much to gain over C.
2023-11-01 19:27:22 +02:00
Rémi Denis-Courmont
28840cf499 lavc/jpeg2000dsp: R-V V rct_int
jpeg2000_rct_int_c:       2592.2
jpeg2000_rct_int_rvv_i32: 1154.2
2023-11-01 18:52:55 +02:00
Rémi Denis-Courmont
73dea2bb91 lavc/jpeg2000dsp: R-V V ict_float
jpeg2000_ict_float_c:       3112.2
jpeg2000_ict_float_rvv_f32: 1225.0
2023-11-01 18:52:55 +02:00
Rémi Denis-Courmont
b2a441a3be lavc/jpeg2000dsp: make coefficients extern
This is so that they can be loaded from assembler, rather than
duplicated.
2023-11-01 18:52:55 +02:00
Michael Niedermayer
a5259f326b
avcodec/vlc: Pass VLC_MULTI_ELEM directly not by pointer
This makes the code more testable as uninitialized fields are 0
and not random values from the last call

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-01 16:40:22 +01:00
Michael Niedermayer
8516609edd
avcodec/vlc: Replace mysterious max computation code in multi vlc
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-01 16:40:21 +01:00
Michael Niedermayer
356b1ba765
avcodec/vlc: Skip subtable entries in multi VLC
These entries do not correspond to VLC symbols that can be used
they do corrupt various variables like min/max bits

This also no longer assumes that there is a single non subtable
entry
Probably fixes some infinite loops too

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-01 16:40:21 +01:00
Michael Niedermayer
2817efbba3
avcodec/dovi_rpu: Use 64 bit in get_us/se_coeff()
Fixes: shift exponent 32 is too large for 32-bit type 'int'
Fixes: 63151/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5067531154751488

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-01 16:40:20 +01:00
Michael Niedermayer
2def617787
avcodec/apedec: Fix integer overflow in predictor_decode_stereo_3950()
Fixes: signed integer overflow: 1900031961 + 553590817 cannot be represented in type 'int'
Fixes: 63061/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APE_fuzzer-5166188298371072

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-01 16:40:20 +01:00
Michael Niedermayer
68cc1744db
avcodec/evc_parse: Check tid
The check is based on not infinite looping. It is likely
a more strict check can be done

Fixes: Infinite loop
Fixes: 62473/clusterfuzz-testcase-minimized-ffmpeg_BSF_EVC_FRAME_MERGE_fuzzer-5719883750703104
Fixes: 62765/clusterfuzz-testcase-minimized-ffmpeg_dem_EVC_fuzzer-6448531252314112
Fixes: 63378/clusterfuzz-testcase-minimized-ffmpeg_dem_MPEGPS_fuzzer-6504993844494336

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: "Dawid Kozinski/Multimedia (PLT) /SRPOL/Staff Engineer/Samsung Electronics" <d.kozinski@samsung.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-01 16:40:19 +01:00
Michael Niedermayer
d35eecd24f
avcodec/evc_parse: remove pow() and log2()
The use of float based functions is both unneeded and wrong due to unpredictable rounding

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-01 16:40:03 +01:00
Andreas Rheinhardt
f2687a3b69 avcodec/wmavoice: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
5615f9dab4 avcodec/wmaprodec: Avoid superfluous VLC structures
For all VLCs here, the number of bits of the VLC is write-only,
because it is hardcoded at the call site. Therefore one can replace
these VLC structures with the only thing that is actually used:
The pointer to the VLCElem table. And in most cases one can even
avoid this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
7e2120c4d9 avcodec/mpeg12: Avoid unnecessary VLC structures
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying tables directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
c9aa80c313 avcodec/mpegaudiodec_common: Avoid superfluous VLC structures
For some VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
5dc31bc67b avcodec/aacps_common: Apply offset for VLCs during init
This avoids having to apply it later after every get_vlc2().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
40a8cb9e6c avcodec/aacps_common: Combine huffman tabels
This allows to avoid the relocations inherent in an array
to individual tables; it also reduces padding.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
774611a349 avcodec/aacps_common: Switch to ff_vlc_init_tables_from_lengths()
It allows to replace codes of type uint16_t or uint32_t
by symbols of type uint8_t.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
eb422c606a avcodec/aacps_common: Avoid superfluous VLC structures
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
4fe91e3676 avcodec/aacps: Move initializing common stuff to aacdec_common.c
ff_ps_init() initializes some tables for AAC parametric stereo
and some of them are only valid for the fixed- or floating-point
decoder, whereas others (namely VLCs) are valid for both.
The latter are therefore initialized by ff_ps_init_common()
and because the two versions of ff_ps_init() can be run
concurrently, it is guarded by an AVOnce.

Yet now that there is ff_aacdec_common_init_once() there is
a better way to do this: Call ff_ps_init_common()
from ff_aacdec_common_init_once(). That way there is no need
to guard ff_ps_init_common() by an AVOnce any more.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
7f66d9d6c5 avcodec/aacdec_common: Apply offset for SBR VLCs during init
This avoids having to apply it later after every get_vlc2().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
1aca4e7fc5 avcodec/aacdec_common: Combine huffman tabs
This allows to avoid the relocations inherent in a table
to individual tables; it also reduces padding.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
2c131f126d avcodec/aacdec_common: Switch to ff_vlc_init_tables_from_lengths()
It allows to replace code tables of type uint32_t or uint16_t
by symbols of type uint8_t. It is also faster.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
0b4e69cc87 avcodec/aacdec_common: Avoid superfluous VLC structures for SBR VLCs
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
22d60524d8 avcodec/aacsbr_template: Deduplicate VLCs
The VLCs, their init code and the tables used for initialization
are currently duplicated for the floating- and fixed-point decoders.
This commit stops doing so and moves this stuff to aacdec_common.c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Andreas Rheinhardt
4d6042e9d7 avcodec/aacdec_common: Avoid superfluous VLC structures
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table. And in some cases one can even
avoid this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 21:44:48 +01:00
Benjamin Cheng
4536de3769 vulkan_h264: fix long-term ref handling
h->long_ref isn't guaranteed to be contiguously filled. Use the approach
from both vaapi_h264 and vdpau_h264 which goes through the 16 frames in
h->long_ref to find the LTR entries.

Fixes MR2_MW_A.264 from JVT-AVC_V1.
2023-10-31 21:35:23 +01:00
Andreas Rheinhardt
1e63e24c76 avcodec/aactab: Improve included headers
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
30deaba97b avcodec/aacdec_template: Don't init unused table for fixed-point decoder
The fixed-point decoder actually does not use the floating-point
tables initialized by ff_aac_tableinit() at all. So don't
initialize them for it; instead merge initializing these tables
into ff_aac_float_common_init() which is already the function
for the common static initializations of the floating-point
AAC decoder and the (also floating-point) AAC encoder.
Doing so saves also one AVOnce.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
3b080fe7af avcodec/aacdec_template: Deduplicate VLCs
They (as well as their init code) are currently duplicated
for the floating- and fixed-point decoders.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
1f15a7e9a8 avcodec/aacdectab: Deduplicate common decoder tables
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
8c1e71a811 avcodec/aacps: Pass logctx as void* instead of AVCodecContext*
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
70b5d9c569 avcodec/aacps: Remove unused AVCodecContext* parameter from ff_ps_apply
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
36b5f71b1f avcodec/msmpeg4dec: Avoid superfluous VLC structures
For all VLCs here, the number of bits of the VLC is write-only,
because it is hardcoded at the call site. Therefore one can replace
these VLC structures with the only thing that is actually used:
The pointer to the VLCElem table. And in some cases one can even
avoid this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
5a694d62c5 avcodec/mpeg4videodec: Avoid superfluous VLC structures
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table. And in some cases one can even
avoid this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
8c39b2bca7 avcodec/mss4: Partially inline max_depth and nb_bits of VLC
It is known at compile-time for the vec_entry_vlcs.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
05577d2c76 avcodec/indeo2: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
25b9ff2780 avcodec/4xm: Avoid unnecessary VLC structures
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
e5dcde620d avcodec/vc1: Avoid superfluous VLC structures
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table. And in some cases one can even
avoid this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
fd4cb6ebee avcodec/speedhqdec: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
0a610e22c1 avcodec/lagarith: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
e3ad5b9784 avcodec/imm4: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
f6c5d04b6d avcodec/mimic: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
827d0325a9 avcodec/mobiclip: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and only VLC.table needs to be retained.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
36e7f9b339 avcodec/vqcdec: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
99ed510d4b avcodec/mv30: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
b60a3f70be avcodec/wnv1: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
1ae750a16e avcodec/rv34: Constify pointer to static object
Said object is only allowed to be modified during its
initialization and is immutable afterwards.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
716ddc8c62 avcodec/rv34: Avoid superfluous VLC structures
For most VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
73fa6d486d avcodec/vp3: Reindent after the previous commits
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
75c6a253a4 avcodec/vp3: Avoid complete VLC struct, only use VLCElem*
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
6c7a344b65 avcodec/vp3: Share coefficient VLCs between threads
These VLCs are very big: The VP3 one have 164382 elements
but due to the overallocation enough memory for 313344 elements
are allocated (1.195 MiB with sizeof(VLCElem) == 4);
for VP4 the numbers are very similar, namely 311296 and 164392
elements. Since 1f4cf92cfb, each
frame thread has its own copy of these VLCs.

This commit fixes this by sharing these VLCs across threads.
The approach used here will also make it easier to support
stream reconfigurations in case of frame-multithreading
in the future.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
7fee90efac avcodec/imc: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
6fb96ef755 avcodec/atrac9dec: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00