FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-02 03:06:28 +02:00

Author	SHA1	Message	Date
Anton Khirnov	08bebeb1be	Revert "all: Don't set AVClass.item_name to its default value" Some callers assume that item_name is always set, so this may be considered an API break. This reverts commit `0c6203c97a`.	2024-01-20 10:34:48 +01:00
James Almer	0a5813fc68	avcodec/vvcdec: allocate and store structs on their own within the table list Fixes "runtime error: member access within misaligned address 0xf00 for type 'struct bar', which requires # byte alignment" errors under GCC ubsan. Reviewed-by: Nuo Mi <nuomi2021@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2024-01-19 08:53:32 -03:00
sunyuechi	8e23ebe6f9	lavc/svq1enc: R-V V ssd_int8_vs_int16 C908 ssd_int8_vs_int16_c: 207.7 ssd_int8_vs_int16_rvv_i32: 14.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-01-17 17:49:54 +02:00
Nuo Mi	d595e0a0b6	avcodec/vvcdec: misc, constify hor_ctu_edge Signed-off-by: James Almer <jamrial@gmail.com>	2024-01-17 10:14:50 -03:00
Nuo Mi	375dcf469e	avcodec/vvcdec: deblock, fix uninitialized values see https://fate.ffmpeg.org/report.cgi?slot=x86_64-archlinux-gcc-valgrind&time=20240105201935 If tc is zero, the max_len_q, max_len_p are uninitialized. Reported-by: James Almer <jamrial@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2024-01-17 10:14:35 -03:00
aybe aybe	36b402f80d	avcodec/mdec: DC reading for STRv1 is like STRv2 As I understand, support for .STR files is broken for almost 10 years now (since `161442ff2c` it seems). Currently, ffmpeg fails with tons of errors like this on version 1 STRs, e.g. Wipeout 1: [mdec @ 00000000027c72c0] ac-tex damaged at 1 9 What happens is that only the audio is present in the video file. Anyway, that one character patch fixes the problem, video is now rendered. Signed-off-by: aybe <aybe@users.noreply.github.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-01-16 01:34:56 +01:00
sunyuechi	0befc1fca7	lvac/svqenc: add ff_svq1enc_init This is for clarity and use in testing, consistent with other parts of the code Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-01-15 19:03:03 +02:00
Rémi Denis-Courmont	278b4b60d6	lavc/takdsp: R-V V decorrelate_sf decorrelate_sf_c: 259.2 decorrelate_sf_rvv_i32: 45.5	2024-01-15 19:00:25 +02:00
yuanhecai	a87a52ed0b	avcodec/hevc: Add ff_hevc_idct_32x32_lasx asm opt tests/checkasm/checkasm: C LSX LASX hevc_idct_32x32_8_c: 1243.0 211.7 101.7 Speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads is 1fps(56fps-->57fps). Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-01-12 23:35:40 +01:00
jinbo	9239081db3	avcodec/hevc: Add asm opt for the following functions tests/checkasm/checkasm: C LSX LASX put_hevc_qpel_uni_h4_8_c: 5.7 1.2 put_hevc_qpel_uni_h6_8_c: 12.2 2.7 put_hevc_qpel_uni_h8_8_c: 21.5 3.2 put_hevc_qpel_uni_h12_8_c: 47.2 9.2 7.2 put_hevc_qpel_uni_h16_8_c: 87.0 11.7 9.0 put_hevc_qpel_uni_h24_8_c: 188.2 27.5 21.0 put_hevc_qpel_uni_h32_8_c: 335.2 46.7 28.5 put_hevc_qpel_uni_h48_8_c: 772.5 104.5 65.2 put_hevc_qpel_uni_h64_8_c: 1383.2 142.2 109.0 put_hevc_epel_uni_w_v4_8_c: 5.0 1.5 put_hevc_epel_uni_w_v6_8_c: 10.7 3.5 2.5 put_hevc_epel_uni_w_v8_8_c: 18.2 3.7 3.0 put_hevc_epel_uni_w_v12_8_c: 40.2 10.7 7.5 put_hevc_epel_uni_w_v16_8_c: 70.2 13.0 9.2 put_hevc_epel_uni_w_v24_8_c: 158.2 30.2 22.5 put_hevc_epel_uni_w_v32_8_c: 281.0 52.0 36.5 put_hevc_epel_uni_w_v48_8_c: 631.7 116.7 82.7 put_hevc_epel_uni_w_v64_8_c: 1108.2 207.5 142.2 put_hevc_epel_uni_w_h4_8_c: 4.7 1.2 put_hevc_epel_uni_w_h6_8_c: 9.7 3.5 2.7 put_hevc_epel_uni_w_h8_8_c: 17.2 4.2 3.5 put_hevc_epel_uni_w_h12_8_c: 38.0 11.5 7.2 put_hevc_epel_uni_w_h16_8_c: 69.2 14.5 9.2 put_hevc_epel_uni_w_h24_8_c: 152.0 34.7 22.5 put_hevc_epel_uni_w_h32_8_c: 271.0 58.0 40.0 put_hevc_epel_uni_w_h48_8_c: 597.5 136.7 95.0 put_hevc_epel_uni_w_h64_8_c: 1074.0 252.2 168.0 put_hevc_epel_bi_h4_8_c: 4.5 0.7 put_hevc_epel_bi_h6_8_c: 9.0 1.5 put_hevc_epel_bi_h8_8_c: 15.2 1.7 put_hevc_epel_bi_h12_8_c: 33.5 4.2 3.7 put_hevc_epel_bi_h16_8_c: 59.7 5.2 4.7 put_hevc_epel_bi_h24_8_c: 132.2 11.0 put_hevc_epel_bi_h32_8_c: 232.7 20.2 13.2 put_hevc_epel_bi_h48_8_c: 521.7 45.2 31.2 put_hevc_epel_bi_h64_8_c: 949.0 71.5 51.0 After this patch, the peformance of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads improves 1fps(55fps-->56fsp). Change-Id: I8cc1e41daa63ca478039bc55d1ee8934a7423f51 Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-01-12 23:35:40 +01:00
jinbo	1f642b99af	avcodec/hevc: Add epel_uni_w_hv4/6/8/12/16/24/32/48/64 asm opt tests/checkasm/checkasm: C LSX LASX put_hevc_epel_uni_w_hv4_8_c: 9.5 2.2 put_hevc_epel_uni_w_hv6_8_c: 18.5 5.0 3.7 put_hevc_epel_uni_w_hv8_8_c: 30.7 6.0 4.5 put_hevc_epel_uni_w_hv12_8_c: 63.7 14.0 10.7 put_hevc_epel_uni_w_hv16_8_c: 107.5 22.7 17.0 put_hevc_epel_uni_w_hv24_8_c: 236.7 50.2 31.7 put_hevc_epel_uni_w_hv32_8_c: 414.5 88.0 53.0 put_hevc_epel_uni_w_hv48_8_c: 917.5 197.7 118.5 put_hevc_epel_uni_w_hv64_8_c: 1617.0 349.5 203.0 After this patch, the peformance of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads improves 3fps (52fps-->55fsp). Change-Id: If067e394cec4685c62193e7adb829ac93ba4804d Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-01-12 23:35:40 +01:00
jinbo	6c6bf18ce8	avcodec/hevc: Add qpel_uni_w_v\|h4/6/8/12/16/24/32/48/64 asm opt tests/checkasm/checkasm: C LSX LASX put_hevc_qpel_uni_w_h4_8_c: 6.5 1.7 1.2 put_hevc_qpel_uni_w_h6_8_c: 14.5 4.5 3.7 put_hevc_qpel_uni_w_h8_8_c: 24.5 5.7 4.5 put_hevc_qpel_uni_w_h12_8_c: 54.7 17.5 12.0 put_hevc_qpel_uni_w_h16_8_c: 96.5 22.7 13.2 put_hevc_qpel_uni_w_h24_8_c: 216.0 51.2 33.2 put_hevc_qpel_uni_w_h32_8_c: 385.7 87.0 53.2 put_hevc_qpel_uni_w_h48_8_c: 860.5 192.0 113.2 put_hevc_qpel_uni_w_h64_8_c: 1531.0 334.2 200.0 put_hevc_qpel_uni_w_v4_8_c: 8.0 1.7 put_hevc_qpel_uni_w_v6_8_c: 17.2 4.5 put_hevc_qpel_uni_w_v8_8_c: 29.5 6.0 5.2 put_hevc_qpel_uni_w_v12_8_c: 65.2 16.0 11.7 put_hevc_qpel_uni_w_v16_8_c: 116.5 20.5 14.0 put_hevc_qpel_uni_w_v24_8_c: 259.2 48.5 37.2 put_hevc_qpel_uni_w_v32_8_c: 459.5 80.5 56.0 put_hevc_qpel_uni_w_v48_8_c: 1028.5 180.2 126.5 put_hevc_qpel_uni_w_v64_8_c: 1831.2 319.2 224.2 Speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads is 4fps(48fps-->52fps). Change-Id: I1178848541d90083869225ba98a02e6aa8bb8c5a Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-01-12 23:35:40 +01:00
jinbo	a28eea2a27	avcodec/hevc: Add pel_uni_w_pixels4/6/8/12/16/24/32/48/64 asm opt tests/checkasm/checkasm: C LSX LASX put_hevc_pel_uni_w_pixels4_8_c: 2.7 1.0 put_hevc_pel_uni_w_pixels6_8_c: 6.2 2.0 1.5 put_hevc_pel_uni_w_pixels8_8_c: 10.7 2.5 1.7 put_hevc_pel_uni_w_pixels12_8_c: 23.0 5.5 5.0 put_hevc_pel_uni_w_pixels16_8_c: 41.0 8.2 5.0 put_hevc_pel_uni_w_pixels24_8_c: 91.0 19.7 13.2 put_hevc_pel_uni_w_pixels32_8_c: 161.7 32.5 16.2 put_hevc_pel_uni_w_pixels48_8_c: 354.5 73.7 43.0 put_hevc_pel_uni_w_pixels64_8_c: 641.5 130.0 64.2 Speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads is 1fps(47fps-->48fps). Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-01-12 23:35:40 +01:00
jinbo	cfbdda607d	avcodec/hevc: Add add_residual_4/8/16/32 asm opt After this patch, the peformance of decoding H265 4K 30FPS 30Mbps on 3A6000 with 8 threads improves 2fps (45fps-->47fsp). Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-01-12 23:35:40 +01:00
Zhao Zhili	13c1fea92f	avcodec/videotoolboxenc: fix setting avctx color_range doesn't work Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-01-12 10:49:18 +08:00
Nuo Mi	8d0dda8260	vvcdec: reuse h26x/h2656_deblock_template.c	2024-01-11 22:53:05 +08:00
Nuo Mi	ae0a83477b	hevcdec: move deblock template to h26x/h2656_deblock_template.c	2024-01-11 22:53:05 +08:00
Nuo Mi	69e179e8bf	vvcdec: reuse h26x/h2656_sao_template.c	2024-01-11 22:53:05 +08:00
Nuo Mi	d2fe23b835	hevcdec: move sao template to h26x/h2656_sao_template.c	2024-01-11 22:53:05 +08:00
Clément Bœsch	af509f9957	avcodec/proresenc_anatoliy: do not write into chroma reserved bitfields The layout for the frame flags is as follow: chroma_format u(2) reserved u(2) interlace_mode u(2) reserved u(2) chroma_format has 2 allowed values: 0: reserved 1: reserved 2: 4:2:2 3: 4:4:4 interlace_mode has 3 allowed values: 0: progressive 1: tff 2: bff 3: reserved 0x80 is what we expect for "422 not interlaced", and the extra 0x2 from 0x82 is actually writing into the reserved bits.	2024-01-10 23:33:02 +01:00
Clément Bœsch	21f7a814ea	avcodec/proresenc_anatoliy: do not write into alpha reserved bitfields This byte represents 4 reserved bits followed by 4 alpha_channel_type bits. alpha_channel_type currently has 3 differents defined values: 0 (no alpha), 1 (8b alpha), and 2 (16b alpha), all the other values are reserved. The 4 initial reserved bits are expected to be 0.	2024-01-10 23:33:02 +01:00
Clément Bœsch	6d35911667	avcodec/proresenc_kostya: do not write into alpha reserved bitfields This byte represents 4 reserved bits followed by 4 alpha_channel_type bits. alpha_channel_type currently has 3 differents defined values: 0 (no alpha), 1 (8b alpha), and 2 (16b alpha), all the other values are reserved. This part is correctly written (alpha_bits>>3 does the correct thing), but the 4 initial bits are reserved.	2024-01-10 23:33:02 +01:00
Clément Bœsch	aa7ccd0ce9	avcodec/proresenc_kostya: use a compatible bitstream version Quoting SMPTE RDD 36:2015: A decoder shall abort if it encounters a bitstream with an unsupported bitstream_version value. If 0, the value of the chroma_format syntax element shall be 2 (4:2:2 sampling) and the value of the alpha_channel_type element shall be 0 (no encoded alpha); if 1, any permissible value may be used for those syntax elements. So if we're not in 4:2:2 or if there is alpha, we are not allowed to use version 0.	2024-01-10 23:33:02 +01:00
Clément Bœsch	85cb1b9b20	avcodec/proresenc_anatoliy: use a compatible bitstream version Quoting SMPTE RDD 36:2015: A decoder shall abort if it encounters a bitstream with an unsupported bitstream_version value. If 0, the value of the chroma_format syntax element shall be 2 (4:2:2 sampling) and the value of the alpha_channel_type element shall be 0 (no encoded alpha); if 1, any permissible value may be used for those syntax elements. So if we're not in 4:2:2 or if there is alpha, we are not allowed to use version 0.	2024-01-10 23:33:02 +01:00
Clément Bœsch	1081bae94d	avcodec/proresenc_kostya: make a few cosmetics in encode_acs() Unify cosmetics with encode_acs() from proresenc_anatoliy.	2024-01-10 14:08:00 +01:00
Clément Bœsch	cc2206d142	avcodec/proresenc_anatoliy: make a few cosmetics in encode_acs() This makes the function pretty much identical to the function of the same name in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	8fb2e96d7e	avcodec/proresenc_anatoliy: execute AC run/level FFMIN() at assignment This matches the logic from the function of the same name in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	096a69ad43	avcodec/proresenc_anatoliy: rework inner loop in encode_acs() This matches the logic from the function of the same name in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	25f28b9308	avcodec/proresenc_anatoliy: avoid using ff_ prefix in function arguments	2024-01-10 14:08:00 +01:00
Clément Bœsch	29fd3f75fe	avcodec/proresenc_anatoliy: rework encode_ac_coeffs() prototype This makes the prototype closer to the function of the same name in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	3543100a05	avcodec/proresenc_anatoliy: replace get_level() with FFABS() This matches the code from proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	ed8692446c	avcodec/proresenc_anatoliy: cosmetics to make encode_dcs() identical to the one in Kostya encoder	2024-01-10 14:08:00 +01:00
Clément Bœsch	e87bc5641c	avcodec/proresenc_anatoliy: remove TO_GOLOMB2() A few cosmetics aside, this makes the function identical to the one with the same name in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	a026f98f29	avcodec/proresenc_anatoliy: only pass down the first scale to encode_dcs() This matches encode_dcs() prototype from proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	1aa7d504ec	avcodec/proresenc_anatoliy: shuffle declarations around in encode_dcs() This makes the function closer to the same function in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	87ba89281c	avcodec/proresenc_anatoliy: rename TO_GOLOMB() to MAKE_CODE() This matches the name in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	7af42088d7	avcodec/proresenc_kostya: add Anatoliy copyright Both encoders share a lot of code from both authors.	2024-01-10 14:08:00 +01:00
Clément Bœsch	d269f84199	avcodec/proresenc_anatoliy: remove IS_NEGATIVE() macro This makes the function closer to encode_acs() in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	9c7f6d89fd	avcodec/proresenc_anatoliy: rename new_dc to dc This makes the function closer to encode_dcs() in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	9258f4eaf9	avcodec/proresenc_anatoliy: compute sign only once This makes the function closer to encode_dcs() in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	17392ca84f	avcodec/proresenc_anatoliy: import GET_SIGN() macro from Kostya encoder and use it	2024-01-10 14:08:00 +01:00
Clément Bœsch	273f591a3d	avcodec/proresenc_anatoliy: directly work with blocks in encode_dcs() This makes the function closer to encode_dcs() in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	dadc5ac24a	avcodec/proresenc_anatoliy: reduce DC encoding function prototype differences with Kostya encoder	2024-01-10 14:08:00 +01:00
Clément Bœsch	8e42d3aba0	avcodec/proresenc_anatoliy: execute codebook FFMIN() at assignment This makes the function closer to encode_dcs() in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	43baba4647	avcodec/proresenc_anatoliy: rename new_code/code to code/codebook This makes the function closer to encode_dcs() in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	c44cd371ca	avcodec/proresenc_anatoliy: inline QSCALE() Also replaces 16384 with 0x4000. This makes the function slightly closer to same function in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	1574475033	avcodec/proresenc_anatoliy: rework encode_codeword() prototype This matches the function of the same name in proresenc_kostya.	2024-01-10 14:08:00 +01:00
Clément Bœsch	1832bd7838	avcodec/proresenc_anatoliy: shuffle encode_codeword() code to match Kostya encoder Code is functionally identical, it's just rename of variables, cosmetics and branch logic shuffling.	2024-01-10 14:08:00 +01:00
Clément Bœsch	3885d2493d	avcodec/proresenc_anatoliy: use FRAME_ID defined in proresdata.h	2024-01-10 14:08:00 +01:00
Clément Bœsch	d6e0fb7c92	avcodec/proresenc_kostya: simplify quantization matrix bytestream writing	2024-01-10 14:08:00 +01:00
Clément Bœsch	cbee015867	avcodec/proresenc_kostya: fix chroma quantisation matrix in frame header Most of the time the quantisation matrices are the same, it only matters with the proxy profile.	2024-01-10 14:08:00 +01:00
Clément Bœsch	631fa19ee0	avcodec/proresenc_kostya: save a few operations in DC encoding This matches the logic from proresenc_anatoliy.	2024-01-10 14:08:00 +01:00
Clément Bœsch	f06f2cf16a	avcodec/proresenc_anatoliy: move DC codebook LUT to shared proresdata This is going to be shared with proresenc_kostya in the upcoming commit.	2024-01-10 14:08:00 +01:00
Clément Bœsch	9f547e2f15	avcodec/proresenc_anatoliy: remove duplicated define This is already defined in proresdata.h	2024-01-10 14:08:00 +01:00
Clément Bœsch	c35733006a	avcodec/proresenc_kostya: remove one LUT indirection for run/level to codebook mapping This is following the same logic as proresenc_anatoliy.	2024-01-10 14:08:00 +01:00
Clément Bœsch	3ba52f18e4	avcodec/proresenc_anatoliy: move run/lev to codebook LUT to shared proresdata This is going to be shared with proresenc_kostya in the upcoming commit.	2024-01-10 14:08:00 +01:00
Clément Bœsch	e940baa65b	avcodec/proresenc_kostya: remove redundant codebook assignments This is already assigned at declaration.	2024-01-10 14:08:00 +01:00
Clément Bœsch	e453efcfbc	avcodec/proresenc_kostya: remove unused plane factor variables	2024-01-10 14:08:00 +01:00
Clément Bœsch	2ac88c1362	avcodec/proresenc_kostya: remove an unnecessary parenthesis level in MAKE_CODE() macro	2024-01-10 14:08:00 +01:00
Marton Balint	363b3ec98a	all: use av_channel_layout_describe_bprint instead of av_channel_layout_describe in a few places Where an AVBPrint buffer is used later anyway. Signed-off-by: Marton Balint <cus@passwd.hu>	2024-01-07 22:47:22 +01:00
James Almer	b95ccfcada	avcodec/vvc_thread: don't use an anonymous union Should fix compilation with old GCC. Signed-off-by: James Almer <jamrial@gmail.com>	2024-01-06 23:28:03 -03:00
Nuo Mi	02d600c568	vvcdec: add TODO for combining transform, lmcs_scale_chroma, and add_residual Thanks for the suggestion from Lynne.	2024-01-07 09:01:04 +08:00
Nuo Mi	26769024d1	avcodec/vvcdec: decode extradata to support container formats For example: wget https://www.elecard.com/storage/video/NovosobornayaSquare_1920x1080.mp4 ./ffplay NovosobornayaSquare_1920x1080.mp4	2024-01-07 08:58:43 +08:00
Sam James	2f24f10d9c	libavcodec: fix -Wint-conversion in vulkan FIx warnings (soon to be errors in GCC 14, already so in Clang 15): ``` src/libavcodec/vulkan_av1.c: In function ‘vk_av1_create_params’: src/libavcodec/vulkan_av1.c:183:43: error: initialization of ‘long long unsigned int’ from ‘void *’ makes integer from pointer without a cast [-Wint-conversion] 183 \| .videoSessionParametersTemplate = NULL, \| ^~~~ src/libavcodec/vulkan_av1.c:183:43: note: (near initialization for ‘(anonymous).videoSessionParametersTemplate’) ``` Use Vulkan's VK_NULL_HANDLE instead of bare NULL. Fix Trac ticket #10724. Was reported downstream in Gentoo at https://bugs.gentoo.org/919067. Signed-off-by: Sam James <sam@gentoo.org>	2024-01-06 22:38:55 +01:00
Clément Bœsch	9109273e3b	avcodec/proresenc: fix alpha plane encoding bitstream These functions encode a slice of alpha (1 to 8 macroblocks) which are expected to be encoded as a repeated sequence of "[diff][run-1]", where diff is the running difference of the alpha value and run is how many times that value is expected to be duplicated (within the limit of a grand total of 2048 unpacked samples, corresponding to a slice of 8 MB). Even when run==0 (the run variable semantic is actually "run minus 1"), there is always a diff previously encoded that needs a counter of at least 1. This means we need to call put_alpha_run() unconditionally at the end of the bitstream to account for the last running diff. This commit fixes glitchy playbacks on QuickTime with M2 and M3 hardware (but not M1 for some mysterious reason) with files generated with commands such as: ffmpeg -f lavfi -i testsrc2=d=5:s=912x320,chromakey -c:v prores_aw -profile:v 4 -y aw.mov ffmpeg -f lavfi -i testsrc2=d=5:s=912x320,chromakey -c:v prores_ks -profile:v 4444 -y ks.mov The glitch expresses itself deterministically as blinking black rectangles on random frames (for example on frame 21, 54, 71, 79, ...). Even with the proresdec from FFmpeg, overreads actually happens while reading the run-minus-1 value (around val = get_bits(gb, 4) in unpack_alpha()). This doesn't seem to cause any particular issue because it simply overreads into the next slice, and because the decoder is resilient, but it's still a problem. The investigation leading to this fix was made possible because of paid work for Jitter (https://jitter.video). Fixes ticket #10255.	2024-01-06 17:29:59 +01:00
Clément Bœsch	2142141a16	avcodec/proresenc: make transparency honored in mov/QT In the mov muxer (in mov_write_video_tag()), bits_per_coded_sample will be written under certain conditions and is required to be 32 for the transparency to be honored in QuickTime. prores_kostya already has this setting but prores_anatoliy and prores_videotoolbox didn't.	2024-01-06 17:29:59 +01:00
Wu Jianhua	94949d4770	avcodec/d3d12va_decode: don't change the resource state if the referenced frame is the same as the current frame This commit removes the follow warning and error: D3D12 WARNING: ID3D12CommandList::ResourceBarrier: Called on the same subresource(s) of Resource(0x000002236E0E00D0:'Unnamed ID3D12Resource Object') in separate Barrier Descs which is inefficient and likely unintentional. Desc[0] and Desc[1] on (subresource : 4294967295). [RESOURCE_MANIPULATION WARNING #1008: RESOURCE_BARRIER_DUPLICATE_SUBRESOURCE_TRANSITIONS] D3D12 ERROR: ID3D12CommandList::ResourceBarrier: Before state (0x0: D3D12_RESOURCE_STATE_[COMMON\|PRESENT]) of resource (0x000002236E0E00D0:'Unnamed ID3D12Resource Object') (subresource: 0) specified by transition barrier does not match with the state (0x20000: D3D12_RESOURCE_STATE_VIDEO_DECODE_WRITE) specified in the previous call to ResourceBarrier [RESOURCE_MANIPULATION ERROR #527: RESOURCE_BARRIER_BEFORE_AFTER_MISMATCH] Tested-by: Tong Wu <tong1.wu@intel.com> Signed-off-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-05 11:08:17 +08:00
Tong Wu	270cd14bbb	avcodec/dxva2(h264\|mpeg2\|vc1): use av_assert0 instead of assert Signed-off-by: Tong Wu <tong1.wu@intel.com>	2024-01-05 11:06:57 +08:00
Tong Wu	0f01581ccd	avcodec/d3d12va_decode\|dxva2: add a warning to replace assertion Previous assertion was not useful. Now a warning is added to replace it. For get_surface_index, we should return a zero index in case the index is not found. But a warning is necessary to notify. Signed-off-by: Tong Wu <tong1.wu@intel.com>	2024-01-05 11:06:57 +08:00
Tong Wu	56c671c3b0	avcodec/d3d12va_h264: replace assert with av_assert0 Signed-off-by: Tong Wu <tong1.wu@intel.com>	2024-01-05 11:06:57 +08:00
Tong Wu	83e0fcbe03	avcodec/d3d12va_decode: delete an empty line and fix a fuction alignment Signed-off-by: Tong Wu <tong1.wu@intel.com>	2024-01-05 11:06:57 +08:00
Tong Wu	d18ed2ddb5	avcodec/d3d12va_vp9: fix vp9 max_num_refs value Previous max_num_refs was based on pp.frame_refs plus 1 and it could possibly reaches the size limit. Actually it should be the size of pp.ref_frame_map plus 1. Signed-off-by: Tong Wu <tong1.wu@intel.com>	2024-01-05 11:06:57 +08:00
Zhao Zhili	33698ef891	avcodec/mpegutils: print axis in debug_info2 For example, ./ffmpeg -nostats -threads 1 -debug qp \ -export_side_data +venc_params \ -i reinit-small_420_9-to-small_420_8.h264 \ -frames 2 \ -f null - New frame, type: B 0 64 128 192 0 313131313131313131313131313129 16 292929292929292929292929292929 32 323232323232323232323232323232 48 323232323232323232323232323232 64 323232323232323232323232323232 80 323232323232323232323232323232 96 323232323030303030303030303030 112 303030303030303030303030303030 128 303030303030303030303030303028 144 313131312929292929292929292929 160 292929292929292929292929292929 176 292929292929292929292929292931 192 312831312631313131312730283131 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-01-04 17:35:11 +08:00
Zhao Zhili	bd48c08f80	avcodec/mpegutils: make debug_info2 thread safe Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-01-04 17:33:56 +08:00
Zhao Zhili	7f900a737f	avcodec/videotoolbox: specify color range for hw frame ctx Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-01-04 17:33:38 +08:00
Zhao Zhili	0f824d792d	avcodec/hevc_parser: fix missing zero_byte at frame beginning The start code is matched against 0x000001, zero_byte was treated as last byte of last frame rather than the beginning of next frame. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-01-05 01:14:33 +08:00
Zhao Zhili	b7ac1f9856	avcodec/hevc_mp4toannexb_bsf: use HEVCNALUnitType instead of integer literal Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-01-05 01:14:33 +08:00
Nuo Mi	301ed950d1	vvcdec: add vvc decoder vvc decoder plug-in to avcodec. split frames into slices/tiles and send them to vvc_thread for further decoding reorder and wait for the frame decoding to be done and output the frame Features: + Support I, P, B frames + Support 8/10/12 bits, chroma 400, 420, 422, and 444 and range extension + Support VVC new tools like MIP, CCLM, AFFINE, GPM, DMVR, PROF, BDOF, LMCS, ALF + 295 conformace clips passed - Not support RPR, IBC, PALETTE, and other minor features yet Performance: C code FPS on an i7-12700K (x86): BQTerrace_1920x1080_60_10_420_22_RA.vvc 93.0 Chimera_8bit_1080P_1000_frames.vvc 184.3 NovosobornayaSquare_1920x1080.bin 191.3 RitualDance_1920x1080_60_10_420_32_LD.266 150.7 RitualDance_1920x1080_60_10_420_37_RA.266 170.0 Tango2_3840x2160_60_10_420_27_LD.266 33.7 C code FPS on a M1 Mac Pro (ARM): BQTerrace_1920x1080_60_10_420_22_RA.vvc 58.7 Chimera_8bit_1080P_1000_frames.vvc 153.3 NovosobornayaSquare_1920x1080.bin 150.3 RitualDance_1920x1080_60_10_420_32_LD.266 105.0 RitualDance_1920x1080_60_10_420_37_RA.266 133.0 Tango2_3840x2160_60_10_420_27_LD.266 21.7 Asm optimizations still working in progress. please check https://github.com/ffvvc/FFmpeg/wiki#performance-data for the latest Contributors (based on code merge order): Nuo Mi <nuomi2021@gmail.com> Xu Mu <toxumu@outlook.com> Frank Plowman <post@frankplowman.com> Shaun Loo <shaunloo10@gmail.com> Wu Jianhua <toqsxw@outlook.com> Thank you for reporting issues and providing performance reports: Łukasz Czech <lukaszcz18@wp.pl> Xu Fulong <839789740@qq.com> Thank you for providing review comments: Ronald S. Bultje <rsbultje@gmail.com> James Almer <jamrial@gmail.com> Andreas Rheinhardt <andreas.rheinhardt@outlook.com> Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:12 +08:00
Nuo Mi	e7ef457d6b	vvcdec: add CTU thread logical This is the main entry point for the CTU (Coding Tree Unit) decoder. The code will divide the CTU decoder into several stages. It will check the stage dependencies and run the stage decoder. Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:12 +08:00
Nuo Mi	07f75d5e02	vvcdec: add CTU parser Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:12 +08:00
Nuo Mi	b49575f4cf	vvcdec: add dsp init and inv transform Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:12 +08:00
Nuo Mi	02c1455b44	vvcdec: add LMCS, Deblocking, SAO, and ALF filters Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:12 +08:00
Nuo Mi	c05ba94ce8	vvcdec: add intra prediction Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:12 +08:00
Nuo Mi	2592cc1f96	vvcdec: add inv transform 1d Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:11 +08:00
Nuo Mi	ea49c83bad	vvcdec: add inter prediction Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:11 +08:00
Nuo Mi	603d0bd171	vvcdec: add motion vector decoder Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:11 +08:00
Nuo Mi	c1a3d17491	vvcdec: add reference management Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:11 +08:00
Nuo Mi	976d3b7d69	vvcdec: add cabac decoder add Context-based Adaptive Binary Arithmetic Coding (CABAC) decoder Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 23:15:06 +08:00
Nuo Mi	e97a5bbb13	vvcdec: add parameter parser for sps, pps, ph, sh Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 16:31:59 +08:00
Nuo Mi	49db9fc171	vvcdec: add vvc_data Co-authored-by: Xu Mu <toxumu@outlook.com> Co-authored-by: Frank Plowman <post@frankplowman.com> Co-authored-by: Shaun Loo <shaunloo10@gmail.com> Co-authored-by: Wu Jianhua <toqsxw@outlook.com>	2024-01-03 16:31:59 +08:00
James Almer	85b8d59ec7	avcodec/d3d12va_mpeg2: change the type for the ID3D12Resource_Map input data argument Fixes -Wincompatible-pointer-types warnings. Reviewed-by: Wu Jianhua <toqsxw@outlook.com> Signed-off-by: James Almer <jamrial@gmail.com>	2024-01-01 09:46:41 -03:00
James Almer	e9722735fa	avcodec/d3d12va_mpeg2: remove unused variables Reviewed-by: Wu Jianhua <toqsxw@outlook.com> Signed-off-by: James Almer <jamrial@gmail.com>	2024-01-01 09:46:06 -03:00
Michael Niedermayer	e063c1d079	avcodec/mpegvideo_enc: Use ptrdiff_t for stride Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-30 21:50:05 +01:00
Michael Niedermayer	a066b8a809	avcodec/mpegvideo_enc: Dont copy beyond the image Fixes: out of array access Fixes: tickets/10754/poc17ffmpeg Discovered by Zeng Yunxiang. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-30 21:50:04 +01:00
Michael Niedermayer	bf1159774b	avcodec/vaapi_encode: Avoid double AVERRORS Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-29 21:36:02 +01:00
Michael Niedermayer	850ab8f6da	avcodec/jpegxl_parser: Check get_vlc2() Fixes: shift exponent -1 is negative Fixes: 63889/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-6009343056936960 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-29 19:21:26 +01:00
Michael Niedermayer	d909d8e5e0	avcodec/leaddec: Check remaining bits in decode_block() Fixes: Timeout Fixes: 64163/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_LEAD_fuzzer-6418925835124736 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-29 01:15:42 +01:00
Michael Niedermayer	5f88458bea	avcodec/jpegxl_parser: Add padding to cs_buffer Fixes: out of array access Fixes: 64081/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-6151006496620544 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-29 01:15:42 +01:00
Michael Niedermayer	c72a20f01a	avcodec/jpeglsdec: Check Jpeg-LS LSE Fixes: signed integer overflow: 2147478526 + 33924 cannot be represented in type 'int' Fixes: shift exponent 32 is too large for 32-bit type 'unsigned int' Fixes: 64243/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEGLS_fuzzer-5195717848989696 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-29 01:00:48 +01:00
Michael Niedermayer	c75fccd1d4	avcodec/osq: Implement flush() Fixes: out of array access Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6227491892887552 Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6268561729126400 Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6414805046788096 Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6538151088488448 Fixes: 62164/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6608131540779008 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-29 00:45:20 +01:00
jinbo	545686e49e	avcodec/hevc: Add init for sao_edge_filter Forgot to init c->sao_edge_filter[idx] when idx=0/1/2/3. After this patch, the speedup of decoding H265 4K 30FPS 30Mbps on 3A6000 is about 7% (42fps==>45fps). Change-Id: I521999b397fa72b931a23c165cf45f276440cdfb Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-29 00:45:20 +01:00
Devin Heitmueller	b2c82b23b9	avcodec/bitpacked_dec: optimize bitpacked_decode_yuv422p10 Rework the code a bit to speed up the 10-bit bitpacked decoding routine. This is probably about as fast as I can get it without switching to assembly language. Demonstratable with: ./ffmpeg -f lavfi -i "smptehdbars=size=3840x2160" -c bitpacked -f image2 -frames:v 1 source.yuv ./ffmpeg -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le out.yuv On my development system, it went from 80ms for a 2160p frame down to 20ms (i.e. a 4X speedup). Good enough for now, I hope... Comments from Marton: Originally on my system better performance could be achieved by simply switching to the cached bitstream reader, but for Devin it was slower than his direct byte operations. I changed the order of writing output from u/y/v/y to u/v/y/y, and that made the code faster than the cached bitstream reader on my system as well. TIMER measurement of the decode loop on Ryzen 5 3600 with command line: ./ffmpeg -stream_loop 256 -threads 1 -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le -f null none -loglevel error Before: 823204127 decicycles in YUV, 256 runs, 0 skips After: 315070524 decicycles in YUV, 256 runs, 0 skips Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com> Signed-off-by: Marton Balint <cus@passwd.hu>	2023-12-28 23:56:14 +01:00
Marton Balint	059ea1d6f6	avcodec/mjpegdec: avoid indirection when accessing avctx Signed-off-by: Marton Balint <cus@passwd.hu>	2023-12-28 23:15:56 +01:00
Marton Balint	e6b9bfaac3	avcodec/mjpegdec: use memset to clear alpha Signed-off-by: Marton Balint <cus@passwd.hu>	2023-12-28 23:15:56 +01:00
Leo Izen	fb54c89a0d	avcodec/jpegxl_parser: check ANS cluster alphabet size vs bundle size The specification doesn't mention that clusters cannot have alphabet sizes greater than 1 << bundle->log_alphabet_size, but the reference implementation rejects these entropy streams as invalid, so we should too. Refusing to do so can overflow a stack variable that should be large enough otherwise. Fixes #10738. Found-by: Zeng Yunxiang and Li Zeyuan Signed-off-by: Leo Izen <leo.izen@gmail.com>	2023-12-27 10:10:09 -05:00
James Almer	4fee63b241	x86/takdsp: add missing wrappers to AVX2 functions Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-25 22:31:15 -03:00
James Almer	591dc3b4b8	x86/takdsp: add avx2 versions of all functions On an Intel Core i7 12700k: decorrelate_ls_c: 814.3 decorrelate_ls_sse2: 165.8 decorrelate_ls_avx2: 101.3 decorrelate_sf_c: 1602.6 decorrelate_sf_sse4: 640.1 decorrelate_sf_avx2: 324.6 decorrelate_sm_c: 1564.8 decorrelate_sm_sse2: 379.3 decorrelate_sm_avx2: 203.3 decorrelate_sr_c: 785.3 decorrelate_sr_sse2: 176.3 decorrelate_sr_avx2: 99.8 Tested-by: Lynne <dev@lynne.ee> Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-23 08:39:22 -03:00
Andreas Rheinhardt	370ce305f4	avcodec/libjxlenc: Set AV_CODEC_CAP_DR1 This encoder uses ff_get_encode_buffer() to allocate the packet buffer. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-12-22 22:08:29 -05:00
Andreas Rheinhardt	577dd7b762	avcodec/libjxlenc: Don't refer to decoder in comments Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-12-22 22:08:29 -05:00
Leo Izen	5942cf46b6	avcodec/libjxldec: emit proper PTS to decoded AVFrame If a sequence of JXL images is encapsulated in a container that has PTS information, we should use the PTS information from the container. At this time there is no container that does this, but if JPEG XL support is ever added to NUT, AVTransport, or some other container, this commit should allow the PTS information those containers provide to work as expected. Signed-off-by: Leo Izen <leo.izen@gmail.com>	2023-12-22 22:08:29 -05:00
Leo Izen	42f78925d7	avcodec/libjxlenc: accept rgbf32 and rgbaf32 frames These pixel formats have always been supported by libjxl, but at the time this plugin was written, they were not in FFmpeg yet. Now that they are in FFmpeg, we should support them. Signed-off-by: Leo Izen <leo.izen@gmail.com>	2023-12-22 22:08:29 -05:00
Leo Izen	f6ef6a853c	avcodec/libjxldec: produce rgbf32 and rgbaf32 frames These pixel formats have always been supported by libjxl, but at the time this plugin was written, they were not in FFmpeg yet. Now that they are in FFmpeg, we should support them. Signed-off-by: Leo Izen <leo.izen@gmail.com>	2023-12-22 22:08:29 -05:00
Leo Izen	4013b8d3f0	avcodec/pngdec: improve handling of bad cICP range tags FFmpeg doesn't support tv-range RGB throughout most of its pipeline, so we should keep the warning. However, in case something does support it we should at least keep it tagged properly. Additionally, the encoder writes this tag if the space is tagged as such so this makes a round trip work as it should. Also, PNG doesn't support nonzero matrices but we only warn and ignore in that case, so we have no reason to error out for illegal cICP ranges either (i.e. greater than 1). Signed-off-by: Leo Izen <leo.izen@gmail.com> Reported-by: Kacper Michajłow <kasper93@gmail.com>	2023-12-22 22:07:35 -05:00
sunyuechi	3d39b8d4e7	lavc/takdsp: R-V V decorrelate_sm C908: decorrelate_sm_c: 130.0 decorrelate_sm_rvv_i32: 43.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net> (with minor changes)	2023-12-22 17:40:00 +02:00
Andreas Rheinhardt	0c6203c97a	all: Don't set AVClass.item_name to its default value Unnecessary since acf63d5350adeae551d412db699f8ca03f7e76b9; also avoids relocations. Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-12-22 15:12:33 +01:00
James Almer	46775e64f8	avcodec/takdsp: fix const correctness Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-22 09:28:04 -03:00
Andreas Rheinhardt	45b4781e9a	avcodec/v4l2_m2m: Remove redundant av_frame_unref() This frame will be freed in the next line. Reviewed-by: Zhao Zhili <quinkblack@foxmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-12-21 23:29:02 +01:00
Rémi Denis-Courmont	0f05f9ed3e	mlp: move pack_output pointer to decoder context The current pack_output function pointer is a property of the decoder, rather than a constant method provided by the DSP code. Indeed, except for an unused initialisation, the field is never used in DSP code.	2023-12-21 22:42:34 +02:00
sunyuechi	c933ff2779	lavc/takdsp: R-V V decorrelate_sr C908: decorrelate_sr_c: 95.5 decorrelate_sr_rvv_i32: 28.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-21 22:42:34 +02:00
sunyuechi	864174dd00	lavc/takdsp: R-V V decorrelate_ls C908: decorrelate_ls_c: 69.7 decorrelate_ls_rvv_i32: 27.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-21 22:42:34 +02:00
Rémi Denis-Courmont	cdd38a2ffe	lavc/aacpsdsp: fix R-V V stereo interpolate The penultimate loop iteration could pick any vl such that: vlenb/4 < vl <= vlenb/2 Thus if the total length is not a multiple of vlenb/2, the vfadd.vf on the penultimate iteration would yield corrupt values for the last iteration. To avoid this, force vl = vlen/2 until the last iteration. Unfortunately this latent bug is not reproducible with either hardware or QEMU as of now.	2023-12-21 17:54:23 +02:00
Rémi Denis-Courmont	db32f75c63	lavc/opusdsp: simplify R-V V postfilter This skips the round-trip to scalar register for the sliding 'x' coefficients, improving performance by about 5%. The trick here is that the vector slide-up instruction preserves elements in destination vector until the slide offset. The switch from vfslide1up.vf to vslideup.vi also allows the elimination of data dependencies on consecutive slides. Since the specifications recommend sticking to power of two offsets, we could slide as follows: vslideup.vi v8, v0, 2 vslideup.vi v4, v0, 1 vslideup.vi v12, v8, 1 vslideup.vi v16, v8, 2 However in the device under test, this seems to make performance slightly worse, so this is left for (in)validation with future better hardware.	2023-12-21 17:54:08 +02:00
Martin Storsjö	327685bafe	d3d12va: Add a missing include for the declaration of ff_d3d12va_get_surface_index This fixes the following build error: src/libavcodec/d3d12va_decode.c:49:10: error: no previous prototype for function 'ff_d3d12va_get_surface_index' [-Werror,-Wmissing-prototypes] 49 \| unsigned ff_d3d12va_get_surface_index(const AVCodecContext *avctx, \| ^ Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-21 13:55:16 +02:00
Tong Wu	6d5fbea289	avcodec/d3d12va_hevc: enable allow_profile_mismatch flag for d3d12va msp profile Same as d3d11va, this flag enables main still picture profile for d3d12va. User should add this flag when decoding main still picture profile. Signed-off-by: Tong Wu <tong1.wu@intel.com>	2023-12-21 16:15:23 +08:00
Wu Jianhua	ffa158edbd	avcodec: add D3D12VA hardware accelerated VC1 decoding The command below is how to enable d3d12va: ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4 Signed-off-by: Wu Jianhua <toqsxw@outlook.com> Signed-off-by: Tong Wu <tong1.wu@intel.com>	2023-12-21 16:15:23 +08:00
Wu Jianhua	c6c05dd34a	avcodec: add D3D12VA hardware accelerated MPEG-2 decoding The command below is how to enable d3d12va: ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4 Signed-off-by: Wu Jianhua <toqsxw@outlook.com> Signed-off-by: Tong Wu <tong1.wu@intel.com>	2023-12-21 16:15:23 +08:00
Wu Jianhua	b16fd96c5f	avcodec: add D3D12VA hardware accelerated AV1 decoding The command below is how to enable d3d12va: ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4 Signed-off-by: Wu Jianhua <toqsxw@outlook.com> Signed-off-by: Tong Wu <tong1.wu@intel.com>	2023-12-21 16:15:23 +08:00
Wu Jianhua	326288c70a	avcodec: add D3D12VA hardware accelerated VP9 decoding The command below is how to enable d3d12va: ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4 Signed-off-by: Wu Jianhua <toqsxw@outlook.com> Signed-off-by: Tong Wu <tong1.wu@intel.com>	2023-12-21 16:15:23 +08:00
Wu Jianhua	cbb93c4ff6	avcodec: add D3D12VA hardware accelerated HEVC decoding The command below is how to enable d3d12va: ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4 Signed-off-by: Wu Jianhua <toqsxw@outlook.com> Signed-off-by: Tong Wu <tong1.wu@intel.com>	2023-12-21 16:15:23 +08:00
Wu Jianhua	349ce30e4e	avcodec: add D3D12VA hardware accelerated H264 decoding The implementation is based on: https://learn.microsoft.com/en-us/windows/win32/medfound/direct3d-12-video-overview With the Direct3D 12 video decoding support, we can render or process the decoded images by the pixel shaders or compute shaders directly without the extra copy overhead, which is beneficial especially if you are trying to render or post-process a 4K or 8K video. The command below is how to enable d3d12va: ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4 Signed-off-by: Wu Jianhua <toqsxw@outlook.com> Signed-off-by: Tong Wu <tong1.wu@intel.com>	2023-12-21 16:15:23 +08:00
Zhao Zhili	287e22f745	avcodec/mediacodecenc: set quality in cq mode From AOSP doc, these values are device and codec specific, but lower values generally result in more efficient (smaller-sized) encoding. For example, global_quality 50 on Pixel 6 results a 1080P 30 FPS HEVC with 3744 kb/s, while global_quality 80 results 28178 kb/s. Fix #10689 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-12-21 19:28:47 +08:00
Michael Niedermayer	a6a553ba94	avcodec/cbs_vp8: fix GetBitContext setup Fixes: abort() Fixes: 64232/clusterfuzz-testcase-minimized-ffmpeg_BSF_TRACE_HEADERS_fuzzer-5417957987319808 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-19 00:27:26 +01:00
Kalev Lember	b391fdbf1a	lavc/libopenh264: Drop openh264 runtime version checks With the way the runtime checks are currently set up, every single openh264 release, no matter how minor, is considered an ABI break and requires ffmpeg recompilation. This is unnecessarily strict because it doesn't allow downstream distributions to ship any openh264 bug fix version updates without breaking ffmpeg's openh264 support. Years ago, at the time when ffmpeg's openh264 support was merged, openh264 releases were done without a versioned soname (the library was just libopenh264.so, unversioned). Since then, starting with version 1.3.0, openh264 has started using versioned sonames and the intent has been to bump the soname every time there's a new release with an ABI change. This patch drops the exact version check and instead adds a minimum requirement on 1.3.0 to the configure script. Signed-off-by: Kalev Lember <klember@redhat.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-18 23:59:51 +02:00
James Almer	0cc0d8c0b5	avcodec/get_bits: add get_leb() Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-18 15:19:36 -03:00
James Almer	12eac23637	avcodec/packet: add IAMF Parameters side data types Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-18 15:19:30 -03:00
Rémi Denis-Courmont	419145c11b	lavc/vc1dsp: fix R-V V vector lengths The 8x4 and 4x4 use a needlessly large multiplier (unless/until we care about embedded 64-bit-vector hardware). This is merely suboptimal. The 8x4 case also uses an incorrect vector length, which leads to incorrect behaviour on future/hypothetical hardware with 256-bit or larger vectors. Pointed-out-by: Martin Storsjö <martin@martin.st>	2023-12-17 09:27:52 +02:00
Martin Storsjö	b51d9eb58e	riscv: vc1dsp: Don't check vlenb before checking the CPU flags We can't call ff_get_rv_vlenb() if we don't have RVV available at all. Acked-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-16 22:30:26 +02:00
Rémi Denis-Courmont	918b3ed2d5	lavc/lpc: R-V V compute_autocorr The loop iterates over the length of the vector, not the order. This is to avoid reloading the same data for each lag value. However this means the loop only works if the maximum order is no larger than VLENB. The loop is roughly equivalent to: for (size_t j = 0; j < lag; j++) autoc[j] = 1.; while (len > lag) { for (ptrdiff_t j = 0; j < lag; j++) autoc[j] += data[j] * data; data++; len--; } while (len > 0) { for (ptrdiff_t j = 0; j < len; j++) autoc[j] += data[j] *data; data++; len--; } Since register pressure is only at 50%, it should be possible to implement the same loop for order up to 2xVLENB. But this is left for future work. Performance numbers are all over the place from ~1.25x to ~4x speedups, but at least they are always noticeably better than nothing.	2023-12-16 11:18:01 +02:00
Nuo Mi	ce0c178a40	avcodec/cbs_h266: more restrictive check on pps_tile_idx_delta_val Fixes: out of array access Fixes: 62603/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-5837632490569728 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-14 23:53:10 +01:00
Pierre-Anthony Lemieux	a1384b4e86	avcodec/jpeg2000htdec: check if block decoding will exceed internal precision Intended to replace https://patchwork.ffmpeg.org/project/ffmpeg/patch/20230802000135.26482-3-michael@niedermayer.cc/ with a more accurate block decoding magnitude bound. Fixes: 62433/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5828618092937216 Fixes: 58299/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5828618092937216 Previous-version-reviewed-by: Tomas Härdin <git@haerdin.se> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-14 23:53:10 +01:00
James Almer	34d56e1766	x86/aacencdsp: clear the high bits for size in ff_abs_pow34_sse Fixes checkasm failures on win64. Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-12 15:24:08 -03:00
sunyuechi	98596f90f4	lavc/aacencdsp: R-V V abs_pow34 C908: abs_pow34_c: 535.5 abs_pow34_rvv_f32: 337.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-11 18:42:07 +02:00
sunyuechi	e880a97e7c	lvac/aacenc: add ff_aac_dsp_init This is for clarity and use in testing, consistent with other parts of the code. Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-11 18:42:04 +02:00
Rémi Denis-Courmont	272d0c164d	lavc/lpc: R-V V apply_welch_window apply_welch_window_even_c: 617.5 apply_welch_window_even_rvv_f64: 235.0 apply_welch_window_odd_c: 709.0 apply_welch_window_odd_rvv_f64: 256.5	2023-12-11 18:17:43 +02:00
Rémi Denis-Courmont	b3825bbe45	riscv: test for assembler support This should fix the build on LLVM 16 and earlier, at the cost of turning all non-RVV optimisations off.	2023-12-08 17:21:09 +02:00
sunyuechi	0b9d009b4a	lavc/vc1dsp: R-V V inv_trans C908: vc1dsp.vc1_inv_trans_4x4_dc_c: 125.7 vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5 vc1dsp.vc1_inv_trans_4x8_dc_c: 230.7 vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.5 vc1dsp.vc1_inv_trans_8x4_dc_c: 228.7 vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 64.5 vc1dsp.vc1_inv_trans_8x8_dc_c: 476.5 vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 80.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-08 17:20:48 +02:00
Mikhail Nitenko	0f745b74ec	lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions Benchmarks A53 A55 A72 A76 avg_h264_qpel_8_mc01_10_c: 936.5 924.0 656.0 504.7 avg_h264_qpel_8_mc01_10_neon: 234.7 202.0 120.7 63.2 avg_h264_qpel_8_mc02_10_c: 921.0 920.0 669.2 493.7 avg_h264_qpel_8_mc02_10_neon: 202.0 173.2 102.7 58.5 avg_h264_qpel_8_mc03_10_c: 936.5 924.0 656.0 509.5 avg_h264_qpel_8_mc03_10_neon: 236.2 203.7 120.0 63.2 avg_h264_qpel_8_mc10_10_c: 1441.0 1437.7 806.7 478.5 avg_h264_qpel_8_mc10_10_neon: 325.7 324.0 153.7 94.2 avg_h264_qpel_8_mc11_10_c: 2160.7 2148.2 1366.7 906.7 avg_h264_qpel_8_mc11_10_neon: 492.0 464.0 242.5 134.5 avg_h264_qpel_8_mc13_10_c: 2157.0 2138.2 1357.0 908.2 avg_h264_qpel_8_mc13_10_neon: 494.0 467.2 242.0 140.0 avg_h264_qpel_8_mc20_10_c: 1433.5 1410.0 785.2 486.0 avg_h264_qpel_8_mc20_10_neon: 293.7 289.7 138.0 91.5 avg_h264_qpel_8_mc30_10_c: 1458.5 1461.7 813.7 483.2 avg_h264_qpel_8_mc30_10_neon: 341.7 339.2 154.0 95.2 avg_h264_qpel_8_mc31_10_c: 2194.7 2197.2 1358.7 928.0 avg_h264_qpel_8_mc31_10_neon: 520.0 495.0 245.5 142.5 avg_h264_qpel_8_mc33_10_c: 2188.0 2205.5 1356.7 910.7 avg_h264_qpel_8_mc33_10_neon: 521.0 494.5 245.7 145.7 avg_h264_qpel_16_mc01_10_c: 3717.2 3595.0 2610.0 2012.0 avg_h264_qpel_16_mc01_10_neon: 920.5 791.5 483.2 240.5 avg_h264_qpel_16_mc02_10_c: 3684.0 3633.0 2659.0 1919.7 avg_h264_qpel_16_mc02_10_neon: 790.7 678.2 409.2 217.0 avg_h264_qpel_16_mc03_10_c: 3726.5 3596.0 2606.7 2010.0 avg_h264_qpel_16_mc03_10_neon: 922.0 792.5 483.2 239.7 avg_h264_qpel_16_mc10_10_c: 5912.0 5803.2 3241.5 1916.7 avg_h264_qpel_16_mc10_10_neon: 1267.5 1277.2 616.5 365.0 avg_h264_qpel_16_mc11_10_c: 8599.2 8482.5 5338.0 3616.2 avg_h264_qpel_16_mc11_10_neon: 1913.0 1827.0 956.2 542.2 avg_h264_qpel_16_mc13_10_c: 8643.7 8488.5 5388.0 3628.5 avg_h264_qpel_16_mc13_10_neon: 1914.7 1828.7 969.2 530.5 avg_h264_qpel_16_mc20_10_c: 5719.5 5641.0 3147.0 1946.2 avg_h264_qpel_16_mc20_10_neon: 1139.5 1150.0 539.5 344.0 avg_h264_qpel_16_mc30_10_c: 5930.0 5872.5 3267.5 1918.0 avg_h264_qpel_16_mc30_10_neon: 1331.5 1341.2 616.5 369.5 avg_h264_qpel_16_mc31_10_c: 8758.7 8697.7 5353.0 3630.7 avg_h264_qpel_16_mc31_10_neon: 2018.7 1941.7 982.2 574.7 avg_h264_qpel_16_mc33_10_c: 8683.2 8675.2 5339.2 3634.7 avg_h264_qpel_16_mc33_10_neon: 2019.7 1940.2 994.5 566.0 put_h264_qpel_8_mc01_10_c: 854.2 843.0 599.2 478.0 put_h264_qpel_8_mc01_10_neon: 192.7 168.0 101.7 56.7 put_h264_qpel_8_mc02_10_c: 766.5 760.0 550.2 441.0 put_h264_qpel_8_mc02_10_neon: 160.0 139.2 88.7 53.0 put_h264_qpel_8_mc03_10_c: 854.2 843.0 599.2 479.0 put_h264_qpel_8_mc03_10_neon: 194.2 169.7 102.0 56.2 put_h264_qpel_8_mc10_10_c: 1352.7 1353.7 749.7 446.7 put_h264_qpel_8_mc10_10_neon: 289.7 294.2 135.5 88.5 put_h264_qpel_8_mc11_10_c: 2080.0 2066.2 1309.5 876.7 put_h264_qpel_8_mc11_10_neon: 450.0 429.7 229.7 131.2 put_h264_qpel_8_mc13_10_c: 2074.7 2060.2 1294.5 870.5 put_h264_qpel_8_mc13_10_neon: 452.5 434.5 226.5 130.0 put_h264_qpel_8_mc20_10_c: 1221.5 1216.0 684.5 399.7 put_h264_qpel_8_mc20_10_neon: 257.7 262.5 121.2 78.7 put_h264_qpel_8_mc30_10_c: 1379.0 1374.7 757.2 449.5 put_h264_qpel_8_mc30_10_neon: 305.7 310.2 135.5 86.5 put_h264_qpel_8_mc31_10_c: 2109.2 2119.7 1299.5 878.0 put_h264_qpel_8_mc31_10_neon: 478.0 458.5 226.0 137.2 put_h264_qpel_8_mc33_10_c: 2101.5 2115.2 1306.5 887.0 put_h264_qpel_8_mc33_10_neon: 479.0 458.7 229.7 141.7 put_h264_qpel_16_mc01_10_c: 3485.7 3396.7 2460.5 1914.5 put_h264_qpel_16_mc01_10_neon: 752.5 665.5 397.0 213.2 put_h264_qpel_16_mc02_10_c: 3103.5 3023.2 2154.7 1720.7 put_h264_qpel_16_mc02_10_neon: 622.7 551.2 347.7 196.2 put_h264_qpel_16_mc03_10_c: 3486.2 3394.0 2436.5 1917.7 put_h264_qpel_16_mc03_10_neon: 754.0 666.5 397.0 215.7 put_h264_qpel_16_mc10_10_c: 5533.0 5488.5 2989.0 1783.0 put_h264_qpel_16_mc10_10_neon: 1123.5 1165.2 535.2 334.7 put_h264_qpel_16_mc11_10_c: 8437.7 8281.2 5209.0 3510.7 put_h264_qpel_16_mc11_10_neon: 1745.0 1697.0 878.5 513.5 put_h264_qpel_16_mc13_10_c: 8567.7 8468.0 5221.5 3528.0 put_h264_qpel_16_mc13_10_neon: 1751.7 1698.2 889.2 507.0 put_h264_qpel_16_mc20_10_c: 4907.5 4885.0 2786.2 1607.5 put_h264_qpel_16_mc20_10_neon: 995.5 1034.5 475.5 307.0 put_h264_qpel_16_mc30_10_c: 5579.7 5537.7 3045.2 1789.5 put_h264_qpel_16_mc30_10_neon: 1187.5 1231.2 532.5 334.5 put_h264_qpel_16_mc31_10_c: 8677.2 8672.5 5204.2 3516.0 put_h264_qpel_16_mc31_10_neon: 1850.7 1813.2 893.0 545.2 put_h264_qpel_16_mc33_10_c: 8688.7 8671.2 5223.2 3512.0 put_h264_qpel_16_mc33_10_neon: 1851.7 1814.2 908.5 535.2 Signed-off-by: Mikhail Nitenko <mnitenko@gmail.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-07 23:20:14 +02:00
sunyuechi	8bdb663062	lavc/ac3dsp: R-V V float_to_fixed24 c910 float_to_fixed24_c: 2207.2 float_to_fixed24_rvv_f32: 696.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-06 16:04:22 +02:00
Paul B Mahol	7e453dad3c	avcodec/qoadec: fix overreads and fix packet size check	2023-12-05 14:50:21 +01:00
Michael Niedermayer	22daf2148f	avcodec/av1dec: Fix resolving zero divisor Fixes: Out of array read Fixes: global-buffer-overflow-AV1 Found-by: "Leonelli, Matteo" <matteo.leonelli@cispa.de> Tested-by: "Wang, Fei W" <fei.w.wang@intel.com> Reviewed-by: "Wang, Fei W" <fei.w.wang@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-05 12:38:16 +01:00
Leo Izen	c4be080e65	avcodec/jpegxl_parser: fix parsing sequences of extremely small files This patch allows the JXL parser to parse sequences of extremely small files concatenated together. (e.g. smaller than the parser buffer) Signed-off-by: Leo Izen <leo.izen@gmail.com>	2023-12-05 05:54:34 -05:00
Leo Izen	019b3ea65a	avcodec/jpegxl_parse{,r}: use correct ISOBMFF extended size location According to ISO/IEC 14996-12, size == 1 means a 64-bit extended-size field occurs after the 32-bit box type, not before. This fix should allow correct parsing of JXL files with extended-size boxes. Signed-off-by: Leo Izen <leo.izen@gmail.com>	2023-12-05 05:53:32 -05:00
Haihao Xiang	fc73b372cd	lavc/qsvdec: reduce info message when more data is required demote the info to AV_LOG_VERBOSE Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2023-12-05 10:10:57 +08:00
Haihao Xiang	e233f3e75f	lavc/qsvdec: return 0 if more data is required The type of qsv decoders is FF_CODEC_CB_TYPE_DECODE which must not return AVERROR(EAGAIN). commit `42b20c9` added an assertion to check the returned value. Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2023-12-05 10:10:57 +08:00
Lynne	8c117b75af	lavc/Makefile: build vulkan decode code if vulkan_av1 has been enabled Forgotten. Reviewed-by: Neal Gompa <ngompa13@gmail.com> Tested-by: Neal Gompa <ngompa13@gmail.com>	2023-12-04 07:57:27 +01:00
Paul B Mahol	0a13178de8	avcodec/qoadec: add support for midstream sample rate/layout changes	2023-12-02 16:51:00 +01:00
Anton Khirnov	5230257ea1	lavc/dvdsubenc: only check canvas size when it is actually set Fixes #10650	2023-12-02 11:22:46 +01:00
Logan Lyu	fa0470347e	lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_hv put_hevc_qpel_bi_hv4_8_c: 433.7 put_hevc_qpel_bi_hv4_8_i8mm: 117.9 put_hevc_qpel_bi_hv6_8_c: 803.9 put_hevc_qpel_bi_hv6_8_i8mm: 252.7 put_hevc_qpel_bi_hv8_8_c: 1296.4 put_hevc_qpel_bi_hv8_8_i8mm: 316.2 put_hevc_qpel_bi_hv12_8_c: 2867.4 put_hevc_qpel_bi_hv12_8_i8mm: 669.2 put_hevc_qpel_bi_hv16_8_c: 4709.4 put_hevc_qpel_bi_hv16_8_i8mm: 929.9 put_hevc_qpel_bi_hv24_8_c: 9639.7 put_hevc_qpel_bi_hv24_8_i8mm: 2072.4 put_hevc_qpel_bi_hv32_8_c: 16663.7 put_hevc_qpel_bi_hv32_8_i8mm: 3391.4 put_hevc_qpel_bi_hv48_8_c: 36972.9 put_hevc_qpel_bi_hv48_8_i8mm: 7505.7 put_hevc_qpel_bi_hv64_8_c: 64106.4 put_hevc_qpel_bi_hv64_8_i8mm: 13145.2 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	595f97028b	lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_v put_hevc_qpel_bi_v4_8_c: 166.1 put_hevc_qpel_bi_v4_8_neon: 61.9 put_hevc_qpel_bi_v6_8_c: 309.4 put_hevc_qpel_bi_v6_8_neon: 75.6 put_hevc_qpel_bi_v8_8_c: 531.1 put_hevc_qpel_bi_v8_8_neon: 78.1 put_hevc_qpel_bi_v12_8_c: 1139.9 put_hevc_qpel_bi_v12_8_neon: 238.1 put_hevc_qpel_bi_v16_8_c: 2063.6 put_hevc_qpel_bi_v16_8_neon: 308.9 put_hevc_qpel_bi_v24_8_c: 4317.1 put_hevc_qpel_bi_v24_8_neon: 629.9 put_hevc_qpel_bi_v32_8_c: 8241.9 put_hevc_qpel_bi_v32_8_neon: 1140.1 put_hevc_qpel_bi_v48_8_c: 18422.9 put_hevc_qpel_bi_v48_8_neon: 2533.9 put_hevc_qpel_bi_v64_8_c: 37508.6 put_hevc_qpel_bi_v64_8_neon: 4520.1 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	00290a64f7	lavc/aarch64: new optimization for 8-bit hevc_epel_bi_hv put_hevc_epel_bi_hv4_8_c: 242.9 put_hevc_epel_bi_hv4_8_i8mm: 68.6 put_hevc_epel_bi_hv6_8_c: 402.4 put_hevc_epel_bi_hv6_8_i8mm: 135.9 put_hevc_epel_bi_hv8_8_c: 636.4 put_hevc_epel_bi_hv8_8_i8mm: 145.6 put_hevc_epel_bi_hv12_8_c: 1363.1 put_hevc_epel_bi_hv12_8_i8mm: 324.1 put_hevc_epel_bi_hv16_8_c: 2222.1 put_hevc_epel_bi_hv16_8_i8mm: 509.1 put_hevc_epel_bi_hv24_8_c: 4793.4 put_hevc_epel_bi_hv24_8_i8mm: 1091.9 put_hevc_epel_bi_hv32_8_c: 8393.9 put_hevc_epel_bi_hv32_8_i8mm: 1720.6 put_hevc_epel_bi_hv48_8_c: 19526.6 put_hevc_epel_bi_hv48_8_i8mm: 4285.9 put_hevc_epel_bi_hv64_8_c: 33915.4 put_hevc_epel_bi_hv64_8_i8mm: 6783.6 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	0448f27f41	lavc/aarch64: new optimization for 8-bit hevc_epel_bi_v put_hevc_epel_bi_v4_8_c: 138.4 put_hevc_epel_bi_v4_8_neon: 33.7 put_hevc_epel_bi_v6_8_c: 302.9 put_hevc_epel_bi_v6_8_neon: 46.7 put_hevc_epel_bi_v8_8_c: 408.7 put_hevc_epel_bi_v8_8_neon: 48.7 put_hevc_epel_bi_v12_8_c: 779.4 put_hevc_epel_bi_v12_8_neon: 139.7 put_hevc_epel_bi_v16_8_c: 1344.9 put_hevc_epel_bi_v16_8_neon: 160.2 put_hevc_epel_bi_v24_8_c: 2981.7 put_hevc_epel_bi_v24_8_neon: 344.9 put_hevc_epel_bi_v32_8_c: 5280.9 put_hevc_epel_bi_v32_8_neon: 618.4 put_hevc_epel_bi_v48_8_c: 12494.9 put_hevc_epel_bi_v48_8_neon: 1364.4 put_hevc_epel_bi_v64_8_c: 22127.7 put_hevc_epel_bi_v64_8_neon: 2473.7 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	216275bd80	lavc/aarch64: new optimization for 8-bit hevc_epel_bi_h put_hevc_epel_bi_h4_8_c: 96.0 put_hevc_epel_bi_h4_8_neon: 36.3 put_hevc_epel_bi_h6_8_c: 288.3 put_hevc_epel_bi_h6_8_neon: 59.3 put_hevc_epel_bi_h8_8_c: 358.5 put_hevc_epel_bi_h8_8_neon: 61.5 put_hevc_epel_bi_h12_8_c: 759.8 put_hevc_epel_bi_h12_8_neon: 159.5 put_hevc_epel_bi_h16_8_c: 1307.0 put_hevc_epel_bi_h16_8_neon: 182.0 put_hevc_epel_bi_h24_8_c: 2778.3 put_hevc_epel_bi_h24_8_neon: 430.5 put_hevc_epel_bi_h32_8_c: 4952.3 put_hevc_epel_bi_h32_8_neon: 679.5 put_hevc_epel_bi_h48_8_c: 11803.3 put_hevc_epel_bi_h48_8_neon: 1443.5 put_hevc_epel_bi_h64_8_c: 20654.8 put_hevc_epel_bi_h64_8_neon: 2737.0 put_hevc_qpel_bi_h4_8_c: 140.0 put_hevc_qpel_bi_h4_8_neon: 111.5 put_hevc_qpel_bi_h6_8_c: 318.0 put_hevc_qpel_bi_h6_8_neon: 85.8 put_hevc_qpel_bi_h8_8_c: 536.5 put_hevc_qpel_bi_h8_8_neon: 95.3 put_hevc_qpel_bi_h12_8_c: 1188.5 put_hevc_qpel_bi_h12_8_neon: 291.3 put_hevc_qpel_bi_h16_8_c: 2064.3 put_hevc_qpel_bi_h16_8_neon: 365.3 put_hevc_qpel_bi_h24_8_c: 4757.5 put_hevc_qpel_bi_h24_8_neon: 1010.0 put_hevc_qpel_bi_h32_8_c: 8351.8 put_hevc_qpel_bi_h32_8_neon: 2917.8 put_hevc_qpel_bi_h48_8_c: 19299.8 put_hevc_qpel_bi_h48_8_neon: 2976.8 put_hevc_qpel_bi_h64_8_c: 34182.5 put_hevc_qpel_bi_h64_8_neon: 5236.3 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	40cf4a5ca3	lavc/aarch64: new optimization for 8-bit hevc_pel_bi_pixels put_hevc_pel_bi_pixels4_8_c: 54.7 put_hevc_pel_bi_pixels4_8_neon: 43.0 put_hevc_pel_bi_pixels6_8_c: 94.7 put_hevc_pel_bi_pixels6_8_neon: 37.0 put_hevc_pel_bi_pixels8_8_c: 171.0 put_hevc_pel_bi_pixels8_8_neon: 24.0 put_hevc_pel_bi_pixels12_8_c: 354.0 put_hevc_pel_bi_pixels12_8_neon: 68.7 put_hevc_pel_bi_pixels16_8_c: 588.2 put_hevc_pel_bi_pixels16_8_neon: 77.5 put_hevc_pel_bi_pixels24_8_c: 1670.7 put_hevc_pel_bi_pixels24_8_neon: 173.0 put_hevc_pel_bi_pixels32_8_c: 2267.7 put_hevc_pel_bi_pixels32_8_neon: 281.2 put_hevc_pel_bi_pixels48_8_c: 5787.5 put_hevc_pel_bi_pixels48_8_neon: 673.5 put_hevc_pel_bi_pixels64_8_c: 9897.0 put_hevc_pel_bi_pixels64_8_neon: 1159.5 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
James Almer	6d19611251	avcodec/ac3dsp: add missing stddef.h include Should fix make checkheaders Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-01 12:42:22 -03:00
xufuji456	cc86343b96	lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b" Signed-off-by: xufuji456 <839789740@qq.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-11-28 15:54:49 +02:00
Zhao Zhili	d526a34c20	avcodec/videotoolboxenc: refactor dump encoder name Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-27 23:49:01 +08:00
Zhao Zhili	cb049d377f	avcodec/videotoolboxenc: Fix build failure due to PropertyKey_EncoderID Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-27 23:48:55 +08:00
Paul B Mahol	3609d2b783	avcodec: add QOA decoder	2023-11-26 17:49:09 +01:00
Geoffrey McRae	93b5d9030b	libavcodec/mlpdec: add missing correction to ch_layout when downmixing This fixes corrupted audio for applications relying on ch_layout when codec downmixing is active. Signed-off-by: Geoffrey McRae <geoff@hostfission.com> Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-26 10:18:33 -03:00
Geoffrey McRae	a8677bcc8f	libavcodec/dcadec: adjust the `ch_layout` when downmix is active Applications making use of this codec with the `downmix` option are segfaulting unless the `ch_layout` is overridden after `avcodec_open2` as can be seen in projects like MythTV[1] This patch fixes this by overriding the ch_layout as done in other decoders such as AC3. 1: `af6f362a14/mythtv/libs/libmythtv/decoders/avformatdecoder.cpp (L4607)` Signed-off-by: Geoffrey McRae <geoff@hostfission.com> Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-26 10:18:33 -03:00
James Almer	72390dea00	mips/ac3dsp_mips: add missing stddef.h header include Fixes compilation failures after `567c67c6c8`. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-25 21:51:04 -03:00
James Almer	e40ea9f34b	x86/ac3dsp: add ff_float_to_fixed24_avx() Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-25 21:50:56 -03:00
James Almer	d8b1a34433	x86/ac3dsp: reduce instruction count inside the float_to_fixed24 loop Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-25 21:50:56 -03:00
Rémi Denis-Courmont	0fa421c8f1	lavc/llvidencdsp: add R-V V diff_bytes diff_bytes_c: 163.0 diff_bytes_rvv_i32: 52.7	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	0183c2c830	lavc/aacpsdsp: use LMUL=2 and amortise strides The input is laid out in 16 segments, of which 13 actually need to be loaded. There are no really efficient ways to deal with this: 1) If we load 8 segments wit unit stride, then narrow to 16 segments with right shifts, we can only get one half-size vector per segment, or just 2 elements per vector (EMUL=1/2) - at least with 128-bit vectors. This ends up unsurprisingly about as fas as the C code. 2) The current approach is to load with strides. We keep that approach, but improve it using three 4-segmented loads instead of 12 single-segment loads. This divides the number of distinct loaded addresses by 4. 3) A potential third approach would be to avoid segmentation altogether and splat the scalar coefficient into vectors. Then we can use a unit-stride and maximum EMUL. But the downside then is that we have to multiply the 3 (of 16) unused segments with zero as part of the multiply-accumulate operations. In addition, we also reuse vectors mid-loop so as to increase the EMUL from 1 to 2, which also improves performance a little bit. Oeverall the gains are quite small with the device under test, as it does not deal with segmented loads very well. But at least the code is tidier, and should enjoy bigger speed-ups on better hardware implementation. Before: ps_hybrid_analysis_c: 1819.2 ps_hybrid_analysis_rvv_f32: 1037.0 (before) ps_hybrid_analysis_rvv_f32: 990.0 (after)	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	b88d4058f9	lavc/g722dsp: optimise R-V V apply_qmf This stores the constant coefficients deinterleaved, so that they can be loaded directly with NF=0. Unfortunately, we cannot optimise loading the input, due to insufficient memory alignment (not 32-bit). Before: g722_apply_qmf_c: 82.5 g722_apply_qmf_rvv_i32: 78.2 After: g722_apply_qmf_c: 82.5 g722_apply_qmf_rvv_i32: 65.2	2023-11-23 18:57:18 +02:00
James Almer	567c67c6c8	avcodec/ac3dsp: make len a size_t in float_to_fixed24 Should simplify asm implementations, and prevent UB on at least win64. Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-22 18:33:00 -03:00
James Almer	2d9fd814d0	x86/: clear the high bits for order in scalarproduct_and_madd functions Should fix checkasm failures on win64. Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-22 14:18:42 -03:00
Zhao Zhili	e8a49b1424	avcodec/mmaldec: Fix build error Fix #10670. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 21:02:04 +08:00
Zhao Zhili	f27fce0c0c	avcodec/mediacodecdec: fix return EAGAIN after EOF Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 21:02:04 +08:00
Dmitry Rogozhkin	e9c93009fc	avcodec/decode: validate hw_frames_ctx when AVHWAccel.free_frame_priv is used Validate that a hw_frames_ctx is available before using it for the AVHWAccel.free_frame_priv callback, and don't require it to be present when the callback is not in use by the HWAccel. v2: check for free_frame_priv (Hendrik) v3: return EINVAL (Christoph Reiter) v4: better commit message (Hendrik) v5: fix typo with missed frames_ctx (Lynne) See[1]: https://github.com/msys2/MINGW-packages/pull/19050 Fixes: `be07145109` ("avcodec: add AVHWAccel.free_frame_priv callback") CC: Lynne <dev@lynne.ee> CC: Christoph Reiter <reiter.christoph@gmail.com> Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2023-11-22 05:01:16 +01:00
Zhao Zhili	aa3b857101	avcodec/h264_mp4toannexb_bsf: process new extradata For fate-h264_mp4toannexb_ticket5927 and fate-h264_mp4toannexb_ticket5927_2, they work by accident previously. The sample file has two 'avc1' entries, and video samples use the second one. It means packets should be decoded with new extradata in side data. Before this patch, only extradata was kept in the output, new extradata has been dropped. The output can be decoded because the two extradata are almost the same, except level indication. This patch fixed the issue, and add another fate test. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 19:42:14 +08:00
Zhao Zhili	d3aa0cd16f	avcodec/h264_mp4toannexb_bsf: fix missing PS before IDR frames If there is a single group of SPS/PPS before an IDR frame, but no SPS/PPS after that, we will miss the chance to reset idr_sps_seen/idr_pps_seen. No SPS/PPS are inserted afterwards. This patch saves in-band SPS/PPS and insert them before IDR frames when necessary. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 19:42:14 +08:00
Zhao Zhili	4c4b833abd	avcodec/h264_mp4toannexb_bsf: remove pass padding size as argument It's a fixed value. There is no use case to change that. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 19:42:14 +08:00
Zhao Zhili	91cbae2f6c	avcodec/h264_mp4toannexb_bsf: refactor start_code_size handling start_code_size depends on whether PS comes from out-of-band or in-band. Make the code more readable. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 19:42:14 +08:00
Michael Niedermayer	fb52070848	avcodec/h264dec: use BOOL for skip_gray, noref_gray Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-22 01:22:31 +01:00
Jun Zhao	c961ac4b0c	vulkan_decode: fix the print format of VkDeviceSize VkDeviceSize represents device memory size and offset values as uint64_t in Spec. Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2023-11-21 08:02:43 +08:00
James Almer	1258f99978	avcodec: bump version after EVC additions Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-20 11:55:51 -03:00
Dawid Kozinski	cfe2947887	avcodec/evc_decoder: Provided support for EVC decoder - Added EVC decoder wrapper - Changes in project configuration file and libavcodec Makefile - Added documentation for xevd wrapper Signed-off-by: Dawid Kozinski <d.kozinski@samsung.com> Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-20 11:55:51 -03:00
Dawid Kozinski	c59a96fd08	avcodec/evc_encoder: Provided support for EVC encoder - Added EVC encoder wrapper - Changes in project configuration file and libavcodec Makefile - Added documentation for xeve wrapper Signed-off-by: Dawid Kozinski <d.kozinski@samsung.com> Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-20 11:55:51 -03:00
Michael Niedermayer	e56d91f8a8	avcodec/h264dec: Support skipping frames that used gray gap frames Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-20 00:19:25 +01:00
Michael Niedermayer	6364fa9e9a	avcodec/h264: Avoid using gray gap frames as references Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-20 00:19:25 +01:00
Michael Niedermayer	29f6c9b04d	avcodec/h264: keep track of which frames used gray references Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-20 00:19:04 +01:00
Michael Niedermayer	e4337606e1	avcodec/h264dec: More elaborate documentation for frame_recovered Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-20 00:12:30 +01:00
Michael Niedermayer	68e1cf204a	avcodec/h264: Use FRAME_RECOVERED_HEURISTIC instead of IDR/SEI This keeps IDR/SEI and heuristically detected recovery points cleaner seperated Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-20 00:12:30 +01:00
Michael Niedermayer	3f4a1a24a5	avcodec/h264: Seperate SEI and IDR recovery handling This avoids SEI and IDR recovery flags affecting each other Also eliminate litteral numbers from recovery handling This should make the code clearer Improves: tickets/4738/tickets_cut.ts Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-20 00:12:29 +01:00
Rémi Denis-Courmont	fbc7adba67	lavc/llviddsp: R-V V add_bytes add_bytes_c: 2077.2 add_bytes_rvv_i32: 105.0	2023-11-18 22:07:14 +02:00
Rémi Denis-Courmont	ca664f2254	lavc/flacdsp: R-V V LPC16 function In this case, the inner loop computing the scalar product can be reduced to just one multiplication and one sum even with 128-bit vectors. The result is a lot simpler, but also brings more modest performance gains: flac_lpc_16_13_c: 15241.0 flac_lpc_16_13_rvv_i32: 11230.0 flac_lpc_16_16_c: 17884.0 flac_lpc_16_16_rvv_i32: 12125.7 flac_lpc_16_29_c: 27847.7 flac_lpc_16_29_rvv_i32: 10494.0 flac_lpc_16_32_c: 30051.5 flac_lpc_16_32_rvv_i32: 10355.0	2023-11-18 22:06:57 +02:00
Rémi Denis-Courmont	295092b46d	lavc/flacdsp: R-V V LPC32 The entire set of 32 coefficients and corresponding past 32 samples can fit in a single vector (with LMUL=8) exactly, but... since widening double the needed vector sizes, we still end up too short with 128-bit vectors. This adds a very simple version for future 256+-bit hardware, and for pred_orders values up to 16, and a bit more involved loop for for 128-bit hardware with pred_orders between 17 and 32. With 128-bit hardware, the benchmarks look like this: flac_lpc_32_13_c: 30152.0 flac_lpc_32_13_rvv_i32: 10244.7 flac_lpc_32_16_c: 37314.2 flac_lpc_32_16_rvv_i32: 10126.2 flac_lpc_32_29_c: 61910.0 flac_lpc_32_29_rvv_i32: 14495.2 flac_lpc_32_32_c: 68204.0 flac_lpc_32_32_rvv_i32: 13273.7	2023-11-18 22:05:43 +02:00
Diederik de Haas via ffmpeg-devel	c07ed10b0e	apply spelling fixes Fix spelling issue as reported by Debian's lintian tool: accomodate -> accommodate addtional -> additional auxillary -> auxiliary bellow -> below betweeen -> between Calulate -> Calculate coefficents -> coefficients Defalt -> Default defaul -> default higer -> higher neccesary -> necessary orignal -> original ouput -> output precison -> precision processsing -> processing substract -> subtract Transfered -> Transferred upto -> up to Also add several of them to the 'common typos' check in patcheck. Signed-off-by: Diederik de Haas <didi.debian@cknow.org>	2023-11-18 19:55:42 +01:00
Rémi Denis-Courmont	07c303b708	lavc/flacdsp: R-V V decorrelate_indep 16-bit packed flac_decorrelate_indep2_16_c: 981.7 flac_decorrelate_indep2_16_rvv_i32: 199.2 flac_decorrelate_indep4_16_c: 1749.7 flac_decorrelate_indep4_16_rvv_i32: 401.2 flac_decorrelate_indep6_16_c: 2517.7 flac_decorrelate_indep6_16_rvv_i32: 858.0 flac_decorrelate_indep8_16_c: 3285.7 flac_decorrelate_indep8_16_rvv_i32: 1123.5	2023-11-17 23:59:56 +02:00
Rémi Denis-Courmont	fb0295e5fd	lavc/flacdsp: R-V V decorrelate_indep 32-bit packed flac_decorrelate_indep2_32_c: 981.7 flac_decorrelate_indep2_32_rvv_i32: 183.7 flac_decorrelate_indep4_32_c: 1749.7 flac_decorrelate_indep4_32_rvv_i32: 362.5 flac_decorrelate_indep6_32_c: 2517.7 flac_decorrelate_indep6_32_rvv_i32: 715.2 flac_decorrelate_indep8_32_c: 3285.7 flac_decorrelate_indep8_32_rvv_i32: 909.0	2023-11-17 23:59:56 +02:00
Rémi Denis-Courmont	6183a69c0b	lavc/flacdsp: R-V V decorrelate_ms packed flac_decorrelate_ms_16_c: 585.5 flac_decorrelate_ms_16_rvv_i32: 263.0 flac_decorrelate_ms_32_c: 584.7 flac_decorrelate_ms_32_rvv_i32: 250.0	2023-11-17 23:59:23 +02:00
Rémi Denis-Courmont	636ae0e0bc	lavc/flacdsp: R-V V packed decorrelate_{l,r}s flac_decorrelate_ms_16_c: 457.2 flac_decorrelate_ms_16_rvv_i32: 203.0 flac_decorrelate_ms_32_c: 457.2 flac_decorrelate_ms_32_rvv_i32: 203.5 flac_decorrelate_rs_16_c: 456.2 flac_decorrelate_rs_16_rvv_i32: 207.0 flac_decorrelate_rs_32_c: 456.2 flac_decorrelate_rs_32_rvv_i32: 210.5	2023-11-17 23:59:22 +02:00
Rémi Denis-Courmont	d076517056	lavc/llauddsp: R-V V scalarproduct_and_madd_int32 scalarproduct_and_madd_int32_c: 10899.7 scalarproduct_and_madd_int32_rvv_i32: 1749.0	2023-11-16 16:53:44 +02:00
Rémi Denis-Courmont	45d0eb3f70	lavc/llauddsp: R-V V scalarproduct_and_madd_int16 scalarproduct_and_madd_int16_c: 10355.7 scalarproduct_and_madd_int16_rvv_i32: 1480.0	2023-11-16 16:53:44 +02:00
James Almer	78f55457c9	x86/flacds: clear the high bits from pred_order in lpc_32 functions Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-15 16:10:15 -03:00
Dai, Jianhui J	c9fe9fb863	avcodec/cbs_vp8: Add support for VP8 codec bitstream This commit adds support for VP8 bitstream read methods to the cbs codec. This enables the trace_headers bitstream filter to support VP8, in addition to AV1, H.264, H.265, and VP9. This can be useful for debugging VP8 stream issues. The CBS VP8 implements a simple VP8 boolean decoder using GetBitContext to read the bitstream. Only the read methods `read_unit` and `split_fragment` are implemented. The write methods `write_unit` and `assemble_fragment` return the error code AVERROR_PATCHWELCOME. This is because CBS VP8 write is unlikely to be used by any applications at the moment. The write methods can be added later if there is a real need for them. TESTS: ffmpeg -i fate-suite/vp8/frame_size_change.webm -vcodec copy -bsf:v trace_headers -f null - Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2023-11-15 10:29:03 -05:00
Dai, Jianhui J	5cb8accd09	avcodec/vp8: Export `vp8_token_update_probs` variable This commit exports the `vp8_token_update_probs` variable to internal library scope to facilitate its reuse within the library. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2023-11-15 10:29:03 -05:00
Rémi Denis-Courmont	90a779bed6	lavc/huffyuvdsp: basic R-V V add_hfyu_left_pred_bgr32 Better performance can probably be achieved with a more intricate unrolled loop, but this is a start: add_hfyu_left_pred_bgr32_c: 15084.0 add_hfyu_left_pred_bgr32_rvv_i32: 10280.2 This would actually be cleaner with the RISC-V P extension, but that is not ratified yet (I think?) and usually not supported if V is supported.	2023-11-15 16:51:07 +02:00
James Almer	b360c91752	avcodec/codecpar: mention how to allocate coded_side_data Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-14 14:26:42 -03:00
Anton Khirnov	6dbde68cb5	lavc/8bps: fix exporting palette after `63767b79a5` It would be left empty on each frame whose packet does not come with palette attached.	2023-11-14 18:18:26 +01:00
Rémi Denis-Courmont	ce467421dc	lavc/exrdsp: unroll predictor With explicit unrolling, we can skip half of the sign bit flips, and the compiler is then better able to optimise the scalar loop: predictor_c: 31376.0 (before) predictor_c: 23703.0 (after)	2023-11-14 19:15:51 +02:00
Rémi Denis-Courmont	c536e92207	lavc/sbrdsp: R-V V hf_apply_noise functions This is restricted to 128-bit vectors as larger vector sizes could read past the end of the noise array. Support for future hardware with larger vector sizes is left for some other time. hf_apply_noise_0_c: 2319.7 hf_apply_noise_0_rvv_f32: 1229.0 hf_apply_noise_1_c: 2539.0 hf_apply_noise_1_rvv_f32: 1244.7 hf_apply_noise_2_c: 2319.7 hf_apply_noise_2_rvv_f32: 1232.7 hf_apply_noise_3_c: 2541.2 hf_apply_noise_3_rvv_f32: 1244.2	2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont	5b33104fca	lavc/sbrdsp: R-V V hf_gen hf_gen_c: 2922.7 hf_gen_rvv_f32: 731.5	2023-11-13 18:33:02 +02:00
Gyan Doshi	67a2571a55	avcodec/libsvtav1: add version guard for external param Setting of external param 'force_key_frames' was added in `7bcc1b4eb8`. It is available since v1.1.0 but ffmpeg allows linking against v0.9.0.	2023-11-13 13:14:43 +05:30
Evgeny Pavlov	da3ce21f68	libavcodec/amfenc: Fix issue with missing headers in AV1 encoder This commit fixes issue with missing SPS/PPS headers in video encoded by AMF AV1 encoder. Missing headers leads to broken seek in MPV video player. Default value for property AV1_HEADER_INSERTION_MODE shouldn't be setup to NONE (no headers insertion). We need to skip definition of this property, because default value depends on USAGE property. Signed-off-by: Dmitrii Ovchinnikov <ovchinnikov.dmitrii@gmail.com>	2023-11-12 22:57:17 +01:00
Sebastian Ramacher	250471ea17	avcoded/fft: Fix memory leak if ctx2 is used Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-12 14:47:56 -03:00
Sebastian Ramacher	a562cfee2e	avcodec/fft: Use av_mallocz to avoid invalid free/uninit Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-12 14:47:56 -03:00
Rémi Denis-Courmont	cd7b352c53	lavc/sbrdsp: R-V V autocorrelate With 5 accumulator vectors and 6 inputs, this can only use LMUL=2. Also the number of vector loop iterations is small, just 5 on 128-bit vector hardware. The vector loop is somewhat unusual in that it processes data in descending memory order, in order to save on vector slides: in descending order, we can extract elements to carry over to the next iteration from the bottom of the vectors directly. With ascending order (see in the Opus postfilter function), there are no ways to get the top elements directly. On the downside, this requires the use of separate shift and sub (the would-be SH3SUB instruction does not exist), with a small pipeline stall on the vector load address. The edge cases in scalar are done in scalar as this saves on loads and remains significantly faster than C. autocorrelate_c: 669.2 autocorrelate_rvv_f32: 421.0	2023-11-12 14:03:09 +02:00
Rémi Denis-Courmont	f576a0835b	lavc/aacpsdsp: rework R-V V hybrid_synthesis_deint Given the size of the data set, strided memory accesses cannot be avoided. We can still do better than the current code. ps_hybrid_synthesis_deint_c: 12065.5 ps_hybrid_synthesis_deint_rvv_i32: 13650.2 (before) ps_hybrid_synthesis_deint_rvv_i64: 8181.0 (after)	2023-11-12 14:03:09 +02:00
Rémi Denis-Courmont	eb508702a8	lavc/aacpsdsp: rework R-V V add_squares Segmented loads may be slower than not. So this advantageously uses a unit-strided load and narrowing shifts instead. Before: ps_add_squares_c: 60757.7 ps_add_squares_rvv_f32: 22242.5 After: ps_add_squares_c: 60516.0 ps_add_squares_rvv_i64: 17067.7	2023-11-12 14:03:09 +02:00
Paul B Mahol	10440a489a	avcodec/gif_parser: split correctly also bitstreams that do not have extension blocks	2023-11-12 02:19:53 +01:00
Nuo Mi	09f783692e	avcodec/cbs_h266: H266RawSliceHeader, expose curr_subpic_idx Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-11 11:53:21 -03:00
Michael Niedermayer	ac4e3e188a	avcodec/evc_parse: Check num_remaining_tiles_in_slice_minus1 Fixes: out of array access Fixes: 62467/clusterfuzz-testcase-minimized-ffmpeg_BSF_EVC_FRAME_MERGE_fuzzer-6092990982258688 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: "Dawid Kozinski/Multimedia (PLT) /SRPOL/Staff Engineer/Samsung Electronics" <d.kozinski@samsung.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-10 00:15:28 +01:00
Michael Niedermayer	bb0a684d93	avcodec/4xm: Check for cfrm exhaustion Fixes: index -1 out of bounds for type 'CFrameBuffer [100]' Fixes: 63877/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FOURXM_fuzzer-5854263397711872 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-10 00:14:02 +01:00
Niklas Haas	96d2a40b9e	avcodec/pnm: explicitly tag color range PGMYUV seems to be always limited range. This was a format originally invented by FFmpeg at a time when YUVJ distinguished limited from full range YUV, and this codec never appeared to output YUVJ in any circumstance, so hard-coding limited range preserves the status quo. The other formats are explicitly documented to be full range RGB/gray formats. That said, don't tag them yet, due to outstanding bugs w.r.t grayscale formats and color range handling. This change in behavior updates a bunch of FATE tests in trivial ways (added tagging being the only difference).	2023-11-09 12:53:35 +01:00
Peter Ross	10869cd849	avcodec: LEAD MCMP decoder Partially fixes ticket #798 Reviewed-by: James Almer <jamrial@gmail.com> Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Peter Ross <pross@xvid.org>	2023-11-08 17:37:58 +11:00
Rémi Denis-Courmont	adc87a5f7c	lavc/opusdsp: rewrite R-V V postfilter This uses a more traditional approach allowing up processing of up to period minus two elements per iteration. This also allows the algorithm to work for all and any vector length. As the T-Head C908 device under test can load 16 elements loop, there is unsurprisingly a little performance drop when the period is minimal and the parallelism is capped at 13 elements: Before: postfilter_15_c: 21222.2 postfilter_15_rvv_f32: 22007.7 postfilter_512_c: 20189.7 postfilter_512_rvv_f32: 22004.2 postfilter_1022_c: 20189.7 postfilter_1022_rvv_f32: 22004.2 After: postfilter_15_c: 20189.5 postfilter_15_rvv_f32: 7057.2 postfilter_512_c: 20189.5 postfilter_512_rvv_f32: 5667.2 postfilter_1022_c: 20192.7 postfilter_1022_rvv_f32: 5667.2	2023-11-06 22:09:30 +02:00
Rémi Denis-Courmont	02594c8c01	lavc/pixblockdsp: rework R-V V get_pixels_unaligned As in the aligned case, we can use VLSE64.V, though the way of doing so gets more convoluted, so the performance gains are more modest: get_pixels_unaligned_c: 126.7 get_pixels_unaligned_rvv_i32: 145.5 (before) get_pixels_unaligned_rvv_i64: 62.2 (after) For the reference, those are the aligned benchmarks (unchanged) on the same T-Head C908 hardware: get_pixels_c: 126.7 get_pixels_rvi: 85.7 get_pixels_rvv_i64: 33.2	2023-11-06 19:42:49 +02:00
Rémi Denis-Courmont	f68ad5d2de	lavc/sbrdsp: R-V V sbr_hf_g_filt hf_g_filt_c: 1552.5 hf_g_filt_rvv_f32: 679.5	2023-11-06 19:42:49 +02:00
Andreas Rheinhardt	3f890fbfd9	avcodec/cbs_h2645: Fix leak of SPS VUI extension data Fixes: VUI extension leak Fixes: 63004/clusterfuzz-testcase-minimized-ffmpeg_BSF_VVC_METADATA_fuzzer-4928832253329408 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-04 01:27:41 +01:00
Andreas Rheinhardt	5935423e1e	avcodec/aactab: Deduplicate swb_offset_960 tabs swb_offset_960_48 and swb_offset_960_32 coincide. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-04 01:24:09 +01:00
Michael Niedermayer	03a4aa9699	avcodec/flicvideo: consider width in copy loops Fixes: out of array write Fixes: 63520/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLIC_fuzzer-4876198087622656 Regression since: `c7f8d42c12` (was not posted to ffmpeg-devel) Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: Sean McGovern <gseanmcg@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-03 22:16:33 +01:00
Rémi Denis-Courmont	d06fd18f8f	lavc/sbrdsp: R-V V neg_odd_64 With 128-bit vectors, this is mostly pointless but also harmless. Performance gains should be more noticeable with larger vector sizes. neg_odd_64_c: 76.2 neg_odd_64_rvv_i64: 74.7	2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont	b0aba7dd0c	lavc/sbrdsp: R-V V sum_square sum_square_c: 803.5 sum_square_rvv_f32: 283.2	2023-11-01 22:53:26 +02:00
Rémi Denis-Courmont	86bee42473	lavc/sbrdsp: R-V V sum64x5 sum64x5_c: 385.0 sum64x5_rvv_f32: 116.0	2023-11-01 22:53:26 +02:00
Andreas Rheinhardt	eba73142ad	avcodec/vp9: Join extradata buffer pools Up until now each thread had its own buffer pool for extradata buffers when using frame-threading. Each thread can have at most three references to extradata and in the long run, each thread's bufferpool seems to fill up with three entries. But given that at any given time there can be at most 2 + number of threads entries used (the oldest thread can have two references to preceding frames that are not currently decoded and each thread has its own current frame, but there can be no references to any other frames), this is wasteful. This commit therefore uses a single buffer pool that is synced across threads. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-01 20:16:02 +01:00
Andreas Rheinhardt	0c44f63b02	avcodec/refstruct: Allow to share pools To do this, make FFRefStructPool itself refcounted according to the RefStruct API. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-01 20:15:54 +01:00
Andreas Rheinhardt	92abc7266b	avcodec/vaapi_encode: Use RefStruct pool API, stop abusing AVBuffer API Up until now, the VAAPI encoder uses fake data with the AVBuffer-API: The data pointer does not point to real memory, but is instead just a VABufferID converted to a pointer. This has probably been copied from the VAAPI-hwcontext-API (which presumably does it to avoid allocations). This commit changes this without causing additional allocations by switching to the RefStruct-pool API. This also fixes an unchecked av_buffer_ref(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-01 20:14:22 +01:00
Andreas Rheinhardt	8c0350f57e	avcodec/vp9: Use RefStruct-pool API for extradata It avoids allocations and corresponding error checks. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-01 20:14:06 +01:00
Andreas Rheinhardt	090d9956fd	avcodec/refstruct: Allow to always return zeroed pool entries This is in preparation for the following commit. Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-01 20:13:40 +01:00
Andreas Rheinhardt	e01e30ede1	avcodec/nvdec: Use RefStruct-pool API for decoder pool It involves less allocations, in particular no allocations after the entry has been created. Therefore creating a new reference from an existing one can't fail and therefore need not be checked. It also avoids indirections and casts. Also note that nvdec_decoder_frame_init() (the callback to initialize new entries from the pool) does not use atomics to read and replace the number of entries currently used by the pool. This relies on nvdec (like most other hwaccels) not being run in a truely frame-threaded way. Tested-by: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-01 20:13:01 +01:00
Andreas Rheinhardt	fd2e65871c	avcodec/hevcdec: Use RefStruct-pool API instead of AVBufferPool API It involves less allocations and therefore has the nice property that deriving a reference from a reference can't fail, simplifying hevc_ref_frame(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-01 20:10:20 +01:00
Andreas Rheinhardt	736b510fcc	avcodec/h264dec: Use RefStruct-pool API instead of AVBufferPool API It involves less allocations and therefore has the nice property that deriving a reference from a reference can't fail. This allows for considerable simplifications in ff_h264_(ref\|replace)_picture(). Switching to the RefStruct API also allows to make H264Picture smaller, because some AVBufferRef* pointers could be removed without replacement. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-01 20:07:56 +01:00
Andreas Rheinhardt	26c0a7321f	avcodec/refstruct: Add RefStruct pool API Very similar to the AVBufferPool API, but with some differences: 1. Reusing an already existing entry does not incur an allocation at all any more (the AVBufferPool API needs to allocate an AVBufferRef). 2. The tasks done while holding the lock are smaller; e.g. allocating new entries is now performed without holding the lock. The same goes for freeing. 3. The entries are freed as soon as possible (the AVBufferPool API frees them in two batches: The first in av_buffer_pool_uninit() and the second immediately before the pool is freed when the last outstanding entry is returned to the pool). 4. The API is designed for objects and not naked buffers and therefore has a reset callback. This is called whenever an object is returned to the pool. 5. Just like with the RefStruct API, custom allocators are not supported. (If desired, the FFRefStructPool struct itself could be made reference counted via the RefStruct API; an FFRefStructPool would then be freed via ff_refstruct_unref().) Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-11-01 20:07:23 +01:00
Rémi Denis-Courmont	92bcc6703a	lavc/pixblockdsp: remove R-V V get_pixels_16 In the aligned case, the existing RVI assembler is actually much faster. In the unaligned case, there is nothing much to gain over C.	2023-11-01 19:27:22 +02:00
Rémi Denis-Courmont	28840cf499	lavc/jpeg2000dsp: R-V V rct_int jpeg2000_rct_int_c: 2592.2 jpeg2000_rct_int_rvv_i32: 1154.2	2023-11-01 18:52:55 +02:00
Rémi Denis-Courmont	73dea2bb91	lavc/jpeg2000dsp: R-V V ict_float jpeg2000_ict_float_c: 3112.2 jpeg2000_ict_float_rvv_f32: 1225.0	2023-11-01 18:52:55 +02:00
Rémi Denis-Courmont	b2a441a3be	lavc/jpeg2000dsp: make coefficients extern This is so that they can be loaded from assembler, rather than duplicated.	2023-11-01 18:52:55 +02:00
Michael Niedermayer	a5259f326b	avcodec/vlc: Pass VLC_MULTI_ELEM directly not by pointer This makes the code more testable as uninitialized fields are 0 and not random values from the last call Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-01 16:40:22 +01:00
Michael Niedermayer	8516609edd	avcodec/vlc: Replace mysterious max computation code in multi vlc Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-01 16:40:21 +01:00
Michael Niedermayer	356b1ba765	avcodec/vlc: Skip subtable entries in multi VLC These entries do not correspond to VLC symbols that can be used they do corrupt various variables like min/max bits This also no longer assumes that there is a single non subtable entry Probably fixes some infinite loops too Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-01 16:40:21 +01:00
Michael Niedermayer	2817efbba3	avcodec/dovi_rpu: Use 64 bit in get_us/se_coeff() Fixes: shift exponent 32 is too large for 32-bit type 'int' Fixes: 63151/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5067531154751488 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-01 16:40:20 +01:00
Michael Niedermayer	2def617787	avcodec/apedec: Fix integer overflow in predictor_decode_stereo_3950() Fixes: signed integer overflow: 1900031961 + 553590817 cannot be represented in type 'int' Fixes: 63061/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APE_fuzzer-5166188298371072 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-01 16:40:20 +01:00
Michael Niedermayer	68cc1744db	avcodec/evc_parse: Check tid The check is based on not infinite looping. It is likely a more strict check can be done Fixes: Infinite loop Fixes: 62473/clusterfuzz-testcase-minimized-ffmpeg_BSF_EVC_FRAME_MERGE_fuzzer-5719883750703104 Fixes: 62765/clusterfuzz-testcase-minimized-ffmpeg_dem_EVC_fuzzer-6448531252314112 Fixes: 63378/clusterfuzz-testcase-minimized-ffmpeg_dem_MPEGPS_fuzzer-6504993844494336 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: "Dawid Kozinski/Multimedia (PLT) /SRPOL/Staff Engineer/Samsung Electronics" <d.kozinski@samsung.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-01 16:40:19 +01:00
Michael Niedermayer	d35eecd24f	avcodec/evc_parse: remove pow() and log2() The use of float based functions is both unneeded and wrong due to unpredictable rounding Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-11-01 16:40:03 +01:00
Andreas Rheinhardt	f2687a3b69	avcodec/wmavoice: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	5615f9dab4	avcodec/wmaprodec: Avoid superfluous VLC structures For all VLCs here, the number of bits of the VLC is write-only, because it is hardcoded at the call site. Therefore one can replace these VLC structures with the only thing that is actually used: The pointer to the VLCElem table. And in most cases one can even avoid this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	7e2120c4d9	avcodec/mpeg12: Avoid unnecessary VLC structures Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying tables directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	c9aa80c313	avcodec/mpegaudiodec_common: Avoid superfluous VLC structures For some VLCs here, the number of bits of the VLC is write-only, because it is hardcoded at the call site. Therefore one can replace these VLC structures with the only thing that is actually used: The pointer to the VLCElem table. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	5dc31bc67b	avcodec/aacps_common: Apply offset for VLCs during init This avoids having to apply it later after every get_vlc2(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	40a8cb9e6c	avcodec/aacps_common: Combine huffman tabels This allows to avoid the relocations inherent in an array to individual tables; it also reduces padding. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	774611a349	avcodec/aacps_common: Switch to ff_vlc_init_tables_from_lengths() It allows to replace codes of type uint16_t or uint32_t by symbols of type uint8_t. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	eb422c606a	avcodec/aacps_common: Avoid superfluous VLC structures For all VLCs here, the number of bits of the VLC is write-only, because it is hardcoded at the call site. Therefore one can replace these VLC structures with the only thing that is actually used: The pointer to the VLCElem table. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	4fe91e3676	avcodec/aacps: Move initializing common stuff to aacdec_common.c ff_ps_init() initializes some tables for AAC parametric stereo and some of them are only valid for the fixed- or floating-point decoder, whereas others (namely VLCs) are valid for both. The latter are therefore initialized by ff_ps_init_common() and because the two versions of ff_ps_init() can be run concurrently, it is guarded by an AVOnce. Yet now that there is ff_aacdec_common_init_once() there is a better way to do this: Call ff_ps_init_common() from ff_aacdec_common_init_once(). That way there is no need to guard ff_ps_init_common() by an AVOnce any more. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	7f66d9d6c5	avcodec/aacdec_common: Apply offset for SBR VLCs during init This avoids having to apply it later after every get_vlc2(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	1aca4e7fc5	avcodec/aacdec_common: Combine huffman tabs This allows to avoid the relocations inherent in a table to individual tables; it also reduces padding. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	2c131f126d	avcodec/aacdec_common: Switch to ff_vlc_init_tables_from_lengths() It allows to replace code tables of type uint32_t or uint16_t by symbols of type uint8_t. It is also faster. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	0b4e69cc87	avcodec/aacdec_common: Avoid superfluous VLC structures for SBR VLCs For all VLCs here, the number of bits of the VLC is write-only, because it is hardcoded at the call site. Therefore one can replace these VLC structures with the only thing that is actually used: The pointer to the VLCElem table. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	22d60524d8	avcodec/aacsbr_template: Deduplicate VLCs The VLCs, their init code and the tables used for initialization are currently duplicated for the floating- and fixed-point decoders. This commit stops doing so and moves this stuff to aacdec_common.c. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Andreas Rheinhardt	4d6042e9d7	avcodec/aacdec_common: Avoid superfluous VLC structures For all VLCs here, the number of bits of the VLC is write-only, because it is hardcoded at the call site. Therefore one can replace these VLC structures with the only thing that is actually used: The pointer to the VLCElem table. And in some cases one can even avoid this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 21:44:48 +01:00
Benjamin Cheng	4536de3769	vulkan_h264: fix long-term ref handling h->long_ref isn't guaranteed to be contiguously filled. Use the approach from both vaapi_h264 and vdpau_h264 which goes through the 16 frames in h->long_ref to find the LTR entries. Fixes MR2_MW_A.264 from JVT-AVC_V1.	2023-10-31 21:35:23 +01:00
Andreas Rheinhardt	1e63e24c76	avcodec/aactab: Improve included headers Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	30deaba97b	avcodec/aacdec_template: Don't init unused table for fixed-point decoder The fixed-point decoder actually does not use the floating-point tables initialized by ff_aac_tableinit() at all. So don't initialize them for it; instead merge initializing these tables into ff_aac_float_common_init() which is already the function for the common static initializations of the floating-point AAC decoder and the (also floating-point) AAC encoder. Doing so saves also one AVOnce. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	3b080fe7af	avcodec/aacdec_template: Deduplicate VLCs They (as well as their init code) are currently duplicated for the floating- and fixed-point decoders. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	1f15a7e9a8	avcodec/aacdectab: Deduplicate common decoder tables Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	8c1e71a811	avcodec/aacps: Pass logctx as void* instead of AVCodecContext* Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	70b5d9c569	avcodec/aacps: Remove unused AVCodecContext* parameter from ff_ps_apply Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	36b5f71b1f	avcodec/msmpeg4dec: Avoid superfluous VLC structures For all VLCs here, the number of bits of the VLC is write-only, because it is hardcoded at the call site. Therefore one can replace these VLC structures with the only thing that is actually used: The pointer to the VLCElem table. And in some cases one can even avoid this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	5a694d62c5	avcodec/mpeg4videodec: Avoid superfluous VLC structures For all VLCs here, the number of bits of the VLC is write-only, because it is hardcoded at the call site. Therefore one can replace these VLC structures with the only thing that is actually used: The pointer to the VLCElem table. And in some cases one can even avoid this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	8c39b2bca7	avcodec/mss4: Partially inline max_depth and nb_bits of VLC It is known at compile-time for the vec_entry_vlcs. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	05577d2c76	avcodec/indeo2: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	25b9ff2780	avcodec/4xm: Avoid unnecessary VLC structures Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	e5dcde620d	avcodec/vc1: Avoid superfluous VLC structures For all VLCs here, the number of bits of the VLC is write-only, because it is hardcoded at the call site. Therefore one can replace these VLC structures with the only thing that is actually used: The pointer to the VLCElem table. And in some cases one can even avoid this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	fd4cb6ebee	avcodec/speedhqdec: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	0a610e22c1	avcodec/lagarith: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	e3ad5b9784	avcodec/imm4: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	f6c5d04b6d	avcodec/mimic: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	827d0325a9	avcodec/mobiclip: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and only VLC.table needs to be retained. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	36e7f9b339	avcodec/vqcdec: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	99ed510d4b	avcodec/mv30: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	b60a3f70be	avcodec/wnv1: Avoid unnecessary VLC structure Everything besides VLC.table is basically write-only and even VLC.table can be removed by accessing the underlying table directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	1ae750a16e	avcodec/rv34: Constify pointer to static object Said object is only allowed to be modified during its initialization and is immutable afterwards. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	716ddc8c62	avcodec/rv34: Avoid superfluous VLC structures For most VLCs here, the number of bits of the VLC is write-only, because it is hardcoded at the call site. Therefore one can replace these VLC structures with the only thing that is actually used: The pointer to the VLCElem table. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	73fa6d486d	avcodec/vp3: Reindent after the previous commits Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	75c6a253a4	avcodec/vp3: Avoid complete VLC struct, only use VLCElem* Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	6c7a344b65	avcodec/vp3: Share coefficient VLCs between threads These VLCs are very big: The VP3 one have 164382 elements but due to the overallocation enough memory for 313344 elements are allocated (1.195 MiB with sizeof(VLCElem) == 4); for VP4 the numbers are very similar, namely 311296 and 164392 elements. Since `1f4cf92cfb`, each frame thread has its own copy of these VLCs. This commit fixes this by sharing these VLCs across threads. The approach used here will also make it easier to support stream reconfigurations in case of frame-multithreading in the future. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	7fee90efac	avcodec/imc: Avoid superfluous VLC structures Of all these VLCs here, only VLC.table was really used after init, so use the ff_vlc_init_tables API to get rid of them. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00
Andreas Rheinhardt	6fb96ef755	avcodec/atrac9dec: Avoid superfluous VLC structures Of all these VLCs here, only VLC.table was really used after init, so use the ff_vlc_init_tables API to get rid of them. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-10-31 20:47:00 +01:00

... 4 5 6 7 8 ...

49499 Commits