FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00

Author	SHA1	Message	Date
James Almer	0cc0d8c0b5	avcodec/get_bits: add get_leb() Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-18 15:19:36 -03:00
James Almer	12eac23637	avcodec/packet: add IAMF Parameters side data types Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-18 15:19:30 -03:00
Rémi Denis-Courmont	419145c11b	lavc/vc1dsp: fix R-V V vector lengths The 8x4 and 4x4 use a needlessly large multiplier (unless/until we care about embedded 64-bit-vector hardware). This is merely suboptimal. The 8x4 case also uses an incorrect vector length, which leads to incorrect behaviour on future/hypothetical hardware with 256-bit or larger vectors. Pointed-out-by: Martin Storsjö <martin@martin.st>	2023-12-17 09:27:52 +02:00
Martin Storsjö	b51d9eb58e	riscv: vc1dsp: Don't check vlenb before checking the CPU flags We can't call ff_get_rv_vlenb() if we don't have RVV available at all. Acked-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-16 22:30:26 +02:00
Rémi Denis-Courmont	918b3ed2d5	lavc/lpc: R-V V compute_autocorr The loop iterates over the length of the vector, not the order. This is to avoid reloading the same data for each lag value. However this means the loop only works if the maximum order is no larger than VLENB. The loop is roughly equivalent to: for (size_t j = 0; j < lag; j++) autoc[j] = 1.; while (len > lag) { for (ptrdiff_t j = 0; j < lag; j++) autoc[j] += data[j] * data; data++; len--; } while (len > 0) { for (ptrdiff_t j = 0; j < len; j++) autoc[j] += data[j] *data; data++; len--; } Since register pressure is only at 50%, it should be possible to implement the same loop for order up to 2xVLENB. But this is left for future work. Performance numbers are all over the place from ~1.25x to ~4x speedups, but at least they are always noticeably better than nothing.	2023-12-16 11:18:01 +02:00
Nuo Mi	ce0c178a40	avcodec/cbs_h266: more restrictive check on pps_tile_idx_delta_val Fixes: out of array access Fixes: 62603/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-5837632490569728 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-14 23:53:10 +01:00
Pierre-Anthony Lemieux	a1384b4e86	avcodec/jpeg2000htdec: check if block decoding will exceed internal precision Intended to replace https://patchwork.ffmpeg.org/project/ffmpeg/patch/20230802000135.26482-3-michael@niedermayer.cc/ with a more accurate block decoding magnitude bound. Fixes: 62433/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5828618092937216 Fixes: 58299/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5828618092937216 Previous-version-reviewed-by: Tomas Härdin <git@haerdin.se> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-14 23:53:10 +01:00
James Almer	34d56e1766	x86/aacencdsp: clear the high bits for size in ff_abs_pow34_sse Fixes checkasm failures on win64. Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-12 15:24:08 -03:00
sunyuechi	98596f90f4	lavc/aacencdsp: R-V V abs_pow34 C908: abs_pow34_c: 535.5 abs_pow34_rvv_f32: 337.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-11 18:42:07 +02:00
sunyuechi	e880a97e7c	lvac/aacenc: add ff_aac_dsp_init This is for clarity and use in testing, consistent with other parts of the code. Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-11 18:42:04 +02:00
Rémi Denis-Courmont	272d0c164d	lavc/lpc: R-V V apply_welch_window apply_welch_window_even_c: 617.5 apply_welch_window_even_rvv_f64: 235.0 apply_welch_window_odd_c: 709.0 apply_welch_window_odd_rvv_f64: 256.5	2023-12-11 18:17:43 +02:00
Rémi Denis-Courmont	b3825bbe45	riscv: test for assembler support This should fix the build on LLVM 16 and earlier, at the cost of turning all non-RVV optimisations off.	2023-12-08 17:21:09 +02:00
sunyuechi	0b9d009b4a	lavc/vc1dsp: R-V V inv_trans C908: vc1dsp.vc1_inv_trans_4x4_dc_c: 125.7 vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5 vc1dsp.vc1_inv_trans_4x8_dc_c: 230.7 vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.5 vc1dsp.vc1_inv_trans_8x4_dc_c: 228.7 vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 64.5 vc1dsp.vc1_inv_trans_8x8_dc_c: 476.5 vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 80.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-08 17:20:48 +02:00
Mikhail Nitenko	0f745b74ec	lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions Benchmarks A53 A55 A72 A76 avg_h264_qpel_8_mc01_10_c: 936.5 924.0 656.0 504.7 avg_h264_qpel_8_mc01_10_neon: 234.7 202.0 120.7 63.2 avg_h264_qpel_8_mc02_10_c: 921.0 920.0 669.2 493.7 avg_h264_qpel_8_mc02_10_neon: 202.0 173.2 102.7 58.5 avg_h264_qpel_8_mc03_10_c: 936.5 924.0 656.0 509.5 avg_h264_qpel_8_mc03_10_neon: 236.2 203.7 120.0 63.2 avg_h264_qpel_8_mc10_10_c: 1441.0 1437.7 806.7 478.5 avg_h264_qpel_8_mc10_10_neon: 325.7 324.0 153.7 94.2 avg_h264_qpel_8_mc11_10_c: 2160.7 2148.2 1366.7 906.7 avg_h264_qpel_8_mc11_10_neon: 492.0 464.0 242.5 134.5 avg_h264_qpel_8_mc13_10_c: 2157.0 2138.2 1357.0 908.2 avg_h264_qpel_8_mc13_10_neon: 494.0 467.2 242.0 140.0 avg_h264_qpel_8_mc20_10_c: 1433.5 1410.0 785.2 486.0 avg_h264_qpel_8_mc20_10_neon: 293.7 289.7 138.0 91.5 avg_h264_qpel_8_mc30_10_c: 1458.5 1461.7 813.7 483.2 avg_h264_qpel_8_mc30_10_neon: 341.7 339.2 154.0 95.2 avg_h264_qpel_8_mc31_10_c: 2194.7 2197.2 1358.7 928.0 avg_h264_qpel_8_mc31_10_neon: 520.0 495.0 245.5 142.5 avg_h264_qpel_8_mc33_10_c: 2188.0 2205.5 1356.7 910.7 avg_h264_qpel_8_mc33_10_neon: 521.0 494.5 245.7 145.7 avg_h264_qpel_16_mc01_10_c: 3717.2 3595.0 2610.0 2012.0 avg_h264_qpel_16_mc01_10_neon: 920.5 791.5 483.2 240.5 avg_h264_qpel_16_mc02_10_c: 3684.0 3633.0 2659.0 1919.7 avg_h264_qpel_16_mc02_10_neon: 790.7 678.2 409.2 217.0 avg_h264_qpel_16_mc03_10_c: 3726.5 3596.0 2606.7 2010.0 avg_h264_qpel_16_mc03_10_neon: 922.0 792.5 483.2 239.7 avg_h264_qpel_16_mc10_10_c: 5912.0 5803.2 3241.5 1916.7 avg_h264_qpel_16_mc10_10_neon: 1267.5 1277.2 616.5 365.0 avg_h264_qpel_16_mc11_10_c: 8599.2 8482.5 5338.0 3616.2 avg_h264_qpel_16_mc11_10_neon: 1913.0 1827.0 956.2 542.2 avg_h264_qpel_16_mc13_10_c: 8643.7 8488.5 5388.0 3628.5 avg_h264_qpel_16_mc13_10_neon: 1914.7 1828.7 969.2 530.5 avg_h264_qpel_16_mc20_10_c: 5719.5 5641.0 3147.0 1946.2 avg_h264_qpel_16_mc20_10_neon: 1139.5 1150.0 539.5 344.0 avg_h264_qpel_16_mc30_10_c: 5930.0 5872.5 3267.5 1918.0 avg_h264_qpel_16_mc30_10_neon: 1331.5 1341.2 616.5 369.5 avg_h264_qpel_16_mc31_10_c: 8758.7 8697.7 5353.0 3630.7 avg_h264_qpel_16_mc31_10_neon: 2018.7 1941.7 982.2 574.7 avg_h264_qpel_16_mc33_10_c: 8683.2 8675.2 5339.2 3634.7 avg_h264_qpel_16_mc33_10_neon: 2019.7 1940.2 994.5 566.0 put_h264_qpel_8_mc01_10_c: 854.2 843.0 599.2 478.0 put_h264_qpel_8_mc01_10_neon: 192.7 168.0 101.7 56.7 put_h264_qpel_8_mc02_10_c: 766.5 760.0 550.2 441.0 put_h264_qpel_8_mc02_10_neon: 160.0 139.2 88.7 53.0 put_h264_qpel_8_mc03_10_c: 854.2 843.0 599.2 479.0 put_h264_qpel_8_mc03_10_neon: 194.2 169.7 102.0 56.2 put_h264_qpel_8_mc10_10_c: 1352.7 1353.7 749.7 446.7 put_h264_qpel_8_mc10_10_neon: 289.7 294.2 135.5 88.5 put_h264_qpel_8_mc11_10_c: 2080.0 2066.2 1309.5 876.7 put_h264_qpel_8_mc11_10_neon: 450.0 429.7 229.7 131.2 put_h264_qpel_8_mc13_10_c: 2074.7 2060.2 1294.5 870.5 put_h264_qpel_8_mc13_10_neon: 452.5 434.5 226.5 130.0 put_h264_qpel_8_mc20_10_c: 1221.5 1216.0 684.5 399.7 put_h264_qpel_8_mc20_10_neon: 257.7 262.5 121.2 78.7 put_h264_qpel_8_mc30_10_c: 1379.0 1374.7 757.2 449.5 put_h264_qpel_8_mc30_10_neon: 305.7 310.2 135.5 86.5 put_h264_qpel_8_mc31_10_c: 2109.2 2119.7 1299.5 878.0 put_h264_qpel_8_mc31_10_neon: 478.0 458.5 226.0 137.2 put_h264_qpel_8_mc33_10_c: 2101.5 2115.2 1306.5 887.0 put_h264_qpel_8_mc33_10_neon: 479.0 458.7 229.7 141.7 put_h264_qpel_16_mc01_10_c: 3485.7 3396.7 2460.5 1914.5 put_h264_qpel_16_mc01_10_neon: 752.5 665.5 397.0 213.2 put_h264_qpel_16_mc02_10_c: 3103.5 3023.2 2154.7 1720.7 put_h264_qpel_16_mc02_10_neon: 622.7 551.2 347.7 196.2 put_h264_qpel_16_mc03_10_c: 3486.2 3394.0 2436.5 1917.7 put_h264_qpel_16_mc03_10_neon: 754.0 666.5 397.0 215.7 put_h264_qpel_16_mc10_10_c: 5533.0 5488.5 2989.0 1783.0 put_h264_qpel_16_mc10_10_neon: 1123.5 1165.2 535.2 334.7 put_h264_qpel_16_mc11_10_c: 8437.7 8281.2 5209.0 3510.7 put_h264_qpel_16_mc11_10_neon: 1745.0 1697.0 878.5 513.5 put_h264_qpel_16_mc13_10_c: 8567.7 8468.0 5221.5 3528.0 put_h264_qpel_16_mc13_10_neon: 1751.7 1698.2 889.2 507.0 put_h264_qpel_16_mc20_10_c: 4907.5 4885.0 2786.2 1607.5 put_h264_qpel_16_mc20_10_neon: 995.5 1034.5 475.5 307.0 put_h264_qpel_16_mc30_10_c: 5579.7 5537.7 3045.2 1789.5 put_h264_qpel_16_mc30_10_neon: 1187.5 1231.2 532.5 334.5 put_h264_qpel_16_mc31_10_c: 8677.2 8672.5 5204.2 3516.0 put_h264_qpel_16_mc31_10_neon: 1850.7 1813.2 893.0 545.2 put_h264_qpel_16_mc33_10_c: 8688.7 8671.2 5223.2 3512.0 put_h264_qpel_16_mc33_10_neon: 1851.7 1814.2 908.5 535.2 Signed-off-by: Mikhail Nitenko <mnitenko@gmail.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-07 23:20:14 +02:00
sunyuechi	8bdb663062	lavc/ac3dsp: R-V V float_to_fixed24 c910 float_to_fixed24_c: 2207.2 float_to_fixed24_rvv_f32: 696.2 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-12-06 16:04:22 +02:00
Paul B Mahol	7e453dad3c	avcodec/qoadec: fix overreads and fix packet size check	2023-12-05 14:50:21 +01:00
Michael Niedermayer	22daf2148f	avcodec/av1dec: Fix resolving zero divisor Fixes: Out of array read Fixes: global-buffer-overflow-AV1 Found-by: "Leonelli, Matteo" <matteo.leonelli@cispa.de> Tested-by: "Wang, Fei W" <fei.w.wang@intel.com> Reviewed-by: "Wang, Fei W" <fei.w.wang@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-12-05 12:38:16 +01:00
Leo Izen	c4be080e65	avcodec/jpegxl_parser: fix parsing sequences of extremely small files This patch allows the JXL parser to parse sequences of extremely small files concatenated together. (e.g. smaller than the parser buffer) Signed-off-by: Leo Izen <leo.izen@gmail.com>	2023-12-05 05:54:34 -05:00
Leo Izen	019b3ea65a	avcodec/jpegxl_parse{,r}: use correct ISOBMFF extended size location According to ISO/IEC 14996-12, size == 1 means a 64-bit extended-size field occurs after the 32-bit box type, not before. This fix should allow correct parsing of JXL files with extended-size boxes. Signed-off-by: Leo Izen <leo.izen@gmail.com>	2023-12-05 05:53:32 -05:00
Haihao Xiang	fc73b372cd	lavc/qsvdec: reduce info message when more data is required demote the info to AV_LOG_VERBOSE Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2023-12-05 10:10:57 +08:00
Haihao Xiang	e233f3e75f	lavc/qsvdec: return 0 if more data is required The type of qsv decoders is FF_CODEC_CB_TYPE_DECODE which must not return AVERROR(EAGAIN). commit `42b20c9` added an assertion to check the returned value. Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2023-12-05 10:10:57 +08:00
Lynne	8c117b75af	lavc/Makefile: build vulkan decode code if vulkan_av1 has been enabled Forgotten. Reviewed-by: Neal Gompa <ngompa13@gmail.com> Tested-by: Neal Gompa <ngompa13@gmail.com>	2023-12-04 07:57:27 +01:00
Paul B Mahol	0a13178de8	avcodec/qoadec: add support for midstream sample rate/layout changes	2023-12-02 16:51:00 +01:00
Anton Khirnov	5230257ea1	lavc/dvdsubenc: only check canvas size when it is actually set Fixes #10650	2023-12-02 11:22:46 +01:00
Logan Lyu	fa0470347e	lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_hv put_hevc_qpel_bi_hv4_8_c: 433.7 put_hevc_qpel_bi_hv4_8_i8mm: 117.9 put_hevc_qpel_bi_hv6_8_c: 803.9 put_hevc_qpel_bi_hv6_8_i8mm: 252.7 put_hevc_qpel_bi_hv8_8_c: 1296.4 put_hevc_qpel_bi_hv8_8_i8mm: 316.2 put_hevc_qpel_bi_hv12_8_c: 2867.4 put_hevc_qpel_bi_hv12_8_i8mm: 669.2 put_hevc_qpel_bi_hv16_8_c: 4709.4 put_hevc_qpel_bi_hv16_8_i8mm: 929.9 put_hevc_qpel_bi_hv24_8_c: 9639.7 put_hevc_qpel_bi_hv24_8_i8mm: 2072.4 put_hevc_qpel_bi_hv32_8_c: 16663.7 put_hevc_qpel_bi_hv32_8_i8mm: 3391.4 put_hevc_qpel_bi_hv48_8_c: 36972.9 put_hevc_qpel_bi_hv48_8_i8mm: 7505.7 put_hevc_qpel_bi_hv64_8_c: 64106.4 put_hevc_qpel_bi_hv64_8_i8mm: 13145.2 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	595f97028b	lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_v put_hevc_qpel_bi_v4_8_c: 166.1 put_hevc_qpel_bi_v4_8_neon: 61.9 put_hevc_qpel_bi_v6_8_c: 309.4 put_hevc_qpel_bi_v6_8_neon: 75.6 put_hevc_qpel_bi_v8_8_c: 531.1 put_hevc_qpel_bi_v8_8_neon: 78.1 put_hevc_qpel_bi_v12_8_c: 1139.9 put_hevc_qpel_bi_v12_8_neon: 238.1 put_hevc_qpel_bi_v16_8_c: 2063.6 put_hevc_qpel_bi_v16_8_neon: 308.9 put_hevc_qpel_bi_v24_8_c: 4317.1 put_hevc_qpel_bi_v24_8_neon: 629.9 put_hevc_qpel_bi_v32_8_c: 8241.9 put_hevc_qpel_bi_v32_8_neon: 1140.1 put_hevc_qpel_bi_v48_8_c: 18422.9 put_hevc_qpel_bi_v48_8_neon: 2533.9 put_hevc_qpel_bi_v64_8_c: 37508.6 put_hevc_qpel_bi_v64_8_neon: 4520.1 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	00290a64f7	lavc/aarch64: new optimization for 8-bit hevc_epel_bi_hv put_hevc_epel_bi_hv4_8_c: 242.9 put_hevc_epel_bi_hv4_8_i8mm: 68.6 put_hevc_epel_bi_hv6_8_c: 402.4 put_hevc_epel_bi_hv6_8_i8mm: 135.9 put_hevc_epel_bi_hv8_8_c: 636.4 put_hevc_epel_bi_hv8_8_i8mm: 145.6 put_hevc_epel_bi_hv12_8_c: 1363.1 put_hevc_epel_bi_hv12_8_i8mm: 324.1 put_hevc_epel_bi_hv16_8_c: 2222.1 put_hevc_epel_bi_hv16_8_i8mm: 509.1 put_hevc_epel_bi_hv24_8_c: 4793.4 put_hevc_epel_bi_hv24_8_i8mm: 1091.9 put_hevc_epel_bi_hv32_8_c: 8393.9 put_hevc_epel_bi_hv32_8_i8mm: 1720.6 put_hevc_epel_bi_hv48_8_c: 19526.6 put_hevc_epel_bi_hv48_8_i8mm: 4285.9 put_hevc_epel_bi_hv64_8_c: 33915.4 put_hevc_epel_bi_hv64_8_i8mm: 6783.6 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	0448f27f41	lavc/aarch64: new optimization for 8-bit hevc_epel_bi_v put_hevc_epel_bi_v4_8_c: 138.4 put_hevc_epel_bi_v4_8_neon: 33.7 put_hevc_epel_bi_v6_8_c: 302.9 put_hevc_epel_bi_v6_8_neon: 46.7 put_hevc_epel_bi_v8_8_c: 408.7 put_hevc_epel_bi_v8_8_neon: 48.7 put_hevc_epel_bi_v12_8_c: 779.4 put_hevc_epel_bi_v12_8_neon: 139.7 put_hevc_epel_bi_v16_8_c: 1344.9 put_hevc_epel_bi_v16_8_neon: 160.2 put_hevc_epel_bi_v24_8_c: 2981.7 put_hevc_epel_bi_v24_8_neon: 344.9 put_hevc_epel_bi_v32_8_c: 5280.9 put_hevc_epel_bi_v32_8_neon: 618.4 put_hevc_epel_bi_v48_8_c: 12494.9 put_hevc_epel_bi_v48_8_neon: 1364.4 put_hevc_epel_bi_v64_8_c: 22127.7 put_hevc_epel_bi_v64_8_neon: 2473.7 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	216275bd80	lavc/aarch64: new optimization for 8-bit hevc_epel_bi_h put_hevc_epel_bi_h4_8_c: 96.0 put_hevc_epel_bi_h4_8_neon: 36.3 put_hevc_epel_bi_h6_8_c: 288.3 put_hevc_epel_bi_h6_8_neon: 59.3 put_hevc_epel_bi_h8_8_c: 358.5 put_hevc_epel_bi_h8_8_neon: 61.5 put_hevc_epel_bi_h12_8_c: 759.8 put_hevc_epel_bi_h12_8_neon: 159.5 put_hevc_epel_bi_h16_8_c: 1307.0 put_hevc_epel_bi_h16_8_neon: 182.0 put_hevc_epel_bi_h24_8_c: 2778.3 put_hevc_epel_bi_h24_8_neon: 430.5 put_hevc_epel_bi_h32_8_c: 4952.3 put_hevc_epel_bi_h32_8_neon: 679.5 put_hevc_epel_bi_h48_8_c: 11803.3 put_hevc_epel_bi_h48_8_neon: 1443.5 put_hevc_epel_bi_h64_8_c: 20654.8 put_hevc_epel_bi_h64_8_neon: 2737.0 put_hevc_qpel_bi_h4_8_c: 140.0 put_hevc_qpel_bi_h4_8_neon: 111.5 put_hevc_qpel_bi_h6_8_c: 318.0 put_hevc_qpel_bi_h6_8_neon: 85.8 put_hevc_qpel_bi_h8_8_c: 536.5 put_hevc_qpel_bi_h8_8_neon: 95.3 put_hevc_qpel_bi_h12_8_c: 1188.5 put_hevc_qpel_bi_h12_8_neon: 291.3 put_hevc_qpel_bi_h16_8_c: 2064.3 put_hevc_qpel_bi_h16_8_neon: 365.3 put_hevc_qpel_bi_h24_8_c: 4757.5 put_hevc_qpel_bi_h24_8_neon: 1010.0 put_hevc_qpel_bi_h32_8_c: 8351.8 put_hevc_qpel_bi_h32_8_neon: 2917.8 put_hevc_qpel_bi_h48_8_c: 19299.8 put_hevc_qpel_bi_h48_8_neon: 2976.8 put_hevc_qpel_bi_h64_8_c: 34182.5 put_hevc_qpel_bi_h64_8_neon: 5236.3 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
Logan Lyu	40cf4a5ca3	lavc/aarch64: new optimization for 8-bit hevc_pel_bi_pixels put_hevc_pel_bi_pixels4_8_c: 54.7 put_hevc_pel_bi_pixels4_8_neon: 43.0 put_hevc_pel_bi_pixels6_8_c: 94.7 put_hevc_pel_bi_pixels6_8_neon: 37.0 put_hevc_pel_bi_pixels8_8_c: 171.0 put_hevc_pel_bi_pixels8_8_neon: 24.0 put_hevc_pel_bi_pixels12_8_c: 354.0 put_hevc_pel_bi_pixels12_8_neon: 68.7 put_hevc_pel_bi_pixels16_8_c: 588.2 put_hevc_pel_bi_pixels16_8_neon: 77.5 put_hevc_pel_bi_pixels24_8_c: 1670.7 put_hevc_pel_bi_pixels24_8_neon: 173.0 put_hevc_pel_bi_pixels32_8_c: 2267.7 put_hevc_pel_bi_pixels32_8_neon: 281.2 put_hevc_pel_bi_pixels48_8_c: 5787.5 put_hevc_pel_bi_pixels48_8_neon: 673.5 put_hevc_pel_bi_pixels64_8_c: 9897.0 put_hevc_pel_bi_pixels64_8_neon: 1159.5 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-12-01 21:25:39 +02:00
James Almer	6d19611251	avcodec/ac3dsp: add missing stddef.h include Should fix make checkheaders Signed-off-by: James Almer <jamrial@gmail.com>	2023-12-01 12:42:22 -03:00
xufuji456	cc86343b96	lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b" Signed-off-by: xufuji456 <839789740@qq.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-11-28 15:54:49 +02:00
Zhao Zhili	d526a34c20	avcodec/videotoolboxenc: refactor dump encoder name Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-27 23:49:01 +08:00
Zhao Zhili	cb049d377f	avcodec/videotoolboxenc: Fix build failure due to PropertyKey_EncoderID Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-27 23:48:55 +08:00
Paul B Mahol	3609d2b783	avcodec: add QOA decoder	2023-11-26 17:49:09 +01:00
Geoffrey McRae	93b5d9030b	libavcodec/mlpdec: add missing correction to ch_layout when downmixing This fixes corrupted audio for applications relying on ch_layout when codec downmixing is active. Signed-off-by: Geoffrey McRae <geoff@hostfission.com> Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-26 10:18:33 -03:00
Geoffrey McRae	a8677bcc8f	libavcodec/dcadec: adjust the `ch_layout` when downmix is active Applications making use of this codec with the `downmix` option are segfaulting unless the `ch_layout` is overridden after `avcodec_open2` as can be seen in projects like MythTV[1] This patch fixes this by overriding the ch_layout as done in other decoders such as AC3. 1: `af6f362a14/mythtv/libs/libmythtv/decoders/avformatdecoder.cpp (L4607)` Signed-off-by: Geoffrey McRae <geoff@hostfission.com> Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-26 10:18:33 -03:00
James Almer	72390dea00	mips/ac3dsp_mips: add missing stddef.h header include Fixes compilation failures after `567c67c6c8`. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-25 21:51:04 -03:00
James Almer	e40ea9f34b	x86/ac3dsp: add ff_float_to_fixed24_avx() Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-25 21:50:56 -03:00
James Almer	d8b1a34433	x86/ac3dsp: reduce instruction count inside the float_to_fixed24 loop Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-25 21:50:56 -03:00
Rémi Denis-Courmont	0fa421c8f1	lavc/llvidencdsp: add R-V V diff_bytes diff_bytes_c: 163.0 diff_bytes_rvv_i32: 52.7	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	0183c2c830	lavc/aacpsdsp: use LMUL=2 and amortise strides The input is laid out in 16 segments, of which 13 actually need to be loaded. There are no really efficient ways to deal with this: 1) If we load 8 segments wit unit stride, then narrow to 16 segments with right shifts, we can only get one half-size vector per segment, or just 2 elements per vector (EMUL=1/2) - at least with 128-bit vectors. This ends up unsurprisingly about as fas as the C code. 2) The current approach is to load with strides. We keep that approach, but improve it using three 4-segmented loads instead of 12 single-segment loads. This divides the number of distinct loaded addresses by 4. 3) A potential third approach would be to avoid segmentation altogether and splat the scalar coefficient into vectors. Then we can use a unit-stride and maximum EMUL. But the downside then is that we have to multiply the 3 (of 16) unused segments with zero as part of the multiply-accumulate operations. In addition, we also reuse vectors mid-loop so as to increase the EMUL from 1 to 2, which also improves performance a little bit. Oeverall the gains are quite small with the device under test, as it does not deal with segmented loads very well. But at least the code is tidier, and should enjoy bigger speed-ups on better hardware implementation. Before: ps_hybrid_analysis_c: 1819.2 ps_hybrid_analysis_rvv_f32: 1037.0 (before) ps_hybrid_analysis_rvv_f32: 990.0 (after)	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	b88d4058f9	lavc/g722dsp: optimise R-V V apply_qmf This stores the constant coefficients deinterleaved, so that they can be loaded directly with NF=0. Unfortunately, we cannot optimise loading the input, due to insufficient memory alignment (not 32-bit). Before: g722_apply_qmf_c: 82.5 g722_apply_qmf_rvv_i32: 78.2 After: g722_apply_qmf_c: 82.5 g722_apply_qmf_rvv_i32: 65.2	2023-11-23 18:57:18 +02:00
James Almer	567c67c6c8	avcodec/ac3dsp: make len a size_t in float_to_fixed24 Should simplify asm implementations, and prevent UB on at least win64. Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-22 18:33:00 -03:00
James Almer	2d9fd814d0	x86/: clear the high bits for order in scalarproduct_and_madd functions Should fix checkasm failures on win64. Signed-off-by: James Almer <jamrial@gmail.com>	2023-11-22 14:18:42 -03:00
Zhao Zhili	e8a49b1424	avcodec/mmaldec: Fix build error Fix #10670. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 21:02:04 +08:00
Zhao Zhili	f27fce0c0c	avcodec/mediacodecdec: fix return EAGAIN after EOF Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 21:02:04 +08:00
Dmitry Rogozhkin	e9c93009fc	avcodec/decode: validate hw_frames_ctx when AVHWAccel.free_frame_priv is used Validate that a hw_frames_ctx is available before using it for the AVHWAccel.free_frame_priv callback, and don't require it to be present when the callback is not in use by the HWAccel. v2: check for free_frame_priv (Hendrik) v3: return EINVAL (Christoph Reiter) v4: better commit message (Hendrik) v5: fix typo with missed frames_ctx (Lynne) See[1]: https://github.com/msys2/MINGW-packages/pull/19050 Fixes: `be07145109` ("avcodec: add AVHWAccel.free_frame_priv callback") CC: Lynne <dev@lynne.ee> CC: Christoph Reiter <reiter.christoph@gmail.com> Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2023-11-22 05:01:16 +01:00
Zhao Zhili	aa3b857101	avcodec/h264_mp4toannexb_bsf: process new extradata For fate-h264_mp4toannexb_ticket5927 and fate-h264_mp4toannexb_ticket5927_2, they work by accident previously. The sample file has two 'avc1' entries, and video samples use the second one. It means packets should be decoded with new extradata in side data. Before this patch, only extradata was kept in the output, new extradata has been dropped. The output can be decoded because the two extradata are almost the same, except level indication. This patch fixed the issue, and add another fate test. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 19:42:14 +08:00
Zhao Zhili	d3aa0cd16f	avcodec/h264_mp4toannexb_bsf: fix missing PS before IDR frames If there is a single group of SPS/PPS before an IDR frame, but no SPS/PPS after that, we will miss the chance to reset idr_sps_seen/idr_pps_seen. No SPS/PPS are inserted afterwards. This patch saves in-band SPS/PPS and insert them before IDR frames when necessary. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-11-22 19:42:14 +08:00

1 2 3 4 5 ...

49116 Commits