1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00
Commit Graph

49116 Commits

Author SHA1 Message Date
James Almer
0cc0d8c0b5 avcodec/get_bits: add get_leb()
Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-18 15:19:36 -03:00
James Almer
12eac23637 avcodec/packet: add IAMF Parameters side data types
Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-18 15:19:30 -03:00
Rémi Denis-Courmont
419145c11b lavc/vc1dsp: fix R-V V vector lengths
The 8x4 and 4x4 use a needlessly large multiplier (unless/until we care
about embedded 64-bit-vector hardware). This is merely suboptimal.

The 8x4 case also uses an incorrect vector length, which leads to incorrect
behaviour on future/hypothetical hardware with 256-bit or larger vectors.

Pointed-out-by: Martin Storsjö <martin@martin.st>
2023-12-17 09:27:52 +02:00
Martin Storsjö
b51d9eb58e riscv: vc1dsp: Don't check vlenb before checking the CPU flags
We can't call ff_get_rv_vlenb() if we don't have RVV available
at all.

Acked-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-16 22:30:26 +02:00
Rémi Denis-Courmont
918b3ed2d5 lavc/lpc: R-V V compute_autocorr
The loop iterates over the length of the vector, not the order. This is
to avoid reloading the same data for each lag value. However this means
the loop only works if the maximum order is no larger than VLENB.

The loop is roughly equivalent to:

    for (size_t j = 0; j < lag; j++)
        autoc[j] = 1.;

    while (len > lag) {
        for (ptrdiff_t j = 0; j < lag; j++)
            autoc[j] += data[j] * *data;
        data++;
        len--;
    }

    while (len > 0) {
        for (ptrdiff_t j = 0; j < len; j++)
            autoc[j] += data[j] * *data;
        data++;
        len--;
    }

Since register pressure is only at 50%, it should be possible to implement
the same loop for order up to 2xVLENB. But this is left for future work.

Performance numbers are all over the place from ~1.25x to ~4x speedups,
but at least they are always noticeably better than nothing.
2023-12-16 11:18:01 +02:00
Nuo Mi
ce0c178a40
avcodec/cbs_h266: more restrictive check on pps_tile_idx_delta_val
Fixes: out of array access
Fixes: 62603/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-5837632490569728

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-14 23:53:10 +01:00
Pierre-Anthony Lemieux
a1384b4e86
avcodec/jpeg2000htdec: check if block decoding will exceed internal precision
Intended to replace https://patchwork.ffmpeg.org/project/ffmpeg/patch/20230802000135.26482-3-michael@niedermayer.cc/
with a more accurate block decoding magnitude bound.

Fixes: 62433/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5828618092937216
Fixes: 58299/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-5828618092937216
Previous-version-reviewed-by: Tomas Härdin <git@haerdin.se>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-14 23:53:10 +01:00
James Almer
34d56e1766 x86/aacencdsp: clear the high bits for size in ff_abs_pow34_sse
Fixes checkasm failures on win64.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-12 15:24:08 -03:00
sunyuechi
98596f90f4 lavc/aacencdsp: R-V V abs_pow34
C908:
abs_pow34_c: 535.5
abs_pow34_rvv_f32: 337.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-11 18:42:07 +02:00
sunyuechi
e880a97e7c lvac/aacenc: add ff_aac_dsp_init
This is for clarity and use in testing, consistent with other parts of the code.

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-11 18:42:04 +02:00
Rémi Denis-Courmont
272d0c164d lavc/lpc: R-V V apply_welch_window
apply_welch_window_even_c:       617.5
apply_welch_window_even_rvv_f64: 235.0
apply_welch_window_odd_c:        709.0
apply_welch_window_odd_rvv_f64:  256.5
2023-12-11 18:17:43 +02:00
Rémi Denis-Courmont
b3825bbe45 riscv: test for assembler support
This should fix the build on LLVM 16 and earlier, at the cost of turning
all non-RVV optimisations off.
2023-12-08 17:21:09 +02:00
sunyuechi
0b9d009b4a lavc/vc1dsp: R-V V inv_trans
C908:
vc1dsp.vc1_inv_trans_4x4_dc_c:      125.7
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5
vc1dsp.vc1_inv_trans_4x8_dc_c:      230.7
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.5
vc1dsp.vc1_inv_trans_8x4_dc_c:      228.7
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 64.5
vc1dsp.vc1_inv_trans_8x8_dc_c:      476.5
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 80.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-08 17:20:48 +02:00
Mikhail Nitenko
0f745b74ec lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions
Benchmarks                         A53      A55     A72     A76
avg_h264_qpel_8_mc01_10_c:        936.5    924.0   656.0   504.7
avg_h264_qpel_8_mc01_10_neon:     234.7    202.0   120.7    63.2
avg_h264_qpel_8_mc02_10_c:        921.0    920.0   669.2   493.7
avg_h264_qpel_8_mc02_10_neon:     202.0    173.2   102.7    58.5
avg_h264_qpel_8_mc03_10_c:        936.5    924.0   656.0   509.5
avg_h264_qpel_8_mc03_10_neon:     236.2    203.7   120.0    63.2
avg_h264_qpel_8_mc10_10_c:       1441.0   1437.7   806.7   478.5
avg_h264_qpel_8_mc10_10_neon:     325.7    324.0   153.7    94.2
avg_h264_qpel_8_mc11_10_c:       2160.7   2148.2  1366.7   906.7
avg_h264_qpel_8_mc11_10_neon:     492.0    464.0   242.5   134.5
avg_h264_qpel_8_mc13_10_c:       2157.0   2138.2  1357.0   908.2
avg_h264_qpel_8_mc13_10_neon:     494.0    467.2   242.0   140.0
avg_h264_qpel_8_mc20_10_c:       1433.5   1410.0   785.2   486.0
avg_h264_qpel_8_mc20_10_neon:     293.7    289.7   138.0    91.5
avg_h264_qpel_8_mc30_10_c:       1458.5   1461.7   813.7   483.2
avg_h264_qpel_8_mc30_10_neon:     341.7    339.2   154.0    95.2
avg_h264_qpel_8_mc31_10_c:       2194.7   2197.2  1358.7   928.0
avg_h264_qpel_8_mc31_10_neon:     520.0    495.0   245.5   142.5
avg_h264_qpel_8_mc33_10_c:       2188.0   2205.5  1356.7   910.7
avg_h264_qpel_8_mc33_10_neon:     521.0    494.5   245.7   145.7
avg_h264_qpel_16_mc01_10_c:      3717.2   3595.0  2610.0  2012.0
avg_h264_qpel_16_mc01_10_neon:    920.5    791.5   483.2   240.5
avg_h264_qpel_16_mc02_10_c:      3684.0   3633.0  2659.0  1919.7
avg_h264_qpel_16_mc02_10_neon:    790.7    678.2   409.2   217.0
avg_h264_qpel_16_mc03_10_c:      3726.5   3596.0  2606.7  2010.0
avg_h264_qpel_16_mc03_10_neon:    922.0    792.5   483.2   239.7
avg_h264_qpel_16_mc10_10_c:      5912.0   5803.2  3241.5  1916.7
avg_h264_qpel_16_mc10_10_neon:   1267.5   1277.2   616.5   365.0
avg_h264_qpel_16_mc11_10_c:      8599.2   8482.5  5338.0  3616.2
avg_h264_qpel_16_mc11_10_neon:   1913.0   1827.0   956.2   542.2
avg_h264_qpel_16_mc13_10_c:      8643.7   8488.5  5388.0  3628.5
avg_h264_qpel_16_mc13_10_neon:   1914.7   1828.7   969.2   530.5
avg_h264_qpel_16_mc20_10_c:      5719.5   5641.0  3147.0  1946.2
avg_h264_qpel_16_mc20_10_neon:   1139.5   1150.0   539.5   344.0
avg_h264_qpel_16_mc30_10_c:      5930.0   5872.5  3267.5  1918.0
avg_h264_qpel_16_mc30_10_neon:   1331.5   1341.2   616.5   369.5
avg_h264_qpel_16_mc31_10_c:      8758.7   8697.7  5353.0  3630.7
avg_h264_qpel_16_mc31_10_neon:   2018.7   1941.7   982.2   574.7
avg_h264_qpel_16_mc33_10_c:      8683.2   8675.2  5339.2  3634.7
avg_h264_qpel_16_mc33_10_neon:   2019.7   1940.2   994.5   566.0
put_h264_qpel_8_mc01_10_c:        854.2    843.0   599.2   478.0
put_h264_qpel_8_mc01_10_neon:     192.7    168.0   101.7    56.7
put_h264_qpel_8_mc02_10_c:        766.5    760.0   550.2   441.0
put_h264_qpel_8_mc02_10_neon:     160.0    139.2    88.7    53.0
put_h264_qpel_8_mc03_10_c:        854.2    843.0   599.2   479.0
put_h264_qpel_8_mc03_10_neon:     194.2    169.7   102.0    56.2
put_h264_qpel_8_mc10_10_c:       1352.7   1353.7   749.7   446.7
put_h264_qpel_8_mc10_10_neon:     289.7    294.2   135.5    88.5
put_h264_qpel_8_mc11_10_c:       2080.0   2066.2  1309.5   876.7
put_h264_qpel_8_mc11_10_neon:     450.0    429.7   229.7   131.2
put_h264_qpel_8_mc13_10_c:       2074.7   2060.2  1294.5   870.5
put_h264_qpel_8_mc13_10_neon:     452.5    434.5   226.5   130.0
put_h264_qpel_8_mc20_10_c:       1221.5   1216.0   684.5   399.7
put_h264_qpel_8_mc20_10_neon:     257.7    262.5   121.2    78.7
put_h264_qpel_8_mc30_10_c:       1379.0   1374.7   757.2   449.5
put_h264_qpel_8_mc30_10_neon:     305.7    310.2   135.5    86.5
put_h264_qpel_8_mc31_10_c:       2109.2   2119.7  1299.5   878.0
put_h264_qpel_8_mc31_10_neon:     478.0    458.5   226.0   137.2
put_h264_qpel_8_mc33_10_c:       2101.5   2115.2  1306.5   887.0
put_h264_qpel_8_mc33_10_neon:     479.0    458.7   229.7   141.7
put_h264_qpel_16_mc01_10_c:      3485.7   3396.7  2460.5  1914.5
put_h264_qpel_16_mc01_10_neon:    752.5    665.5   397.0   213.2
put_h264_qpel_16_mc02_10_c:      3103.5   3023.2  2154.7  1720.7
put_h264_qpel_16_mc02_10_neon:    622.7    551.2   347.7   196.2
put_h264_qpel_16_mc03_10_c:      3486.2   3394.0  2436.5  1917.7
put_h264_qpel_16_mc03_10_neon:    754.0    666.5   397.0   215.7
put_h264_qpel_16_mc10_10_c:      5533.0   5488.5  2989.0  1783.0
put_h264_qpel_16_mc10_10_neon:   1123.5   1165.2   535.2   334.7
put_h264_qpel_16_mc11_10_c:      8437.7   8281.2  5209.0  3510.7
put_h264_qpel_16_mc11_10_neon:   1745.0   1697.0   878.5   513.5
put_h264_qpel_16_mc13_10_c:      8567.7   8468.0  5221.5  3528.0
put_h264_qpel_16_mc13_10_neon:   1751.7   1698.2   889.2   507.0
put_h264_qpel_16_mc20_10_c:      4907.5   4885.0  2786.2  1607.5
put_h264_qpel_16_mc20_10_neon:    995.5   1034.5   475.5   307.0
put_h264_qpel_16_mc30_10_c:      5579.7   5537.7  3045.2  1789.5
put_h264_qpel_16_mc30_10_neon:   1187.5   1231.2   532.5   334.5
put_h264_qpel_16_mc31_10_c:      8677.2   8672.5  5204.2  3516.0
put_h264_qpel_16_mc31_10_neon:   1850.7   1813.2   893.0   545.2
put_h264_qpel_16_mc33_10_c:      8688.7   8671.2  5223.2  3512.0
put_h264_qpel_16_mc33_10_neon:   1851.7   1814.2   908.5   535.2

Signed-off-by: Mikhail Nitenko <mnitenko@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-07 23:20:14 +02:00
sunyuechi
8bdb663062 lavc/ac3dsp: R-V V float_to_fixed24
c910
    float_to_fixed24_c: 2207.2
    float_to_fixed24_rvv_f32: 696.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-06 16:04:22 +02:00
Paul B Mahol
7e453dad3c avcodec/qoadec: fix overreads and fix packet size check 2023-12-05 14:50:21 +01:00
Michael Niedermayer
22daf2148f
avcodec/av1dec: Fix resolving zero divisor
Fixes: Out of array read
Fixes: global-buffer-overflow-AV1

Found-by: "Leonelli, Matteo" <matteo.leonelli@cispa.de>
Tested-by: "Wang, Fei W" <fei.w.wang@intel.com>
Reviewed-by: "Wang, Fei W" <fei.w.wang@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-05 12:38:16 +01:00
Leo Izen
c4be080e65
avcodec/jpegxl_parser: fix parsing sequences of extremely small files
This patch allows the JXL parser to parse sequences of extremely small
files concatenated together. (e.g. smaller than the parser buffer)

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-05 05:54:34 -05:00
Leo Izen
019b3ea65a
avcodec/jpegxl_parse{,r}: use correct ISOBMFF extended size location
According to ISO/IEC 14996-12, size == 1 means a 64-bit extended-size
field occurs *after* the 32-bit box type, not before. This fix should
allow correct parsing of JXL files with extended-size boxes.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-05 05:53:32 -05:00
Haihao Xiang
fc73b372cd lavc/qsvdec: reduce info message when more data is required
demote the info to AV_LOG_VERBOSE

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:10:57 +08:00
Haihao Xiang
e233f3e75f lavc/qsvdec: return 0 if more data is required
The type of qsv decoders is FF_CODEC_CB_TYPE_DECODE which must not
return AVERROR(EAGAIN). commit 42b20c9 added an assertion to check the
returned value.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:10:57 +08:00
Lynne
8c117b75af
lavc/Makefile: build vulkan decode code if vulkan_av1 has been enabled
Forgotten.

Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Tested-by: Neal Gompa <ngompa13@gmail.com>
2023-12-04 07:57:27 +01:00
Paul B Mahol
0a13178de8 avcodec/qoadec: add support for midstream sample rate/layout changes 2023-12-02 16:51:00 +01:00
Anton Khirnov
5230257ea1 lavc/dvdsubenc: only check canvas size when it is actually set
Fixes #10650
2023-12-02 11:22:46 +01:00
Logan Lyu
fa0470347e lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_hv
put_hevc_qpel_bi_hv4_8_c: 433.7
put_hevc_qpel_bi_hv4_8_i8mm: 117.9
put_hevc_qpel_bi_hv6_8_c: 803.9
put_hevc_qpel_bi_hv6_8_i8mm: 252.7
put_hevc_qpel_bi_hv8_8_c: 1296.4
put_hevc_qpel_bi_hv8_8_i8mm: 316.2
put_hevc_qpel_bi_hv12_8_c: 2867.4
put_hevc_qpel_bi_hv12_8_i8mm: 669.2
put_hevc_qpel_bi_hv16_8_c: 4709.4
put_hevc_qpel_bi_hv16_8_i8mm: 929.9
put_hevc_qpel_bi_hv24_8_c: 9639.7
put_hevc_qpel_bi_hv24_8_i8mm: 2072.4
put_hevc_qpel_bi_hv32_8_c: 16663.7
put_hevc_qpel_bi_hv32_8_i8mm: 3391.4
put_hevc_qpel_bi_hv48_8_c: 36972.9
put_hevc_qpel_bi_hv48_8_i8mm: 7505.7
put_hevc_qpel_bi_hv64_8_c: 64106.4
put_hevc_qpel_bi_hv64_8_i8mm: 13145.2

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
595f97028b lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_v
put_hevc_qpel_bi_v4_8_c: 166.1
put_hevc_qpel_bi_v4_8_neon: 61.9
put_hevc_qpel_bi_v6_8_c: 309.4
put_hevc_qpel_bi_v6_8_neon: 75.6
put_hevc_qpel_bi_v8_8_c: 531.1
put_hevc_qpel_bi_v8_8_neon: 78.1
put_hevc_qpel_bi_v12_8_c: 1139.9
put_hevc_qpel_bi_v12_8_neon: 238.1
put_hevc_qpel_bi_v16_8_c: 2063.6
put_hevc_qpel_bi_v16_8_neon: 308.9
put_hevc_qpel_bi_v24_8_c: 4317.1
put_hevc_qpel_bi_v24_8_neon: 629.9
put_hevc_qpel_bi_v32_8_c: 8241.9
put_hevc_qpel_bi_v32_8_neon: 1140.1
put_hevc_qpel_bi_v48_8_c: 18422.9
put_hevc_qpel_bi_v48_8_neon: 2533.9
put_hevc_qpel_bi_v64_8_c: 37508.6
put_hevc_qpel_bi_v64_8_neon: 4520.1

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
00290a64f7 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_hv
put_hevc_epel_bi_hv4_8_c: 242.9
put_hevc_epel_bi_hv4_8_i8mm: 68.6
put_hevc_epel_bi_hv6_8_c: 402.4
put_hevc_epel_bi_hv6_8_i8mm: 135.9
put_hevc_epel_bi_hv8_8_c: 636.4
put_hevc_epel_bi_hv8_8_i8mm: 145.6
put_hevc_epel_bi_hv12_8_c: 1363.1
put_hevc_epel_bi_hv12_8_i8mm: 324.1
put_hevc_epel_bi_hv16_8_c: 2222.1
put_hevc_epel_bi_hv16_8_i8mm: 509.1
put_hevc_epel_bi_hv24_8_c: 4793.4
put_hevc_epel_bi_hv24_8_i8mm: 1091.9
put_hevc_epel_bi_hv32_8_c: 8393.9
put_hevc_epel_bi_hv32_8_i8mm: 1720.6
put_hevc_epel_bi_hv48_8_c: 19526.6
put_hevc_epel_bi_hv48_8_i8mm: 4285.9
put_hevc_epel_bi_hv64_8_c: 33915.4
put_hevc_epel_bi_hv64_8_i8mm: 6783.6

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
0448f27f41 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_v
put_hevc_epel_bi_v4_8_c: 138.4
put_hevc_epel_bi_v4_8_neon: 33.7
put_hevc_epel_bi_v6_8_c: 302.9
put_hevc_epel_bi_v6_8_neon: 46.7
put_hevc_epel_bi_v8_8_c: 408.7
put_hevc_epel_bi_v8_8_neon: 48.7
put_hevc_epel_bi_v12_8_c: 779.4
put_hevc_epel_bi_v12_8_neon: 139.7
put_hevc_epel_bi_v16_8_c: 1344.9
put_hevc_epel_bi_v16_8_neon: 160.2
put_hevc_epel_bi_v24_8_c: 2981.7
put_hevc_epel_bi_v24_8_neon: 344.9
put_hevc_epel_bi_v32_8_c: 5280.9
put_hevc_epel_bi_v32_8_neon: 618.4
put_hevc_epel_bi_v48_8_c: 12494.9
put_hevc_epel_bi_v48_8_neon: 1364.4
put_hevc_epel_bi_v64_8_c: 22127.7
put_hevc_epel_bi_v64_8_neon: 2473.7

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
216275bd80 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_h
put_hevc_epel_bi_h4_8_c: 96.0
put_hevc_epel_bi_h4_8_neon: 36.3
put_hevc_epel_bi_h6_8_c: 288.3
put_hevc_epel_bi_h6_8_neon: 59.3
put_hevc_epel_bi_h8_8_c: 358.5
put_hevc_epel_bi_h8_8_neon: 61.5
put_hevc_epel_bi_h12_8_c: 759.8
put_hevc_epel_bi_h12_8_neon: 159.5
put_hevc_epel_bi_h16_8_c: 1307.0
put_hevc_epel_bi_h16_8_neon: 182.0
put_hevc_epel_bi_h24_8_c: 2778.3
put_hevc_epel_bi_h24_8_neon: 430.5
put_hevc_epel_bi_h32_8_c: 4952.3
put_hevc_epel_bi_h32_8_neon: 679.5
put_hevc_epel_bi_h48_8_c: 11803.3
put_hevc_epel_bi_h48_8_neon: 1443.5
put_hevc_epel_bi_h64_8_c: 20654.8
put_hevc_epel_bi_h64_8_neon: 2737.0
put_hevc_qpel_bi_h4_8_c: 140.0
put_hevc_qpel_bi_h4_8_neon: 111.5
put_hevc_qpel_bi_h6_8_c: 318.0
put_hevc_qpel_bi_h6_8_neon: 85.8
put_hevc_qpel_bi_h8_8_c: 536.5
put_hevc_qpel_bi_h8_8_neon: 95.3
put_hevc_qpel_bi_h12_8_c: 1188.5
put_hevc_qpel_bi_h12_8_neon: 291.3
put_hevc_qpel_bi_h16_8_c: 2064.3
put_hevc_qpel_bi_h16_8_neon: 365.3
put_hevc_qpel_bi_h24_8_c: 4757.5
put_hevc_qpel_bi_h24_8_neon: 1010.0
put_hevc_qpel_bi_h32_8_c: 8351.8
put_hevc_qpel_bi_h32_8_neon: 2917.8
put_hevc_qpel_bi_h48_8_c: 19299.8
put_hevc_qpel_bi_h48_8_neon: 2976.8
put_hevc_qpel_bi_h64_8_c: 34182.5
put_hevc_qpel_bi_h64_8_neon: 5236.3

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
40cf4a5ca3 lavc/aarch64: new optimization for 8-bit hevc_pel_bi_pixels
put_hevc_pel_bi_pixels4_8_c: 54.7
put_hevc_pel_bi_pixels4_8_neon: 43.0
put_hevc_pel_bi_pixels6_8_c: 94.7
put_hevc_pel_bi_pixels6_8_neon: 37.0
put_hevc_pel_bi_pixels8_8_c: 171.0
put_hevc_pel_bi_pixels8_8_neon: 24.0
put_hevc_pel_bi_pixels12_8_c: 354.0
put_hevc_pel_bi_pixels12_8_neon: 68.7
put_hevc_pel_bi_pixels16_8_c: 588.2
put_hevc_pel_bi_pixels16_8_neon: 77.5
put_hevc_pel_bi_pixels24_8_c: 1670.7
put_hevc_pel_bi_pixels24_8_neon: 173.0
put_hevc_pel_bi_pixels32_8_c: 2267.7
put_hevc_pel_bi_pixels32_8_neon: 281.2
put_hevc_pel_bi_pixels48_8_c: 5787.5
put_hevc_pel_bi_pixels48_8_neon: 673.5
put_hevc_pel_bi_pixels64_8_c: 9897.0
put_hevc_pel_bi_pixels64_8_neon: 1159.5

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
James Almer
6d19611251 avcodec/ac3dsp: add missing stddef.h include
Should fix make checkheaders

Signed-off-by: James Almer <jamrial@gmail.com>
2023-12-01 12:42:22 -03:00
xufuji456
cc86343b96 lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d
Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b"

Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-11-28 15:54:49 +02:00
Zhao Zhili
d526a34c20 avcodec/videotoolboxenc: refactor dump encoder name
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-27 23:49:01 +08:00
Zhao Zhili
cb049d377f avcodec/videotoolboxenc: Fix build failure due to PropertyKey_EncoderID
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-27 23:48:55 +08:00
Paul B Mahol
3609d2b783 avcodec: add QOA decoder 2023-11-26 17:49:09 +01:00
Geoffrey McRae
93b5d9030b libavcodec/mlpdec: add missing correction to ch_layout when downmixing
This fixes corrupted audio for applications relying on ch_layout when
codec downmixing is active.

Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-26 10:18:33 -03:00
Geoffrey McRae
a8677bcc8f libavcodec/dcadec: adjust the ch_layout when downmix is active
Applications making use of this codec with the `downmix` option are
segfaulting unless the `ch_layout` is overridden after `avcodec_open2`
as can be seen in projects like MythTV[1]

This patch fixes this by overriding the ch_layout as done in other
decoders such as AC3.

1: af6f362a14/mythtv/libs/libmythtv/decoders/avformatdecoder.cpp (L4607)

Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-26 10:18:33 -03:00
James Almer
72390dea00 mips/ac3dsp_mips: add missing stddef.h header include
Fixes compilation failures after 567c67c6c8.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-25 21:51:04 -03:00
James Almer
e40ea9f34b x86/ac3dsp: add ff_float_to_fixed24_avx()
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-25 21:50:56 -03:00
James Almer
d8b1a34433 x86/ac3dsp: reduce instruction count inside the float_to_fixed24 loop
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-25 21:50:56 -03:00
Rémi Denis-Courmont
0fa421c8f1 lavc/llvidencdsp: add R-V V diff_bytes
diff_bytes_c:      163.0
diff_bytes_rvv_i32: 52.7
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont
0183c2c830 lavc/aacpsdsp: use LMUL=2 and amortise strides
The input is laid out in 16 segments, of which 13 actually need to be
loaded. There are no really efficient ways to deal with this:
1) If we load 8 segments wit unit stride, then narrow to 16 segments with
   right shifts, we can only get one half-size vector per segment, or just 2
   elements per vector (EMUL=1/2) - at least with 128-bit vectors.
   This ends up unsurprisingly about as fas as the C code.
2) The current approach is to load with strides. We keep that approach,
   but improve it using three 4-segmented loads instead of 12 single-segment
   loads. This divides the number of distinct loaded addresses by 4.
3) A potential third approach would be to avoid segmentation altogether
   and splat the scalar coefficient into vectors. Then we can use a
   unit-stride and maximum EMUL. But the downside then is that we have to
   multiply the 3 (of 16) unused segments with zero as part of the
   multiply-accumulate operations.

In addition, we also reuse vectors mid-loop so as to increase the EMUL
from 1 to 2, which also improves performance a little bit.

Oeverall the gains are quite small with the device under test, as it does
not deal with segmented loads very well. But at least the code is tidier,
and should enjoy bigger speed-ups on better hardware implementation.

Before:
ps_hybrid_analysis_c:       1819.2
ps_hybrid_analysis_rvv_f32: 1037.0 (before)
ps_hybrid_analysis_rvv_f32:  990.0 (after)
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont
b88d4058f9 lavc/g722dsp: optimise R-V V apply_qmf
This stores the constant coefficients deinterleaved, so that they can be
loaded directly with NF=0. Unfortunately, we cannot optimise loading the
input, due to insufficient memory alignment (not 32-bit).

Before:
g722_apply_qmf_c:       82.5
g722_apply_qmf_rvv_i32: 78.2

After:
g722_apply_qmf_c:       82.5
g722_apply_qmf_rvv_i32: 65.2
2023-11-23 18:57:18 +02:00
James Almer
567c67c6c8 avcodec/ac3dsp: make len a size_t in float_to_fixed24
Should simplify asm implementations, and prevent UB on at least win64.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-22 18:33:00 -03:00
James Almer
2d9fd814d0 x86/: clear the high bits for order in scalarproduct_and_madd functions
Should fix checkasm failures on win64.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-22 14:18:42 -03:00
Zhao Zhili
e8a49b1424 avcodec/mmaldec: Fix build error
Fix #10670.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 21:02:04 +08:00
Zhao Zhili
f27fce0c0c avcodec/mediacodecdec: fix return EAGAIN after EOF
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 21:02:04 +08:00
Dmitry Rogozhkin
e9c93009fc avcodec/decode: validate hw_frames_ctx when AVHWAccel.free_frame_priv is used
Validate that a hw_frames_ctx is available before using it for
the AVHWAccel.free_frame_priv callback, and don't require it to
be present when the callback is not in use by the HWAccel.

v2: check for free_frame_priv (Hendrik)
v3: return EINVAL (Christoph Reiter)
v4: better commit message (Hendrik)
v5: fix typo with missed frames_ctx (Lynne)

See[1]: https://github.com/msys2/MINGW-packages/pull/19050
Fixes: be07145109 ("avcodec: add AVHWAccel.free_frame_priv callback")
CC: Lynne <dev@lynne.ee>
CC: Christoph Reiter <reiter.christoph@gmail.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2023-11-22 05:01:16 +01:00
Zhao Zhili
aa3b857101 avcodec/h264_mp4toannexb_bsf: process new extradata
For fate-h264_mp4toannexb_ticket5927 and
fate-h264_mp4toannexb_ticket5927_2, they work by accident
previously. The sample file has two 'avc1' entries, and video
samples use the second one. It means packets should be decoded with
new extradata in side data. Before this patch, only extradata was
kept in the output, new extradata has been dropped. The output can
be decoded because the two extradata are almost the same, except
level indication. This patch fixed the issue, and add another
fate test.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00
Zhao Zhili
d3aa0cd16f avcodec/h264_mp4toannexb_bsf: fix missing PS before IDR frames
If there is a single group of SPS/PPS before an IDR frame, but no
SPS/PPS after that, we will miss the chance to reset
idr_sps_seen/idr_pps_seen. No SPS/PPS are inserted afterwards.

This patch saves in-band SPS/PPS and insert them before IDR frames
when necessary.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00