1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-21 10:55:51 +02:00
Commit Graph

112931 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
b3825bbe45 riscv: test for assembler support
This should fix the build on LLVM 16 and earlier, at the cost of turning
all non-RVV optimisations off.
2023-12-08 17:21:09 +02:00
sunyuechi
0b9d009b4a lavc/vc1dsp: R-V V inv_trans
C908:
vc1dsp.vc1_inv_trans_4x4_dc_c:      125.7
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 53.5
vc1dsp.vc1_inv_trans_4x8_dc_c:      230.7
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.5
vc1dsp.vc1_inv_trans_8x4_dc_c:      228.7
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 64.5
vc1dsp.vc1_inv_trans_8x8_dc_c:      476.5
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 80.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-08 17:20:48 +02:00
Mikhail Nitenko
0f745b74ec lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions
Benchmarks                         A53      A55     A72     A76
avg_h264_qpel_8_mc01_10_c:        936.5    924.0   656.0   504.7
avg_h264_qpel_8_mc01_10_neon:     234.7    202.0   120.7    63.2
avg_h264_qpel_8_mc02_10_c:        921.0    920.0   669.2   493.7
avg_h264_qpel_8_mc02_10_neon:     202.0    173.2   102.7    58.5
avg_h264_qpel_8_mc03_10_c:        936.5    924.0   656.0   509.5
avg_h264_qpel_8_mc03_10_neon:     236.2    203.7   120.0    63.2
avg_h264_qpel_8_mc10_10_c:       1441.0   1437.7   806.7   478.5
avg_h264_qpel_8_mc10_10_neon:     325.7    324.0   153.7    94.2
avg_h264_qpel_8_mc11_10_c:       2160.7   2148.2  1366.7   906.7
avg_h264_qpel_8_mc11_10_neon:     492.0    464.0   242.5   134.5
avg_h264_qpel_8_mc13_10_c:       2157.0   2138.2  1357.0   908.2
avg_h264_qpel_8_mc13_10_neon:     494.0    467.2   242.0   140.0
avg_h264_qpel_8_mc20_10_c:       1433.5   1410.0   785.2   486.0
avg_h264_qpel_8_mc20_10_neon:     293.7    289.7   138.0    91.5
avg_h264_qpel_8_mc30_10_c:       1458.5   1461.7   813.7   483.2
avg_h264_qpel_8_mc30_10_neon:     341.7    339.2   154.0    95.2
avg_h264_qpel_8_mc31_10_c:       2194.7   2197.2  1358.7   928.0
avg_h264_qpel_8_mc31_10_neon:     520.0    495.0   245.5   142.5
avg_h264_qpel_8_mc33_10_c:       2188.0   2205.5  1356.7   910.7
avg_h264_qpel_8_mc33_10_neon:     521.0    494.5   245.7   145.7
avg_h264_qpel_16_mc01_10_c:      3717.2   3595.0  2610.0  2012.0
avg_h264_qpel_16_mc01_10_neon:    920.5    791.5   483.2   240.5
avg_h264_qpel_16_mc02_10_c:      3684.0   3633.0  2659.0  1919.7
avg_h264_qpel_16_mc02_10_neon:    790.7    678.2   409.2   217.0
avg_h264_qpel_16_mc03_10_c:      3726.5   3596.0  2606.7  2010.0
avg_h264_qpel_16_mc03_10_neon:    922.0    792.5   483.2   239.7
avg_h264_qpel_16_mc10_10_c:      5912.0   5803.2  3241.5  1916.7
avg_h264_qpel_16_mc10_10_neon:   1267.5   1277.2   616.5   365.0
avg_h264_qpel_16_mc11_10_c:      8599.2   8482.5  5338.0  3616.2
avg_h264_qpel_16_mc11_10_neon:   1913.0   1827.0   956.2   542.2
avg_h264_qpel_16_mc13_10_c:      8643.7   8488.5  5388.0  3628.5
avg_h264_qpel_16_mc13_10_neon:   1914.7   1828.7   969.2   530.5
avg_h264_qpel_16_mc20_10_c:      5719.5   5641.0  3147.0  1946.2
avg_h264_qpel_16_mc20_10_neon:   1139.5   1150.0   539.5   344.0
avg_h264_qpel_16_mc30_10_c:      5930.0   5872.5  3267.5  1918.0
avg_h264_qpel_16_mc30_10_neon:   1331.5   1341.2   616.5   369.5
avg_h264_qpel_16_mc31_10_c:      8758.7   8697.7  5353.0  3630.7
avg_h264_qpel_16_mc31_10_neon:   2018.7   1941.7   982.2   574.7
avg_h264_qpel_16_mc33_10_c:      8683.2   8675.2  5339.2  3634.7
avg_h264_qpel_16_mc33_10_neon:   2019.7   1940.2   994.5   566.0
put_h264_qpel_8_mc01_10_c:        854.2    843.0   599.2   478.0
put_h264_qpel_8_mc01_10_neon:     192.7    168.0   101.7    56.7
put_h264_qpel_8_mc02_10_c:        766.5    760.0   550.2   441.0
put_h264_qpel_8_mc02_10_neon:     160.0    139.2    88.7    53.0
put_h264_qpel_8_mc03_10_c:        854.2    843.0   599.2   479.0
put_h264_qpel_8_mc03_10_neon:     194.2    169.7   102.0    56.2
put_h264_qpel_8_mc10_10_c:       1352.7   1353.7   749.7   446.7
put_h264_qpel_8_mc10_10_neon:     289.7    294.2   135.5    88.5
put_h264_qpel_8_mc11_10_c:       2080.0   2066.2  1309.5   876.7
put_h264_qpel_8_mc11_10_neon:     450.0    429.7   229.7   131.2
put_h264_qpel_8_mc13_10_c:       2074.7   2060.2  1294.5   870.5
put_h264_qpel_8_mc13_10_neon:     452.5    434.5   226.5   130.0
put_h264_qpel_8_mc20_10_c:       1221.5   1216.0   684.5   399.7
put_h264_qpel_8_mc20_10_neon:     257.7    262.5   121.2    78.7
put_h264_qpel_8_mc30_10_c:       1379.0   1374.7   757.2   449.5
put_h264_qpel_8_mc30_10_neon:     305.7    310.2   135.5    86.5
put_h264_qpel_8_mc31_10_c:       2109.2   2119.7  1299.5   878.0
put_h264_qpel_8_mc31_10_neon:     478.0    458.5   226.0   137.2
put_h264_qpel_8_mc33_10_c:       2101.5   2115.2  1306.5   887.0
put_h264_qpel_8_mc33_10_neon:     479.0    458.7   229.7   141.7
put_h264_qpel_16_mc01_10_c:      3485.7   3396.7  2460.5  1914.5
put_h264_qpel_16_mc01_10_neon:    752.5    665.5   397.0   213.2
put_h264_qpel_16_mc02_10_c:      3103.5   3023.2  2154.7  1720.7
put_h264_qpel_16_mc02_10_neon:    622.7    551.2   347.7   196.2
put_h264_qpel_16_mc03_10_c:      3486.2   3394.0  2436.5  1917.7
put_h264_qpel_16_mc03_10_neon:    754.0    666.5   397.0   215.7
put_h264_qpel_16_mc10_10_c:      5533.0   5488.5  2989.0  1783.0
put_h264_qpel_16_mc10_10_neon:   1123.5   1165.2   535.2   334.7
put_h264_qpel_16_mc11_10_c:      8437.7   8281.2  5209.0  3510.7
put_h264_qpel_16_mc11_10_neon:   1745.0   1697.0   878.5   513.5
put_h264_qpel_16_mc13_10_c:      8567.7   8468.0  5221.5  3528.0
put_h264_qpel_16_mc13_10_neon:   1751.7   1698.2   889.2   507.0
put_h264_qpel_16_mc20_10_c:      4907.5   4885.0  2786.2  1607.5
put_h264_qpel_16_mc20_10_neon:    995.5   1034.5   475.5   307.0
put_h264_qpel_16_mc30_10_c:      5579.7   5537.7  3045.2  1789.5
put_h264_qpel_16_mc30_10_neon:   1187.5   1231.2   532.5   334.5
put_h264_qpel_16_mc31_10_c:      8677.2   8672.5  5204.2  3516.0
put_h264_qpel_16_mc31_10_neon:   1850.7   1813.2   893.0   545.2
put_h264_qpel_16_mc33_10_c:      8688.7   8671.2  5223.2  3512.0
put_h264_qpel_16_mc33_10_neon:   1851.7   1814.2   908.5   535.2

Signed-off-by: Mikhail Nitenko <mnitenko@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-07 23:20:14 +02:00
Haihao Xiang
f89cff96d0 lavu/hwcontext_qsv: Make sure hardware vendor is Intel for qsv on d3d11va
When multiple hardwares are available, the default one might not be
Intel Hardware. We can use option vendor_id to choose the required
vendor.

Tested-by: Artem Galin <artem.galin@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-07 10:32:16 +08:00
Haihao Xiang
e5f8b5313e doc/ffmpeg: Update the description about d3d11va
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-07 10:32:16 +08:00
Artem Galin
a556be69a7 lavu/hwcontext_d3d11va: Add option vendor_id
User may choose the hardware via option vendor_id when multiple
hardwares are available.

Signed-off-by: Artem Galin <artem.galin@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-07 10:32:16 +08:00
Cosmin Stejerean
737ede405b avfilter/bwdif: account for chroma sub-sampling in min size calculation
The current logic for detecting frames that are too small for the
algorithm does not account for chroma sub-sampling, and so a sample
where the luma plane is large enough, but the chroma planes are not
will not be rejected. In that event, a heap overflow will occur.

This change adjusts the logic to consider the chroma planes and makes
the change to all three bwdif implementations.

Fixes #10688

Signed-off-by: Cosmin Stejerean <cosmin@cosmin.at>
Reviewed-by: Thomas Mundt <tmundt75@gmail.com>
Signed-off-by: Philip Langdale <philipl@overt.org>
2023-12-07 10:00:12 +08:00
sunyuechi
8bdb663062 lavc/ac3dsp: R-V V float_to_fixed24
c910
    float_to_fixed24_c: 2207.2
    float_to_fixed24_rvv_f32: 696.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-06 16:04:22 +02:00
Paul B Mahol
ebfd3912e9 avfilter/asrc_flite: use streaming function
Fix continuous accumulation of audio samples for big txt inputs.
2023-12-06 10:52:46 +01:00
Paul B Mahol
d793af982e avfilter/asrc_flite: switch to activate
Allows to set EOF timestamp.
2023-12-06 10:52:45 +01:00
Anton Khirnov
99d2fa38ad fftools/ffmpeg: make sure FrameData is writable when we modify it
Also, add a function that returns const FrameData* for cases that only
read from it.
2023-12-06 10:30:28 +01:00
Anton Khirnov
1d536e0283 fftools/ffmpeg_filter: track input/output index in {Input,Output}FilterPriv
Will be useful in following commits.
2023-12-06 10:01:21 +01:00
Anton Khirnov
c6483f1c2a lavfi/buffersink: avoid leaking peeked_frame on uninit 2023-12-06 10:01:09 +01:00
Nuo Mi
e78c6a1f2c MAINTAINERS: add myself as vvc maintainer 2023-12-05 20:51:37 +01:00
Paul B Mahol
3047f05a99 tests/fate: add asegment filter tests 2023-12-05 14:50:40 +01:00
Paul B Mahol
7e453dad3c avcodec/qoadec: fix overreads and fix packet size check 2023-12-05 14:50:21 +01:00
Jean-Baptiste Kempf
6e26a5a64e doc/developer: require asm for RISC-V
Explicitly document our usage of assembly, following suit with other
architectures.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2023-12-05 14:44:18 +01:00
Michael Niedermayer
22daf2148f
avcodec/av1dec: Fix resolving zero divisor
Fixes: Out of array read
Fixes: global-buffer-overflow-AV1

Found-by: "Leonelli, Matteo" <matteo.leonelli@cispa.de>
Tested-by: "Wang, Fei W" <fei.w.wang@intel.com>
Reviewed-by: "Wang, Fei W" <fei.w.wang@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-05 12:38:16 +01:00
Michael Niedermayer
4cdf2c7f76
avformat/mov: Ignore duplicate ftyp
Fixes: switch_1080p_720p.mp4
Found-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-05 12:12:10 +01:00
Leo Izen
b60d39fbe2
fate/jpegxl: add parser test for extboxes and small files
Add a fate test for the above commits fixing extremely small files or
files with extended box sizes.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-05 05:54:58 -05:00
Leo Izen
c4be080e65
avcodec/jpegxl_parser: fix parsing sequences of extremely small files
This patch allows the JXL parser to parse sequences of extremely small
files concatenated together. (e.g. smaller than the parser buffer)

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-05 05:54:34 -05:00
Leo Izen
019b3ea65a
avcodec/jpegxl_parse{,r}: use correct ISOBMFF extended size location
According to ISO/IEC 14996-12, size == 1 means a 64-bit extended-size
field occurs *after* the 32-bit box type, not before. This fix should
allow correct parsing of JXL files with extended-size boxes.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
2023-12-05 05:53:32 -05:00
Haihao Xiang
35a555e2b9 lavfi/vf_vpp_qsv: set the default value of async_depth to 4
Both qsv encoders and decoders use 4 as the default value of
async_depth, let's use 4 as the default value for vpp_qsv filter too.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:12:16 +08:00
Haihao Xiang
05debdaa5f lavu/hwcontext_qsv: use mfxImplDescription instead of mfxExtendedDeviceId on Linux
mfxExtendedDeviceId mightn't be supported in certain configurations of
oneVPL on Linux, so we can't ensure a property filter for
mfxExtendedDeviceId.DeviceID or mfxExtendedDeviceId.VendorID works as
expected. This fixed the issue mentioned in [1]

[1] http://ffmpeg.org/pipermail/ffmpeg-user/2023-October/056983.html

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:11:19 +08:00
Haihao Xiang
fc73b372cd lavc/qsvdec: reduce info message when more data is required
demote the info to AV_LOG_VERBOSE

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:10:57 +08:00
Haihao Xiang
e233f3e75f lavc/qsvdec: return 0 if more data is required
The type of qsv decoders is FF_CODEC_CB_TYPE_DECODE which must not
return AVERROR(EAGAIN). commit 42b20c9 added an assertion to check the
returned value.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:10:57 +08:00
Haihao Xiang
5717fbbea2 configure: don't warn deprecated symbols from libvpl
libvpl deprecated some symbols (e.g. MFX_EXTBUFF_VPP_DENOISE2 is used to
replace MFX_EXTBUFF_VPP_DENOISE), however the new symbols aren't support
by MediaSDK runtime. In order to support the combination of libvpl and
MediaSDK runtime on legacy devices, we continue to use the deprecated
symbols in FFmpeg. This patch added -DMFX_DEPRECATED_OFF to CFLAGS to
silence the corresponding compilation warnings.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:10:29 +08:00
Haihao Xiang
d36d9994e4 lavu/hwcontext_vaapi: ignore nonexistent device in default DRM device selection
It is possible that renderD128 doesn't exist but renderD129 is
available in a system (see [1]). This change can make sure the default
DRM device selection works even if renderD128 doesn't exist.

[1] https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md#issues-with-media-workloads-on-multi-gpu-setups

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2023-12-05 10:09:55 +08:00
Paul B Mahol
9e74c7ae87 avfilter/af_dialoguenhance: add double-floating point sample format support 2023-12-04 23:14:40 +01:00
Paul B Mahol
704ef556fe avfilter/af_surround: refactor some code 2023-12-04 23:14:39 +01:00
Cosmin Stejerean
634216dc40 fate: Add tests for QOA decoder 2023-12-04 23:14:38 +01:00
Kyle Swanson
9f1dbca820 avfilter/libvmaf: small cleanup for style, whitespace, unused LIBVMAFContext struct members
Signed-off-by: Kyle Swanson <kswanson@netflix.com>
2023-12-04 10:00:01 -08:00
Lynne
8c117b75af
lavc/Makefile: build vulkan decode code if vulkan_av1 has been enabled
Forgotten.

Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Tested-by: Neal Gompa <ngompa13@gmail.com>
2023-12-04 07:57:27 +01:00
Paul B Mahol
d9e41ead82 avfilter/avfilter: fix OOM case for default activate
Fixes OOM when caller keeps adding frames into filtergraph
that reached EOF by other means, for example EOF is signalled
by other filter in filtergraph or by buffersink.
2023-12-03 23:26:43 +01:00
Paul B Mahol
e3e3531d1e avfilter/vsrc_gradients: make rotation always continuous if speed changes 2023-12-03 18:43:26 +01:00
Paul B Mahol
8888574d10 avfilter/vsrc_gradients: add commands support 2023-12-03 18:30:06 +01:00
Stefano Sabatini
ddecc39c39 doc/encoders/libx264: review and extend option description
Also, merge x264opts and x264-opts option docs to avoid duplication
and make it clearer that they provide mostly the same functionality.
2023-12-03 13:03:01 +01:00
Paul B Mahol
f84412d6f4 avfilter/vf_corr: for all zero returns zero score instead of 1 2023-12-03 03:10:05 +01:00
Paul B Mahol
aad3223978 avfilter/vf_corr: add slice threading support 2023-12-03 03:10:03 +01:00
Martin Storsjö
12598e72e3 checkasm: Fix the signature of float_to_fixed24
The len parameter was changed from unsigned int to size_t in
567c67c6c8.

This fixes crashes in the reference C code, when running checkasm
on aarch64.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-02 18:16:09 +02:00
Paul B Mahol
0a13178de8 avcodec/qoadec: add support for midstream sample rate/layout changes 2023-12-02 16:51:00 +01:00
Anton Khirnov
5230257ea1 lavc/dvdsubenc: only check canvas size when it is actually set
Fixes #10650
2023-12-02 11:22:46 +01:00
Anton Khirnov
6a22d80041 doc/filters:ddagrab: elaborate on the semantics of framerate 2023-12-02 11:22:46 +01:00
Alfred Wingate
e5ce473040 swscale/x86/rgb_2_rgb: Add opaque pointer to missed definitions of ff_nv12ToUV
Opaque parameters were previously added to the original definition of
ff_nv12ToUV, leading to gcc noticing a type mismatch with -Wlto-type-mismatch.

f2de911818
https://bugs.gentoo.org/907484

Signed-off-by: Alfred Wingate <parona@protonmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2023-12-02 11:22:46 +01:00
Dale Curtis
2182173a69
avformat/mov: Fix integer overflow in mov_read_packet().
Fixes https://crbug.com/1499669:
runtime error: signed integer overflow: 9223372036853334272 + 1375731456
cannot be represented in type 'int64_t' (aka 'long')

Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-12-02 00:25:15 +01:00
Paul B Mahol
db7b838237 avfilter/vf_chromanr: compare correct variables for advanced mode 2023-12-01 21:31:38 +01:00
Logan Lyu
fa0470347e lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_hv
put_hevc_qpel_bi_hv4_8_c: 433.7
put_hevc_qpel_bi_hv4_8_i8mm: 117.9
put_hevc_qpel_bi_hv6_8_c: 803.9
put_hevc_qpel_bi_hv6_8_i8mm: 252.7
put_hevc_qpel_bi_hv8_8_c: 1296.4
put_hevc_qpel_bi_hv8_8_i8mm: 316.2
put_hevc_qpel_bi_hv12_8_c: 2867.4
put_hevc_qpel_bi_hv12_8_i8mm: 669.2
put_hevc_qpel_bi_hv16_8_c: 4709.4
put_hevc_qpel_bi_hv16_8_i8mm: 929.9
put_hevc_qpel_bi_hv24_8_c: 9639.7
put_hevc_qpel_bi_hv24_8_i8mm: 2072.4
put_hevc_qpel_bi_hv32_8_c: 16663.7
put_hevc_qpel_bi_hv32_8_i8mm: 3391.4
put_hevc_qpel_bi_hv48_8_c: 36972.9
put_hevc_qpel_bi_hv48_8_i8mm: 7505.7
put_hevc_qpel_bi_hv64_8_c: 64106.4
put_hevc_qpel_bi_hv64_8_i8mm: 13145.2

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
595f97028b lavc/aarch64: new optimization for 8-bit hevc_qpel_bi_v
put_hevc_qpel_bi_v4_8_c: 166.1
put_hevc_qpel_bi_v4_8_neon: 61.9
put_hevc_qpel_bi_v6_8_c: 309.4
put_hevc_qpel_bi_v6_8_neon: 75.6
put_hevc_qpel_bi_v8_8_c: 531.1
put_hevc_qpel_bi_v8_8_neon: 78.1
put_hevc_qpel_bi_v12_8_c: 1139.9
put_hevc_qpel_bi_v12_8_neon: 238.1
put_hevc_qpel_bi_v16_8_c: 2063.6
put_hevc_qpel_bi_v16_8_neon: 308.9
put_hevc_qpel_bi_v24_8_c: 4317.1
put_hevc_qpel_bi_v24_8_neon: 629.9
put_hevc_qpel_bi_v32_8_c: 8241.9
put_hevc_qpel_bi_v32_8_neon: 1140.1
put_hevc_qpel_bi_v48_8_c: 18422.9
put_hevc_qpel_bi_v48_8_neon: 2533.9
put_hevc_qpel_bi_v64_8_c: 37508.6
put_hevc_qpel_bi_v64_8_neon: 4520.1

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
00290a64f7 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_hv
put_hevc_epel_bi_hv4_8_c: 242.9
put_hevc_epel_bi_hv4_8_i8mm: 68.6
put_hevc_epel_bi_hv6_8_c: 402.4
put_hevc_epel_bi_hv6_8_i8mm: 135.9
put_hevc_epel_bi_hv8_8_c: 636.4
put_hevc_epel_bi_hv8_8_i8mm: 145.6
put_hevc_epel_bi_hv12_8_c: 1363.1
put_hevc_epel_bi_hv12_8_i8mm: 324.1
put_hevc_epel_bi_hv16_8_c: 2222.1
put_hevc_epel_bi_hv16_8_i8mm: 509.1
put_hevc_epel_bi_hv24_8_c: 4793.4
put_hevc_epel_bi_hv24_8_i8mm: 1091.9
put_hevc_epel_bi_hv32_8_c: 8393.9
put_hevc_epel_bi_hv32_8_i8mm: 1720.6
put_hevc_epel_bi_hv48_8_c: 19526.6
put_hevc_epel_bi_hv48_8_i8mm: 4285.9
put_hevc_epel_bi_hv64_8_c: 33915.4
put_hevc_epel_bi_hv64_8_i8mm: 6783.6

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
0448f27f41 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_v
put_hevc_epel_bi_v4_8_c: 138.4
put_hevc_epel_bi_v4_8_neon: 33.7
put_hevc_epel_bi_v6_8_c: 302.9
put_hevc_epel_bi_v6_8_neon: 46.7
put_hevc_epel_bi_v8_8_c: 408.7
put_hevc_epel_bi_v8_8_neon: 48.7
put_hevc_epel_bi_v12_8_c: 779.4
put_hevc_epel_bi_v12_8_neon: 139.7
put_hevc_epel_bi_v16_8_c: 1344.9
put_hevc_epel_bi_v16_8_neon: 160.2
put_hevc_epel_bi_v24_8_c: 2981.7
put_hevc_epel_bi_v24_8_neon: 344.9
put_hevc_epel_bi_v32_8_c: 5280.9
put_hevc_epel_bi_v32_8_neon: 618.4
put_hevc_epel_bi_v48_8_c: 12494.9
put_hevc_epel_bi_v48_8_neon: 1364.4
put_hevc_epel_bi_v64_8_c: 22127.7
put_hevc_epel_bi_v64_8_neon: 2473.7

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00