1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-12 19:18:44 +02:00
Commit Graph

350 Commits

Author SHA1 Message Date
Andreas Rheinhardt
6f7bf64dbc avcodec: Remove DCT, FFT, MDCT and RDFT
They were replaced by TX from libavutil; the tremendous work
to get to this point (both creating TX as well as porting
the users of the components removed in this commit) was
completely performed by Lynne alone.

Removing the subsystems from configure may break some command lines,
because the --disable-fft etc. options are no longer recognized.

Co-authored-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-01 02:25:09 +02:00
Logan Lyu
8fa83ad70f lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_hv
checkasm bench:
put_hevc_qpel_uni_hv4_8_c: 489.2
put_hevc_qpel_uni_hv4_8_i8mm: 105.7
put_hevc_qpel_uni_hv6_8_c: 852.7
put_hevc_qpel_uni_hv6_8_i8mm: 268.7
put_hevc_qpel_uni_hv8_8_c: 1345.7
put_hevc_qpel_uni_hv8_8_i8mm: 300.4
put_hevc_qpel_uni_hv12_8_c: 2757.4
put_hevc_qpel_uni_hv12_8_i8mm: 581.4
put_hevc_qpel_uni_hv16_8_c: 4458.9
put_hevc_qpel_uni_hv16_8_i8mm: 860.2
put_hevc_qpel_uni_hv24_8_c: 9582.2
put_hevc_qpel_uni_hv24_8_i8mm: 2086.7
put_hevc_qpel_uni_hv32_8_c: 16401.9
put_hevc_qpel_uni_hv32_8_i8mm: 3217.4
put_hevc_qpel_uni_hv48_8_c: 36402.4
put_hevc_qpel_uni_hv48_8_i8mm: 7082.7
put_hevc_qpel_uni_hv64_8_c: 62713.2
put_hevc_qpel_uni_hv64_8_i8mm: 12408.9

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-09-26 15:50:44 +03:00
Logan Lyu
23ca61b7de lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_v
checkasm bench:
put_hevc_qpel_uni_v4_8_c: 146.2
put_hevc_qpel_uni_v4_8_neon: 43.2
put_hevc_qpel_uni_v6_8_c: 303.9
put_hevc_qpel_uni_v6_8_neon: 69.7
put_hevc_qpel_uni_v8_8_c: 495.2
put_hevc_qpel_uni_v8_8_neon: 74.7
put_hevc_qpel_uni_v12_8_c: 1100.9
put_hevc_qpel_uni_v12_8_neon: 222.4
put_hevc_qpel_uni_v16_8_c: 1955.2
put_hevc_qpel_uni_v16_8_neon: 269.2
put_hevc_qpel_uni_v24_8_c: 4571.9
put_hevc_qpel_uni_v24_8_neon: 832.4
put_hevc_qpel_uni_v32_8_c: 8226.4
put_hevc_qpel_uni_v32_8_neon: 1035.7
put_hevc_qpel_uni_v48_8_c: 18324.2
put_hevc_qpel_uni_v48_8_neon: 2321.2
put_hevc_qpel_uni_v64_8_c: 37659.4
put_hevc_qpel_uni_v64_8_neon: 4122.2

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-09-26 15:50:44 +03:00
Logan Lyu
b7a3150bc5 lavc/aarch64: new optimization for 8-bit hevc_epel_uni_hv
checkasm bench:
put_hevc_epel_uni_hv4_8_c: 204.7
put_hevc_epel_uni_hv4_8_i8mm: 70.2
put_hevc_epel_uni_hv6_8_c: 378.2
put_hevc_epel_uni_hv6_8_i8mm: 131.9
put_hevc_epel_uni_hv8_8_c: 637.7
put_hevc_epel_uni_hv8_8_i8mm: 137.9
put_hevc_epel_uni_hv12_8_c: 1301.9
put_hevc_epel_uni_hv12_8_i8mm: 314.2
put_hevc_epel_uni_hv16_8_c: 2203.4
put_hevc_epel_uni_hv16_8_i8mm: 454.7
put_hevc_epel_uni_hv24_8_c: 4848.2
put_hevc_epel_uni_hv24_8_i8mm: 1065.2
put_hevc_epel_uni_hv32_8_c: 8517.4
put_hevc_epel_uni_hv32_8_i8mm: 1898.4
put_hevc_epel_uni_hv48_8_c: 19591.7
put_hevc_epel_uni_hv48_8_i8mm: 4107.2
put_hevc_epel_uni_hv64_8_c: 33880.2
put_hevc_epel_uni_hv64_8_i8mm: 6568.7

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-09-26 15:50:44 +03:00
Logan Lyu
c0374f77f4 lavc/aarch64: move macros calc_epelh, calc_epelh2, load_epel_filterh
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-09-26 15:50:40 +03:00
Logan Lyu
7ce5a2f640 lavc/aarch64: new optimization for 8-bit hevc_epel_uni_v
checkasm bench:
put_hevc_epel_uni_hv64_8_i8mm: 6568.7
put_hevc_epel_uni_v4_8_c: 88.7
put_hevc_epel_uni_v4_8_neon: 32.7
put_hevc_epel_uni_v6_8_c: 185.4
put_hevc_epel_uni_v6_8_neon: 44.9
put_hevc_epel_uni_v8_8_c: 333.9
put_hevc_epel_uni_v8_8_neon: 44.4
put_hevc_epel_uni_v12_8_c: 728.7
put_hevc_epel_uni_v12_8_neon: 119.7
put_hevc_epel_uni_v16_8_c: 1224.2
put_hevc_epel_uni_v16_8_neon: 139.7
put_hevc_epel_uni_v24_8_c: 2531.2
put_hevc_epel_uni_v24_8_neon: 329.9
put_hevc_epel_uni_v32_8_c: 4739.9
put_hevc_epel_uni_v32_8_neon: 562.7
put_hevc_epel_uni_v48_8_c: 10618.7
put_hevc_epel_uni_v48_8_neon: 1256.2
put_hevc_epel_uni_v64_8_c: 19169.9
put_hevc_epel_uni_v64_8_neon: 2179.2

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-09-26 15:50:40 +03:00
Casey Smalley
b98ee1a355 aarch64/hevc: Replace br return with ret
This patch changes the return instruction in the tr_32x4 macro from
BR to RET.

Function returns should always use the RET instruction instead of BR,
to avoid interfering with branch prediction.

On devices that support BTI, this is observeable as a landing pad is
required when branching with BR. The change fixes
fate-hevc-hdr-vivid-metadata when on hardware with BTI support.

Signed-off-by: Casey Smalley <casey.smalley@arm.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-08-08 13:46:07 +03:00
Reimar Döffinger
dcff15692d hevcdsp_idct_neon.S: Avoid unnecessary mov.
ret can be given an argument instead.
This is also consistent with how other assembler code
in FFmpeg does it.

Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2023-07-29 16:05:23 +02:00
Rémi Denis-Courmont
82cb4b1c05 lavc/aarch64: remove bogus HAVE_VFP guard
The IMDCT offset is only relevant for NEON optimisations. There are no
VFP optimisations here that would justify the HAVE_VFP flag. In
practice, this makes no difference because HAVE_NEON is practically
always true for standard Armv8 platforms.
2023-07-15 22:56:30 +03:00
Logan Lyu
9557bf26b3 lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv
put_hevc_epel_uni_w_hv4_8_c: 254.6
put_hevc_epel_uni_w_hv4_8_i8mm: 102.9
put_hevc_epel_uni_w_hv6_8_c: 411.6
put_hevc_epel_uni_w_hv6_8_i8mm: 221.6
put_hevc_epel_uni_w_hv8_8_c: 669.4
put_hevc_epel_uni_w_hv8_8_i8mm: 214.9
put_hevc_epel_uni_w_hv12_8_c: 1412.6
put_hevc_epel_uni_w_hv12_8_i8mm: 481.4
put_hevc_epel_uni_w_hv16_8_c: 2425.4
put_hevc_epel_uni_w_hv16_8_i8mm: 647.4
put_hevc_epel_uni_w_hv24_8_c: 5384.1
put_hevc_epel_uni_w_hv24_8_i8mm: 1450.6
put_hevc_epel_uni_w_hv32_8_c: 9470.9
put_hevc_epel_uni_w_hv32_8_i8mm: 2497.1
put_hevc_epel_uni_w_hv48_8_c: 20930.1
put_hevc_epel_uni_w_hv48_8_i8mm: 5635.9
put_hevc_epel_uni_w_hv64_8_c: 36682.9
put_hevc_epel_uni_w_hv64_8_i8mm: 9712.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-14 21:19:12 +03:00
Logan Lyu
d48c89701c lavc/aarch64: new optimization for 8-bit hevc_epel_h
put_hevc_epel_h4_8_c: 67.1
put_hevc_epel_h4_8_i8mm: 21.1
put_hevc_epel_h6_8_c: 147.1
put_hevc_epel_h6_8_i8mm: 45.1
put_hevc_epel_h8_8_c: 237.4
put_hevc_epel_h8_8_i8mm: 72.1
put_hevc_epel_h12_8_c: 527.4
put_hevc_epel_h12_8_i8mm: 115.4
put_hevc_epel_h16_8_c: 943.6
put_hevc_epel_h16_8_i8mm: 153.9
put_hevc_epel_h24_8_c: 2105.4
put_hevc_epel_h24_8_i8mm: 384.4
put_hevc_epel_h32_8_c: 3631.4
put_hevc_epel_h32_8_i8mm: 519.9
put_hevc_epel_h48_8_c: 8082.1
put_hevc_epel_h48_8_i8mm: 1110.4
put_hevc_epel_h64_8_c: 14400.6
put_hevc_epel_h64_8_i8mm: 2057.1

put_hevc_qpel_h4_8_c: 124.9
put_hevc_qpel_h4_8_neon: 43.1
put_hevc_qpel_h4_8_i8mm: 33.1
put_hevc_qpel_h6_8_c: 269.4
put_hevc_qpel_h6_8_neon: 90.6
put_hevc_qpel_h6_8_i8mm: 61.4
put_hevc_qpel_h8_8_c: 477.6
put_hevc_qpel_h8_8_neon: 82.1
put_hevc_qpel_h8_8_i8mm: 99.9
put_hevc_qpel_h12_8_c: 1062.4
put_hevc_qpel_h12_8_neon: 226.9
put_hevc_qpel_h12_8_i8mm: 170.9
put_hevc_qpel_h16_8_c: 1880.6
put_hevc_qpel_h16_8_neon: 302.9
put_hevc_qpel_h16_8_i8mm: 251.4
put_hevc_qpel_h24_8_c: 4221.9
put_hevc_qpel_h24_8_neon: 893.9
put_hevc_qpel_h24_8_i8mm: 626.1
put_hevc_qpel_h32_8_c: 7437.6
put_hevc_qpel_h32_8_neon: 1189.9
put_hevc_qpel_h32_8_i8mm: 959.1
put_hevc_qpel_h48_8_c: 16838.4
put_hevc_qpel_h48_8_neon: 2727.9
put_hevc_qpel_h48_8_i8mm: 2163.9
put_hevc_qpel_h64_8_c: 29982.1
put_hevc_qpel_h64_8_neon: 4777.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-14 21:19:12 +03:00
Logan Lyu
668eb4c00e lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v
put_hevc_epel_uni_w_v4_8_c: 116.1
put_hevc_epel_uni_w_v4_8_neon: 48.6
put_hevc_epel_uni_w_v6_8_c: 248.9
put_hevc_epel_uni_w_v6_8_neon: 80.6
put_hevc_epel_uni_w_v8_8_c: 383.9
put_hevc_epel_uni_w_v8_8_neon: 91.9
put_hevc_epel_uni_w_v12_8_c: 806.1
put_hevc_epel_uni_w_v12_8_neon: 202.9
put_hevc_epel_uni_w_v16_8_c: 1411.1
put_hevc_epel_uni_w_v16_8_neon: 289.9
put_hevc_epel_uni_w_v24_8_c: 3168.9
put_hevc_epel_uni_w_v24_8_neon: 619.4
put_hevc_epel_uni_w_v32_8_c: 5632.9
put_hevc_epel_uni_w_v32_8_neon: 1161.1
put_hevc_epel_uni_w_v48_8_c: 12406.1
put_hevc_epel_uni_w_v48_8_neon: 2476.4
put_hevc_epel_uni_w_v64_8_c: 22001.4
put_hevc_epel_uni_w_v64_8_neon: 4343.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-14 21:19:12 +03:00
Logan Lyu
0c604b1913 lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_h
put_hevc_epel_uni_w_h4_8_c: 126.1
put_hevc_epel_uni_w_h4_8_i8mm: 41.6
put_hevc_epel_uni_w_h6_8_c: 222.9
put_hevc_epel_uni_w_h6_8_i8mm: 91.4
put_hevc_epel_uni_w_h8_8_c: 374.4
put_hevc_epel_uni_w_h8_8_i8mm: 102.1
put_hevc_epel_uni_w_h12_8_c: 806.1
put_hevc_epel_uni_w_h12_8_i8mm: 225.6
put_hevc_epel_uni_w_h16_8_c: 1414.4
put_hevc_epel_uni_w_h16_8_i8mm: 333.4
put_hevc_epel_uni_w_h24_8_c: 3128.6
put_hevc_epel_uni_w_h24_8_i8mm: 713.1
put_hevc_epel_uni_w_h32_8_c: 5519.1
put_hevc_epel_uni_w_h32_8_i8mm: 1118.1
put_hevc_epel_uni_w_h48_8_c: 12364.4
put_hevc_epel_uni_w_h48_8_i8mm: 2541.1
put_hevc_epel_uni_w_h64_8_c: 21925.9
put_hevc_epel_uni_w_h64_8_i8mm: 4383.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-14 21:19:12 +03:00
Logan Lyu
e652e7dcda lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels
put_hevc_pel_uni_pixels4_8_c: 35.9
put_hevc_pel_uni_pixels4_8_neon: 7.6
put_hevc_pel_uni_pixels6_8_c: 46.1
put_hevc_pel_uni_pixels6_8_neon: 20.6
put_hevc_pel_uni_pixels8_8_c: 53.4
put_hevc_pel_uni_pixels8_8_neon: 11.6
put_hevc_pel_uni_pixels12_8_c: 89.1
put_hevc_pel_uni_pixels12_8_neon: 25.9
put_hevc_pel_uni_pixels16_8_c: 106.4
put_hevc_pel_uni_pixels16_8_neon: 20.4
put_hevc_pel_uni_pixels24_8_c: 137.6
put_hevc_pel_uni_pixels24_8_neon: 47.1
put_hevc_pel_uni_pixels32_8_c: 173.6
put_hevc_pel_uni_pixels32_8_neon: 54.1
put_hevc_pel_uni_pixels48_8_c: 268.1
put_hevc_pel_uni_pixels48_8_neon: 117.1
put_hevc_pel_uni_pixels64_8_c: 346.1
put_hevc_pel_uni_pixels64_8_neon: 205.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-14 21:19:12 +03:00
Logan Lyu
e79686be96 lavc/aarch64: new optimization for 8-bit hevc_qpel_h hevc_qpel_uni_w_hv
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-06-06 12:50:18 +03:00
Logan Lyu
15972cce8c lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_w_h
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-06-06 12:50:18 +03:00
Logan Lyu
0b7356c1b4 lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels and qpel_uni_w_v
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-06-06 12:50:18 +03:00
xufuji456
bd2f00f665 codec/aarch64/hevc: add transform_luma_neon
got 56% speed up (run_count=1000, CPU=Cortex A53)
transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103

Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-04-14 12:07:57 +03:00
xufuji456
00a062b8d5 codec/aarch64/hevc:add idct_32x32_neon
got 73% speed up (run_count=1000, CPU=Cortex A53)
idct_32x32_neon: 4826 idct_32x32_c: 18236
idct_32x32_neon: 4824 idct_32x32_c: 18149
idct_32x32_neon: 4937 idct_32x32_c: 18333

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-04-12 15:58:09 +03:00
J. Dekker
b564ad8eac lavc/aarch64: add hevc deblock chroma 8-12bit
Benched on Ampere Altra:

hevc_h_loop_filter_chroma8_c: 367.7
hevc_h_loop_filter_chroma8_neon: 31.0
hevc_h_loop_filter_chroma10_c: 396.7
hevc_h_loop_filter_chroma10_neon: 27.5
hevc_h_loop_filter_chroma12_c: 377.0
hevc_h_loop_filter_chroma12_neon: 31.7
hevc_v_loop_filter_chroma8_c: 369.0
hevc_v_loop_filter_chroma8_neon: 55.0
hevc_v_loop_filter_chroma10_c: 389.0
hevc_v_loop_filter_chroma10_neon: 54.0
hevc_v_loop_filter_chroma12_c: 389.5
hevc_v_loop_filter_chroma12_neon: 53.0

Signed-off-by: J. Dekker <jdek@itanimul.li>
2023-04-06 06:54:26 +02:00
J. Dekker
37cde570bc lavc/aarch64: add clip N macro
Signed-off-by: J. Dekker <jdek@itanimul.li>
2023-03-22 14:48:13 +01:00
xufuji456
4b4de07721 libavcodec/hevc: add hevc idct4x4 neon of aarch64
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-02-28 13:12:52 +02:00
Martin Storsjö
ec7fa13eb0 aarch64: hevcdsp_idct: Reuse preexisting macros for transposes
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-02-28 11:48:54 +02:00
Lynne
e0661fc805
dca_core: convert to lavu/tx
Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the
arm32 and aarch64 changes.
2022-11-06 14:39:36 +01:00
J. Dekker
9bed814e1d lavc/aarch64: add hevc horizontal qpel/uni/bi
checkasm --benchmark on Ampere Altra (Neoverse N1):

put_hevc_qpel_bi_h4_8_c: 170.7
put_hevc_qpel_bi_h4_8_neon: 64.5
put_hevc_qpel_bi_h6_8_c: 373.7
put_hevc_qpel_bi_h6_8_neon: 130.2
put_hevc_qpel_bi_h8_8_c: 662.0
put_hevc_qpel_bi_h8_8_neon: 138.5
put_hevc_qpel_bi_h12_8_c: 1529.5
put_hevc_qpel_bi_h12_8_neon: 422.0
put_hevc_qpel_bi_h16_8_c: 2735.5
put_hevc_qpel_bi_h16_8_neon: 560.5
put_hevc_qpel_bi_h24_8_c: 6015.7
put_hevc_qpel_bi_h24_8_neon: 1636.0
put_hevc_qpel_bi_h32_8_c: 10779.0
put_hevc_qpel_bi_h32_8_neon: 2204.5
put_hevc_qpel_bi_h48_8_c: 24375.0
put_hevc_qpel_bi_h48_8_neon: 4984.0
put_hevc_qpel_bi_h64_8_c: 42768.0
put_hevc_qpel_bi_h64_8_neon: 8795.7
put_hevc_qpel_h4_8_c: 149.0
put_hevc_qpel_h4_8_neon: 55.7
put_hevc_qpel_h6_8_c: 321.2
put_hevc_qpel_h6_8_neon: 106.0
put_hevc_qpel_h8_8_c: 578.7
put_hevc_qpel_h8_8_neon: 133.2
put_hevc_qpel_h12_8_c: 1279.0
put_hevc_qpel_h12_8_neon: 391.7
put_hevc_qpel_h16_8_c: 2286.2
put_hevc_qpel_h16_8_neon: 519.7
put_hevc_qpel_h24_8_c: 5100.7
put_hevc_qpel_h24_8_neon: 1546.2
put_hevc_qpel_h32_8_c: 9022.0
put_hevc_qpel_h32_8_neon: 2060.2
put_hevc_qpel_h48_8_c: 20293.5
put_hevc_qpel_h48_8_neon: 4656.7
put_hevc_qpel_h64_8_c: 36037.0
put_hevc_qpel_h64_8_neon: 8262.7
put_hevc_qpel_uni_h4_8_c: 162.2
put_hevc_qpel_uni_h4_8_neon: 61.7
put_hevc_qpel_uni_h6_8_c: 355.2
put_hevc_qpel_uni_h6_8_neon: 114.2
put_hevc_qpel_uni_h8_8_c: 651.0
put_hevc_qpel_uni_h8_8_neon: 135.7
put_hevc_qpel_uni_h12_8_c: 1412.5
put_hevc_qpel_uni_h12_8_neon: 402.7
put_hevc_qpel_uni_h16_8_c: 2551.0
put_hevc_qpel_uni_h16_8_neon: 533.5
put_hevc_qpel_uni_h24_8_c: 5782.2
put_hevc_qpel_uni_h24_8_neon: 1578.7
put_hevc_qpel_uni_h32_8_c: 10586.5
put_hevc_qpel_uni_h32_8_neon: 2102.2
put_hevc_qpel_uni_h48_8_c: 23812.0
put_hevc_qpel_uni_h48_8_neon: 4739.5
put_hevc_qpel_uni_h64_8_c: 42958.7
put_hevc_qpel_uni_h64_8_neon: 8366.5

Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-10-25 14:56:38 +02:00
Reimar Döffinger
38cd829dce
aarch64: Implement stack spilling in a consistent way.
Currently it is done in several different ways, which
might cause needless dependencies or in case of
tx_float_neon.S is incorrect.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2022-10-11 09:12:02 +02:00
Grzegorz Bernacki
8f4b000c37 lavc/aarch64: Add neon implementation for vsse_intra8
Provide optimized implementation for vsse_intra8 for arm64.

Performance tests are shown below.
- vsse_5_c: 87.7
- vsse_5_neon: 26.2

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-10-04 13:24:20 +03:00
Grzegorz Bernacki
bad67cb9fd lavc/aarch64: Provide optimized implementation of vsse8 for arm64.
Provide optimized implementation of vsse8 for arm64.

Performance comparison tests are shown below.
- vsse_1_c: 141.5
- vsse_1_neon: 32.5

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-10-04 13:24:20 +03:00
Grzegorz Bernacki
faea56c9c7 lavc/aarch64: Provide neon implementation of nsse8
Add vectorized implementation of nsse8 function.

Performance comparison tests are shown below.
- nsse_1_c: 256.0
- nsse_1_neon: 82.7

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-10-04 13:24:20 +03:00
Grzegorz Bernacki
f401a2af21 lavc/aarch64: Add neon implementation for pix_abs8 functions.
Provide optimized implementation of pix_abs8 function for arm64.

Performance comparison tests are shown below:
pix_abs_1_1_c: 162.5
pix_abs_1_1_neon: 27.0
pix_abs_1_2_c: 174.0
pix_abs_1_2_neon: 23.5
pix_abs_1_3_c: 203.2
pix_abs_1_3_neon: 34.7

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-10-04 13:24:04 +03:00
Martin Storsjö
8089fe072e aarch64: me_cmp: Avoid using the non-unrolled codepath for the minimum unroll size
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-29 10:29:11 +03:00
Martin Storsjö
6f2ad7f951 aarch64: me_cmp: Avoid redundant loads in ff_pix_abs16_y2_neon
This avoids one redundant load per row; pix3 from the previous
iteration can be used as pix2 in the next one.

Before:       Cortex A53    A72    A73
pix_abs_0_2_neon:  138.0   59.7   48.0
After:
pix_abs_0_2_neon:  109.7   50.2   39.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-29 10:29:10 +03:00
Andreas Rheinhardt
9beba05311 avcodec/fmtconvert: Remove unused AVCodecContext parameter
Unused since d74a8cb7e4.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-21 20:26:40 +02:00
Hubert Mazur
b2732115dd lavc/aarch64: Add neon implementation for pix_median_abs8
Provide optimized implementation for pix_median_abs8 function.

Performance comparison tests are shown below.
- median_sad_1_c: 277.0
- median_sad_1_neon: 82.0

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-21 12:57:56 +03:00
Hubert Mazur
e9a6170213 lavc/aarch64: Add neon implementation for vsad8_intra
Provide optimized implementation for vsad8_intra function.

Performance comparison tests are shown below.
- vsad_5_c: 94.7
- vsad_5_neon: 20.7

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-21 12:57:56 +03:00
Hubert Mazur
0ee535b1db lavc/aarch64: Add neon implementation for pix_median_abs16
Provide optimized implementation for pix_median_abs16 function.

Performance comparison tests are shown below.
 - median_sad_0_c: 720.5
 - median_sad_0_neon: 127.2

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-21 12:57:56 +03:00
Rémi Denis-Courmont
b52034270a lavc/vorbisdsp: use ptrdiff_t rather than intptr_t
... for a difference between pointers.
2022-09-19 13:51:00 -03:00
Andreas Rheinhardt
a54e53a1c4 avcodec/vp8dsp: Constify src in vp8_mc_func
Reviewed-by: Peter Ross <pross@xvid.org>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 20:57:51 +02:00
Hubert Mazur
06b98e396a lavc/aarch64: Provide neon implementation of nsse16
Add vectorized implementation of nsse16 function.

Performance comparison tests are shown below.
- nsse_0_c: 682.2
- nsse_0_neon: 116.5

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur
908abe8032 lavc/aarch64: Add neon implementation for vsse_intra16
Provide optimized implementation for vsse_intra16 for arm64.

Performance tests are shown below.
- vsse_4_c: 155.2
- vsse_4_neon: 36.2

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur
ce03ea3e79 lavc/aarch64: Add neon implementation for vsad_intra16
Provide optimized implementation for vsad_intra16 function for arm64.

Performance comparison tests are shown below.
- vsad_4_c: 177.5
- vsad_4_neon: 23.5

Benchmarks and tests are run with checkasm tool on AWS Gravtion 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur
c495a4b32d lavc/aarch64: Add neon implementation of vsse16
Provide optimized implementation of vsse16 for arm64.

Performance comparison tests are shown below.
- vsse_0_c: 257.7
- vsse_0_neon: 59.2

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur
200f5e578f lavc/aarch64: Add neon implementation for vsad16
Provide optimized implementation of vsad16 function for arm64.

Performance comparison tests are shown below.
- vsad_0_c: 285.2
- vsad_0_neon: 39.5

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Lynne
f99d15cca0
arm/fft: disable NEON optimizations for 131072pt transforms
This has been broken since the start, and it was only discovered
when I started testing my replacement for the FFT.
Disable it, since there's no point in fixing slower code that's about
to be removed anyway.

The vfp version is not affected.
2022-08-29 07:13:43 +02:00
J. Dekker
ce2f47318b lavc/aarch64: hevc_add_res add 12bit variants
hevc_add_res_4x4_12_c: 46.0
hevc_add_res_4x4_12_neon: 18.7
hevc_add_res_8x8_12_c: 194.7
hevc_add_res_8x8_12_neon: 25.2
hevc_add_res_16x16_12_c: 716.0
hevc_add_res_16x16_12_neon: 69.7
hevc_add_res_32x32_12_c: 3820.7
hevc_add_res_32x32_12_neon: 261.0

Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-08-18 15:04:43 +02:00
Martin Storsjö
48be6616d0 aarch64: me_cmp: Remove a leftover unnecessary instruction
This was missed in a2e45ad407.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:14:53 +03:00
Hubert Mazur
70efa4d011 lavc/aarch64: Add neon implementation for pix_abs8
Provide optimized implementation of pix_abs8 function for arm64.

Performance comparison tests are shown below.
- pix_abs_1_0_c: 101.2
- pix_abs_1_0_neon: 22.5
- sad_1_c: 101.2
- sad_1_neon: 22.5

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur
74312e80d7 lavc/aarch64: Add neon implementation for sse8
Provide optimized implementation of sse8 function for arm64.

Performance comparison tests are shown below.
- sse_1_c: 130.7
- sse_1_neon: 29.7

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur
a2e45ad407 lavc/aarch64: Add neon implementation for pix_abs16_y2
Provide optimized implementation of pix_abs16_y2 function for arm64.

Performance comparison tests are shown below.
pix_abs_0_2_c: 317.2
pix_abs_0_2_neon: 37.5

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur
d7abb7d143 lavc/aarch64: Add neon implementation for sse4
Provide neon implementation for sse4 function.

Performance comparison tests are shown below.
- sse_2_c: 80.7
- sse_2_neon: 31.0

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00