1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-23 12:43:46 +02:00
Commit Graph

25 Commits

Author SHA1 Message Date
Martin Storsjö
de23b384fd aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm
In addition to just templating, this contains one change to
ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register
which ff_hevc_put_hevc_epel_h32_8_neon requires.

AWS Graviton 3:
put_hevc_epel_bi_hv4_8_c: 176.5
put_hevc_epel_bi_hv4_8_neon: 62.0
put_hevc_epel_bi_hv4_8_i8mm: 58.0
put_hevc_epel_bi_hv6_8_c: 343.7
put_hevc_epel_bi_hv6_8_neon: 109.7
put_hevc_epel_bi_hv6_8_i8mm: 105.7
put_hevc_epel_bi_hv8_8_c: 536.0
put_hevc_epel_bi_hv8_8_neon: 112.7
put_hevc_epel_bi_hv8_8_i8mm: 111.7
put_hevc_epel_bi_hv12_8_c: 1107.7
put_hevc_epel_bi_hv12_8_neon: 254.7
put_hevc_epel_bi_hv12_8_i8mm: 239.0
put_hevc_epel_bi_hv16_8_c: 1927.7
put_hevc_epel_bi_hv16_8_neon: 356.2
put_hevc_epel_bi_hv16_8_i8mm: 334.2
put_hevc_epel_bi_hv24_8_c: 4195.2
put_hevc_epel_bi_hv24_8_neon: 736.7
put_hevc_epel_bi_hv24_8_i8mm: 715.5
put_hevc_epel_bi_hv32_8_c: 7280.5
put_hevc_epel_bi_hv32_8_neon: 1287.7
put_hevc_epel_bi_hv32_8_i8mm: 1162.2
put_hevc_epel_bi_hv48_8_c: 16857.7
put_hevc_epel_bi_hv48_8_neon: 2836.2
put_hevc_epel_bi_hv48_8_i8mm: 2908.5
put_hevc_epel_bi_hv64_8_c: 29248.2
put_hevc_epel_bi_hv64_8_neon: 5051.7
put_hevc_epel_bi_hv64_8_i8mm: 4491.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 09:03:16 +02:00
Martin Storsjö
96e5adda9f aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm
AWS Graviton 3:
put_hevc_epel_uni_w_hv4_8_c: 191.2
put_hevc_epel_uni_w_hv4_8_neon: 87.7
put_hevc_epel_uni_w_hv4_8_i8mm: 83.2
put_hevc_epel_uni_w_hv6_8_c: 349.5
put_hevc_epel_uni_w_hv6_8_neon: 153.0
put_hevc_epel_uni_w_hv6_8_i8mm: 148.5
put_hevc_epel_uni_w_hv8_8_c: 581.2
put_hevc_epel_uni_w_hv8_8_neon: 166.7
put_hevc_epel_uni_w_hv8_8_i8mm: 163.5
put_hevc_epel_uni_w_hv12_8_c: 1230.0
put_hevc_epel_uni_w_hv12_8_neon: 387.7
put_hevc_epel_uni_w_hv12_8_i8mm: 370.2
put_hevc_epel_uni_w_hv16_8_c: 2003.2
put_hevc_epel_uni_w_hv16_8_neon: 501.5
put_hevc_epel_uni_w_hv16_8_i8mm: 490.2
put_hevc_epel_uni_w_hv24_8_c: 4448.7
put_hevc_epel_uni_w_hv24_8_neon: 1092.2
put_hevc_epel_uni_w_hv24_8_i8mm: 1069.7
put_hevc_epel_uni_w_hv32_8_c: 7817.2
put_hevc_epel_uni_w_hv32_8_neon: 1916.2
put_hevc_epel_uni_w_hv32_8_i8mm: 1829.5
put_hevc_epel_uni_w_hv48_8_c: 16728.2
put_hevc_epel_uni_w_hv48_8_neon: 4263.7
put_hevc_epel_uni_w_hv48_8_i8mm: 4342.7
put_hevc_epel_uni_w_hv64_8_c: 29563.2
put_hevc_epel_uni_w_hv64_8_neon: 7474.2
put_hevc_epel_uni_w_hv64_8_i8mm: 7128.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:58 +02:00
Martin Storsjö
d7294199ab aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm
AWS Graviton 3:
put_hevc_epel_uni_hv4_8_c: 163.5
put_hevc_epel_uni_hv4_8_neon: 59.7
put_hevc_epel_uni_hv4_8_i8mm: 57.5
put_hevc_epel_uni_hv6_8_c: 344.7
put_hevc_epel_uni_hv6_8_neon: 105.0
put_hevc_epel_uni_hv6_8_i8mm: 102.7
put_hevc_epel_uni_hv8_8_c: 552.2
put_hevc_epel_uni_hv8_8_neon: 111.2
put_hevc_epel_uni_hv8_8_i8mm: 104.0
put_hevc_epel_uni_hv12_8_c: 1195.0
put_hevc_epel_uni_hv12_8_neon: 248.7
put_hevc_epel_uni_hv12_8_i8mm: 229.5
put_hevc_epel_uni_hv16_8_c: 1910.2
put_hevc_epel_uni_hv16_8_neon: 339.5
put_hevc_epel_uni_hv16_8_i8mm: 323.2
put_hevc_epel_uni_hv24_8_c: 4048.2
put_hevc_epel_uni_hv24_8_neon: 737.7
put_hevc_epel_uni_hv24_8_i8mm: 713.7
put_hevc_epel_uni_hv32_8_c: 6865.7
put_hevc_epel_uni_hv32_8_neon: 1285.0
put_hevc_epel_uni_hv32_8_i8mm: 1206.0
put_hevc_epel_uni_hv48_8_c: 15830.5
put_hevc_epel_uni_hv48_8_neon: 2844.7
put_hevc_epel_uni_hv48_8_i8mm: 2914.0
put_hevc_epel_uni_hv64_8_c: 27912.7
put_hevc_epel_uni_hv64_8_neon: 4970.5
put_hevc_epel_uni_hv64_8_i8mm: 4653.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:28 +02:00
Martin Storsjö
7bf3d14769 aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm
AWS Graviton 3:
put_hevc_epel_hv4_8_c: 163.7
put_hevc_epel_hv4_8_neon: 52.5
put_hevc_epel_hv4_8_i8mm: 49.5
put_hevc_epel_hv6_8_c: 292.2
put_hevc_epel_hv6_8_neon: 97.7
put_hevc_epel_hv6_8_i8mm: 101.2
put_hevc_epel_hv8_8_c: 471.0
put_hevc_epel_hv8_8_neon: 106.7
put_hevc_epel_hv8_8_i8mm: 102.5
put_hevc_epel_hv12_8_c: 1030.2
put_hevc_epel_hv12_8_neon: 240.5
put_hevc_epel_hv12_8_i8mm: 215.0
put_hevc_epel_hv16_8_c: 1711.5
put_hevc_epel_hv16_8_neon: 340.2
put_hevc_epel_hv16_8_i8mm: 319.2
put_hevc_epel_hv24_8_c: 3670.0
put_hevc_epel_hv24_8_neon: 702.0
put_hevc_epel_hv24_8_i8mm: 666.5
put_hevc_epel_hv32_8_c: 6785.5
put_hevc_epel_hv32_8_neon: 1247.0
put_hevc_epel_hv32_8_i8mm: 1169.0
put_hevc_epel_hv48_8_c: 14689.7
put_hevc_epel_hv48_8_neon: 2665.2
put_hevc_epel_hv48_8_i8mm: 2740.0
put_hevc_epel_hv64_8_c: 25899.2
put_hevc_epel_hv64_8_neon: 4801.2
put_hevc_epel_hv64_8_i8mm: 4487.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:19 +02:00
Martin Storsjö
5b5666e5ab aarch64: hevc: Reorder epel_hv functions to prepare for templating
This is a pure reordering of code without changing anything in
the individual functions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:07 +02:00
Martin Storsjö
e6d4c0e117 aarch64: hevc: Split the epel_*_hv functions into two parts
The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:59:00 +02:00
Martin Storsjö
54af555bfa aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8
AWS Graviton 3:
put_hevc_epel_uni_w_h4_8_c: 97.2
put_hevc_epel_uni_w_h4_8_neon: 41.2
put_hevc_epel_uni_w_h4_8_i8mm: 35.2
put_hevc_epel_uni_w_h6_8_c: 203.7
put_hevc_epel_uni_w_h6_8_neon: 84.7
put_hevc_epel_uni_w_h6_8_i8mm: 74.7
put_hevc_epel_uni_w_h8_8_c: 345.7
put_hevc_epel_uni_w_h8_8_neon: 94.0
put_hevc_epel_uni_w_h8_8_i8mm: 80.7
put_hevc_epel_uni_w_h12_8_c: 768.7
put_hevc_epel_uni_w_h12_8_neon: 196.7
put_hevc_epel_uni_w_h12_8_i8mm: 169.7
put_hevc_epel_uni_w_h16_8_c: 1313.0
put_hevc_epel_uni_w_h16_8_neon: 290.7
put_hevc_epel_uni_w_h16_8_i8mm: 238.0
put_hevc_epel_uni_w_h24_8_c: 2877.5
put_hevc_epel_uni_w_h24_8_neon: 650.0
put_hevc_epel_uni_w_h24_8_i8mm: 512.0
put_hevc_epel_uni_w_h32_8_c: 5113.5
put_hevc_epel_uni_w_h32_8_neon: 1129.5
put_hevc_epel_uni_w_h32_8_i8mm: 739.2
put_hevc_epel_uni_w_h48_8_c: 11757.0
put_hevc_epel_uni_w_h48_8_neon: 2518.7
put_hevc_epel_uni_w_h48_8_i8mm: 1688.5
put_hevc_epel_uni_w_h64_8_c: 20478.0
put_hevc_epel_uni_w_h64_8_neon: 4411.7
put_hevc_epel_uni_w_h64_8_i8mm: 2884.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:58:47 +02:00
Martin Storsjö
6d384298ec aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8
AWS Graviton 3:
put_hevc_epel_h4_8_c: 64.7
put_hevc_epel_h4_8_neon: 25.0
put_hevc_epel_h4_8_i8mm: 21.2
put_hevc_epel_h6_8_c: 130.0
put_hevc_epel_h6_8_neon: 40.7
put_hevc_epel_h6_8_i8mm: 36.5
put_hevc_epel_h8_8_c: 209.0
put_hevc_epel_h8_8_neon: 45.2
put_hevc_epel_h8_8_i8mm: 41.2
put_hevc_epel_h12_8_c: 465.5
put_hevc_epel_h12_8_neon: 104.5
put_hevc_epel_h12_8_i8mm: 86.5
put_hevc_epel_h16_8_c: 830.7
put_hevc_epel_h16_8_neon: 134.2
put_hevc_epel_h16_8_i8mm: 114.0
put_hevc_epel_h24_8_c: 1844.7
put_hevc_epel_h24_8_neon: 282.2
put_hevc_epel_h24_8_i8mm: 277.2
put_hevc_epel_h32_8_c: 3227.5
put_hevc_epel_h32_8_neon: 501.5
put_hevc_epel_h32_8_i8mm: 396.0
put_hevc_epel_h48_8_c: 7229.2
put_hevc_epel_h48_8_neon: 1120.2
put_hevc_epel_h48_8_i8mm: 901.2
put_hevc_epel_h64_8_c: 12869.0
put_hevc_epel_h64_8_neon: 1999.2
put_hevc_epel_h64_8_i8mm: 1610.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-03-26 08:58:29 +02:00
Martin Storsjö
0c5da7be59 aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm
The first 32 elements of each row were correct, while the
last 16 were scrambled.

This hasn't been noticed, because the checkasm test erroneously
only checked half of the output (for 8 bit functions), and
apparently none of the samples as part of "fate-hevc" seem to
trigger this specific function.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-03-14 13:42:39 +01:00
Logan Lyu
00290a64f7 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_hv
put_hevc_epel_bi_hv4_8_c: 242.9
put_hevc_epel_bi_hv4_8_i8mm: 68.6
put_hevc_epel_bi_hv6_8_c: 402.4
put_hevc_epel_bi_hv6_8_i8mm: 135.9
put_hevc_epel_bi_hv8_8_c: 636.4
put_hevc_epel_bi_hv8_8_i8mm: 145.6
put_hevc_epel_bi_hv12_8_c: 1363.1
put_hevc_epel_bi_hv12_8_i8mm: 324.1
put_hevc_epel_bi_hv16_8_c: 2222.1
put_hevc_epel_bi_hv16_8_i8mm: 509.1
put_hevc_epel_bi_hv24_8_c: 4793.4
put_hevc_epel_bi_hv24_8_i8mm: 1091.9
put_hevc_epel_bi_hv32_8_c: 8393.9
put_hevc_epel_bi_hv32_8_i8mm: 1720.6
put_hevc_epel_bi_hv48_8_c: 19526.6
put_hevc_epel_bi_hv48_8_i8mm: 4285.9
put_hevc_epel_bi_hv64_8_c: 33915.4
put_hevc_epel_bi_hv64_8_i8mm: 6783.6

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
0448f27f41 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_v
put_hevc_epel_bi_v4_8_c: 138.4
put_hevc_epel_bi_v4_8_neon: 33.7
put_hevc_epel_bi_v6_8_c: 302.9
put_hevc_epel_bi_v6_8_neon: 46.7
put_hevc_epel_bi_v8_8_c: 408.7
put_hevc_epel_bi_v8_8_neon: 48.7
put_hevc_epel_bi_v12_8_c: 779.4
put_hevc_epel_bi_v12_8_neon: 139.7
put_hevc_epel_bi_v16_8_c: 1344.9
put_hevc_epel_bi_v16_8_neon: 160.2
put_hevc_epel_bi_v24_8_c: 2981.7
put_hevc_epel_bi_v24_8_neon: 344.9
put_hevc_epel_bi_v32_8_c: 5280.9
put_hevc_epel_bi_v32_8_neon: 618.4
put_hevc_epel_bi_v48_8_c: 12494.9
put_hevc_epel_bi_v48_8_neon: 1364.4
put_hevc_epel_bi_v64_8_c: 22127.7
put_hevc_epel_bi_v64_8_neon: 2473.7

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
216275bd80 lavc/aarch64: new optimization for 8-bit hevc_epel_bi_h
put_hevc_epel_bi_h4_8_c: 96.0
put_hevc_epel_bi_h4_8_neon: 36.3
put_hevc_epel_bi_h6_8_c: 288.3
put_hevc_epel_bi_h6_8_neon: 59.3
put_hevc_epel_bi_h8_8_c: 358.5
put_hevc_epel_bi_h8_8_neon: 61.5
put_hevc_epel_bi_h12_8_c: 759.8
put_hevc_epel_bi_h12_8_neon: 159.5
put_hevc_epel_bi_h16_8_c: 1307.0
put_hevc_epel_bi_h16_8_neon: 182.0
put_hevc_epel_bi_h24_8_c: 2778.3
put_hevc_epel_bi_h24_8_neon: 430.5
put_hevc_epel_bi_h32_8_c: 4952.3
put_hevc_epel_bi_h32_8_neon: 679.5
put_hevc_epel_bi_h48_8_c: 11803.3
put_hevc_epel_bi_h48_8_neon: 1443.5
put_hevc_epel_bi_h64_8_c: 20654.8
put_hevc_epel_bi_h64_8_neon: 2737.0
put_hevc_qpel_bi_h4_8_c: 140.0
put_hevc_qpel_bi_h4_8_neon: 111.5
put_hevc_qpel_bi_h6_8_c: 318.0
put_hevc_qpel_bi_h6_8_neon: 85.8
put_hevc_qpel_bi_h8_8_c: 536.5
put_hevc_qpel_bi_h8_8_neon: 95.3
put_hevc_qpel_bi_h12_8_c: 1188.5
put_hevc_qpel_bi_h12_8_neon: 291.3
put_hevc_qpel_bi_h16_8_c: 2064.3
put_hevc_qpel_bi_h16_8_neon: 365.3
put_hevc_qpel_bi_h24_8_c: 4757.5
put_hevc_qpel_bi_h24_8_neon: 1010.0
put_hevc_qpel_bi_h32_8_c: 8351.8
put_hevc_qpel_bi_h32_8_neon: 2917.8
put_hevc_qpel_bi_h48_8_c: 19299.8
put_hevc_qpel_bi_h48_8_neon: 2976.8
put_hevc_qpel_bi_h64_8_c: 34182.5
put_hevc_qpel_bi_h64_8_neon: 5236.3

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
Logan Lyu
40cf4a5ca3 lavc/aarch64: new optimization for 8-bit hevc_pel_bi_pixels
put_hevc_pel_bi_pixels4_8_c: 54.7
put_hevc_pel_bi_pixels4_8_neon: 43.0
put_hevc_pel_bi_pixels6_8_c: 94.7
put_hevc_pel_bi_pixels6_8_neon: 37.0
put_hevc_pel_bi_pixels8_8_c: 171.0
put_hevc_pel_bi_pixels8_8_neon: 24.0
put_hevc_pel_bi_pixels12_8_c: 354.0
put_hevc_pel_bi_pixels12_8_neon: 68.7
put_hevc_pel_bi_pixels16_8_c: 588.2
put_hevc_pel_bi_pixels16_8_neon: 77.5
put_hevc_pel_bi_pixels24_8_c: 1670.7
put_hevc_pel_bi_pixels24_8_neon: 173.0
put_hevc_pel_bi_pixels32_8_c: 2267.7
put_hevc_pel_bi_pixels32_8_neon: 281.2
put_hevc_pel_bi_pixels48_8_c: 5787.5
put_hevc_pel_bi_pixels48_8_neon: 673.5
put_hevc_pel_bi_pixels64_8_c: 9897.0
put_hevc_pel_bi_pixels64_8_neon: 1159.5

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-01 21:25:39 +02:00
xufuji456
cc86343b96 lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d
Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b"

Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-11-28 15:54:49 +02:00
Logan Lyu
265450b89e lavc/aarch64: new optimization for 8-bit hevc_epel_hv
checkasm bench:
put_hevc_epel_hv4_8_c: 213.7
put_hevc_epel_hv4_8_i8mm: 59.4
put_hevc_epel_hv6_8_c: 350.9
put_hevc_epel_hv6_8_i8mm: 130.2
put_hevc_epel_hv8_8_c: 548.7
put_hevc_epel_hv8_8_i8mm: 136.9
put_hevc_epel_hv12_8_c: 1126.7
put_hevc_epel_hv12_8_i8mm: 302.2
put_hevc_epel_hv16_8_c: 1925.2
put_hevc_epel_hv16_8_i8mm: 459.9
put_hevc_epel_hv24_8_c: 4301.9
put_hevc_epel_hv24_8_i8mm: 1024.9
put_hevc_epel_hv32_8_c: 7509.2
put_hevc_epel_hv32_8_i8mm: 1680.4
put_hevc_epel_hv48_8_c: 16566.9
put_hevc_epel_hv48_8_i8mm: 3945.4
put_hevc_epel_hv64_8_c: 29134.2
put_hevc_epel_hv64_8_i8mm: 6567.7

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:02:53 +02:00
Logan Lyu
22c7291506 lavc/aarch64: new optimization for 8-bit hevc_epel_v
checkasm bench:
put_hevc_epel_v4_8_c: 79.9
put_hevc_epel_v4_8_neon: 25.7
put_hevc_epel_v6_8_c: 151.4
put_hevc_epel_v6_8_neon: 46.4
put_hevc_epel_v8_8_c: 250.9
put_hevc_epel_v8_8_neon: 41.7
put_hevc_epel_v12_8_c: 542.7
put_hevc_epel_v12_8_neon: 108.7
put_hevc_epel_v16_8_c: 939.4
put_hevc_epel_v16_8_neon: 169.2
put_hevc_epel_v24_8_c: 2104.9
put_hevc_epel_v24_8_neon: 307.9
put_hevc_epel_v32_8_c: 3713.9
put_hevc_epel_v32_8_neon: 524.2
put_hevc_epel_v48_8_c: 8175.2
put_hevc_epel_v48_8_neon: 1197.2
put_hevc_epel_v64_8_c: 16049.4
put_hevc_epel_v64_8_neon: 2094.9

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:02:53 +02:00
Logan Lyu
772865717b lavc/aarch64: new optimization for 8-bit hevc_epel_pixels and and hevc_qpel_pixels
checkasm bench:
put_hevc_pel_pixels4_8_c: 33.7
put_hevc_pel_pixels4_8_neon: 20.2
put_hevc_pel_pixels6_8_c: 61.4
put_hevc_pel_pixels6_8_neon: 25.4
put_hevc_pel_pixels8_8_c: 121.4
put_hevc_pel_pixels8_8_neon: 16.9
put_hevc_pel_pixels12_8_c: 199.9
put_hevc_pel_pixels12_8_neon: 40.2
put_hevc_pel_pixels16_8_c: 355.9
put_hevc_pel_pixels16_8_neon: 43.4
put_hevc_pel_pixels24_8_c: 774.7
put_hevc_pel_pixels24_8_neon: 78.9
put_hevc_pel_pixels32_8_c: 1345.2
put_hevc_pel_pixels32_8_neon: 152.2
put_hevc_pel_pixels48_8_c: 2963.7
put_hevc_pel_pixels48_8_neon: 309.4
put_hevc_pel_pixels64_8_c: 5236.2
put_hevc_pel_pixels64_8_neon: 514.2

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:02:53 +02:00
Martin Storsjö
a4877f1ec1 aarch64: Only enable extensions in the intended files/regions
This eases actual development of the assembly functions, by only
allowing extension instructions within the sections that explicitly
enable them, instead of having all extensions enabled everywhere.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-24 14:46:20 +03:00
Logan Lyu
b7a3150bc5 lavc/aarch64: new optimization for 8-bit hevc_epel_uni_hv
checkasm bench:
put_hevc_epel_uni_hv4_8_c: 204.7
put_hevc_epel_uni_hv4_8_i8mm: 70.2
put_hevc_epel_uni_hv6_8_c: 378.2
put_hevc_epel_uni_hv6_8_i8mm: 131.9
put_hevc_epel_uni_hv8_8_c: 637.7
put_hevc_epel_uni_hv8_8_i8mm: 137.9
put_hevc_epel_uni_hv12_8_c: 1301.9
put_hevc_epel_uni_hv12_8_i8mm: 314.2
put_hevc_epel_uni_hv16_8_c: 2203.4
put_hevc_epel_uni_hv16_8_i8mm: 454.7
put_hevc_epel_uni_hv24_8_c: 4848.2
put_hevc_epel_uni_hv24_8_i8mm: 1065.2
put_hevc_epel_uni_hv32_8_c: 8517.4
put_hevc_epel_uni_hv32_8_i8mm: 1898.4
put_hevc_epel_uni_hv48_8_c: 19591.7
put_hevc_epel_uni_hv48_8_i8mm: 4107.2
put_hevc_epel_uni_hv64_8_c: 33880.2
put_hevc_epel_uni_hv64_8_i8mm: 6568.7

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-09-26 15:50:44 +03:00
Logan Lyu
c0374f77f4 lavc/aarch64: move macros calc_epelh, calc_epelh2, load_epel_filterh
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-09-26 15:50:40 +03:00
Logan Lyu
7ce5a2f640 lavc/aarch64: new optimization for 8-bit hevc_epel_uni_v
checkasm bench:
put_hevc_epel_uni_hv64_8_i8mm: 6568.7
put_hevc_epel_uni_v4_8_c: 88.7
put_hevc_epel_uni_v4_8_neon: 32.7
put_hevc_epel_uni_v6_8_c: 185.4
put_hevc_epel_uni_v6_8_neon: 44.9
put_hevc_epel_uni_v8_8_c: 333.9
put_hevc_epel_uni_v8_8_neon: 44.4
put_hevc_epel_uni_v12_8_c: 728.7
put_hevc_epel_uni_v12_8_neon: 119.7
put_hevc_epel_uni_v16_8_c: 1224.2
put_hevc_epel_uni_v16_8_neon: 139.7
put_hevc_epel_uni_v24_8_c: 2531.2
put_hevc_epel_uni_v24_8_neon: 329.9
put_hevc_epel_uni_v32_8_c: 4739.9
put_hevc_epel_uni_v32_8_neon: 562.7
put_hevc_epel_uni_v48_8_c: 10618.7
put_hevc_epel_uni_v48_8_neon: 1256.2
put_hevc_epel_uni_v64_8_c: 19169.9
put_hevc_epel_uni_v64_8_neon: 2179.2

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-09-26 15:50:40 +03:00
Logan Lyu
9557bf26b3 lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv
put_hevc_epel_uni_w_hv4_8_c: 254.6
put_hevc_epel_uni_w_hv4_8_i8mm: 102.9
put_hevc_epel_uni_w_hv6_8_c: 411.6
put_hevc_epel_uni_w_hv6_8_i8mm: 221.6
put_hevc_epel_uni_w_hv8_8_c: 669.4
put_hevc_epel_uni_w_hv8_8_i8mm: 214.9
put_hevc_epel_uni_w_hv12_8_c: 1412.6
put_hevc_epel_uni_w_hv12_8_i8mm: 481.4
put_hevc_epel_uni_w_hv16_8_c: 2425.4
put_hevc_epel_uni_w_hv16_8_i8mm: 647.4
put_hevc_epel_uni_w_hv24_8_c: 5384.1
put_hevc_epel_uni_w_hv24_8_i8mm: 1450.6
put_hevc_epel_uni_w_hv32_8_c: 9470.9
put_hevc_epel_uni_w_hv32_8_i8mm: 2497.1
put_hevc_epel_uni_w_hv48_8_c: 20930.1
put_hevc_epel_uni_w_hv48_8_i8mm: 5635.9
put_hevc_epel_uni_w_hv64_8_c: 36682.9
put_hevc_epel_uni_w_hv64_8_i8mm: 9712.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-14 21:19:12 +03:00
Logan Lyu
d48c89701c lavc/aarch64: new optimization for 8-bit hevc_epel_h
put_hevc_epel_h4_8_c: 67.1
put_hevc_epel_h4_8_i8mm: 21.1
put_hevc_epel_h6_8_c: 147.1
put_hevc_epel_h6_8_i8mm: 45.1
put_hevc_epel_h8_8_c: 237.4
put_hevc_epel_h8_8_i8mm: 72.1
put_hevc_epel_h12_8_c: 527.4
put_hevc_epel_h12_8_i8mm: 115.4
put_hevc_epel_h16_8_c: 943.6
put_hevc_epel_h16_8_i8mm: 153.9
put_hevc_epel_h24_8_c: 2105.4
put_hevc_epel_h24_8_i8mm: 384.4
put_hevc_epel_h32_8_c: 3631.4
put_hevc_epel_h32_8_i8mm: 519.9
put_hevc_epel_h48_8_c: 8082.1
put_hevc_epel_h48_8_i8mm: 1110.4
put_hevc_epel_h64_8_c: 14400.6
put_hevc_epel_h64_8_i8mm: 2057.1

put_hevc_qpel_h4_8_c: 124.9
put_hevc_qpel_h4_8_neon: 43.1
put_hevc_qpel_h4_8_i8mm: 33.1
put_hevc_qpel_h6_8_c: 269.4
put_hevc_qpel_h6_8_neon: 90.6
put_hevc_qpel_h6_8_i8mm: 61.4
put_hevc_qpel_h8_8_c: 477.6
put_hevc_qpel_h8_8_neon: 82.1
put_hevc_qpel_h8_8_i8mm: 99.9
put_hevc_qpel_h12_8_c: 1062.4
put_hevc_qpel_h12_8_neon: 226.9
put_hevc_qpel_h12_8_i8mm: 170.9
put_hevc_qpel_h16_8_c: 1880.6
put_hevc_qpel_h16_8_neon: 302.9
put_hevc_qpel_h16_8_i8mm: 251.4
put_hevc_qpel_h24_8_c: 4221.9
put_hevc_qpel_h24_8_neon: 893.9
put_hevc_qpel_h24_8_i8mm: 626.1
put_hevc_qpel_h32_8_c: 7437.6
put_hevc_qpel_h32_8_neon: 1189.9
put_hevc_qpel_h32_8_i8mm: 959.1
put_hevc_qpel_h48_8_c: 16838.4
put_hevc_qpel_h48_8_neon: 2727.9
put_hevc_qpel_h48_8_i8mm: 2163.9
put_hevc_qpel_h64_8_c: 29982.1
put_hevc_qpel_h64_8_neon: 4777.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-14 21:19:12 +03:00
Logan Lyu
668eb4c00e lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v
put_hevc_epel_uni_w_v4_8_c: 116.1
put_hevc_epel_uni_w_v4_8_neon: 48.6
put_hevc_epel_uni_w_v6_8_c: 248.9
put_hevc_epel_uni_w_v6_8_neon: 80.6
put_hevc_epel_uni_w_v8_8_c: 383.9
put_hevc_epel_uni_w_v8_8_neon: 91.9
put_hevc_epel_uni_w_v12_8_c: 806.1
put_hevc_epel_uni_w_v12_8_neon: 202.9
put_hevc_epel_uni_w_v16_8_c: 1411.1
put_hevc_epel_uni_w_v16_8_neon: 289.9
put_hevc_epel_uni_w_v24_8_c: 3168.9
put_hevc_epel_uni_w_v24_8_neon: 619.4
put_hevc_epel_uni_w_v32_8_c: 5632.9
put_hevc_epel_uni_w_v32_8_neon: 1161.1
put_hevc_epel_uni_w_v48_8_c: 12406.1
put_hevc_epel_uni_w_v48_8_neon: 2476.4
put_hevc_epel_uni_w_v64_8_c: 22001.4
put_hevc_epel_uni_w_v64_8_neon: 4343.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-14 21:19:12 +03:00
Logan Lyu
0c604b1913 lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_h
put_hevc_epel_uni_w_h4_8_c: 126.1
put_hevc_epel_uni_w_h4_8_i8mm: 41.6
put_hevc_epel_uni_w_h6_8_c: 222.9
put_hevc_epel_uni_w_h6_8_i8mm: 91.4
put_hevc_epel_uni_w_h8_8_c: 374.4
put_hevc_epel_uni_w_h8_8_i8mm: 102.1
put_hevc_epel_uni_w_h12_8_c: 806.1
put_hevc_epel_uni_w_h12_8_i8mm: 225.6
put_hevc_epel_uni_w_h16_8_c: 1414.4
put_hevc_epel_uni_w_h16_8_i8mm: 333.4
put_hevc_epel_uni_w_h24_8_c: 3128.6
put_hevc_epel_uni_w_h24_8_i8mm: 713.1
put_hevc_epel_uni_w_h32_8_c: 5519.1
put_hevc_epel_uni_w_h32_8_i8mm: 1118.1
put_hevc_epel_uni_w_h48_8_c: 12364.4
put_hevc_epel_uni_w_h48_8_i8mm: 2541.1
put_hevc_epel_uni_w_h64_8_c: 21925.9
put_hevc_epel_uni_w_h64_8_i8mm: 4383.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-14 21:19:12 +03:00