xufuji456
4b4de07721
libavcodec/hevc: add hevc idct4x4 neon of aarch64
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-02-28 13:12:52 +02:00
J. Dekker
9bed814e1d
lavc/aarch64: add hevc horizontal qpel/uni/bi
...
checkasm --benchmark on Ampere Altra (Neoverse N1):
put_hevc_qpel_bi_h4_8_c: 170.7
put_hevc_qpel_bi_h4_8_neon: 64.5
put_hevc_qpel_bi_h6_8_c: 373.7
put_hevc_qpel_bi_h6_8_neon: 130.2
put_hevc_qpel_bi_h8_8_c: 662.0
put_hevc_qpel_bi_h8_8_neon: 138.5
put_hevc_qpel_bi_h12_8_c: 1529.5
put_hevc_qpel_bi_h12_8_neon: 422.0
put_hevc_qpel_bi_h16_8_c: 2735.5
put_hevc_qpel_bi_h16_8_neon: 560.5
put_hevc_qpel_bi_h24_8_c: 6015.7
put_hevc_qpel_bi_h24_8_neon: 1636.0
put_hevc_qpel_bi_h32_8_c: 10779.0
put_hevc_qpel_bi_h32_8_neon: 2204.5
put_hevc_qpel_bi_h48_8_c: 24375.0
put_hevc_qpel_bi_h48_8_neon: 4984.0
put_hevc_qpel_bi_h64_8_c: 42768.0
put_hevc_qpel_bi_h64_8_neon: 8795.7
put_hevc_qpel_h4_8_c: 149.0
put_hevc_qpel_h4_8_neon: 55.7
put_hevc_qpel_h6_8_c: 321.2
put_hevc_qpel_h6_8_neon: 106.0
put_hevc_qpel_h8_8_c: 578.7
put_hevc_qpel_h8_8_neon: 133.2
put_hevc_qpel_h12_8_c: 1279.0
put_hevc_qpel_h12_8_neon: 391.7
put_hevc_qpel_h16_8_c: 2286.2
put_hevc_qpel_h16_8_neon: 519.7
put_hevc_qpel_h24_8_c: 5100.7
put_hevc_qpel_h24_8_neon: 1546.2
put_hevc_qpel_h32_8_c: 9022.0
put_hevc_qpel_h32_8_neon: 2060.2
put_hevc_qpel_h48_8_c: 20293.5
put_hevc_qpel_h48_8_neon: 4656.7
put_hevc_qpel_h64_8_c: 36037.0
put_hevc_qpel_h64_8_neon: 8262.7
put_hevc_qpel_uni_h4_8_c: 162.2
put_hevc_qpel_uni_h4_8_neon: 61.7
put_hevc_qpel_uni_h6_8_c: 355.2
put_hevc_qpel_uni_h6_8_neon: 114.2
put_hevc_qpel_uni_h8_8_c: 651.0
put_hevc_qpel_uni_h8_8_neon: 135.7
put_hevc_qpel_uni_h12_8_c: 1412.5
put_hevc_qpel_uni_h12_8_neon: 402.7
put_hevc_qpel_uni_h16_8_c: 2551.0
put_hevc_qpel_uni_h16_8_neon: 533.5
put_hevc_qpel_uni_h24_8_c: 5782.2
put_hevc_qpel_uni_h24_8_neon: 1578.7
put_hevc_qpel_uni_h32_8_c: 10586.5
put_hevc_qpel_uni_h32_8_neon: 2102.2
put_hevc_qpel_uni_h48_8_c: 23812.0
put_hevc_qpel_uni_h48_8_neon: 4739.5
put_hevc_qpel_uni_h64_8_c: 42958.7
put_hevc_qpel_uni_h64_8_neon: 8366.5
Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-10-25 14:56:38 +02:00
J. Dekker
ce2f47318b
lavc/aarch64: hevc_add_res add 12bit variants
...
hevc_add_res_4x4_12_c: 46.0
hevc_add_res_4x4_12_neon: 18.7
hevc_add_res_8x8_12_c: 194.7
hevc_add_res_8x8_12_neon: 25.2
hevc_add_res_16x16_12_c: 716.0
hevc_add_res_16x16_12_neon: 69.7
hevc_add_res_32x32_12_c: 3820.7
hevc_add_res_32x32_12_neon: 261.0
Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-08-18 15:04:43 +02:00
Andreas Rheinhardt
b3bbbb14d0
avcodec/hevcdsp: Constify src pointers
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 02:54:04 +02:00
J. Dekker
2e832be322
lavc/aarch64: add hevc sao edge 8x8
...
bench on AWS Graviton:
hevc_sao_edge_8x8_8_c: 516.0
hevc_sao_edge_8x8_8_neon: 81.0
Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-05-25 08:04:46 +02:00
J. Dekker
92f67e4017
lavc/aarch64: add hevc sao edge 16x16
...
bench on AWS Graviton:
hevc_sao_edge_16x16_8_c: 1857.0
hevc_sao_edge_16x16_8_neon: 211.0
hevc_sao_edge_32x32_8_c: 7802.2
hevc_sao_edge_32x32_8_neon: 808.2
hevc_sao_edge_48x48_8_c: 16764.2
hevc_sao_edge_48x48_8_neon: 1796.5
hevc_sao_edge_64x64_8_c: 32647.5
hevc_sao_edge_64x64_8_neon: 3118.5
Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-05-25 08:04:39 +02:00
J. Dekker
d957ee34a6
lavc/aarch64: fix hevc sao band filter
...
The SAO band filter can be called with non-multiples of 8, we round up
to the nearest multiple of 8 to account for this.
Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-05-25 08:04:35 +02:00
Martin Storsjö
24b93022fe
aarch64: Disable ff_hevc_sao_band_filter_8x8_8_neon out of precaution
...
While this function on its own passes all of fate-hevc, there's
indications that the function might need to handle widths that
aren't a multiple of 8 (noted in commit
f63f9be37c
, which later was
reverted).
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-01-07 22:33:27 +02:00
Martin Storsjö
16fba44b4d
Revert "lavc/aarch64: add hevc sao edge 16x16"
...
This reverts commit a9214a2ca3
, as
it breaks fate-hevc.
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-01-07 22:33:23 +02:00
Martin Storsjö
df48b1d06f
Revert "lavc/aarch64: add hevc sao edge 8x8"
...
This reverts commit c97ffc1a77
, as
it breaks fate-hevc.
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-01-07 22:33:19 +02:00
Martin Storsjö
cafed377eb
Revert "lavc/aarch64: add hevc sao band 8x8 tiling"
...
This reverts commit f63f9be37c
, as
it breaks fate-hevc.
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-01-07 22:33:14 +02:00
J. Dekker
f63f9be37c
lavc/aarch64: add hevc sao band 8x8 tiling
...
bench on AWS Graviton:
hevc_sao_band_8x8_8_c: 317.5
hevc_sao_band_8x8_8_neon: 97.5
hevc_sao_band_16x16_8_c: 1115.0
hevc_sao_band_16x16_8_neon: 322.7
hevc_sao_band_32x32_8_c: 4599.2
hevc_sao_band_32x32_8_neon: 1246.2
hevc_sao_band_48x48_8_c: 10021.7
hevc_sao_band_48x48_8_neon: 2740.5
hevc_sao_band_64x64_8_c: 17635.0
hevc_sao_band_64x64_8_neon: 4875.7
Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-01-04 14:32:26 +01:00
J. Dekker
c97ffc1a77
lavc/aarch64: add hevc sao edge 8x8
...
bench on AWS Graviton:
hevc_sao_edge_8x8_8_c: 516.0
hevc_sao_edge_8x8_8_neon: 81.0
Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-01-04 14:31:56 +01:00
J. Dekker
a9214a2ca3
lavc/aarch64: add hevc sao edge 16x16
...
bench on AWS Graviton:
hevc_sao_edge_16x16_8_c: 1857.0
hevc_sao_edge_16x16_8_neon: 211.0
hevc_sao_edge_32x32_8_c: 7802.2
hevc_sao_edge_32x32_8_neon: 808.2
hevc_sao_edge_48x48_8_c: 16764.2
hevc_sao_edge_48x48_8_neon: 1796.5
hevc_sao_edge_64x64_8_c: 32647.5
hevc_sao_edge_64x64_8_neon: 3118.5
Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-01-04 14:31:56 +01:00
Josh Dekker
7ac41e0db2
lavc/aarch64: add HEVC sao_band NEON
...
Only works for 8x8.
Signed-off-by: Josh Dekker <josh@itanimul.li>
2021-02-18 14:12:01 +01:00
Josh Dekker
75c2ddfa61
lavc/aarch64: add HEVC idct_dc NEON
...
Signed-off-by: Josh Dekker <josh@itanimul.li>
2021-02-18 14:12:01 +01:00
Reimar Döffinger
00c916ef61
lavc/aarch64: port HEVC add_residual NEON
...
Speedup is fairly small, around 1.5%, but these are fairly simple.
Signed-off-by: Josh Dekker <josh@itanimul.li>
2021-02-18 14:11:57 +01:00
Reimar Döffinger
30f80d855b
lavc/aarch64: port HEVC SIMD idct NEON
...
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts",
running on Apple M1.
Signed-off-by: Josh Dekker <josh@itanimul.li>
2021-02-18 14:11:53 +01:00