Martin Storsjö
7f905f3672
aarch64: Make the indentation more consistent
...
Some functions have slightly different indentation styles; try
to match the surrounding code.
libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally
uses a layered indentation style to visually show how different
unrolled/interleaved phases fit together.
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:29 +03:00
Martin Storsjö
184103b310
aarch64: Consistently use lowercase for vector element specifiers
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:18 +03:00
Martin Storsjö
402784ba9f
aarch64: h264dsp: Fix incorrectly indented code
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-02-11 10:49:12 +02:00
Mikhail Nitenko
43ca887bc2
lavc/aarch64: h264, add chroma loop filters for 10bit
...
Benchmarks: A53 A72
h264_h_loop_filter_chroma422_10bpp_c: 282.7 114.2
h264_h_loop_filter_chroma422_10bpp_neon: 109.5 78.5
h264_h_loop_filter_chroma_10bpp_c: 165.0 81.5
h264_h_loop_filter_chroma_10bpp_neon: 120.0 76.7
h264_h_loop_filter_chroma_intra422_10bpp_c: 323.7 124.2
h264_h_loop_filter_chroma_intra422_10bpp_neon: 155.0 102.7
h264_h_loop_filter_chroma_intra_10bpp_c: 121.0 49.5
h264_h_loop_filter_chroma_intra_10bpp_neon: 79.7 53.7
h264_h_loop_filter_chroma_mbaff422_10bpp_c: 188.5 75.0
h264_h_loop_filter_chroma_mbaff422_10bpp_neon: 120.0 75.5
h264_h_loop_filter_chroma_mbaff_intra422_10bpp_c: 116.7 46.0
h264_h_loop_filter_chroma_mbaff_intra422_10bpp_neon: 79.7 53.7
h264_h_loop_filter_chroma_mbaff_intra_10bpp_c: 63.0 27.2
h264_h_loop_filter_chroma_mbaff_intra_10bpp_neon: 48.5 34.0
h264_v_loop_filter_chroma_10bpp_c: 258.7 135.5
h264_v_loop_filter_chroma_10bpp_neon: 71.2 51.0
h264_v_loop_filter_chroma_intra_10bpp_c: 158.0 70.7
h264_v_loop_filter_chroma_intra_10bpp_neon: 48.7 31.5
Signed-off-by: Mikhail Nitenko <mnitenko@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2021-08-21 00:06:26 +03:00
Martin Storsjö
c60b76d0c8
aarch64: h264dsp: Fix indentation of some functions to match the rest
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2021-08-08 23:18:43 +03:00
Martin Storsjö
e86ec831b0
aarch64: h264dsp: Remove unnecessary sign extensions
...
These became unnecessary when the stride arguments were changed from
int to ptrdiff_t in bc26fe8927
(0576ef466d
) and
d5d699ab6e
(aa844dc46f
).
Signed-off-by: Martin Storsjö <martin@martin.st>
2021-08-08 23:18:43 +03:00
James Almer
92219ef4ac
Merge commit '186bd30aa3b6c2b29b4dbf18278700b572068b1e'
...
* commit '186bd30aa3b6c2b29b4dbf18278700b572068b1e':
h264/arm64: implement missing 4:2:2 chroma loop filter neon functions
Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:29:41 -03:00
Janne Grunau
186bd30aa3
h264/arm64: implement missing 4:2:2 chroma loop filter neon functions
2019-02-27 21:57:05 +01:00
James Almer
e4e04dce1f
Merge commit '28a8b5413b64b831dfb8650208bccd8b78360484'
...
* commit '28a8b5413b64b831dfb8650208bccd8b78360484':
h264/aarch64: add intra loop filter neon asm
Merged-by: James Almer <jamrial@gmail.com>
2019-02-20 15:42:01 -03:00
James Almer
4dc1f06f0c
Merge commit '846c3d6aca5484904e60946c4fe8b8833bc07f92'
...
* commit '846c3d6aca5484904e60946c4fe8b8833bc07f92':
h264/aarch64: optimize neon loop filter
Merged-by: James Almer <jamrial@gmail.com>
2019-02-20 15:41:03 -03:00
James Almer
5ca7eb36b7
Merge commit 'bb515e3a735f526ccb1068031e289eb5aeb69e22'
...
* commit 'bb515e3a735f526ccb1068031e289eb5aeb69e22':
h264/aarch64: sign extend int stride in loop filter asm
Merged-by: James Almer <jamrial@gmail.com>
2019-02-20 14:50:37 -03:00
Janne Grunau
28a8b5413b
h264/aarch64: add intra loop filter neon asm
...
Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported
(x264 uses nv12 chroma) and optimized.
Cycle count for checkasm --bench on a Snapdragon 820e:
h264_h_loop_filter_luma_intra_8bpp_c: 60.0
h264_h_loop_filter_luma_intra_8bpp_neon: 54.2
h264_v_loop_filter_luma_intra_8bpp_c: 148.3
h264_v_loop_filter_luma_intra_8bpp_neon: 73.8
h264_h_loop_filter_chroma_intra_8bpp_c: 27.8
h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8
h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7
h264_v_loop_filter_chroma_intra_8bpp_c: 45.8
h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
2019-01-26 12:05:10 +01:00
Janne Grunau
846c3d6aca
h264/aarch64: optimize neon loop filter
...
Exit as soon as possible if no filtering will be done.
Improves the checkasm --bench cycle count on a Snapdragon 820e:
h264_h_loop_filter_luma_8bpp_c: 72.4 -> 72.5
h264_h_loop_filter_luma_8bpp_neon: 97.1 -> 56.3
h264_v_loop_filter_luma_8bpp_c: 174.0 -> 173.5
h264_v_loop_filter_luma_8bpp_neon: 62.9 -> 60.9
h264_h_loop_filter_chroma_8bpp_c: 30.2 -> 30.3
h264_h_loop_filter_chroma_8bpp_neon: 51.6 -> 25.7
h264_v_loop_filter_chroma_8bpp_c: 57.3 -> 57.3
h264_v_loop_filter_chroma_8bpp_neon: 28.0 -> 24.0
2019-01-26 12:05:10 +01:00
Janne Grunau
bb515e3a73
h264/aarch64: sign extend int stride in loop filter asm
2019-01-26 12:05:10 +01:00
Michael Niedermayer
92d07ea4b5
Merge commit 'f896bca03fc63b93851c1c14c9321c20b3cd44a6'
...
* commit 'f896bca03fc63b93851c1c14c9321c20b3cd44a6':
aarch64: h264 (bi)weight NEON optimizations
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-15 15:36:37 +01:00
Michael Niedermayer
bf0470a5be
Merge commit '36e3b1f2fd262028834a9d7b1eb533c1218ee6c2'
...
* commit '36e3b1f2fd262028834a9d7b1eb533c1218ee6c2':
aarch64: h264 loop filter NEON optimizations
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-15 15:27:26 +01:00
Janne Grunau
f896bca03f
aarch64: h264 (bi)weight NEON optimizations
...
Ported from ARMv7 NEON.
2014-01-15 12:31:07 +01:00
Janne Grunau
36e3b1f2fd
aarch64: h264 loop filter NEON optimizations
...
Ported from ARMv7 NEON.
2014-01-15 12:31:04 +01:00