1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-02-14 22:22:59 +02:00
FFmpeg/libavcodec
Martin Storsjö 9d2afd1eb8 aarch64: vp9: Implement NEON loop filters
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the loop filters with
16 pixels at a time. The implementation is fully templated, with
a single macro which can generate versions for both 8 and
16 pixels wide, for both 4, 8 and 16 pixels loop filters
(and the 4/8 mixed versions as well).

For the 8 pixel wide versions, it is pretty close in speed (the
v_4_8 and v_8_8 filters are the best examples of this; the h_4_8
and h_8_8 filters seem to get some gain in the load/transpose/store
part). For the 16 pixels wide ones, we get a speedup of around
1.2-1.4x compared to the 32 bit version.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM AArch64
vp9_loop_filter_h_4_8_neon:          144.0   127.2
vp9_loop_filter_h_8_8_neon:          207.0   182.5
vp9_loop_filter_h_16_8_neon:         415.0   328.7
vp9_loop_filter_h_16_16_neon:        672.0   558.6
vp9_loop_filter_mix2_h_44_16_neon:   302.0   203.5
vp9_loop_filter_mix2_h_48_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_84_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_88_16_neon:   376.0   305.2
vp9_loop_filter_mix2_v_44_16_neon:   193.2   128.2
vp9_loop_filter_mix2_v_48_16_neon:   246.7   218.4
vp9_loop_filter_mix2_v_84_16_neon:   248.0   218.5
vp9_loop_filter_mix2_v_88_16_neon:   302.0   218.2
vp9_loop_filter_v_4_8_neon:           89.0    88.7
vp9_loop_filter_v_8_8_neon:          141.0   137.7
vp9_loop_filter_v_16_8_neon:         295.0   272.7
vp9_loop_filter_v_16_16_neon:        546.0   453.7

The speedup vs C code in checkasm tests is around 2-7x, which is
pretty much the same as for the 32 bit version. Even if these functions
are faster than their 32 bit equivalent, the C version that we compare
to also became around 1.3-1.7x faster than the C version in 32 bit.

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 4-5x.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                         A57 gcc-5.3  neon
loop_filter_h_4_8_neon:        256.6  93.4
loop_filter_h_8_8_neon:        307.3 139.1
loop_filter_h_16_8_neon:       340.1 254.1
loop_filter_h_16_16_neon:      827.0 407.9
loop_filter_mix2_h_44_16_neon: 524.5 155.4
loop_filter_mix2_h_48_16_neon: 644.5 173.3
loop_filter_mix2_h_84_16_neon: 630.5 222.0
loop_filter_mix2_h_88_16_neon: 697.3 222.0
loop_filter_mix2_v_44_16_neon: 598.5 100.6
loop_filter_mix2_v_48_16_neon: 651.5 127.0
loop_filter_mix2_v_84_16_neon: 591.5 167.1
loop_filter_mix2_v_88_16_neon: 855.1 166.7
loop_filter_v_4_8_neon:        271.7  65.3
loop_filter_v_8_8_neon:        312.5 106.9
loop_filter_v_16_8_neon:       473.3 206.5
loop_filter_v_16_16_neon:      976.1 327.8

The speed-up compared to the C functions is 2.5 to 6 and the cortex-a57
is again 30-50% faster than the cortex-a53.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-14 00:10:13 +02:00
..
2016-02-18 15:35:30 +01:00
2015-07-27 15:24:58 +01:00
2014-05-01 13:00:57 +02:00
2016-11-13 18:44:00 +01:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-06 12:03:20 -04:00
2015-07-20 15:06:50 +01:00
2016-09-08 21:58:22 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-02-18 15:35:30 +01:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2014-12-18 23:27:14 +01:00
2014-09-02 03:45:21 -07:00
2014-09-02 03:45:21 -07:00
2016-05-04 18:16:21 +02:00
2016-01-11 21:51:11 +01:00
2015-07-27 15:24:58 +01:00
2015-07-27 15:24:58 +01:00
2015-07-27 15:24:58 +01:00
2014-11-14 20:25:02 +01:00
2015-07-20 15:06:50 +01:00
2016-03-20 08:15:01 +01:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2015-10-08 21:55:31 +02:00
2015-11-30 10:58:46 -05:00
2015-11-30 10:58:46 -05:00
2016-05-04 18:16:21 +02:00
2015-07-20 15:06:50 +01:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2014-09-16 04:42:32 -07:00
2015-12-23 11:50:08 +01:00
2015-09-12 13:39:37 +02:00
2015-05-31 15:03:30 +02:00
2016-04-26 16:30:18 -04:00
2015-07-27 15:24:58 +01:00
2014-09-09 01:39:47 -07:00
2014-04-11 12:09:08 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2014-12-18 23:22:59 +01:00
2015-07-27 15:24:58 +01:00
2014-09-09 05:30:28 -07:00
2014-09-09 05:30:28 -07:00
2016-06-07 13:09:57 +02:00
2014-09-09 05:30:28 -07:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2015-07-01 01:08:15 +02:00
2016-05-04 18:16:21 +02:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-03-01 10:18:28 +01:00
2016-06-08 18:51:57 +02:00
2016-08-17 12:16:42 +02:00
2016-08-17 12:16:42 +02:00
2015-07-27 15:24:58 +01:00
2016-11-13 18:44:01 +01:00
2015-11-02 16:29:46 +01:00
2015-07-20 15:06:50 +01:00
2014-09-02 04:03:45 -07:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2015-11-30 10:58:46 -05:00
2015-12-05 12:12:26 +01:00
2016-05-04 18:16:21 +02:00
2016-08-17 12:22:29 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-08 18:51:56 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 16:21:14 +01:00
2015-07-27 16:00:22 +01:00
2015-07-27 16:00:22 +01:00
2016-10-21 10:11:20 +02:00
2016-07-18 15:27:13 +02:00
2015-04-19 22:37:23 +01:00
2015-04-19 22:37:23 +01:00
2015-04-19 22:36:48 +01:00
2015-03-17 13:34:23 +00:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-03-29 13:41:09 +02:00
2016-03-22 17:12:20 -04:00
2016-05-04 18:16:21 +02:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2014-07-18 17:27:54 +02:00
2016-06-28 14:17:43 +03:00
2016-06-28 14:17:43 +03:00
2016-11-13 18:44:01 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-13 18:44:01 +01:00
2016-05-04 18:16:21 +02:00
2015-07-09 21:36:19 +02:00
2016-07-23 08:27:29 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2014-11-14 20:25:52 +01:00
2014-11-25 02:00:06 +00:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-07 22:42:00 +01:00
2015-09-02 21:56:22 +02:00
2015-07-27 15:24:58 +01:00
2016-06-20 15:45:51 -04:00
2016-05-05 10:48:34 +02:00
2016-05-04 18:16:21 +02:00
2016-05-03 15:45:10 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-10-02 19:35:12 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2015-07-27 15:24:58 +01:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-09-30 16:44:33 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-10-09 20:58:10 +02:00
2016-06-08 18:51:57 +02:00
2014-04-23 19:57:44 +02:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2015-07-20 15:06:50 +01:00
2016-05-11 12:21:25 +02:00
2016-05-04 18:16:21 +02:00
2016-06-12 20:27:53 +02:00
2015-05-31 15:03:31 +02:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-07-22 19:08:13 +02:00
2016-05-04 18:16:21 +02:00
2016-05-11 12:22:49 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-01-21 15:33:19 -05:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-29 15:25:42 -04:00
2016-06-29 15:25:42 -04:00
2016-05-04 18:16:21 +02:00
2016-10-02 15:42:03 -04:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-06-12 12:29:46 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2014-04-01 01:13:18 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-09-28 10:01:52 +02:00
2016-05-04 18:16:21 +02:00
2015-06-22 15:23:08 +01:00
2015-07-27 15:24:58 +01:00
2016-02-18 15:35:30 +01:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-10-12 21:33:34 +02:00
2016-02-01 13:40:07 +01:00
2016-02-01 13:40:07 +01:00
2015-07-27 15:24:58 +01:00
2015-07-20 15:06:50 +01:00
2016-03-30 09:10:26 +02:00
2016-03-30 09:10:26 +02:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2014-05-11 15:00:03 +02:00
2016-11-13 18:44:01 +01:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2014-04-01 23:47:25 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-09-08 21:58:22 +01:00
2016-09-08 21:58:22 +01:00
2016-03-25 15:23:56 +01:00
2014-04-04 04:00:11 +02:00
2016-08-11 10:54:44 +02:00
2016-05-04 18:16:21 +02:00
2014-09-03 03:09:08 -07:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-03-29 13:41:09 +02:00
2016-05-04 18:16:21 +02:00
2015-07-20 15:06:50 +01:00
2015-07-27 15:24:58 +01:00
2016-05-11 12:21:25 +02:00
2014-09-02 14:41:13 -07:00
2015-07-20 15:06:50 +01:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00
2015-07-20 15:06:50 +01:00