1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-02-14 22:22:59 +02:00
FFmpeg/libavcodec
Martin Storsjö 5eb5aec475 arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 12388 bytes to 19784 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    212.0    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2102.1   1521.7   1736.2   1265.8
vp9_inv_dct_dct_16x16_sub4_add_neon:    2104.5   1533.0   1736.6   1265.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2484.8   1828.7   2014.4   1506.5
vp9_inv_dct_dct_16x16_sub12_add_neon:   2851.2   2117.8   2294.8   1753.2
vp9_inv_dct_dct_16x16_sub16_add_neon:   3239.4   2408.3   2543.5   1994.9
vp9_inv_dct_dct_32x32_sub1_add_neon:     758.3    456.7    864.5    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10776.7   7949.8   8567.7   6819.7
vp9_inv_dct_dct_32x32_sub4_add_neon:   10865.6   8131.5   8589.6   6816.3
vp9_inv_dct_dct_32x32_sub8_add_neon:   12053.9   9271.3   9387.7   7564.0
vp9_inv_dct_dct_32x32_sub12_add_neon:  13328.3  10463.2  10217.0   8321.3
vp9_inv_dct_dct_32x32_sub16_add_neon:  14176.4  11509.5  11018.7   9062.3
vp9_inv_dct_dct_32x32_sub20_add_neon:  15301.5  12999.9  11855.1   9828.2
vp9_inv_dct_dct_32x32_sub24_add_neon:  16482.7  14931.5  12650.1  10575.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17589.5  15811.9  13482.8  11333.4
vp9_inv_dct_dct_32x32_sub32_add_neon:  18696.2  17049.2  14355.6  12089.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    1203.5    998.2   1035.3    763.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1203.5    998.1   1035.5    760.8
vp9_inv_dct_dct_16x16_sub8_add_neon:    1926.1   1610.6   1722.1   1271.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2873.2   2129.7   2285.1   1757.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   3221.4   2520.3   2557.6   2002.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     753.0    457.5    866.6    554.6
vp9_inv_dct_dct_32x32_sub2_add_neon:    7554.6   5652.4   6048.4   4920.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    7549.9   5685.0   6046.9   4925.7
vp9_inv_dct_dct_32x32_sub8_add_neon:    8336.9   6704.5   6604.0   5478.0
vp9_inv_dct_dct_32x32_sub12_add_neon:  10914.0   9777.2   9240.4   7416.9
vp9_inv_dct_dct_32x32_sub16_add_neon:  11859.2  11223.3   9966.3   8095.1
vp9_inv_dct_dct_32x32_sub20_add_neon:  15237.1  13029.4  11838.3   9829.4
vp9_inv_dct_dct_32x32_sub24_add_neon:  16293.2  14379.8  12644.9  10572.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17424.3  15734.7  13473.0  11326.9
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.3  17457.0  14298.6  12080.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:32:00 +02:00
..
2017-02-07 18:27:21 +01:00
2016-02-18 15:35:30 +01:00
2015-07-27 15:24:58 +01:00
2016-11-13 18:44:00 +01:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-20 15:06:50 +01:00
2017-01-30 23:03:46 +00:00
2016-05-04 18:16:21 +02:00
2016-02-18 15:35:30 +01:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2014-12-18 23:27:14 +01:00
2014-09-02 03:45:21 -07:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2017-02-07 18:27:21 +01:00
2015-07-27 15:24:58 +01:00
2015-07-20 15:06:50 +01:00
2016-03-20 08:15:01 +01:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2015-10-08 21:55:31 +02:00
2017-02-07 18:27:21 +01:00
2015-11-30 10:58:46 -05:00
2015-11-30 10:58:46 -05:00
2016-05-04 18:16:21 +02:00
2015-07-20 15:06:50 +01:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2014-09-16 04:42:32 -07:00
2015-12-23 11:50:08 +01:00
2015-09-12 13:39:37 +02:00
2015-05-31 15:03:30 +02:00
2016-04-26 16:30:18 -04:00
2014-09-09 01:39:47 -07:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2014-12-18 23:22:59 +01:00
2015-07-27 15:24:58 +01:00
2014-09-09 05:30:28 -07:00
2014-09-09 05:30:28 -07:00
2016-06-07 13:09:57 +02:00
2014-09-09 05:30:28 -07:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2015-07-01 01:08:15 +02:00
2016-11-18 10:35:04 +01:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-03-01 10:18:28 +01:00
2016-06-08 18:51:57 +02:00
2017-01-31 17:54:10 +01:00
2016-11-13 18:44:01 +01:00
2015-07-20 15:06:50 +01:00
2014-09-02 04:03:45 -07:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2015-11-30 10:58:46 -05:00
2015-12-05 12:12:26 +01:00
2017-02-07 18:27:21 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-08 18:51:56 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 16:21:14 +01:00
2015-07-27 16:00:22 +01:00
2015-07-27 16:00:22 +01:00
2016-10-21 10:11:20 +02:00
2015-04-19 22:37:23 +01:00
2015-04-19 22:37:23 +01:00
2016-11-18 10:35:43 +01:00
2015-03-17 13:34:23 +00:00
2016-05-04 18:16:21 +02:00
2017-02-07 18:27:21 +01:00
2016-05-04 18:16:21 +02:00
2016-03-29 13:41:09 +02:00
2016-03-22 17:12:20 -04:00
2017-02-07 18:27:21 +01:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2014-07-18 17:27:54 +02:00
2016-06-28 14:17:43 +03:00
2016-06-28 14:17:43 +03:00
2016-11-13 18:44:01 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-13 18:44:01 +01:00
2016-05-04 18:16:21 +02:00
2015-07-09 21:36:19 +02:00
2016-07-23 08:27:29 +02:00
2014-11-14 20:25:52 +01:00
2014-11-25 02:00:06 +00:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-07 22:42:00 +01:00
2015-09-02 21:56:22 +02:00
2015-07-27 15:24:58 +01:00
2017-01-30 23:03:46 +00:00
2016-05-05 10:48:34 +02:00
2016-11-30 13:44:05 +01:00
2016-05-04 18:16:21 +02:00
2016-05-03 15:45:10 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2017-02-07 18:27:21 +01:00
2017-02-07 18:27:21 +01:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-09-30 16:44:33 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-12-14 09:06:44 +01:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2017-02-07 18:27:21 +01:00
2015-07-20 15:06:50 +01:00
2016-06-12 20:27:53 +02:00
2015-05-31 15:03:31 +02:00
2015-07-27 15:24:58 +01:00
2017-02-07 18:27:21 +01:00
2015-07-27 15:24:58 +01:00
2016-11-17 16:53:48 +01:00
2016-11-14 19:38:20 +00:00
2016-07-22 19:08:13 +02:00
2016-05-04 18:16:21 +02:00
2016-05-11 12:22:49 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-01-21 15:33:19 -05:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-29 15:25:42 -04:00
2016-06-29 15:25:42 -04:00
2016-05-04 18:16:21 +02:00
2016-10-02 15:42:03 -04:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-06-12 12:29:46 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2017-01-25 11:06:58 +01:00
2017-01-25 11:06:58 +01:00
2015-07-27 15:24:58 +01:00
2016-09-28 10:01:52 +02:00
2016-05-04 18:16:21 +02:00
2015-06-22 15:23:08 +01:00
2015-07-27 15:24:58 +01:00
2016-02-18 15:35:30 +01:00
2017-02-07 18:27:21 +01:00
2015-07-27 15:24:58 +01:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-02-01 13:40:07 +01:00
2016-02-01 13:40:07 +01:00
2015-07-27 15:24:58 +01:00
2015-07-20 15:06:50 +01:00
2016-03-30 09:10:26 +02:00
2016-03-30 09:10:26 +02:00
2016-12-19 08:13:08 +01:00
2015-07-27 15:24:58 +01:00
2017-02-07 18:27:21 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2015-07-27 15:24:58 +01:00
2016-11-13 18:44:01 +01:00
2017-01-30 23:03:46 +00:00
2016-05-04 18:16:21 +02:00
2016-09-08 21:58:22 +01:00
2016-03-25 15:23:56 +01:00
2016-08-11 10:54:44 +02:00
2016-05-04 18:16:21 +02:00
2017-02-07 18:27:21 +01:00
2014-09-03 03:09:08 -07:00
2017-02-06 15:13:34 +01:00
2017-02-06 15:13:34 +01:00
2016-03-29 13:41:09 +02:00
2016-05-04 18:16:21 +02:00
2017-02-07 18:27:21 +01:00
2017-02-07 18:27:21 +01:00
2015-07-20 15:06:50 +01:00
2015-07-27 15:24:58 +01:00
2014-09-02 14:41:13 -07:00
2015-07-20 15:06:50 +01:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00
2017-01-19 09:52:10 +01:00