1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

84672 Commits

Author SHA1 Message Date
Clément Bœsch
100026bed6 Merge commit 'ec903058447ad5be34d89533962e9ae1aa1c78f7'
* commit 'ec903058447ad5be34d89533962e9ae1aa1c78f7':
  configure: Simplify clock_gettime() test

nanosleep check also updated.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:04:50 +01:00
Clément Bœsch
38343651a8 Merge commit '3aa9d37d03da3c9b482d19b3988659287815280e'
* commit '3aa9d37d03da3c9b482d19b3988659287815280e':
  build: Fix directory dependencies of tests/pixfmts.mak target

This might not be necessary given our mkdirs in the configure, but it
probably doesn't hurt.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:01:02 +01:00
Clément Bœsch
4ae80c3753 Merge commit '0e5dde739943168d6f61d3fb40b3f622e7abfeff'
* commit '0e5dde739943168d6f61d3fb40b3f622e7abfeff':
  configure: Fix --disable-pod2man / --disable-texi2html

This commit is a noop, we have dedicated documentation option for this
purpose.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 10:47:01 +01:00
Clément Bœsch
d0db00c808 configure: remove pod2man from the config list
The configure has the --disable-manpages option for this purpose, and
--disable-pod2man is currently ignored due to that. This is also
consistent with the other documentation options.
2017-03-20 10:45:48 +01:00
Clément Bœsch
715f781834 Merge commit 'b8c2d407efa41c3db6813ad67fadd51b814765bd'
* commit 'b8c2d407efa41c3db6813ad67fadd51b814765bd':
  configure: Simplify libopenjpeg check

This commit is a noop, our libopenjpeg check is already "simpler".

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:48:22 +01:00
Clément Bœsch
6d6f79c737 Merge commit '2610c9528f86286e4c6e174411a26ff5b4815cde'
* commit '2610c9528f86286e4c6e174411a26ff5b4815cde':
  configure: Move initial VAAPI check to a more sensible place

This commit is a noop, see 17989dcf54

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:46:33 +01:00
Clément Bœsch
7317b69630 Merge commit '5b5ed92d92252a685e891a5d636870e223b63228'
* commit '5b5ed92d92252a685e891a5d636870e223b63228':
  sanm: Change type of array pitch parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:43:52 +01:00
Clément Bœsch
64926292a6 lavc/copy_block: style fix 2017-03-20 09:23:15 +01:00
Clément Bœsch
21c18b0878 Merge commit '73f5e17a203713c4ac4e5a821809823b383b195f'
* commit '73f5e17a203713c4ac4e5a821809823b383b195f':
  copy_block: Change type of array stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:22:36 +01:00
Clément Bœsch
e59d8d030f Merge commit '21e500ba647aec233d5930d3d1081489d0d53ceb'
* commit '21e500ba647aec233d5930d3d1081489d0d53ceb':
  svq1dec: Change type of array pitch parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:17:34 +01:00
Clément Bœsch
bb3ad401fc Merge commit '746c56b7730ce09397d3a8354acc131285e9d829'
* commit '746c56b7730ce09397d3a8354acc131285e9d829':
  indeo: Change type of array pitch parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:07:57 +01:00
Clément Bœsch
3835283293 Merge commit '4fb311c804098d78e5ce5f527f9a9c37536d3a08'
* commit '4fb311c804098d78e5ce5f527f9a9c37536d3a08':
  Drop memalign hack

Merged, as this may indeed be uneeded since
46e3936fb0.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:54:44 +01:00
Clément Bœsch
a5cf6628d6 Merge commit 'f01f7a7846529b7c3ef343f117eaa2c0a1457af0'
* commit 'f01f7a7846529b7c3ef343f117eaa2c0a1457af0':
  hwcontext_dxva2: use the special UC copy for downloading frames

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:37:40 +01:00
Clément Bœsch
8200b16a9c Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'
* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5':
  imgutils: add a function for copying image data from GPU mapped memory

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:34:10 +01:00
Clément Bœsch
5d23543277 Merge commit '24da430324735f95880c4a4a54298dc8023125bb'
* commit '24da430324735f95880c4a4a54298dc8023125bb':
  Changelog: mark the release 12 branch

This commit is a noop.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:26:09 +01:00
Clément Bœsch
518961bc99 Merge commit '851960f6f8cf1f946fe42fa36cf6598fac68072c'
* commit '851960f6f8cf1f946fe42fa36cf6598fac68072c':
  lavc: Remove old vaapi decode infrastructure
  avconv_vaapi: Convert to use hw_frames_ctx only
  vaapi_mpeg4: Convert to use the new VAAPI hwaccel code
  vaapi_vc1: Convert to use the new VAAPI hwaccel code
  vaapi_mpeg2: Convert to use the new VAAPI hwaccel code
  vaapi_h264: Convert to use the new VAAPI hwaccel code
  lavc: Rewrite VAAPI decode infrastructure

This merge is a noop, these commits have already been cherry-picked.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:25:01 +01:00
Clément Bœsch
464fcc979c Merge commit '72eba6558ee4f10239ba3f472c0b033ec70082a7'
* commit '72eba6558ee4f10239ba3f472c0b033ec70082a7':
  wmavoice: Simplify GetBitContext initialization

This commit is a noop. We don't have that code anymore since
3deb4b54a2.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:21:09 +01:00
Clément Bœsch
e514a1d404 Merge commit '80fc75d51e3312e1890591048eb6a3d499b6e49d'
* commit '80fc75d51e3312e1890591048eb6a3d499b6e49d':
  Changelog: Mention mov with multiple stsd

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:19:03 +01:00
Clément Bœsch
45982bdcd0 Merge commit '728e80cd2e1d4b7c3e26489efcd77bd7a9e84a99'
* commit '728e80cd2e1d4b7c3e26489efcd77bd7a9e84a99':
  High Definition Compatible Digital (HDCD) decoder filter, using libhdcd

This commit is a noop, we have that code natively.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:17:09 +01:00
Clément Bœsch
b1a80bdb62 Merge commit '95f80293456d9d4b1b096621260c38bc90325ec0'
* commit '95f80293456d9d4b1b096621260c38bc90325ec0':
  avprobe: Fix memory leak

This commit is a noop, ffprobe is not affected.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:12:57 +01:00
Clément Bœsch
5e5e793552 doc/APIchanges: fill date & hash for AV_PIX_FMT_FLAG_BAYER 2017-03-20 08:10:54 +01:00
Clément Bœsch
6557d784d2 Merge commit '8db804e8f549d5b86a1edf62736e0ef80f160da9'
* commit '8db804e8f549d5b86a1edf62736e0ef80f160da9':
  mov: Remove old b-frame/video delay heuristic

This commit is a noop, see 425be3c810

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:09:15 +01:00
Clément Bœsch
64722057b4 Merge commit 'eb96505b761eb02b6a3efc76d854afa6a41941ff'
* commit 'eb96505b761eb02b6a3efc76d854afa6a41941ff':
  mov: Remove ancient heuristic hack

This commit is a noop, see 04f8d31287

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:08:31 +01:00
Clément Bœsch
e811f84a2e swscale: cosmetics in is{RGB,BGR}inInt
Reduce diff with Libav.
2017-03-20 08:02:30 +01:00
Clément Bœsch
d6635daded swscale: remove unused is{RGB,BGR}inBytes 2017-03-20 08:02:30 +01:00
Clément Bœsch
ff6bc16c5a swscale: use a (more correct) function for isPacked 2017-03-20 08:02:30 +01:00
Clément Bœsch
2b9a52bcca swscale: use a function for isAnyRGB 2017-03-20 08:02:30 +01:00
Clément Bœsch
c30875e8b2 swscale: use a function for isBayer 2017-03-20 08:02:30 +01:00
Clément Bœsch
9c2436e1e7 lavu: add AV_PIX_FMT_FLAG_BAYER 2017-03-20 08:02:30 +01:00
Clément Bœsch
f052b1b40f swscale: use a function for isGray 2017-03-20 08:02:30 +01:00
Clément Bœsch
08e1376d81 fate: add fate-sws-pixdesc-query
Test the pixel format querying within libswscale.
2017-03-20 08:02:30 +01:00
Michael Niedermayer
23f3f92361 avcodec/mjpegdec: quant_matrixes can be up to 65535, use uint16_t
Fixes invalid shift
Fixes: 870/clusterfuzz-testcase-5649105424482304

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:38:04 +01:00
Michael Niedermayer
656a17e126 avcodec/mjpegdec: Check quant_matrixes values for being non zero
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:38:02 +01:00
Michael Niedermayer
98da63b3f5 avcodec/vp56: Check avctx->error_concealment before enabling EC
Fixes timeout with 847/clusterfuzz-testcase-5291877358108672
Fixes timeout with 850/clusterfuzz-testcase-5721296509861888

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:33:08 +01:00
Michael Niedermayer
a84d610b37 avcodec/h264_direct: Fix runtime error: signed integer overflow: -9 - 2147483647 cannot be represented in type 'int'
Fixes: 864/clusterfuzz-testcase-4774385942528000

See: [FFmpeg-devel] [PATCH 1/2] avcodec/h264_direct: Fix runtime error: signed integer overflow: 2147483647 - -14133 cannot be represented in type 'int'
See: [FFmpeg-devel] [PATCH 2/2] avcodec/h264_direct: Fix runtime error: signed integer overflow: -9 - 2147483647 cannot be represented in type 'int'

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:33:08 +01:00
Michael Niedermayer
5d996b5649 avcodec/tiff: Check stripsize strippos for overflow
Fixes: 861/clusterfuzz-testcase-5688284384591872

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:33:08 +01:00
Martin Storsjö
61b8a9ea29 aarch64: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 21512 bytes to 31400 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     284.6
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    1902.7
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1903.0
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    2201.1
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   2510.0
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2821.3
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1011.6
vp9_inv_dct_dct_32x32_sub2_add_10_neon:    9716.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9704.9
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   10641.7
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  11555.7
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  12499.8
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  13403.7
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  14335.8
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  15253.6
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16179.5

After:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     282.8
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    1142.4
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1139.0
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    1772.9
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   2515.2
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2823.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1012.7
vp9_inv_dct_dct_32x32_sub2_add_10_neon:    6944.4
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    6944.2
vp9_inv_dct_dct_32x32_sub8_add_10_neon:    7609.8
vp9_inv_dct_dct_32x32_sub12_add_10_neon:   9953.4
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  10770.1
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  13418.8
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  14330.7
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  15257.1
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16190.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:37 +02:00
Martin Storsjö
eabc5abf94 arm: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14516 bytes to 22484 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:                                 Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     454.0    270.7    418.5    295.4
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    3840.2   3244.8   3700.1   2337.9
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4212.5   3575.4   3996.9   2571.6
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    5174.4   4270.5   4615.5   3031.9
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   5676.0   4908.5   5226.5   3491.3
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6403.9   5589.0   5839.8   3948.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1710.7    944.7   1582.1   1045.4
vp9_inv_dct_dct_32x32_sub2_add_10_neon:   21040.7  16706.1  18687.7  13193.1
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22197.7  18282.7  19577.5  13918.6
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   24511.5  20911.5  21472.5  15367.5
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  26939.5  24264.3  23239.1  16830.3
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  29419.5  26845.1  25020.6  18259.9
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  31146.4  29633.5  26803.3  19721.7
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  33376.3  32507.8  28642.4  21174.2
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  35629.4  35439.6  30416.5  22625.7
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37269.9  37914.9  32271.9  24078.9

After:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     454.0    276.0    418.5    295.1
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    2336.2   1886.0   2251.0   1458.6
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    2531.0   2054.7   2402.8   1591.1
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    3848.6   3491.1   3845.7   2554.8
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   5703.8   4831.6   5230.8   3493.4
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6399.5   5567.0   5832.4   3951.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1722.1    938.5   1577.3   1044.5
vp9_inv_dct_dct_32x32_sub2_add_10_neon:   15003.5  11576.8  13105.8   9602.2
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   15768.5  12677.2  13726.0  10138.1
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   17278.8  14825.4  14907.5  11185.7
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  22335.7  21544.5  20379.5  15019.8
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  24165.6  23881.7  21938.6  16308.2
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  31082.2  30860.9  26835.3  19711.3
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  33102.6  31922.8  28638.3  21161.0
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  35104.9  34867.5  30411.7  22621.2
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37438.1  39103.4  32217.8  24067.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:33 +02:00
Martin Storsjö
d564c9018f aarch64: vp9itxfm16: Move the load_add_store macro out from the itxfm16 pass2 function
This allows reusing the macro for a separate implementation of the
pass2 function.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:30 +02:00
Martin Storsjö
0f2705e66b aarch64: vp9itxfm16: Make the larger core transforms standalone functions
This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/aarch64/vp9itxfm_16bpp_neon.o from
26288 to 21512 bytes.

This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.

Before:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1887.4
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2801.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9691.4
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16154.9

After:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1899.5
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2827.2
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9714.7
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16175.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:26 +02:00
Martin Storsjö
0ea603203d arm: vp9itxfm16: Make the larger core transforms standalone functions
This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/arm/vp9itxfm_16bpp_neon.o from
17500 to 14516 bytes.

This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.

Before:                                 Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4237.4   3561.5   3971.8   2525.3
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6371.9   5452.0   5779.3   3910.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22068.8  17867.5  19555.2  13871.6
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37268.9  38684.2  32314.2  23969.0

After:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4375.1   3571.9   4283.8   2567.2
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6415.6   5578.9   5844.6   3948.3
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22653.7  18079.7  19603.7  13905.3
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37593.2  38862.2  32235.8  24070.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:19 +02:00
Martin Storsjö
b76533f105 aarch64: vp9itxfm16: Restructure the idct32 store macros
This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:15 +02:00
Martin Storsjö
d613251622 aarch64: vp9itxfm16: Avoid .irp when it doesn't save any lines
This makes the code a bit more readable.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:09 +02:00
Martin Storsjö
25ced1eb1c aarch64: vp9itxfm16: Fix a typo in a comment
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:02 +02:00
Martin Storsjö
32e273c111 arm: vp9itxfm16: Avoid reloading the idct32 coefficients
Keep the idct32 coefficients in narrow form in q6-q7, and idct16
coefficients in lengthened 32 bit form in q0-q3. Avoid clobbering
q0-q3 in the pass1 function, and squeeze the idct16 coefficients
into q0-q1 in the pass2 function to avoid reloading them.

The idct16 coefficients are clobbered and reloaded within idct32_odd
though, since that turns out to be faster than narrowing them and
swapping them into q6-q7.

Before:                            Cortex       A7        A8        A9      A53
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    22653.8   18268.4   19598.0  14079.0
vp9_inv_dct_dct_32x32_sub32_add_10_neon:   37699.0   38665.2   32542.3  24472.2
After:
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    22270.8   18159.3   19531.0  13865.0
vp9_inv_dct_dct_32x32_sub32_add_10_neon:   37523.3   37731.6   32181.7  24071.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:53:57 +02:00
Martin Storsjö
c1619318e5 arm: vp9itxfm16: Fix vertical alignment
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:53:48 +02:00
Martin Storsjö
b46d37e93a arm: vp9itxfm16: Use the right lane size
This makes the code slightly clearer, but doesn't make any functional
difference.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:53:43 +02:00
Martin Storsjö
21c89f3a26 arm/aarch64: vp9: Fix vertical alignment
Align the second/third operands as they usually are.

Due to the wildly varying sizes of the written out operands
in aarch64 assembly, the column alignment is usually not as clear
as in arm assembly.

This is cherrypicked from libav commit
7995ebfad1.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:53:32 +02:00
Martin Storsjö
70317b25aa arm/aarch64: vp9itxfm: Skip loading the min_eob pointer when it won't be used
In the half/quarter cases where we don't use the min_eob array, defer
loading the pointer until we know it will be needed.

This is cherrypicked from libav commit
3a0d5e206d.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:53:28 +02:00
Martin Storsjö
b7a565fe71 arm: vp9itxfm: Template the quarter/half idct32 function
This reduces the number of lines and reduces the duplication.

Also simplify the eob check for the half case.

If we are in the half case, we know we at least will need to do the
first three slices, we only need to check eob for the fourth one,
so we can hardcode the value to check against instead of loading
from the min_eob array.

Since at most one slice can be skipped in the first pass, we can
unroll the loop for filling zeros completely, as it was done for
the quarter case before.

This allows skipping loading the min_eob pointer when using the
quarter/half cases.

This is cherrypicked from libav commit
98ee855ae0.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:53:22 +02:00