1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-02 03:06:28 +02:00
Commit Graph

82707 Commits

Author SHA1 Message Date
Hendrik Leppkes
127cc6dd3d Merge commit '8d07e941b04d63fc4443dd986e3dc7b69cdcca43'
* commit '8d07e941b04d63fc4443dd986e3dc7b69cdcca43':
  FATE: add a test of H.264 SEI recovery in an intra refresh stream

Our H264 decoder drops 3 frames from the beginning of the stream, but
all frames after those match, hence the difference in the fate test.

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 11:32:10 +01:00
Hendrik Leppkes
5e78126bbd Merge commit '46278ec90ac5ad1dab5e85991f176afe49003fee'
* commit '46278ec90ac5ad1dab5e85991f176afe49003fee':
  mp3enc: write trailing padding

Noop, we have our own implementation for mp3 gapless.

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:48:40 +01:00
Hendrik Leppkes
f4c3aa7ade Merge commit 'd60c2d5216930ef98c7d4d6837d6229b37e0dcb3'
* commit 'd60c2d5216930ef98c7d4d6837d6229b37e0dcb3':
  mp3dec: read the initial/trailing padding from the LAME tag

Noop, we have our own implementation for mp3 gapless tags.

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:48:04 +01:00
Hendrik Leppkes
07502e473f Merge commit '7a76371437f9562c3414f985523f883489e3936a'
* commit '7a76371437f9562c3414f985523f883489e3936a':
  libopenh264enc: Simplify init by setting FF_CODEC_CAP_INIT_CLEANUP

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:47:08 +01:00
Hendrik Leppkes
7e9474ca47 Merge commit '2d097c16b833c532ac974a7f1fd05c0a1f3b7675'
* commit '2d097c16b833c532ac974a7f1fd05c0a1f3b7675':
  libopenh264enc: Return a more sensible error code in some init failure paths

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:46:02 +01:00
Hendrik Leppkes
0bd76401d1 Merge commit '36b380dcd52ef47d7ba0559ed51192c88d82a9bd'
* commit '36b380dcd52ef47d7ba0559ed51192c88d82a9bd':
  libopenh264dec: Simplify the init thanks to FF_CODEC_CAP_INIT_CLEANUP being set

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:45:08 +01:00
Hendrik Leppkes
6fb07c7d85 Merge commit 'd0b1e6049b06eeeeca146ece4d2f199c5dba1565'
* commit 'd0b1e6049b06eeeeca146ece4d2f199c5dba1565':
  libopenh264dec: Fix cleanup if the init failed early

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:44:37 +01:00
Hendrik Leppkes
8a91452e83 Merge commit '61cb9fac47498a38dfe7623f66aa1f3696e9158c'
* commit '61cb9fac47498a38dfe7623f66aa1f3696e9158c':
  mov: fix stream extradata_size allocation

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:44:21 +01:00
Hendrik Leppkes
d7d6f9c782 Merge commit '0b1bd1b2057d41fd0ccba7317911c484a50f9207'
* commit '0b1bd1b2057d41fd0ccba7317911c484a50f9207':
  lavd: Drop unneeded av_init_packet()s

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:20:14 +01:00
Hendrik Leppkes
9e1ddc0820 Merge commit '390b95b88b2b896b63f257f69e434dfc0111e076'
* commit '390b95b88b2b896b63f257f69e434dfc0111e076':
  fate: Add a mixed NAL coding sample

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-18 10:16:54 +01:00
Moritz Barsnick
c493a531ed doc/bsfs: various improvements
- Restored alphabetical order.
- Enhanced sections aac_adtstoasc, dca_core, h264_mp4toannexb.
- Added sections hevc_mp4toannexb and vp9_superframe.
- Renamed (if required) and filled previously empty sections
  mjpegadump, mov2textsub/text2movsub, mp3decomp, and
  remove_extra.
- Fixes ticket #3198.

Signed-off-by: Moritz Barsnick <barsnick@gmx.net>
Signed-off-by: Lou Logan <lou@lrcd.com>
2016-11-17 10:49:51 -09:00
Stefano Sabatini
427a47abcd ffprobe: fix crash in case -of is specified with an empty string
Fix trac issue #5957.
2016-11-17 20:28:45 +01:00
Michael Niedermayer
709c87109d avformat/movenc: Check frame rate before use.
Fixes division by 0
This is similar to how avg_frame_rate is checked elsewhere
Fixes: 6d24add0455f41b1b45b7ba615cd46f3/asan_generic_dc34c3_5480_0a2ef411cae999b9871ed71a2e481b71.mov

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-17 19:03:41 +01:00
Michael Niedermayer
ae514b1254 avcodec/ass_split: Change order of operations in ass_split_section()
This matches the other branch
Fixes out of array read
Fixes: 4d142ca76d39fe685effcf5017098723/asan_heap-oob_31ae824_8611_348fdb64f9009b63c8a8eae9a0e497c5.mkv

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-17 18:05:18 +01:00
Hendrik Leppkes
1398ded7a7 Merge commit 'cbbb404055877e3beb9890ffe22784a6a100963e'
* commit 'cbbb404055877e3beb9890ffe22784a6a100963e':
  fate: Restore order of h264 entries

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:27:27 +01:00
Hendrik Leppkes
2f1a539d4b Merge commit '61bd0ed781b56eea1e8e851aab34a2ee3b59fbac'
* commit '61bd0ed781b56eea1e8e851aab34a2ee3b59fbac':
  h264: Log more information about invalid NALu size

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:24:25 +01:00
Hendrik Leppkes
286d8bae61 Merge commit '7b1ae0e73ab7f7c5eabc70dbe2e579127c6e154f'
* commit '7b1ae0e73ab7f7c5eabc70dbe2e579127c6e154f':
  checkasm/arm: preserve the stack alignment checkasm_checked_call

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:21:32 +01:00
Hendrik Leppkes
c0af1ee90d Merge commit '80fbb7becae530167373fe5178966b7d7604306e'
* commit '80fbb7becae530167373fe5178966b7d7604306e':
  checkasm: vp8.mc: initialize the full src buffer after ec32574209

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:20:10 +01:00
Hendrik Leppkes
62d9b7a69b Merge commit '17c99b6158f2c6720af74e81ee727ee50d2e7e96'
* commit '17c99b6158f2c6720af74e81ee727ee50d2e7e96':
  h2645_parse: handle embedded Annex B NAL units in size prefixed NAL units

This commit is a noop, see a9bb4cf87d

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:19:49 +01:00
Hendrik Leppkes
cca4fd4778 Merge commit 'a8cbe5a0ccebf60a8a8b0aba5d5716dd54c1595c'
* commit 'a8cbe5a0ccebf60a8a8b0aba5d5716dd54c1595c':
  h264_ps: export actual height in MBs as SPS.mb_height

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:17:21 +01:00
Hendrik Leppkes
4c5c522fc1 Merge commit '99cf943339a2e5171863c48cd1a73dd43dc243e1'
* commit '99cf943339a2e5171863c48cd1a73dd43dc243e1':
  d3d11va: don't keep the context lock while waiting for a frame

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:06:42 +01:00
Hendrik Leppkes
e999a4ed6c Merge commit '2866d108c9e9da7baf53ff57a51d470691049a57'
* commit '2866d108c9e9da7baf53ff57a51d470691049a57':
  vp8dsp: Remove the comment saying that the height is equal to the width

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:06:28 +01:00
Hendrik Leppkes
90b72f6bda Merge commit '8c816c0c9b12fdefd9046415e97df299880bc9b8'
* commit '8c816c0c9b12fdefd9046415e97df299880bc9b8':
  checkasm/arm: align the clobber check data properly for ldrd

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:06:10 +01:00
Hendrik Leppkes
4fe013fc70 Merge commit 'ec32574209f36467ef0d22c21a7e811ba98c15b6'
* commit 'ec32574209f36467ef0d22c21a7e811ba98c15b6':
  checkasm: vp8: mc: test unequal width/height for partitions

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:05:25 +01:00
Hendrik Leppkes
2818aaaba0 Merge commit '5f74bd31a9bd1ac7655103b11743c12d38e0419f'
* commit '5f74bd31a9bd1ac7655103b11743c12d38e0419f':
  vp8/armv6: mc: avoid boolean expression in calculation

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:05:07 +01:00
Hendrik Leppkes
711b7b7776 Merge commit 'fc5cdc0d5372f5103c71d5dede296734fe71ead2'
* commit 'fc5cdc0d5372f5103c71d5dede296734fe71ead2':
  doc: escape left brace in texi2pod.pl regex

This commit is a noop, see e43ea1cbb2

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:04:37 +01:00
Hendrik Leppkes
da97b244b0 Merge commit 'd825b1a5306576dcd0553b7d0d24a3a46ad92864'
* commit 'd825b1a5306576dcd0553b7d0d24a3a46ad92864':
  libopenh264: Support building with the 1.6 release

This commit is a noop, see 293676c476

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:01:35 +01:00
Hendrik Leppkes
4a485daa7f Merge commit '4f7723cb3b913c577842a5bb088c804ddacac8df'
* commit '4f7723cb3b913c577842a5bb088c804ddacac8df':
  movenc: Add an option for skipping writing the mfra/tfra/mfro trailer

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 14:54:51 +01:00
Carl Eugen Hoyos
55a424c5a8 lavc/ffv1dec: Scale output for msb-packed compression to full 16bit.
2% slowdown for existing decode-line timer.
2016-11-17 13:00:47 +01:00
Carl Eugen Hoyos
f8247c0cce lavc/ffv1enc: Support pix_fmt GRAY10. 2016-11-17 12:47:39 +01:00
Michael Niedermayer
2c9106257f avcodec/mpeg4videodec: Workaround interlaced mpeg4 edge MC bug
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-17 12:21:48 +01:00
Michael Niedermayer
85407c7e63 avcodec/mpegvideo: Fix edge emu buffer overlap with interlaced mpeg4
Fixes Ticket5936
Regression since c5fc8ae126

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-17 12:21:48 +01:00
Martin Vignali
52da3f6f70 libavcodec/exr : fix channel size calculation for uint32 channel
uint32 need 4 bytes not 1.
Fix decoding when there is half/float and uint32 channel.

This fixes crashes due to pointer corruption caused by invalid writes.

The problem was introduced in commit
03152e74df.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 23:45:44 +01:00
Andreas Cadhalpun
ce3147eb19 exr: reindent after previous commit
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 22:37:24 +01:00
Andreas Cadhalpun
ffdc5d09e4 exr: fix out-of-bounds read
channel_index can be -1.

This problem was introduced in commit
2dd7b46132.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 22:37:17 +01:00
Michael Niedermayer
721c90f0f9 avutil/frame: fix indention after last commit
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-16 21:25:45 +01:00
Michael Niedermayer
2acee08a4a avutil/frame: Copy size=0 side data in ff_init_buffer_info()
Fixes null pointer dereference
Fixes: 189/FOO

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-16 21:25:45 +01:00
Andreas Cadhalpun
3c0328d58d libschroedingerdec: fix leaking of framewithpts
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 19:31:11 +01:00
Andreas Cadhalpun
a86ebbf7f6 libschroedingerdec: don't produce empty frames
They are not valid and can cause problems/crashes for API users.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 19:30:49 +01:00
Andreas Cadhalpun
90ebf3c428 dds: limit 4 bpp handling to AV_PIX_FMT_PAL8
This fixes NULL pointer dereferencing for formats, where frame->data[1]
is not allocated.

The problem was introduced in commit
257fbc3af4.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-16 19:29:45 +01:00
kieranjol
605f3084fc doc/filters: adds recently added -vf colorspace options
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-16 15:06:16 +01:00
Michael Niedermayer
d79d8ef927 cmdutils: remove duplicate windows.h include
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-16 15:06:16 +01:00
Hendrik Leppkes
99218ee30d configure: properly add dxva2 link dependencies
Fixes building with --disable-everything --enable-shared --enable-dxva2

The hwcontext DXVA2 implementation in avutil needs this library now, instead
of just the ffmpeg program.
2016-11-16 14:14:28 +01:00
Vittorio Giovara
00c8079816 fate: Add h264 extradata reload tests
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-16 02:43:19 +01:00
Thierry Foucu
c512546689 Fix -Werror=parentheses error
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-16 02:39:57 +01:00
Michael Niedermayer
1546d487cf avcodec/rv40: Test remaining space in loop of get_dimension()
Fixes infinite loop
Fixes: 178/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_RV40_fuzzer

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-11-15 23:08:43 +01:00
Andreas Cadhalpun
1abcd972c4 mlz: limit next_code to data buffer size
This fixes a heap-buffer-overflow detected by AddressSanitizer.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-11-15 22:01:08 +01:00
Martin Storsjö
f1212e472b aarch64: vp9: Implement NEON loop filters
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the loop filters with
16 pixels at a time. The implementation is fully templated, with
a single macro which can generate versions for both 8 and
16 pixels wide, for both 4, 8 and 16 pixels loop filters
(and the 4/8 mixed versions as well).

For the 8 pixel wide versions, it is pretty close in speed (the
v_4_8 and v_8_8 filters are the best examples of this; the h_4_8
and h_8_8 filters seem to get some gain in the load/transpose/store
part). For the 16 pixels wide ones, we get a speedup of around
1.2-1.4x compared to the 32 bit version.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM AArch64
vp9_loop_filter_h_4_8_neon:          144.0   127.2
vp9_loop_filter_h_8_8_neon:          207.0   182.5
vp9_loop_filter_h_16_8_neon:         415.0   328.7
vp9_loop_filter_h_16_16_neon:        672.0   558.6
vp9_loop_filter_mix2_h_44_16_neon:   302.0   203.5
vp9_loop_filter_mix2_h_48_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_84_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_88_16_neon:   376.0   305.2
vp9_loop_filter_mix2_v_44_16_neon:   193.2   128.2
vp9_loop_filter_mix2_v_48_16_neon:   246.7   218.4
vp9_loop_filter_mix2_v_84_16_neon:   248.0   218.5
vp9_loop_filter_mix2_v_88_16_neon:   302.0   218.2
vp9_loop_filter_v_4_8_neon:           89.0    88.7
vp9_loop_filter_v_8_8_neon:          141.0   137.7
vp9_loop_filter_v_16_8_neon:         295.0   272.7
vp9_loop_filter_v_16_16_neon:        546.0   453.7

The speedup vs C code in checkasm tests is around 2-7x, which is
pretty much the same as for the 32 bit version. Even if these functions
are faster than their 32 bit equivalent, the C version that we compare
to also became around 1.3-1.7x faster than the C version in 32 bit.

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 4-5x.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                         A57 gcc-5.3  neon
loop_filter_h_4_8_neon:        256.6  93.4
loop_filter_h_8_8_neon:        307.3 139.1
loop_filter_h_16_8_neon:       340.1 254.1
loop_filter_h_16_16_neon:      827.0 407.9
loop_filter_mix2_h_44_16_neon: 524.5 155.4
loop_filter_mix2_h_48_16_neon: 644.5 173.3
loop_filter_mix2_h_84_16_neon: 630.5 222.0
loop_filter_mix2_h_88_16_neon: 697.3 222.0
loop_filter_mix2_v_44_16_neon: 598.5 100.6
loop_filter_mix2_v_48_16_neon: 651.5 127.0
loop_filter_mix2_v_84_16_neon: 591.5 167.1
loop_filter_mix2_v_88_16_neon: 855.1 166.7
loop_filter_v_4_8_neon:        271.7  65.3
loop_filter_v_8_8_neon:        312.5 106.9
loop_filter_v_16_8_neon:       473.3 206.5
loop_filter_v_16_16_neon:      976.1 327.8

The speed-up compared to the C functions is 2.5 to 6 and the cortex-a57
is again 30-50% faster than the cortex-a53.

This is an adapted cherry-pick from libav commits
9d2afd1eb8 and
31756abe29.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Martin Storsjö
f43079e11c aarch64: vp9: Add NEON itxfm routines
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the 16x16 and 32x32
transforms in slices 8 pixels wide instead of 4. This gives
a speedup of around 1.4x compared to the 32 bit version.

The fact that aarch64 doesn't have the same d/q register
aliasing makes some of the macros quite a bit simpler as well.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM  AArch64
vp9_inv_adst_adst_4x4_add_neon:       90.0     87.7
vp9_inv_adst_adst_8x8_add_neon:      400.0    354.7
vp9_inv_adst_adst_16x16_add_neon:   2526.5   1827.2
vp9_inv_dct_dct_4x4_add_neon:         74.0     72.7
vp9_inv_dct_dct_8x8_add_neon:        271.0    256.7
vp9_inv_dct_dct_16x16_add_neon:     1960.7   1372.7
vp9_inv_dct_dct_32x32_add_neon:    11988.9   8088.3
vp9_inv_wht_wht_4x4_add_neon:         63.0     57.7

The speedup vs C code (2-4x) is smaller than in the 32 bit case,
mostly because the C code ends up significantly faster (around
1.6x faster, with GCC 5.4) when built for aarch64.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                                A57 gcc-5.3   neon
vp9_inv_adst_adst_4x4_add_neon:       152.2   60.0
vp9_inv_adst_adst_8x8_add_neon:       948.2  288.0
vp9_inv_adst_adst_16x16_add_neon:    4830.4 1380.5
vp9_inv_dct_dct_4x4_add_neon:         153.0   58.6
vp9_inv_dct_dct_8x8_add_neon:         789.2  180.2
vp9_inv_dct_dct_16x16_add_neon:      3639.6  917.1
vp9_inv_dct_dct_32x32_add_neon:     20462.1 4985.0
vp9_inv_wht_wht_4x4_add_neon:          91.0   49.8

The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
is around 30-50% faster on the a57 compared to the a53.

This is an adapted cherry-pick from libav commit
3c9546dfaf.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Martin Storsjö
1f7801c2bc aarch64: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; it is essentially a 1:1
port with no extra added features, but with some hand tuning
(especially for the plain copy/avg functions). The ARM version
isn't very register starved to begin with, so there's not much
to be gained from having more spare registers here - we only
avoid having to clobber callee-saved registers.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                     ARM   AArch64
vp9_avg4_neon:                      27.2      23.7
vp9_avg8_neon:                      56.5      54.7
vp9_avg16_neon:                    169.9     167.4
vp9_avg32_neon:                    585.8     585.2
vp9_avg64_neon:                   2460.3    2294.7
vp9_avg_8tap_smooth_4h_neon:       132.7     125.2
vp9_avg_8tap_smooth_4hv_neon:      478.8     442.0
vp9_avg_8tap_smooth_4v_neon:       126.0      93.7
vp9_avg_8tap_smooth_8h_neon:       241.7     234.2
vp9_avg_8tap_smooth_8hv_neon:      690.9     646.5
vp9_avg_8tap_smooth_8v_neon:       245.0     205.5
vp9_avg_8tap_smooth_64h_neon:    11273.2   11280.1
vp9_avg_8tap_smooth_64hv_neon:   22980.6   22184.1
vp9_avg_8tap_smooth_64v_neon:    11549.7   10781.1
vp9_put4_neon:                      18.0      17.2
vp9_put8_neon:                      40.2      37.7
vp9_put16_neon:                     97.4      99.5
vp9_put32_neon/armv8:              346.0     307.4
vp9_put64_neon/armv8:             1319.0    1107.5
vp9_put_8tap_smooth_4h_neon:       126.7     118.2
vp9_put_8tap_smooth_4hv_neon:      465.7     434.0
vp9_put_8tap_smooth_4v_neon:       113.0      86.5
vp9_put_8tap_smooth_8h_neon:       229.7     221.6
vp9_put_8tap_smooth_8hv_neon:      658.9     621.3
vp9_put_8tap_smooth_8v_neon:       215.0     187.5
vp9_put_8tap_smooth_64h_neon:    10636.7   10627.8
vp9_put_8tap_smooth_64hv_neon:   21076.8   21026.9
vp9_put_8tap_smooth_64v_neon:     9635.0    9632.4

These are generally about as fast as the corresponding ARM
routines on the same CPU (at least on the A53), in most cases
marginally faster.

The speedup vs C code is pretty much the same as for the 32 bit
case; on the A53 it's around 6-13x for ther larger 8tap filters.
The exact speedup varies a little, since the C versions generally
don't end up exactly as slow/fast as on 32 bit.

This is an adapted cherry-pick from libav commit
383d96aa22.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00