This work is sponsored by, and copyright, Google.
The plain pixel put/copy functions are used from the 8 bit version,
for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new
copy128 is added.
Compared with the 8 bit version, the filters can no longer use the
trick to accumulate in 16 bit with only saturation at the end, but now
the accumulators need to be 32 bit. This avoids the need to keep track
of which filter index is the largest though, reducing the size of the
executable code for these filters.
For the horizontal filters, we only do 4 or 8 pixels wide in parallel
(while doing two rows at a time), since we don't have enough register
space to filter 16 pixels wide.
For the vertical filters, we still do 4 and 8 pixels in parallel just
as in the 8 bit case, but we need to store the output after every 2
rows instead of after every 4 rows.
Examples of relative speedup compared to the C version, from checkasm:
Cortex A7 A8 A9 A53
vp9_avg4_10bpp_neon: 2.25 2.44 3.05 2.16
vp9_avg8_10bpp_neon: 3.66 8.48 3.86 3.50
vp9_avg16_10bpp_neon: 3.39 8.26 3.37 2.72
vp9_avg32_10bpp_neon: 4.03 10.20 4.07 3.42
vp9_avg64_10bpp_neon: 4.15 10.01 4.13 3.70
vp9_avg_8tap_smooth_4h_10bpp_neon: 3.38 6.22 3.41 4.75
vp9_avg_8tap_smooth_4hv_10bpp_neon: 3.89 6.39 4.30 5.32
vp9_avg_8tap_smooth_4v_10bpp_neon: 5.32 9.73 6.34 7.31
vp9_avg_8tap_smooth_8h_10bpp_neon: 4.45 9.40 4.68 6.87
vp9_avg_8tap_smooth_8hv_10bpp_neon: 4.64 8.91 5.44 6.47
vp9_avg_8tap_smooth_8v_10bpp_neon: 6.44 13.42 8.68 8.79
vp9_avg_8tap_smooth_64h_10bpp_neon: 4.66 9.02 4.84 7.71
vp9_avg_8tap_smooth_64hv_10bpp_neon: 4.61 9.14 4.92 7.10
vp9_avg_8tap_smooth_64v_10bpp_neon: 6.90 14.13 9.57 10.41
vp9_put4_10bpp_neon: 1.33 1.46 2.09 1.33
vp9_put8_10bpp_neon: 1.57 3.42 1.83 1.84
vp9_put16_10bpp_neon: 1.55 4.78 2.17 1.89
vp9_put32_10bpp_neon: 2.06 5.35 2.14 2.30
vp9_put64_10bpp_neon: 3.00 2.41 1.95 1.66
vp9_put_8tap_smooth_4h_10bpp_neon: 3.19 5.81 3.31 4.63
vp9_put_8tap_smooth_4hv_10bpp_neon: 3.86 6.22 4.32 5.21
vp9_put_8tap_smooth_4v_10bpp_neon: 5.40 9.77 6.08 7.21
vp9_put_8tap_smooth_8h_10bpp_neon: 4.22 8.41 4.46 6.63
vp9_put_8tap_smooth_8hv_10bpp_neon: 4.56 8.51 5.39 6.25
vp9_put_8tap_smooth_8v_10bpp_neon: 6.60 12.43 8.17 8.89
vp9_put_8tap_smooth_64h_10bpp_neon: 4.41 8.59 4.54 7.49
vp9_put_8tap_smooth_64hv_10bpp_neon: 4.43 8.58 5.34 6.63
vp9_put_8tap_smooth_64v_10bpp_neon: 7.26 13.92 9.27 10.92
For the larger 8tap filters, the speedup vs C code is around 4-14x.
Signed-off-by: Martin Storsjö <martin@martin.st>
This work is sponsored by, and copyright, Google.
This is more in line with how it will be extended for more bitdepths.
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit 'fd5e6a095f69495c558069315d6b36ea410c31fa':
x86util: Extend SPLATW for avx2
This commit is a noop, see 1ace9573dc
(only libavutil/x86/x86util.asm chunk).
Merged-by: Clément Bœsch <u@pkh.me>
* commit '37961044c6':
checkasm: arm: Ignore changes to bits 0-4 and 7 of FPSCR
cheackasm/arm: remove NEON instructions from checkasm_checked_call_vfp
checkasm: arm: Don't start new const blocks for each string
This merge is a noop: the changes were included in 9f1c81e5ec.
Merged-by: Clément Bœsch <u@pkh.me>
* commit '5ece6911010b3464d2fdacfa8031c15b5bd83418':
apichanges: Fill in missing hashes and dates
This commit is a noop as we need to fill with our own hashes.
Merged-by: Clément Bœsch <u@pkh.me>
* commit 'facdfe40805559963b5875931af9406ed5ddcd5c':
swscale: Add proper ff_ prefix to init functions
This commit is a noop, see e8c3716064
I'm keeping our ff_sws_ vs ff_ since we use ff_sws_ in other places in
swscale.
Merged-by: Clément Bœsch <u@pkh.me>
* commit 'c0fd2fb27bebd1d5ab028e6df6bca9119d269122':
swscale: Rename sws_context_class to ff_sws_context_class
This commit is a noop, see 8bfbc8c5e5
Merged-by: Clément Bœsch <u@pkh.me>
* commit '71a0472114574993df7035f4de9aa007e03817b8':
checkasm: arm: report the first clobbered register in checkasm_checked_call
Also includes 446353ea18, 59aeed93e4, and 37961044c6 to avoid breaking
too much stuff.
Merged-by: Clément Bœsch <u@pkh.me>
* commit 'a8fce24b9c5a87187f5bd864b18f5b3e575f8c3d':
avconv_dxva2: support HEVC Main10 decoding
This commit is a noop, see 1ec14612a5
Merged-by: Clément Bœsch <u@pkh.me>
* commit '33f6690eb4e21acc4b581688eecfc4cc5ea9515e':
hevc: offer DXVA2 for 10bit 420
This commit is a noop, see ccb94789e2
Merged-by: Clément Bœsch <u@pkh.me>
* commit '38efff92f1ef81f3de20ff0460ec7b70c253d714':
FATE: add a test for H.264 with two fields per packet
h264: fix decoding multiple fields per packet with slice threads
This merge includes two commits because the FATE test was useful in
order to make proper testing.
The merge gets rid of the now unused:
- SLICE_SINGLETHREAD and SLICE_SKIPED macros
- max_contexts
- "again" label in decode_nal_units()
This commit also includes the fix from d3e4d406b.
Thanks to wm4 and Michael Niedermayer for their testing.
Merged-by: Clément Bœsch <u@pkh.me>
Merged-by: Matthieu Bouron <matthieu.bouron@gmail.com>
This treats the case of no slices like no frames which it basically is.
The field is added to the context as other nal related fields are also there
and passing the has_slices field per *arguments is ugly and not consistent
Found-by: ubitux
Approved-by: ubitux
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If fifo is enabled on tee muxer, ffmpeg exits because of an unknown option passed to fifo muxer.
Option name "format_options" was replaced by "format_opts" on tee muxer.
Signed-off-by: Felipe Astroza <felipe@astroza.cl>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
CUVID on GeForce GT 730 and GeForce GTX 1060 does not report any error when
decoding 8K h264 packets. However, it does return an error during
cuvidCreateDecoder call if the indicated video resolution is not
supported.
Given that stream resolution is typically known as a result of probing
it is better to use this information during avcodec_open2 call to fail
immediately, rather than proceeding to decode and never receiving any
frames from the decoder nor receiving any indication of decode failure.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This happens because segment_end() returns an error, so seg_write_packet
never proceeds to segment_start(), and seg->avf->pb is never re-set,
so we crash with a null pb when av_write_trailer flushes the packet
queue.
This doesn't seem to be clearly recoverable, so I'm just failing more
gracefully.
Repro:
ffmpeg -i input.ts -f segment -c copy -segment_list /noaxx.m3u8 test-%05d.ts
(assuming you don't have write access to /)
This makes the code 7 times faster with the testcase from libfuzzer
and should reduce the amount of timeouts we hit in automated fuzzing.
(for example 438/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_RV40_fuzzer)
The code is also faster with more realistic input though the difference
is small here as that is far from the worst cases the fuzzers pick out
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
use av_lfg_init_from_data() to seed AC-3 dithering from the AC-3 frame
data to make it consistent given the same AC-3 frame, if option is set.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Raises max channels to 6 (for non joint-stereo only),
there is no difference decoding 1 or N discrete channels.
Fixes trac issue #5840
Signed-off-by: bnnm <bananaman255@gmail.com>
When use http method to delete the old segments,
there is only io_open, hove not io_close yet,
this patch is used to fix it
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>