FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-03-23 04:24:35 +02:00

Author	SHA1	Message	Date
Andreas Rheinhardt	4368e86a02	avcodec/vp9dec: Constify VP9TileData->VP9Context pointer target This is possible now that ff_thread_await_progress() accepts a const ThreadFrame*. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-31 04:18:26 +02:00
Andreas Rheinhardt	6f7d3bde11	avcodec/vp8, vp9: Avoid using VP56mv and VP56Frame in VP8/9 Instead replace VP56mv by new and identical structures VP8mv and VP9mv. Also replace VP56Frame by VP8FrameType in vp8.h and use that in VP8 code. Also remove VP56_FRAME_GOLDEN2, as this has only been used by VP8, and use VP8_FRAME_ALTREF as replacement for its usage in VP8 as this is more in line with VP8 verbiage. This allows to remove all inclusions of vp56.h from everything that is not VP5/6. This also removes implicit inclusions of hpeldsp.h, h264chroma.h, vp3dsp.h and vp56dsp.h from all VP8/9 files. (This also fixes a build issue: If one compiles with -O0 and disables everything except the VP8-VAAPI encoder, the file containing ff_vpx_norm_shift is not compiled, yet this is used implicitly by vp56_rac_gets_nn() which is defined in vp56.h; it is unused by the VP8-VAAPI encoder and declared as av_unused, yet with -O0 unused noninline functions are not optimized away, leading to linking failures. With this patch, said function is not included in vaapi_encode_vp8.c any more.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-28 03:49:54 +02:00
Andreas Rheinhardt	7ab9b30800	avcodec/vp56: Move VP5-9 range coder functions to a header of their own Also use a vpx prefix for them. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-28 03:49:54 +02:00
Andreas Rheinhardt	80ad06ab1b	avcodec/vp56: Move VP8/9-only rac functions to a header of their own Also rename these functions from vp8_rac_* to vp89_rac_*. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-28 03:49:54 +02:00
Andreas Rheinhardt	b3551b6072	avcodec/thread: Move ff_thread_(await\|report)_progress to new header This is in preparation for further commits that will stop using ThreadFrame for frame-threaded codecs that don't use ff_thread_(await\|report)_progress(); the API for those codecs having inter-frame depdendencies will live in threadframe.h. Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-02-09 17:22:16 +01:00
Andreas Rheinhardt	25c8507818	Remove/replace some unnecessary avcodec.h inclusions Also remove other unnecessary headers and include headers directly while at it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-07-22 15:29:46 +02:00
Anton Khirnov	ffae62d96c	vp9dec: support exporting QP tables through the AVVideoEncParams API	2020-05-12 09:37:47 +02:00
James Almer	318778de9e	Merge commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3' * commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3': Mark some arrays that never change as const. Merged-by: James Almer <jamrial@gmail.com>	2017-09-26 16:02:40 -03:00
Ilia Valiakhmetov	e59da0f7ff	avcodec/vp9: Add tile threading support Signed-off-by: Ilia Valiakhmetov <zakne0ne@gmail.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2017-09-08 10:25:40 -04:00
Michael Niedermayer	d4ee767808	avcodec/vp9block: fix runtime error: signed integer overflow: 196675 * 20670 cannot be represented in type 'int' Fixes: 1710/clusterfuzz-testcase-minimized-4837032931098624 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-05-21 15:39:07 +02:00
Ronald S. Bultje	6d0d1c4a43	vp9: split out reconstruction functions in their own source file.	2017-03-28 18:04:26 -04:00
Ronald S. Bultje	f8c019944d	vp9: re-split the decoder/format/dsp interface header files. The advantage here is that the internal software decoder interface is not exposed to the DSP functions or the hardware accelerations.	2017-03-28 18:04:26 -04:00
Clément Bœsch	37814a21cb	lavc/vp9: consistent use of typedef instead of struct	2017-03-27 21:38:21 +02:00
Clément Bœsch	875f695576	lavc/vp9: misc cosmetics Imported from Libav	2017-03-27 21:38:21 +02:00
Clément Bœsch	12c44d6373	lavc/vp9: rename ctx to avctx This reduces diff with Libav. It also prevents a potential confusion between the private context and the AVCodecContext.	2017-03-27 21:38:21 +02:00
Clément Bœsch	1c9f4b5078	lavc/vp9: split into vp9{block,data,mvs} This is following Libav layout to ease merges.	2017-03-27 21:38:21 +02:00
Anton Khirnov	fd9212f2ed	Mark some arrays that never change as const.	2017-02-01 10:42:59 +01:00
Martin Storsjö	383d96aa22	aarch64: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:15:56 +02:00
Martin Storsjö	ffbd1d2b00	arm: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. The filter coefficients are signed values, where the product of the multiplication with one individual filter coefficient doesn't overflow a 16 bit signed value (the largest filter coefficient is 127). But when the products are accumulated, the resulting sum can overflow the 16 bit signed range. Instead of accumulating in 32 bit, we accumulate the largest product (either index 3 or 4) last with a saturated addition. (The VP8 MC asm does something similar, but slightly simpler, by accumulating each half of the filter separately. In the VP9 MC filters, each half of the filter can also overflow though, so the largest component has to be handled individually.) Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_neon: 1.71 1.15 1.42 1.49 vp9_avg8_neon: 2.51 3.63 3.14 2.58 vp9_avg16_neon: 2.95 6.76 3.01 2.84 vp9_avg32_neon: 3.29 6.64 2.85 3.00 vp9_avg64_neon: 3.47 6.67 3.14 2.80 vp9_avg_8tap_smooth_4h_neon: 3.22 4.73 2.76 4.67 vp9_avg_8tap_smooth_4hv_neon: 3.67 4.76 3.28 4.71 vp9_avg_8tap_smooth_4v_neon: 5.52 7.60 4.60 6.31 vp9_avg_8tap_smooth_8h_neon: 6.22 9.04 5.12 9.32 vp9_avg_8tap_smooth_8hv_neon: 6.38 8.21 5.72 8.17 vp9_avg_8tap_smooth_8v_neon: 9.22 12.66 8.15 11.10 vp9_avg_8tap_smooth_64h_neon: 7.02 10.23 5.54 11.58 vp9_avg_8tap_smooth_64hv_neon: 6.76 9.46 5.93 9.40 vp9_avg_8tap_smooth_64v_neon: 10.76 14.13 9.46 13.37 vp9_put4_neon: 1.11 1.47 1.00 1.21 vp9_put8_neon: 1.23 2.17 1.94 1.48 vp9_put16_neon: 1.63 4.02 1.73 1.97 vp9_put32_neon: 1.56 4.92 2.00 1.96 vp9_put64_neon: 2.10 5.28 2.03 2.35 vp9_put_8tap_smooth_4h_neon: 3.11 4.35 2.63 4.35 vp9_put_8tap_smooth_4hv_neon: 3.67 4.69 3.25 4.71 vp9_put_8tap_smooth_4v_neon: 5.45 7.27 4.49 6.52 vp9_put_8tap_smooth_8h_neon: 5.97 8.18 4.81 8.56 vp9_put_8tap_smooth_8hv_neon: 6.39 7.90 5.64 8.15 vp9_put_8tap_smooth_8v_neon: 9.03 11.84 8.07 11.51 vp9_put_8tap_smooth_64h_neon: 6.78 9.48 4.88 10.89 vp9_put_8tap_smooth_64hv_neon: 6.99 8.87 5.94 9.56 vp9_put_8tap_smooth_64v_neon: 10.69 13.30 9.43 14.34 For the larger 8tap filters, the speedup vs C code is around 5-14x. This is significantly faster than libvpx's implementation of the same functions, at least when comparing the put_8tap_smooth_64 functions (compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from libvpx). Absolute runtimes from checkasm: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_64h_neon: 20150.3 14489.4 19733.6 10863.7 libvpx vpx_convolve8_horiz_neon: 52623.3 19736.4 21907.7 25027.7 vp9_put_8tap_smooth_64v_neon: 14455.0 12303.9 13746.4 9628.9 libvpx vpx_convolve8_vert_neon: 42090.0 17706.2 17659.9 16941.2 Thus, on the A9, the horizontal filter is only marginally faster than libvpx, while our version is significantly faster on the other cores, and the vertical filter is significantly faster on all cores. The difference is especially large on the A7. The libvpx implementation does the accumulation in 32 bit, which probably explains most of the differences. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-03 09:35:38 +02:00
Martin Storsjö	2e55e26b40	vp9: Flip the order of arguments in MC functions This makes it match the pattern already used for VP8 MC functions. This also makes the signature match ffmpeg's version of these functions, easing porting of code in both directions. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-03 09:12:02 +02:00
Ronald S. Bultje	a451324ddd	vp9: ignore reference segmentation map if error_resilience flag is set. Fixes ffvp9_fails_where_libvpx.succeeds.webm. Bug-Id: ffmpeg/3849. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:07 +02:00
Ronald S. Bultje	1730a67ab9	vp9: add frame threading Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-08-11 10:54:44 +02:00
Ronald S. Bultje	5b995452a6	vp9: allocate 'b', 'block/uvblock' and 'eob/uveob' dynamically. This will be needed for frame threading. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-08-11 10:54:20 +02:00
Ronald S. Bultje	bc6e0b64a9	vp9: split last/cur_frame from the reference buffers. We need more information from last/cur_frame than from reference buffers, so we can use a simplified structure for reference buffers, and then store mvs and segmentation map information in last/cur. This prepares the decoder for frame threading support. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-08-11 10:53:13 +02:00
Diego Biurrun	b7f98659f2	Remove unnecessary get_bits.h #includes	2016-06-07 13:09:57 +02:00
Luca Barbato	312daa1589	vp9: Use the correct upper bound for seg_id And use a macro to make apparent why the value. Bug-Id: CID 1108595	2014-11-21 12:37:05 +00:00
Anton Khirnov	ca96e33716	vp9: drop support for real (non-emulated) edges They are not measurably faster on x86, they might be somewhat faster on other platforms due to missing emu edge SIMD, but the gain is not large enough to justify the added complexity.	2014-01-09 09:43:59 +01:00
Ronald S. Bultje	72ca830f51	lavc: VP9 decoder Originally written by Ronald S. Bultje <rsbultje@gmail.com> and Clément Bœsch <u@pkh.me> Further contributions by: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> Signed-off-by: Luca Barbato <lu_zero@gentoo.org> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2013-11-15 10:16:28 +01:00

28 Commits