1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-03-23 04:24:35 +02:00

28 Commits

Author SHA1 Message Date
Andreas Rheinhardt
4368e86a02 avcodec/vp9dec: Constify VP9TileData->VP9Context pointer target
This is possible now that ff_thread_await_progress() accepts
a const ThreadFrame*.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-31 04:18:26 +02:00
Andreas Rheinhardt
6f7d3bde11 avcodec/vp8, vp9: Avoid using VP56mv and VP56Frame in VP8/9
Instead replace VP56mv by new and identical structures VP8mv and VP9mv.
Also replace VP56Frame by VP8FrameType in vp8.h and use that
in VP8 code. Also remove VP56_FRAME_GOLDEN2, as this has only
been used by VP8, and use VP8_FRAME_ALTREF as replacement for
its usage in VP8 as this is more in line with VP8 verbiage.

This allows to remove all inclusions of vp56.h from everything
that is not VP5/6. This also removes implicit inclusions
of hpeldsp.h, h264chroma.h, vp3dsp.h and vp56dsp.h from all VP8/9
files.

(This also fixes a build issue: If one compiles with -O0 and disables
everything except the VP8-VAAPI encoder, the file containing
ff_vpx_norm_shift is not compiled, yet this is used implicitly
by vp56_rac_gets_nn() which is defined in vp56.h; it is unused
by the VP8-VAAPI encoder and declared as av_unused, yet with -O0
unused noninline functions are not optimized away, leading to
linking failures. With this patch, said function is not included
in vaapi_encode_vp8.c any more.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-28 03:49:54 +02:00
Andreas Rheinhardt
7ab9b30800 avcodec/vp56: Move VP5-9 range coder functions to a header of their own
Also use a vpx prefix for them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-28 03:49:54 +02:00
Andreas Rheinhardt
80ad06ab1b avcodec/vp56: Move VP8/9-only rac functions to a header of their own
Also rename these functions from vp8_rac_* to vp89_rac_*.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-28 03:49:54 +02:00
Andreas Rheinhardt
b3551b6072 avcodec/thread: Move ff_thread_(await|report)_progress to new header
This is in preparation for further commits that will stop
using ThreadFrame for frame-threaded codecs that don't use
ff_thread_(await|report)_progress(); the API for those codecs
having inter-frame depdendencies will live in threadframe.h.

Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-02-09 17:22:16 +01:00
Andreas Rheinhardt
25c8507818 Remove/replace some unnecessary avcodec.h inclusions
Also remove other unnecessary headers and include headers directly while
at it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-07-22 15:29:46 +02:00
Anton Khirnov
ffae62d96c vp9dec: support exporting QP tables through the AVVideoEncParams API 2020-05-12 09:37:47 +02:00
James Almer
318778de9e Merge commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3'
* commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3':
  Mark some arrays that never change as const.

Merged-by: James Almer <jamrial@gmail.com>
2017-09-26 16:02:40 -03:00
Ilia Valiakhmetov
e59da0f7ff avcodec/vp9: Add tile threading support
Signed-off-by: Ilia Valiakhmetov <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-09-08 10:25:40 -04:00
Michael Niedermayer
d4ee767808 avcodec/vp9block: fix runtime error: signed integer overflow: 196675 * 20670 cannot be represented in type 'int'
Fixes: 1710/clusterfuzz-testcase-minimized-4837032931098624

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-05-21 15:39:07 +02:00
Ronald S. Bultje
6d0d1c4a43 vp9: split out reconstruction functions in their own source file. 2017-03-28 18:04:26 -04:00
Ronald S. Bultje
f8c019944d vp9: re-split the decoder/format/dsp interface header files.
The advantage here is that the internal software decoder interface is
not exposed to the DSP functions or the hardware accelerations.
2017-03-28 18:04:26 -04:00
Clément Bœsch
37814a21cb lavc/vp9: consistent use of typedef instead of struct 2017-03-27 21:38:21 +02:00
Clément Bœsch
875f695576 lavc/vp9: misc cosmetics
Imported from Libav
2017-03-27 21:38:21 +02:00
Clément Bœsch
12c44d6373 lavc/vp9: rename ctx to avctx
This reduces diff with Libav. It also prevents a potential confusion
between the private context and the AVCodecContext.
2017-03-27 21:38:21 +02:00
Clément Bœsch
1c9f4b5078 lavc/vp9: split into vp9{block,data,mvs}
This is following Libav layout to ease merges.
2017-03-27 21:38:21 +02:00
Anton Khirnov
fd9212f2ed Mark some arrays that never change as const. 2017-02-01 10:42:59 +01:00
Martin Storsjö
383d96aa22 aarch64: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; it is essentially a 1:1
port with no extra added features, but with some hand tuning
(especially for the plain copy/avg functions). The ARM version
isn't very register starved to begin with, so there's not much
to be gained from having more spare registers here - we only
avoid having to clobber callee-saved registers.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                     ARM   AArch64
vp9_avg4_neon:                      27.2      23.7
vp9_avg8_neon:                      56.5      54.7
vp9_avg16_neon:                    169.9     167.4
vp9_avg32_neon:                    585.8     585.2
vp9_avg64_neon:                   2460.3    2294.7
vp9_avg_8tap_smooth_4h_neon:       132.7     125.2
vp9_avg_8tap_smooth_4hv_neon:      478.8     442.0
vp9_avg_8tap_smooth_4v_neon:       126.0      93.7
vp9_avg_8tap_smooth_8h_neon:       241.7     234.2
vp9_avg_8tap_smooth_8hv_neon:      690.9     646.5
vp9_avg_8tap_smooth_8v_neon:       245.0     205.5
vp9_avg_8tap_smooth_64h_neon:    11273.2   11280.1
vp9_avg_8tap_smooth_64hv_neon:   22980.6   22184.1
vp9_avg_8tap_smooth_64v_neon:    11549.7   10781.1
vp9_put4_neon:                      18.0      17.2
vp9_put8_neon:                      40.2      37.7
vp9_put16_neon:                     97.4      99.5
vp9_put32_neon/armv8:              346.0     307.4
vp9_put64_neon/armv8:             1319.0    1107.5
vp9_put_8tap_smooth_4h_neon:       126.7     118.2
vp9_put_8tap_smooth_4hv_neon:      465.7     434.0
vp9_put_8tap_smooth_4v_neon:       113.0      86.5
vp9_put_8tap_smooth_8h_neon:       229.7     221.6
vp9_put_8tap_smooth_8hv_neon:      658.9     621.3
vp9_put_8tap_smooth_8v_neon:       215.0     187.5
vp9_put_8tap_smooth_64h_neon:    10636.7   10627.8
vp9_put_8tap_smooth_64hv_neon:   21076.8   21026.9
vp9_put_8tap_smooth_64v_neon:     9635.0    9632.4

These are generally about as fast as the corresponding ARM
routines on the same CPU (at least on the A53), in most cases
marginally faster.

The speedup vs C code is pretty much the same as for the 32 bit
case; on the A53 it's around 6-13x for ther larger 8tap filters.
The exact speedup varies a little, since the C versions generally
don't end up exactly as slow/fast as on 32 bit.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:15:56 +02:00
Martin Storsjö
ffbd1d2b00 arm: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

The filter coefficients are signed values, where the product of the
multiplication with one individual filter coefficient doesn't
overflow a 16 bit signed value (the largest filter coefficient is
127). But when the products are accumulated, the resulting sum can
overflow the 16 bit signed range. Instead of accumulating in 32 bit,
we accumulate the largest product (either index 3 or 4) last with a
saturated addition.

(The VP8 MC asm does something similar, but slightly simpler, by
accumulating each half of the filter separately. In the VP9 MC
filters, each half of the filter can also overflow though, so the
largest component has to be handled individually.)

Examples of relative speedup compared to the C version, from checkasm:
                       Cortex      A7     A8     A9    A53
vp9_avg4_neon:                   1.71   1.15   1.42   1.49
vp9_avg8_neon:                   2.51   3.63   3.14   2.58
vp9_avg16_neon:                  2.95   6.76   3.01   2.84
vp9_avg32_neon:                  3.29   6.64   2.85   3.00
vp9_avg64_neon:                  3.47   6.67   3.14   2.80
vp9_avg_8tap_smooth_4h_neon:     3.22   4.73   2.76   4.67
vp9_avg_8tap_smooth_4hv_neon:    3.67   4.76   3.28   4.71
vp9_avg_8tap_smooth_4v_neon:     5.52   7.60   4.60   6.31
vp9_avg_8tap_smooth_8h_neon:     6.22   9.04   5.12   9.32
vp9_avg_8tap_smooth_8hv_neon:    6.38   8.21   5.72   8.17
vp9_avg_8tap_smooth_8v_neon:     9.22  12.66   8.15  11.10
vp9_avg_8tap_smooth_64h_neon:    7.02  10.23   5.54  11.58
vp9_avg_8tap_smooth_64hv_neon:   6.76   9.46   5.93   9.40
vp9_avg_8tap_smooth_64v_neon:   10.76  14.13   9.46  13.37
vp9_put4_neon:                   1.11   1.47   1.00   1.21
vp9_put8_neon:                   1.23   2.17   1.94   1.48
vp9_put16_neon:                  1.63   4.02   1.73   1.97
vp9_put32_neon:                  1.56   4.92   2.00   1.96
vp9_put64_neon:                  2.10   5.28   2.03   2.35
vp9_put_8tap_smooth_4h_neon:     3.11   4.35   2.63   4.35
vp9_put_8tap_smooth_4hv_neon:    3.67   4.69   3.25   4.71
vp9_put_8tap_smooth_4v_neon:     5.45   7.27   4.49   6.52
vp9_put_8tap_smooth_8h_neon:     5.97   8.18   4.81   8.56
vp9_put_8tap_smooth_8hv_neon:    6.39   7.90   5.64   8.15
vp9_put_8tap_smooth_8v_neon:     9.03  11.84   8.07  11.51
vp9_put_8tap_smooth_64h_neon:    6.78   9.48   4.88  10.89
vp9_put_8tap_smooth_64hv_neon:   6.99   8.87   5.94   9.56
vp9_put_8tap_smooth_64v_neon:   10.69  13.30   9.43  14.34

For the larger 8tap filters, the speedup vs C code is around 5-14x.

This is significantly faster than libvpx's implementation of the same
functions, at least when comparing the put_8tap_smooth_64 functions
(compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from
libvpx).

Absolute runtimes from checkasm:
                          Cortex      A7        A8        A9       A53
vp9_put_8tap_smooth_64h_neon:    20150.3   14489.4   19733.6   10863.7
libvpx vpx_convolve8_horiz_neon: 52623.3   19736.4   21907.7   25027.7

vp9_put_8tap_smooth_64v_neon:    14455.0   12303.9   13746.4    9628.9
libvpx vpx_convolve8_vert_neon:  42090.0   17706.2   17659.9   16941.2

Thus, on the A9, the horizontal filter is only marginally faster than
libvpx, while our version is significantly faster on the other cores,
and the vertical filter is significantly faster on all cores. The
difference is especially large on the A7.

The libvpx implementation does the accumulation in 32 bit, which
probably explains most of the differences.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:35:38 +02:00
Martin Storsjö
2e55e26b40 vp9: Flip the order of arguments in MC functions
This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Ronald S. Bultje
a451324ddd vp9: ignore reference segmentation map if error_resilience flag is set.
Fixes ffvp9_fails_where_libvpx.succeeds.webm.

Bug-Id: ffmpeg/3849.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:07 +02:00
Ronald S. Bultje
1730a67ab9 vp9: add frame threading
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-11 10:54:44 +02:00
Ronald S. Bultje
5b995452a6 vp9: allocate 'b', 'block/uvblock' and 'eob/uveob' dynamically.
This will be needed for frame threading.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-11 10:54:20 +02:00
Ronald S. Bultje
bc6e0b64a9 vp9: split last/cur_frame from the reference buffers.
We need more information from last/cur_frame than from reference
buffers, so we can use a simplified structure for reference buffers,
and then store mvs and segmentation map information in last/cur.

This prepares the decoder for frame threading support.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-11 10:53:13 +02:00
Diego Biurrun
b7f98659f2 Remove unnecessary get_bits.h #includes 2016-06-07 13:09:57 +02:00
Luca Barbato
312daa1589 vp9: Use the correct upper bound for seg_id
And use a macro to make apparent why the value.

Bug-Id: CID 1108595
2014-11-21 12:37:05 +00:00
Anton Khirnov
ca96e33716 vp9: drop support for real (non-emulated) edges
They are not measurably faster on x86, they might be somewhat faster on
other platforms due to missing emu edge SIMD, but the gain is not large
enough to justify the added complexity.
2014-01-09 09:43:59 +01:00
Ronald S. Bultje
72ca830f51 lavc: VP9 decoder
Originally written by Ronald S. Bultje <rsbultje@gmail.com> and
Clément Bœsch <u@pkh.me>

Further contributions by:
Anton Khirnov <anton@khirnov.net>
Diego Biurrun <diego@biurrun.de>
Luca Barbato <lu_zero@gentoo.org>
Martin Storsjö <martin@martin.st>

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-11-15 10:16:28 +01:00