The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.
Signed-off-by: Martin Storsjö <martin@martin.st>
The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.
Signed-off-by: Martin Storsjö <martin@martin.st>
For widths of 32 pixels and more, loop first horizontally,
then vertically.
Previously, this function would process a 16 pixel wide slice
of the block, looping vertically. After processing the whole
height, it would backtrack and process the next 16 pixel wide
slice.
When doing 8tap filtering horizontally, the function must load
7 more pixels (in practice, 8) following the actual inputs, and
this was done for each slice.
By iterating first horizontally throughout each line, then
vertically, we access data in a more cache friendly order, and
we don't need to reload data unnecessarily.
Keep the original order in put_hevc_\type\()_h12_8_neon; the
only suboptimal case there is for width=24. But specializing
an optimal variant for that would require more code, which
might not be worth it.
For the h16 case, this implementation would give a slowdown,
as it now loads the first 8 pixels separately from the rest, but
for larger widths, it is a gain. Therefore, keep the h16 case
as it was (but remove the outer loop), and create a new specialized
version for horizontal looping with 16 pixels at a time.
Before: Cortex A53 A72 A73 Graviton 3
put_hevc_qpel_h16_8_neon: 710.5 667.7 692.5 211.0
put_hevc_qpel_h32_8_neon: 2791.5 2643.5 2732.0 883.5
put_hevc_qpel_h64_8_neon: 10954.0 10657.0 10874.2 3241.5
After:
put_hevc_qpel_h16_8_neon: 697.5 663.5 705.7 212.5
put_hevc_qpel_h32_8_neon: 2767.2 2684.5 2791.2 920.5
put_hevc_qpel_h64_8_neon: 10559.2 10471.5 10932.2 3051.7
Signed-off-by: Martin Storsjö <martin@martin.st>
This gets rid of a couple instructions, but the actual performance
is almost identical on Cortex A72/A73. On Cortex A53, it is a
handful of cycles faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon
store temporary buffers on the stack. When consuming it,
many of these functions use the stack pointer as incremental pointer
for reading the data (instead of storing it in another register),
which is rather unusual.
Technically, this is fine as long as the pointer remains properly
aligned.
However in the case of ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm,
after incrementing sp when reading data (within each 16 pixel
wide stripe) it would then reset the stack pointer back to a lower
value, for reading the next 16 pixel wide stripe, expecting the
data to remain untouched.
This can't be assumed; data on the stack below the stack pointer
can be clobbered (e.g. by a signal handler). Some OS ABIs
allow for a little margin that won't be touched, aka a red zone,
but not all do. The ones that do, guarantee 16 or 128 bytes, not
9 KB.
Convert this function to use a separate pointer register to
iterate through the data, retaining the stack pointer to point
at the bottom of the data we require to remain untouched.
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes: signed integer overflow: 178459578 + 2009763270 cannot be represented in type 'int'
Fixes: 62285/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-5013423686287360
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
For hardware cases where we are forced to have a fixed pool of frames
allocated up-front (such as array textures on decoder output), suggest
a possible workaround to the user if an allocation fails because the
pool is exhausted.
Fixes: shift exponent 32 is too large for 32-bit type 'unsigned int'
Fixes: 65909/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VMIX_fuzzer-519459745831321
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Broken in afa471d0ef. It just happened
to work before due to x86inc.asm previously performing XMM spills in
INIT_MMX mode which was more of a bug than an intentional feature.
Based on the AOMedia Film Grain Synthesis 1 (AFGS1) spec:
https://aomediacodec.github.io/afgs1-spec/
The parsing has been changed substantially relative to the AV1 film
grain OBU. In particular:
1. There is the possibility of maintaining multiple independent film
grain parameter sets, and decoders/players are recommended to pick
the one most appropriate for the intended display resolution. This
could also be used to e.g. switch between different grain profiles
without having to re-signal the appropriate coefficients.
2. Supporting this, it's possible to *predict* the grain coefficients
from previously signalled parameter sets, transmitting only the
residual.
3. When not predicting, the parameter sets are now stored as a series of
increments, rather than being directly transmitted.
4. There are several new AFGS1-exclusive fields.
I placed this parser in its own file, rather than h2645_sei.c, since
nothing in the generic AFGS1 film grain payload is specific to T.35, and
to compartmentalize the code base.
Implementation copied wholesale from dav1d, sans SIMD, under permissive
license. This implementation was extensively verified to be bit-exact,
so it serves as a much better starting point than trying to re-engineer
this from scratch for no reason. (I also authored the original
implementation in dav1d, so any "clean room" implementation would end up
looking much the same, anyway)
The notable changes I had to make while adapting this from the dav1d
code-base to the FFmpeg codebase include:
- reordering variable declarations to avoid triggering warnings
- replacing several inline helpers by avutil equivalents
- changing code that accesses frame metadata
- replacing raw plane copying logic by av_image_copy_plane
Apart from this, the implementation is basically unmodified.
Not directly signalled by AV1, but we should still set this accordingly
so that users will know what the original intended video characteristics
and chroma resolution were.
Not directly signalled by AV1, but we should still set this accordingly
so that users will know what the original intended video characteristics
and chroma resolution were.
H.274 specifies that film grain parameters are signalled as intended for
4:4:4 frames, so we always signal this, regardless of the frame's actual
subsampling.
small cleanup for style, redundant semicolons, goto labels,
in FFmpeg, we put goto labels at brace level.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Fixes: out of array access
Fixes: 67021/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-4883576579489792
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This will allow users to pass the Android ApplicationContext which is mandatory
to retrieve the ContentResolver responsible to resolve/open Android content URIS.
av_frame_side_data_get() has a const AVFrameSideData * const *sd
parameter; so calling it with an AVFramesSideData **sd like
AVCodecContext.decoded_side_data (or with a AVFramesSideData * const
*sd) is safe, but the conversion is not performed automatically
in C. All users of this function therefore resort to a cast.
This commit changes this: av_frame_side_data_get() is renamed
to av_frame_side_data_get_c(); furthermore, a static inline
wrapper for it name av_frame_side_data_get() is added
that accepts an AVFramesSideData * const * and converts this
to const AVFramesSideData * const * in a Wcast-qual safe way.
This also allows to remove the casts from the current users.
Reviewed-by: Jan Ekström <jeebjp@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The latter need not be save, because av_log() expects
to get a pointer to an AVClass-enabled structure
and not only a fake object. If this function were actually
be called in the following way:
const AVClass *avcl = avctx->av_class;
handle_mdcv(&avcl, );
the AVClass's item_name would expect to point to an actual
AVCodecContext, potentially leading to a segfault.
Reviewed-by: Jan Ekström <jeebjp@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This code uses the AVBPrint API for exactly one av_bprintf()
in a scenario in which a good upper bound for the needed
size of the buffer is available (with said upper bound being
much smaller than sizeof(AVBPrint)). So one can simply use
snprintf() instead. This also avoids the (always-false due to
the current size of the internal AVBPrint buffer) check for
whether the AVBPrint is complete.
Furthermore, the old code used AV_BPRINT_SIZE_AUTOMATIC
which implies that the AVBPrint buffer will never be
(re)allocated and yet it used av_bprint_finalize().
This has of course also been removed.
Reviewed-by: Jan Ekström <jeebjp@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AVCodecContext extradata should be an AVCDecoderConfigurationRecord
when bitstream format is avcc. Simply concatenating the NALUs output
by x264_encoder_headers does not form a standard
AVCDecoderConfigurationRecord. The following cmd generates broken
file before the patch:
ffmpeg -i foo.mp4 -c:v libx264 -x264-params annexb=0 bar.mp4
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Many users mistakenly think that hwaccel is an instance of a decoder,
and cannot find the corresponding decoder name in the logs. Log
hwaccel name so user know hwaccel has taken effect.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>