This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter
sobel_c: 4537
sobel_avx512icl 2136
Signed-off-by: bwang30 <bin.wang@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
block_index is write-only for the H.261 decoder, so
don't update it by calling ff_update_block_index().
Instead use a function of our own to set/update dest.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also remove some internal.h inclusions which have been
unnecessarily added recently.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The max == 0 case can be removed too but i left it as 50% of the cases use it
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 22 * -2107998208 cannot be represented in type 'int'
Fixes: 51363/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BONK_fuzzer-5660734784143360
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Due to the lack of an active AMF maintainer at the moment, as well
as plans to add the av1 encoder and other improvements of AMF,
I added myself to the maintainers. Timely review and merging
patches targeting AMF integration should improve support
of AMD GPUs and APUs in FFmpeg.
For the last couple of years I have been working on AMF related
patches to ffmpeg and other open source projects.
added a new option 'a53cc' (on by default, as in libx264) for rendering
AV_FRAME_DATA_A53_CC as hevc sei payloads.
the code is a blend of the libx265.c code for writing
AV_FRAME_DATA_SEI_UNREGISTERED with the libx264.c code for writing atsc
a/53 payloads.
Up until now, the ClearVideo decoder separates parsing tiles
and actually using the parsed information: The information is
instead stored in structures which are constantly allocated
and freed. This commit changes this to use the information
immediately, avoiding said allocations. This e.g. reduced
the amount of allocations for [1] from 2,866,462 to 24,720.
For said sample decoding speed improved by 143%.
[1]: https://samples.ffmpeg.org/V-codecs/UCOD/AccordianDance-300.avi
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
postProcess in postprocess_template.c copies a PPContext
to the stack, works with this copy and then copies
it back again. Said local copy uses a hardcoded alignment
of eight, although PPContext has alignment 32 since
cbe27006ce
(this commit was in anticipation of AVX2 code that never landed).
This leads to misalignment in the filter-(pp|pp1|pp2|pp3|qp)
FATE-tests which UBSan complains about. So avoid the local copy.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
postprocess.c currently has C, MMX, MMXEXT, 3DNow as well as
SSE2 versions of its internal functions. But given that only
ancient 32-bit x86 CPUs don't support SSE2, the MMX, MMXEXT
and 3DNow versions are obsolete and are therefore removed by
this commit. This saves about 56KB here.
(The SSE2 version in particular is not really complete,
so that it often falls back to MMXEXT (which means that
there were some identical (apart from the name) MMXEXT
and SSE2 functions; this duplication no longer exists
with this commit.)
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The input data is multiplied by `s->offset` to get normalized output.
`s->target_tp` and `true_peak` is not in dB,
so `s->offset` should be calculated by division instead of subtraction.
Signed-off-by: Rui Zhu <real.zhurui@gmail.com>
Should fix fate failures on Windowx x86 targets, where long is 32 bits.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: James Almer <jamrial@gmail.com>
The amount of lines printed is too high for the verbose level, and the debug
level is a better fit for their content.
Signed-off-by: James Almer <jamrial@gmail.com>