Theora signals "Output last frame again" with an empty packet.
To keep the behaviour of 18f24527eb of ignoring side data only packets, as
generated by the FLAC encoder to propagate updated extradata, also check for
pkt->side_data_size to choose wheter to mux the paket or not.
Fixes part of ticket #11451.
Signed-off-by: James Almer <jamrial@gmail.com>
If sc->tts_count is not 0, then the sample index has already been built.
Fixes: Null-dereference READ
Fixes: 396192874/clusterfuzz-testcase-minimized-audio_decoder_fuzzer-4589309789143040
Signed-off-by: James Almer <jamrial@gmail.com>
The scalar loop is replaced with masked AVX512 instructions.
For extracting the Y from UYVY, vperm2b is used instead of
various AND and packuswb.
Instead of loading the vectors with interleaved lanes as done
in AVX2 version, normal load is used. At the end of packuswb,
for U and V, an extra permute operation is done to get the
required layout.
AMD 7950x Zen 4 benchmark data:
uyvytoyuv422_c: 29105.0 ( 1.00x)
uyvytoyuv422_sse2: 3888.0 ( 7.49x)
uyvytoyuv422_avx: 3374.2 ( 8.63x)
uyvytoyuv422_avx2: 2649.8 (10.98x)
uyvytoyuv422_avx512icl: 1615.0 (18.02x)
Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
texture() uses bilinear scaling; imageLoad() accesses the image directly.
The reason why texture() was used throughout Vulkan filters is that
back when they were written, they were targetting old Intel hardware,
which had a texel cache only for sampled images.
These days, GPUs have a generic cache that doesn't care what source it
gets populated with. Additionally, bypassing the sampling circuitry saves
us some performance.
Finally, all the old texture() code had an issue where unnormalized
coordinates were used, but an offset of 0.5 was not added, hence each
pixel ended up being interpolated. This fixes this.
Remove a for loop and make it easy to extend to support other types
of scalability. Move ScalabilityMask to hevc header file so it can
be used in hevc decoder.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The limit is based on later code storing 32bits
Fixes: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
Fixes: 393164866/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_LATM_fuzzer-4606798354513920
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
types and SFO become confused for a USAC stream
Fixes: out of array access
Fixes: 383854203/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_LATM_fuzzer-4996677847547904.fuzz
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This commit adds two logging flags: 'time' and 'datetime'.
Usage:
ffmpeg -loglevel +time
or
ffmpeg -loglevel +datetime
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In the fail: block of decode_nal_units, a check as to whether fc->ref is
nonzero is used. Before this patch, fc->ref was set to NULL in
frame_context_setup. The issue is that, by the time frame_context_setup
is called, falliable functions (namely slices_realloc and
ff_vvc_decode_frame_ps) have already been called. Therefore, there
could arise a situation in which the fc->ref test of decode_nal_units'
fail: block is performed while fc->ref has an invalid value. This seems
to be particularly prevalent in situations where the FrameContexts are
being reused. The patch resolves the issue by moving the assignment of
fc->ref to NULL to the very top of decode_nal_units, before any falliable
functions are called.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Parameter sets may be coded in the packet before an IRAP (as is the case for
the hev1 ISO-BMFF brand), and they should have priority as they may override
the extradata ones.
As such, prepend the extradata PS NALUs to the packet PS NALUs if they are
present before an IRAP, instead of prepending them to the IRAP slice.
Should fix ticket #11458.
Signed-off-by: James Almer <jamrial@gmail.com>
On a Zen 5, on Ubuntu 24.04 (with CLOCKS_PER_SEC 1000000), the
value of clock() in this loop increments by 0 most of the time,
and when it does increment, it usually increments by 1 compared
to the previous round.
Due to the "last_t + 2*last_td + (CLOCKS_PER_SEC > 1000) >= t"
expression, we only manage to take one step forward in this loop
(incrementing i) if clock() increments by 2, while it incremented
by 0 in the previous iteration (last_td).
This is similar to the change done in
c4152fc42e, to speed it up on
systems with very small CLOCKS_PER_SEC. However in this case,
CLOCKS_PER_SEC is still very large, but the machine is fast enough
to hit every clock increment repeatedly.
For this case, use the number of repetitions of each timer value
as entropy source; require a change in the number of repetitions
in order to proceed to the next buffer index.
This helps the fate-random-seed test to actually terminate within
a reasonable time on such a system (where it previously could hang,
running for many minutes).
Signed-off-by: Martin Storsjö <martin@martin.st>
t_info.keyframe_granule_shift is set to the library default of 6, which is ok
for gop sizes up to 63. Since there's apparently no way to query the updated
value after having forced a gop value with TH_ENCCTL_SET_KEYFRAME_FREQUENCY_FORCE,
calculate it manually instead.
Fixes ticket #11454.
Signed-off-by: James Almer <jamrial@gmail.com>
This also ensures the layout set during the indev init is used instead of the
blank one in st->codecpar.
Signed-off-by: James Almer <jamrial@gmail.com>
Some files keep extra metadata such as 'name' fields within udta, and
it is useful for Wine to access them with the "export_all" option so
they can then be exposed to Windows applications.
Signed-off-by: Rémi Bernon <rbernon@codeweavers.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
4b77a0a681 added a new consumer of ff_scale_adjust_dimensions
which was recently modified to allow for square pixel output.
This commit extends the new option to vpp_amf, and unbreaks the building
of vf_amf_common.c
This change removes one extra floating point operation and simplifies
load operations at the beginning of the loop by using dedicated register
for each of the 5 pointers and interleaving it with calculations. The
first case seems to be a bit slower, but the performance increase is
substantial in the other two.
A78 before:
postfilter_15_neon: 1684.8 ( 4.23x)
postfilter_512_neon: 1395.5 ( 5.10x)
postfilter_1022_neon: 1357.0 ( 5.25x)
After:
postfilter_15_neon: 1742.2 ( 4.09x)
postfilter_512_neon: 1169.8 ( 6.09x)
postfilter_1022_neon: 1160.0 ( 6.12x)
A72 before:
postfilter_15_neon: 3144.8 ( 2.39x)
postfilter_512_neon: 3141.2 ( 2.39x)
postfilter_1022_neon: 3230.0 ( 2.33x)
After:
postfilter_15_neon: 2847.8 ( 2.64x)
postfilter_512_neon: 2877.8 ( 2.61x)
postfilter_1022_neon: 2837.2 ( 2.65x)
x13s before:
postfilter_15_neon: 1615.4 ( 2.61x)
postfilter_512_neon: 963.1 ( 4.39x)
postfilter_1022_neon: 963.6 ( 4.39x)
After:
postfilter_15_neon: 1749.6 ( 2.41x)
postfilter_512_neon: 707.1 ( 5.97x)
postfilter_1022_neon: 706.1 ( 5.99x)
Signed-off-by: Martin Storsjö <martin@martin.st>