texture() uses bilinear scaling; imageLoad() accesses the image directly.
The reason why texture() was used throughout Vulkan filters is that
back when they were written, they were targetting old Intel hardware,
which had a texel cache only for sampled images.
These days, GPUs have a generic cache that doesn't care what source it
gets populated with. Additionally, bypassing the sampling circuitry saves
us some performance.
Finally, all the old texture() code had an issue where unnormalized
coordinates were used, but an offset of 0.5 was not added, hence each
pixel ended up being interpolated. This fixes this.
Remove a for loop and make it easy to extend to support other types
of scalability. Move ScalabilityMask to hevc header file so it can
be used in hevc decoder.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The limit is based on later code storing 32bits
Fixes: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
Fixes: 393164866/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_LATM_fuzzer-4606798354513920
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
types and SFO become confused for a USAC stream
Fixes: out of array access
Fixes: 383854203/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_LATM_fuzzer-4996677847547904.fuzz
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This commit adds two logging flags: 'time' and 'datetime'.
Usage:
ffmpeg -loglevel +time
or
ffmpeg -loglevel +datetime
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In the fail: block of decode_nal_units, a check as to whether fc->ref is
nonzero is used. Before this patch, fc->ref was set to NULL in
frame_context_setup. The issue is that, by the time frame_context_setup
is called, falliable functions (namely slices_realloc and
ff_vvc_decode_frame_ps) have already been called. Therefore, there
could arise a situation in which the fc->ref test of decode_nal_units'
fail: block is performed while fc->ref has an invalid value. This seems
to be particularly prevalent in situations where the FrameContexts are
being reused. The patch resolves the issue by moving the assignment of
fc->ref to NULL to the very top of decode_nal_units, before any falliable
functions are called.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Parameter sets may be coded in the packet before an IRAP (as is the case for
the hev1 ISO-BMFF brand), and they should have priority as they may override
the extradata ones.
As such, prepend the extradata PS NALUs to the packet PS NALUs if they are
present before an IRAP, instead of prepending them to the IRAP slice.
Should fix ticket #11458.
Signed-off-by: James Almer <jamrial@gmail.com>
On a Zen 5, on Ubuntu 24.04 (with CLOCKS_PER_SEC 1000000), the
value of clock() in this loop increments by 0 most of the time,
and when it does increment, it usually increments by 1 compared
to the previous round.
Due to the "last_t + 2*last_td + (CLOCKS_PER_SEC > 1000) >= t"
expression, we only manage to take one step forward in this loop
(incrementing i) if clock() increments by 2, while it incremented
by 0 in the previous iteration (last_td).
This is similar to the change done in
c4152fc42e, to speed it up on
systems with very small CLOCKS_PER_SEC. However in this case,
CLOCKS_PER_SEC is still very large, but the machine is fast enough
to hit every clock increment repeatedly.
For this case, use the number of repetitions of each timer value
as entropy source; require a change in the number of repetitions
in order to proceed to the next buffer index.
This helps the fate-random-seed test to actually terminate within
a reasonable time on such a system (where it previously could hang,
running for many minutes).
Signed-off-by: Martin Storsjö <martin@martin.st>
t_info.keyframe_granule_shift is set to the library default of 6, which is ok
for gop sizes up to 63. Since there's apparently no way to query the updated
value after having forced a gop value with TH_ENCCTL_SET_KEYFRAME_FREQUENCY_FORCE,
calculate it manually instead.
Fixes ticket #11454.
Signed-off-by: James Almer <jamrial@gmail.com>
This also ensures the layout set during the indev init is used instead of the
blank one in st->codecpar.
Signed-off-by: James Almer <jamrial@gmail.com>
Some files keep extra metadata such as 'name' fields within udta, and
it is useful for Wine to access them with the "export_all" option so
they can then be exposed to Windows applications.
Signed-off-by: Rémi Bernon <rbernon@codeweavers.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
4b77a0a681 added a new consumer of ff_scale_adjust_dimensions
which was recently modified to allow for square pixel output.
This commit extends the new option to vpp_amf, and unbreaks the building
of vf_amf_common.c
This change removes one extra floating point operation and simplifies
load operations at the beginning of the loop by using dedicated register
for each of the 5 pointers and interleaving it with calculations. The
first case seems to be a bit slower, but the performance increase is
substantial in the other two.
A78 before:
postfilter_15_neon: 1684.8 ( 4.23x)
postfilter_512_neon: 1395.5 ( 5.10x)
postfilter_1022_neon: 1357.0 ( 5.25x)
After:
postfilter_15_neon: 1742.2 ( 4.09x)
postfilter_512_neon: 1169.8 ( 6.09x)
postfilter_1022_neon: 1160.0 ( 6.12x)
A72 before:
postfilter_15_neon: 3144.8 ( 2.39x)
postfilter_512_neon: 3141.2 ( 2.39x)
postfilter_1022_neon: 3230.0 ( 2.33x)
After:
postfilter_15_neon: 2847.8 ( 2.64x)
postfilter_512_neon: 2877.8 ( 2.61x)
postfilter_1022_neon: 2837.2 ( 2.65x)
x13s before:
postfilter_15_neon: 1615.4 ( 2.61x)
postfilter_512_neon: 963.1 ( 4.39x)
postfilter_1022_neon: 963.6 ( 4.39x)
After:
postfilter_15_neon: 1749.6 ( 2.41x)
postfilter_512_neon: 707.1 ( 5.97x)
postfilter_1022_neon: 706.1 ( 5.99x)
Signed-off-by: Martin Storsjö <martin@martin.st>
Previously, we read elements from ff_aac_pow34sf_tab; however
that table is initialized to zero; one needs to call
ff_aac_float_common_init() to make sure that the table is
initialized.
However, given the range of the input values, a large number of
entries in ff_aac_pow34sf_tab would give results outside of the
range for signed 32 bit integers. As the largest aac_cb_maxval
entry is 16, it seems more reasonable to produce values within
an order of mangitude of that value.
(When hitting INT_MIN, implementations may end up with different
results depending on whether the value is negated as a float or
as an int. This corner case is irrelevant in practice as this
is way outside of the expected value range here.)
Coincidentally, this fixes linking checkasm with Apple's older
linker. (In Xcode 15, Apple switched to a new linker. The one in
older toolchains seems to have a bug where it won't figure out to
load object files from a static library, if the only symbol
referenced in the object file is a "common" symbol, i.e. one for
a zero-initialized variable. This issue can also be reproduced with
newer Apple toolchains by passing -Wl,-ld_classic to the linker.)
Signed-off-by: Martin Storsjö <martin@martin.st>
Previously, we would do OR with the sign bit, forcing the output
to a negative value, while we want to negate it, by inverting the
sign bit.
Signed-off-by: Martin Storsjö <martin@martin.st>
For anamorphic videos, enabling this option leads to adjustment of
output dimensions to obtain square pixels when the user requests
proportional scaling through either of the w/h expressions or
force_original_aspect_ratio.
Output SAR is always reset to 1.
Option added to scale, scale_cuda, scale_npp & scale_vaapi.
libplacebo already has a similar option with different semantics,
scale_vt and scale_vulkan don't implement force_oar, so for these
three filters, I've made minimal changes needed to not break building
or change output.
IVTV, a Linux driver for TV tuners, and V4L2 utilize
a coding named after IVTV to carry sliced VBI data
in MPEG streams produced by tuner cards with
VBI capture capability and an MPEG-2 encoder SoC.
IVTV or V4L2 driver will transport the coded data into a
MPEG-PS private stream ("IVTV") that can be captured
from the card alongside the video/audio.
The data could include:
EIA-608, Teletext, WSS (PAL widescreen signaling),
or VPS (PAL VCR signaling).
Signed-off-by: Marth64 <marth64@proxyid.net>
Currently, if there is a hardware encode failure, the numeric
error code will be printed making it somewhat hard to get to
the root cause of the issue. Print the readable message generated
by av_err2str() instead.
Signed-off-by: Marth64 <marth64@proxyid.net>
If a malformed chunk like sBIT appears but otherwise the stream
is still parseable, we should print a warning and skip it rather
than failing with an error.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Using audio_substream_id for AVStream ids is not ideal give that in containers
like mp4, the IAMF structure is opaque to the outside and other streams may
share such id values.
Signed-off-by: James Almer <jamrial@gmail.com>