The penultimate loop iteration could pick any vl such that:
vlenb/4 < vl <= vlenb/2
Thus if the total length is not a multiple of vlenb/2, the vfadd.vf
on the penultimate iteration would yield corrupt values for the last
iteration.
To avoid this, force vl = vlen/2 until the last iteration. Unfortunately
this latent bug is not reproducible with either hardware or QEMU as of now.
This skips the round-trip to scalar register for the sliding 'x'
coefficients, improving performance by about 5%. The trick here is that
the vector slide-up instruction preserves elements in destination vector
until the slide offset.
The switch from vfslide1up.vf to vslideup.vi also allows the elimination
of data dependencies on consecutive slides. Since the specifications
recommend sticking to power of two offsets, we could slide as follows:
vslideup.vi v8, v0, 2
vslideup.vi v4, v0, 1
vslideup.vi v12, v8, 1
vslideup.vi v16, v8, 2
However in the device under test, this seems to make performance slightly
worse, so this is left for (in)validation with future better hardware.
Long is 32 bits signed on Windows, and nb_stream{s,_groups} are both unsigned
int. In a realistic scenario this wont make a difference, but it's still
proper.
Also ensure the parsed string is an integer while at it.
Signed-off-by: James Almer <jamrial@gmail.com>
It caused lacking a public declaration build error with
-Werror=missing-prototypes.
Since DXGI_FORMAT is moved to public since patch set V10, this function
is no longer useful. Now remove it.
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This fixes the following build error:
src/libavcodec/d3d12va_decode.c:49:10: error: no previous prototype for function
'ff_d3d12va_get_surface_index' [-Werror,-Wmissing-prototypes]
49 | unsigned ff_d3d12va_get_surface_index(const AVCodecContext *avctx,
| ^
Signed-off-by: Martin Storsjö <martin@martin.st>
This disables regenerating ffversion.h whenever the checked out
git commit changes, speeding up development rebuilds.
Whenever this option is set, force the version to be printed as
"unknown" rather than showing potentially stale information.
Signed-off-by: Martin Storsjö <martin@martin.st>
Same as d3d11va, this flag enables main still picture profile for
d3d12va. User should add this flag when decoding main still picture
profile.
Signed-off-by: Tong Wu <tong1.wu@intel.com>
The implementation is based on:
https://learn.microsoft.com/en-us/windows/win32/medfound/direct3d-12-video-overview
With the Direct3D 12 video decoding support, we can render or process
the decoded images by the pixel shaders or compute shaders directly
without the extra copy overhead, which is beneficial especially if you
are trying to render or post-process a 4K or 8K video.
The command below is how to enable d3d12va:
ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
In close_output(), a dummy frame is created with format NONE passed
to enc_open(), which isn't prepared for it. The NULL pointer
dereference happened at
av_pix_fmt_desc_get(enc_ctx->pix_fmt)->comp[0].depth.
When fgt.graph is NULL, skip fg_output_frame() since there is
nothing to output.
frame #0: 0x0000005555bc34a4 ffmpeg_g`enc_open(opaque=0xb400007efe2db690, frame=0xb400007efe2d9f70) at ffmpeg_enc.c:235:44
frame #1: 0x0000005555bef250 ffmpeg_g`enc_open(sch=0xb400007dde2d4090, enc=0xb400007e4e2daad0, frame=0xb400007efe2d9f70) at ffmpeg_sched.c:1462:11
frame #2: 0x0000005555bee094 ffmpeg_g`send_to_enc(sch=0xb400007dde2d4090, enc=0xb400007e4e2daad0, frame=0xb400007efe2d9f70) at ffmpeg_sched.c:1571:19
frame #3: 0x0000005555bee01c ffmpeg_g`sch_filter_send(sch=0xb400007dde2d4090, fg_idx=0, out_idx=0, frame=0xb400007efe2d9f70) at ffmpeg_sched.c:2154:12
frame #4: 0x0000005555bcf124 ffmpeg_g`close_output(ofp=0xb400007e4e2d85b0, fgt=0x0000007d1790eb08) at ffmpeg_filter.c:2225:15
frame #5: 0x0000005555bcb000 ffmpeg_g`fg_output_frame(ofp=0xb400007e4e2d85b0, fgt=0x0000007d1790eb08, frame=0x0000000000000000) at ffmpeg_filter.c:2317:16
frame #6: 0x0000005555bc7e48 ffmpeg_g`filter_thread(arg=0xb400007eae2ce7a0) at ffmpeg_filter.c:2836:15
frame #7: 0x0000005555bee568 ffmpeg_g`task_wrapper(arg=0xb400007d8e2db478) at ffmpeg_sched.c:2200:21
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
From AOSP doc, these values are device and codec specific, but lower
values generally result in more efficient (smaller-sized) encoding.
For example, global_quality 50 on Pixel 6 results a 1080P 30 FPS
HEVC with 3744 kb/s, while global_quality 80 results 28178 kb/s.
Fix#10689
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The ffmpeg coding style doesn't usually use const on scalar
parameters (or on the pointer values - as opposed to the type
that is pointed to, where it has a semantic meaning), contrary
to the dav1d coding style (where this was imported from).
This avoids warnings about differences in the type signatures
between declaration and definition of this function, with older
versions of MSVC.
The issue was observed with one version of MSVC 2017,
19.16.27024.1, with warnings like these:
src/tests/checkasm/checkasm.c(969): warning C4028: formal parameter 3 different from declaration
The warning itself is bogus as the const here is harmless, and
newer versions of MSVC no longer warn about this.
Signed-off-by: Martin Storsjö <martin@martin.st>
This can be used to run tests multple times, with e.g. differing
QEMU settings, by adding something like this to the FATE configuration
file:
target_exec="qemu-aarch64-static"
fate_targets="fate-checkasm fate-cpu"
fate_environments="sve128 sve256 sve512"
sve128_env="QEMU_CPU=max,sve128=on"
sve256_env="QEMU_CPU=max,sve256=on"
sve512_env="QEMU_CPU=max,sve512=on"
It's also possible to customize the target_exec command further
by injecting a sufficiently quoted variable into it, which then can
be updated for each run, e.g. target_exec="\$(CUR_EXEC_CMD)".
For each of the environment names in fate_environments, the tests
that are run get the name suffixed on the fate tests in the
test log and fate report, e.g. "fate-checkasm-h264dsp_sve128".
Signed-off-by: Martin Storsjö <martin@martin.st>
This can be useful if doing testing of uncommon CPU extensions by
running tests with QEMU (by configuring with e.g.
"target_exec=qemu-aarch64"), by only running the checkasm tests,
to get a reasonable test coverage without excessive test runtime.
For such a config, setting fate_targets="fate-checkasm fate-cpu"
can be a good tradeoff.
Signed-off-by: Martin Storsjö <martin@martin.st>
Reduce false positives for VVC files by adding additional checks in
`vvc_probe`. Specifically, `nuh_temporal_id_plus1` is tested for valid
values in extra cases depending on the NAL unit type, as per ITU-T H.266
section 7.4.2.2.
Resolves trac #10703.
Signed-off-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Converting from an integer to HWND (which is a pointer) requires
an explicit cast, otherwise Clang errors out like this:
src/libavdevice/gdigrab.c:280:14: error: incompatible integer to pointer conversion assigning to 'HWND' (aka 'struct HWND__ *') from 'long' [-Wint-conversion]
280 | hwnd = strtol(name, &p, 0);
| ^ ~~~~~~~~~~~~~~~~~~~
(With GCC and MSVC, this was a mere warning, but with recent Clang,
this is an error.)
After adding a cast, all compilers also warn something like this:
src/libavdevice/gdigrab.c:280:16: warning: cast to 'HWND' (aka 'struct HWND__ *') from smaller integer type 'long' [-Wint-to-pointer-cast]
280 | hwnd = (HWND) strtol(name, &p, 0);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
On Windows, long types are 32 bit, so to get a usable pointer, we
need to use long long. And interpret it as unsigned long long
while at it - i.e. using strtoull.
Finally, right above it, the code triggered the following warning:
src/libavdevice/gdigrab.c:278:15: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
278 | char *p;
| ^
Signed-off-by: Martin Storsjö <martin@martin.st>
With the way the runtime checks are currently set up, every single
openh264 release, no matter how minor, is considered an ABI break and
requires ffmpeg recompilation. This is unnecessarily strict because it
doesn't allow downstream distributions to ship any openh264 bug fix
version updates without breaking ffmpeg's openh264 support.
Years ago, at the time when ffmpeg's openh264 support was merged,
openh264 releases were done without a versioned soname (the library was
just libopenh264.so, unversioned). Since then, starting with version
1.3.0, openh264 has started using versioned sonames and the intent has
been to bump the soname every time there's a new release with an ABI
change.
This patch drops the exact version check and instead adds a minimum
requirement on 1.3.0 to the configure script.
Signed-off-by: Kalev Lember <klember@redhat.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
During a resampling operation where
1) user has specified first_pts
2) SWR_FLAG_RESAMPLE is not set initially (directly or otherwise)
3) first_pts has been fulfilled (always using hard compensation)
then upon first encountering a delay where a soft compensation is
required, swr_set_compensation will lead to another init of swr which
will reset outpts to the specified firstpts thus leading to an output
frame having its pts = firstpts. When the next input frame is received,
swr will see a large delay and inject silence from firstpts to the
current frame's pts. This can lead to severe desync and in worst case,
loss of audio playback.
Parameter firstpts initialized to AV_NOPTS_VALUE in swr_alloc and then
checked in swr_init to avoid resetting outpts, thus avoiding reapplication
of firstpts.
Fixes#4131.