This allows JPEG XL images to be recognized as valid attachments.
Since JPEG is already widely used for cover art, JXL's support for
lossless JPEG transcodes can decrease the total size of music collections.
This fixes JXL cover art rendering in applications like mpv which rely
on FFmpeg for demuxing.
Signed-off-by: jade <heartstopp1ng@proton.me>
mf_encv_input_adjust() currently only validates the pixel format and
otherwise leaves the input IMFMediaType unchanged. The Microsoft
H.264, H.265 and AV1 encoder MFTs tolerate this and internally infer
the missing attributes from the previously-set output type. Other
MediaFoundation encoder MFTs that follow the specification more
strictly reject the input type with MF_E_INVALIDMEDIATYPE (due to
MF_E_ATTRIBUTENOTFOUND on MF_MT_FRAME_SIZE / MF_MT_FRAME_RATE) when
those attributes are absent, which causes IMFTransform::SetInputType
to fail and aborts encoding.
Set MF_MT_FRAME_SIZE, MF_MT_FRAME_RATE and MF_MT_INTERLACE_MODE on
the input media type, mirroring what mf_encv_output_adjust() already
writes to the output type. Behaviour on the Microsoft MFTs is
unchanged (they were already using these values) and encoding now
works with stricter third-party MFTs.
The MF_MT_FRAME_SIZE assignment has been present but commented out
since the original MediaFoundation wrapper was added in 050b72ab5e.
Signed-off-by: Ashrit Shetty <ashritshetty@microsoft.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Monochrome formats (gray, gray10le) have log2_chroma_w == 0 and
log2_chroma_h == 0 because they have no chroma planes — the same
values as YUV444. This caused them to be misclassified as YUV444 by
the is_yuv444 detection introduced in bcea693f75.
After fed6612415 changed cuvid_test_capabilities to use is_yuv444
instead of hardcoding cudaVideoChromaFormat_420, monochrome AV1
streams were rejected with "Codec av1_cuvid is not supported with
this chroma format".
Add an nb_components > 1 guard to exclude single-component formats
from the YUV444 path.
Patch by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
AV1CodecConfigurationRecord may contain only the 4-byte header and no
configOBUs. Still skip the header in that case so only configOBUs are
passed to cuvidParseVideoData().
Otherwise the av1C header itself is treated as sequence header data
and AV1 decoding can fail with an unknown error.
Suggested-by: Aniket Dhok <adhok@nvidia.com>
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
3-tap [1,2,1]>>2: shared implementation body across size-specialized
entry points (8x8/16x16/32x32) to reduce code size. Fold the 3-tap
kernel into uhadd + urhadd: uhadd gives floor((prev+next)/2), then
urhadd rounds with curr to produce (prev + 2*curr + next + 2) >> 2
on 16 bytes in-place (no widen/narrow needed). Overlap-last technique
for tail avoids partial stores. Caller pads input arrays by 16 bytes
to guarantee safe over-read.
Strong smoothing (32x32): preloaded weight tables, interleaved
umull/umlal pairs (two 16-byte blocks at a time) to hide
rshrn-to-store latency, with paired st1 for 32-byte writes.
checkasm --bench --runs=15 (Apple M4, average of 3 trials):
ref_filter_3tap_8x8_8_neon: 4.1x
ref_filter_3tap_16x16_8_neon: 3.3x
ref_filter_3tap_32x32_8_neon: 2.5x
ref_filter_strong_8_neon: 1.9x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Test 3-tap for 8x8/16x16/32x32 (both filtered_left and
filtered_top outputs). Test strong smoothing for filtered_top
and in-place left modification.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Extract 3-tap [1,2,1]>>2 and strong intra smoothing from
intra_pred() into HEVCPredContext function pointers, preparing
for arch-specific overrides.
ref_filter_3tap[3] indexed by log2_size - 3 (sizes 8/16/32).
ref_filter_strong for 32x32 luma only.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Deprecating MMX w/o performance regression; nearly identical performance
numbers on my Zen 4 (1.99x vs c)
Signed-off-by: Zuxy Meng <zuxy.meng@gmail.com>
This patch adds the transpose_cuda video filter.
It's similar to the existing transpose filter but accelerated by CUDA.
It supports the same pixel formats as the scale_cuda filter.
This also supersedes the deprecated transpose_npp filter.
Example usage:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i <INPUT> -vf "transpose_cuda=dir=clock" <OUTPUT>
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Add a top-level title and demote former section headings (MD041-style hierarchy).
Add blank lines around headings and fenced code blocks where appropriate (MD022 and MD031-style). Some Markdown parsers, including kramdown, only recognize headings that are preceded by a blank line.
Use a top-level heading on the first line (MD041-style) and adjust section levels for clearer document structure. Improves navigation for assistive technologies that rely on heading outlines.
This muxer seems to intend to support output that does
not begin at zero (instead of e.g. just hardcoding
nb_frames_pos to 16). But then it is possible
that avio_seek() returns values > INT_MAX even
though the part of the file written by us can not
exceed this value. So the return value of avio_seek()
needs to be checked as 64bit integer and not silently
truncated to int.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The number of streams is always one (namely one video stream
with codec id AV_CODEC_ID_PDV) due to the MAX_ONE_OF_EACH,
ONLY_DEFAULT_CODECS flags. Also, the generic code (init_muxer()
in mux.c) checks that video streams have proper dimensions set.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This adds a NEON-optimized function for computing 32x32 Sum of Absolute
Differences (SAD) on AArch64, addressing a gap where x86 had SSE2/AVX2
implementations but AArch64 lacked equivalent coverage.
The implementation mirrors the existing sad8 and sad16 NEON functions,
employing a 4-row unrolled loop with UABAL and UABAL2 instructions for
efficient load-compute interleaving, and four 8x16-bit accumulators to
handle the wider 32-byte rows.
Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge) using checkasm:
sad_32x32_0: C 146.4 cycles -> NEON 98.1 cycles (1.49x speedup)
sad_32x32_1: C 141.4 cycles -> NEON 98.9 cycles (1.43x speedup)
sad_32x32_2: C 140.7 cycles -> NEON 95.0 cycles (1.48x speedup)
Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>
This incorrectly lists the libavcodec major version as 60 instead of
62. Also fix the date and commit hash while at it
Fixes: 7faa6ee2aa ("libavformat/matroska: Support smpte 2094-50 metadata")
Signed-off-by: llyyr <llyyr.public@gmail.com>
This change should improve performance on Skylake and later
Intel CPUs (which have only half the ports for saturated adds/subs
for mmx register compared to xmm register): llvm-mca predicts
a 25% performance improvement on Skylake.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>