The input is laid out in 16 segments, of which 13 actually need to be
loaded. There are no really efficient ways to deal with this:
1) If we load 8 segments wit unit stride, then narrow to 16 segments with
right shifts, we can only get one half-size vector per segment, or just 2
elements per vector (EMUL=1/2) - at least with 128-bit vectors.
This ends up unsurprisingly about as fas as the C code.
2) The current approach is to load with strides. We keep that approach,
but improve it using three 4-segmented loads instead of 12 single-segment
loads. This divides the number of distinct loaded addresses by 4.
3) A potential third approach would be to avoid segmentation altogether
and splat the scalar coefficient into vectors. Then we can use a
unit-stride and maximum EMUL. But the downside then is that we have to
multiply the 3 (of 16) unused segments with zero as part of the
multiply-accumulate operations.
In addition, we also reuse vectors mid-loop so as to increase the EMUL
from 1 to 2, which also improves performance a little bit.
Oeverall the gains are quite small with the device under test, as it does
not deal with segmented loads very well. But at least the code is tidier,
and should enjoy bigger speed-ups on better hardware implementation.
Before:
ps_hybrid_analysis_c: 1819.2
ps_hybrid_analysis_rvv_f32: 1037.0 (before)
ps_hybrid_analysis_rvv_f32: 990.0 (after)
This stores the constant coefficients deinterleaved, so that they can be
loaded directly with NF=0. Unfortunately, we cannot optimise loading the
input, due to insufficient memory alignment (not 32-bit).
Before:
g722_apply_qmf_c: 82.5
g722_apply_qmf_rvv_i32: 78.2
After:
g722_apply_qmf_c: 82.5
g722_apply_qmf_rvv_i32: 65.2
Validate that a hw_frames_ctx is available before using it for
the AVHWAccel.free_frame_priv callback, and don't require it to
be present when the callback is not in use by the HWAccel.
v2: check for free_frame_priv (Hendrik)
v3: return EINVAL (Christoph Reiter)
v4: better commit message (Hendrik)
v5: fix typo with missed frames_ctx (Lynne)
See[1]: https://github.com/msys2/MINGW-packages/pull/19050
Fixes: be07145109 ("avcodec: add AVHWAccel.free_frame_priv callback")
CC: Lynne <dev@lynne.ee>
CC: Christoph Reiter <reiter.christoph@gmail.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
For fate-h264_mp4toannexb_ticket5927 and
fate-h264_mp4toannexb_ticket5927_2, they work by accident
previously. The sample file has two 'avc1' entries, and video
samples use the second one. It means packets should be decoded with
new extradata in side data. Before this patch, only extradata was
kept in the output, new extradata has been dropped. The output can
be decoded because the two extradata are almost the same, except
level indication. This patch fixed the issue, and add another
fate test.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
If there is a single group of SPS/PPS before an IDR frame, but no
SPS/PPS after that, we will miss the chance to reset
idr_sps_seen/idr_pps_seen. No SPS/PPS are inserted afterwards.
This patch saves in-band SPS/PPS and insert them before IDR frames
when necessary.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
start_code_size depends on whether PS comes from out-of-band or
in-band. Make the code more readable.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
This avoids SEI and IDR recovery flags affecting each other
Also eliminate litteral numbers from recovery handling
This should make the code clearer
Improves: tickets/4738/tickets_cut.ts
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In this case, the inner loop computing the scalar product can be reduced
to just one multiplication and one sum even with 128-bit vectors. The
result is a lot simpler, but also brings more modest performance gains:
flac_lpc_16_13_c: 15241.0
flac_lpc_16_13_rvv_i32: 11230.0
flac_lpc_16_16_c: 17884.0
flac_lpc_16_16_rvv_i32: 12125.7
flac_lpc_16_29_c: 27847.7
flac_lpc_16_29_rvv_i32: 10494.0
flac_lpc_16_32_c: 30051.5
flac_lpc_16_32_rvv_i32: 10355.0
The entire set of 32 coefficients and corresponding past 32 samples can
fit in a single vector (with LMUL=8) exactly, but... since widening
double the needed vector sizes, we still end up too short with 128-bit
vectors. This adds a very simple version for future 256+-bit hardware,
and for pred_orders values up to 16, and a bit more involved loop for
for 128-bit hardware with pred_orders between 17 and 32.
With 128-bit hardware, the benchmarks look like this:
flac_lpc_32_13_c: 30152.0
flac_lpc_32_13_rvv_i32: 10244.7
flac_lpc_32_16_c: 37314.2
flac_lpc_32_16_rvv_i32: 10126.2
flac_lpc_32_29_c: 61910.0
flac_lpc_32_29_rvv_i32: 14495.2
flac_lpc_32_32_c: 68204.0
flac_lpc_32_32_rvv_i32: 13273.7
This commit adds support for VP8 bitstream read methods to the cbs
codec. This enables the trace_headers bitstream filter to support VP8,
in addition to AV1, H.264, H.265, and VP9. This can be useful for
debugging VP8 stream issues.
The CBS VP8 implements a simple VP8 boolean decoder using GetBitContext
to read the bitstream.
Only the read methods `read_unit` and `split_fragment` are implemented.
The write methods `write_unit` and `assemble_fragment` return the error
code AVERROR_PATCHWELCOME. This is because CBS VP8 write is unlikely to
be used by any applications at the moment. The write methods can be
added later if there is a real need for them.
TESTS: ffmpeg -i fate-suite/vp8/frame_size_change.webm -vcodec copy
-bsf:v trace_headers -f null -
Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This commit exports the `vp8_token_update_probs` variable to internal
library scope to facilitate its reuse within the library.
Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Better performance can probably be achieved with a more intricate
unrolled loop, but this is a start:
add_hfyu_left_pred_bgr32_c: 15084.0
add_hfyu_left_pred_bgr32_rvv_i32: 10280.2
This would actually be cleaner with the RISC-V P extension, but that is
not ratified yet (I think?) and usually not supported if V is supported.
With explicit unrolling, we can skip half of the sign bit flips, and
the compiler is then better able to optimise the scalar loop:
predictor_c: 31376.0 (before)
predictor_c: 23703.0 (after)
This is restricted to 128-bit vectors as larger vector sizes could read
past the end of the noise array. Support for future hardware with larger
vector sizes is left for some other time.
hf_apply_noise_0_c: 2319.7
hf_apply_noise_0_rvv_f32: 1229.0
hf_apply_noise_1_c: 2539.0
hf_apply_noise_1_rvv_f32: 1244.7
hf_apply_noise_2_c: 2319.7
hf_apply_noise_2_rvv_f32: 1232.7
hf_apply_noise_3_c: 2541.2
hf_apply_noise_3_rvv_f32: 1244.2
This commit fixes issue with missing SPS/PPS headers in video
encoded by AMF AV1 encoder.
Missing headers leads to broken seek in MPV video player.
Default value for property AV1_HEADER_INSERTION_MODE shouldn't be setup
to NONE (no headers insertion). We need to skip definition of this property,
because default value depends on USAGE property.
Signed-off-by: Dmitrii Ovchinnikov <ovchinnikov.dmitrii@gmail.com>
With 5 accumulator vectors and 6 inputs, this can only use LMUL=2.
Also the number of vector loop iterations is small, just 5 on 128-bit
vector hardware.
The vector loop is somewhat unusual in that it processes data in
descending memory order, in order to save on vector slides:
in descending order, we can extract elements to carry over to the next
iteration from the bottom of the vectors directly. With ascending order
(see in the Opus postfilter function), there are no ways to get the top
elements directly. On the downside, this requires the use of separate
shift and sub (the would-be SH3SUB instruction does not exist), with
a small pipeline stall on the vector load address.
The edge cases in scalar are done in scalar as this saves on loads
and remains significantly faster than C.
autocorrelate_c: 669.2
autocorrelate_rvv_f32: 421.0
Given the size of the data set, strided memory accesses cannot be avoided.
We can still do better than the current code.
ps_hybrid_synthesis_deint_c: 12065.5
ps_hybrid_synthesis_deint_rvv_i32: 13650.2 (before)
ps_hybrid_synthesis_deint_rvv_i64: 8181.0 (after)
Segmented loads may be slower than not. So this advantageously uses a
unit-strided load and narrowing shifts instead.
Before:
ps_add_squares_c: 60757.7
ps_add_squares_rvv_f32: 22242.5
After:
ps_add_squares_c: 60516.0
ps_add_squares_rvv_i64: 17067.7
Fixes: index -1 out of bounds for type 'CFrameBuffer [100]'
Fixes: 63877/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FOURXM_fuzzer-5854263397711872
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
PGMYUV seems to be always limited range. This was a format originally
invented by FFmpeg at a time when YUVJ distinguished limited from full
range YUV, and this codec never appeared to output YUVJ in any
circumstance, so hard-coding limited range preserves the status quo.
The other formats are explicitly documented to be full range RGB/gray
formats. That said, don't tag them yet, due to outstanding bugs w.r.t
grayscale formats and color range handling.
This change in behavior updates a bunch of FATE tests in trivial ways
(added tagging being the only difference).
Partially fixes ticket #798
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Peter Ross <pross@xvid.org>
This uses a more traditional approach allowing up processing of up to
period minus two elements per iteration. This also allows the algorithm
to work for all and any vector length.
As the T-Head C908 device under test can load 16 elements loop, there is
unsurprisingly a little performance drop when the period is minimal and
the parallelism is capped at 13 elements:
Before:
postfilter_15_c: 21222.2
postfilter_15_rvv_f32: 22007.7
postfilter_512_c: 20189.7
postfilter_512_rvv_f32: 22004.2
postfilter_1022_c: 20189.7
postfilter_1022_rvv_f32: 22004.2
After:
postfilter_15_c: 20189.5
postfilter_15_rvv_f32: 7057.2
postfilter_512_c: 20189.5
postfilter_512_rvv_f32: 5667.2
postfilter_1022_c: 20192.7
postfilter_1022_rvv_f32: 5667.2
As in the aligned case, we can use VLSE64.V, though the way of doing so
gets more convoluted, so the performance gains are more modest:
get_pixels_unaligned_c: 126.7
get_pixels_unaligned_rvv_i32: 145.5 (before)
get_pixels_unaligned_rvv_i64: 62.2 (after)
For the reference, those are the aligned benchmarks (unchanged) on the
same T-Head C908 hardware:
get_pixels_c: 126.7
get_pixels_rvi: 85.7
get_pixels_rvv_i64: 33.2
Fixes: out of array write
Fixes: 63520/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLIC_fuzzer-4876198087622656
Regression since: c7f8d42c12 (was not posted to ffmpeg-devel)
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
With 128-bit vectors, this is mostly pointless but also harmless.
Performance gains should be more noticeable with larger vector sizes.
neg_odd_64_c: 76.2
neg_odd_64_rvv_i64: 74.7
Up until now each thread had its own buffer pool for extradata
buffers when using frame-threading. Each thread can have at most
three references to extradata and in the long run, each thread's
bufferpool seems to fill up with three entries. But given
that at any given time there can be at most 2 + number of threads
entries used (the oldest thread can have two references to preceding
frames that are not currently decoded and each thread has its own
current frame, but there can be no references to any other frames),
this is wasteful. This commit therefore uses a single buffer pool
that is synced across threads.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, the VAAPI encoder uses fake data with the
AVBuffer-API: The data pointer does not point to real memory,
but is instead just a VABufferID converted to a pointer.
This has probably been copied from the VAAPI-hwcontext-API
(which presumably does it to avoid allocations).
This commit changes this without causing additional allocations
by switching to the RefStruct-pool API. This also fixes an
unchecked av_buffer_ref().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is in preparation for the following commit.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It involves less allocations, in particular no allocations
after the entry has been created. Therefore creating a new
reference from an existing one can't fail and therefore
need not be checked. It also avoids indirections and casts.
Also note that nvdec_decoder_frame_init() (the callback
to initialize new entries from the pool) does not use
atomics to read and replace the number of entries
currently used by the pool. This relies on nvdec (like
most other hwaccels) not being run in a truely frame-threaded
way.
Tested-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It involves less allocations and therefore has the nice property
that deriving a reference from a reference can't fail,
simplifying hevc_ref_frame().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It involves less allocations and therefore has the nice property
that deriving a reference from a reference can't fail.
This allows for considerable simplifications in
ff_h264_(ref|replace)_picture().
Switching to the RefStruct API also allows to make H264Picture
smaller, because some AVBufferRef* pointers could be removed
without replacement.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Very similar to the AVBufferPool API, but with some differences:
1. Reusing an already existing entry does not incur an allocation
at all any more (the AVBufferPool API needs to allocate an AVBufferRef).
2. The tasks done while holding the lock are smaller; e.g.
allocating new entries is now performed without holding the lock.
The same goes for freeing.
3. The entries are freed as soon as possible (the AVBufferPool API
frees them in two batches: The first in av_buffer_pool_uninit() and
the second immediately before the pool is freed when the last
outstanding entry is returned to the pool).
4. The API is designed for objects and not naked buffers and
therefore has a reset callback. This is called whenever an object
is returned to the pool.
5. Just like with the RefStruct API, custom allocators are not
supported.
(If desired, the FFRefStructPool struct itself could be made
reference counted via the RefStruct API; an FFRefStructPool
would then be freed via ff_refstruct_unref().)
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This makes the code more testable as uninitialized fields are 0
and not random values from the last call
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
These entries do not correspond to VLC symbols that can be used
they do corrupt various variables like min/max bits
This also no longer assumes that there is a single non subtable
entry
Probably fixes some infinite loops too
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: shift exponent 32 is too large for 32-bit type 'int'
Fixes: 63151/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5067531154751488
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 1900031961 + 553590817 cannot be represented in type 'int'
Fixes: 63061/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APE_fuzzer-5166188298371072
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The check is based on not infinite looping. It is likely
a more strict check can be done
Fixes: Infinite loop
Fixes: 62473/clusterfuzz-testcase-minimized-ffmpeg_BSF_EVC_FRAME_MERGE_fuzzer-5719883750703104
Fixes: 62765/clusterfuzz-testcase-minimized-ffmpeg_dem_EVC_fuzzer-6448531252314112
Fixes: 63378/clusterfuzz-testcase-minimized-ffmpeg_dem_MPEGPS_fuzzer-6504993844494336
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: "Dawid Kozinski/Multimedia (PLT) /SRPOL/Staff Engineer/Samsung Electronics" <d.kozinski@samsung.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For all VLCs here, the number of bits of the VLC is write-only,
because it is hardcoded at the call site. Therefore one can replace
these VLC structures with the only thing that is actually used:
The pointer to the VLCElem table. And in most cases one can even
avoid this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying tables directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For some VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to avoid the relocations inherent in an array
to individual tables; it also reduces padding.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_ps_init() initializes some tables for AAC parametric stereo
and some of them are only valid for the fixed- or floating-point
decoder, whereas others (namely VLCs) are valid for both.
The latter are therefore initialized by ff_ps_init_common()
and because the two versions of ff_ps_init() can be run
concurrently, it is guarded by an AVOnce.
Yet now that there is ff_aacdec_common_init_once() there is
a better way to do this: Call ff_ps_init_common()
from ff_aacdec_common_init_once(). That way there is no need
to guard ff_ps_init_common() by an AVOnce any more.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to avoid the relocations inherent in a table
to individual tables; it also reduces padding.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It allows to replace code tables of type uint32_t or uint16_t
by symbols of type uint8_t. It is also faster.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The VLCs, their init code and the tables used for initialization
are currently duplicated for the floating- and fixed-point decoders.
This commit stops doing so and moves this stuff to aacdec_common.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table. And in some cases one can even
avoid this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
h->long_ref isn't guaranteed to be contiguously filled. Use the approach
from both vaapi_h264 and vdpau_h264 which goes through the 16 frames in
h->long_ref to find the LTR entries.
Fixes MR2_MW_A.264 from JVT-AVC_V1.
The fixed-point decoder actually does not use the floating-point
tables initialized by ff_aac_tableinit() at all. So don't
initialize them for it; instead merge initializing these tables
into ff_aac_float_common_init() which is already the function
for the common static initializations of the floating-point
AAC decoder and the (also floating-point) AAC encoder.
Doing so saves also one AVOnce.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They (as well as their init code) are currently duplicated
for the floating- and fixed-point decoders.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For all VLCs here, the number of bits of the VLC is write-only,
because it is hardcoded at the call site. Therefore one can replace
these VLC structures with the only thing that is actually used:
The pointer to the VLCElem table. And in some cases one can even
avoid this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table. And in some cases one can even
avoid this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For all VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table. And in some cases one can even
avoid this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and only VLC.table needs to be retained.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Said object is only allowed to be modified during its
initialization and is immutable afterwards.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For most VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These VLCs are very big: The VP3 one have 164382 elements
but due to the overallocation enough memory for 313344 elements
are allocated (1.195 MiB with sizeof(VLCElem) == 4);
for VP4 the numbers are very similar, namely 311296 and 164392
elements. Since 1f4cf92cfb, each
frame thread has its own copy of these VLCs.
This commit fixes this by sharing these VLCs across threads.
The approach used here will also make it easier to support
stream reconfigurations in case of frame-multithreading
in the future.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Also combine the ff_msmp4_dc_(luma|chroma)_vlcs as well
as the tables used to generate them to simplify the code.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These are quite small and therefore force reloads
that can be avoided by modest increases in the number of bits used.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is especially important for frame-threaded decoders like
this one, because up until now each thread had an identical
copy of all these VLC tables.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For lots of static VLCs, the number of bits is not read from
VLC.bits, but rather a compile-constant that is hardcoded
at the callsite of get_vlc2(). Only VLC.table is ever used
and not using it directly is just an unnecessary indirection.
This commit adds helper functions and macros to avoid the VLC
structure when initializing VLC tables; there are 2x2 functions:
Two choices for init_sparse or from_lengths and two choices
for "overlong" initialization (as used when multiple VLCs are
initialized that share the same underlying table).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
If the scan lines are aligned, we can load each row as a 64-bit value,
thus avoiding segmentation. And then we can factor the conversion or
subtraction.
In principle, the same optimisation should be possible for high depth,
but would require 128-bit elements, for which no FFmpeg CPU flag
exists.
This should hopefully clarify that the option only affects MSZ
full-width characters, and not all full-width ASCII. Additionally,
this matches the prefix with the upstream option.
Signed-off-by: TADANO Tokumei <aimingoff@pc.nifty.jp>
This patch adds two MSZ (Middle Size; half width) character
related options, mapping against newly added upstream
functionality:
* `replace_msz_japanese`, which was introduced in version 1.0.1
of libaribcaption.
* `replace_msz_glyph`, which was introduced in version 1.1.0
of libaribcaption.
The latter option improves bitmap type rendering if specified
fonts contain half-width glyphs (e.g., BIZ UDGothic), even
if both ASCII and Japanese MSZ replacement options are set
to false.
As these options require newer versions of libaribcaption, the
configure requirement has been bumped accordingly.
Signed-off-by: TADANO Tokumei <aimingoff@pc.nifty.jp>
On some environments, a `bool` variable is of smaller size than `int`.
As AV_OPT_TYPE_BOOL is internally handled as sizeof(int), if a `bool`
option was set on such an environment, the memory of following
variables would be filled. Additionally, set values may be destroyed
by av_opt_copy().
Signed-off-by: TADANO Tokumei <aimingoff@pc.nifty.jp>
Fixes: left shift of negative value -538967841
Fixes: 62447/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-6427134337613824
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Tomas Härdin <git@haerdin.se>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array write
Fixes: 63390/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MAGICYUV_fuzzer-5144552979431424.fuzz
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Only the collocated_ref of the current frame (i.e. HEVCContext.ref)
is ever used*, so move it to HEVCContext directly after ref.
*: This goes so far that collocated_ref was not even synced across
threads in case of frame-threading.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
"Validation Error: [ VUID-VkImageViewCreateInfo-imageViewType-04974 ] Object 0: handle = 0x9f9b41000000003c, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0xc120e150 | vkCreateImageView():
Using pCreateInfo->viewType VK_IMAGE_VIEW_TYPE_2D and the subresourceRange.layerCount VK_REMAINING_ARRAY_LAYERS=(17) and must 1 (try looking into VK_IMAGE_VIEW_TYPE_*_ARRAY).
The Vulkan spec states: If viewType is VK_IMAGE_VIEW_TYPE_1D, VK_IMAGE_VIEW_TYPE_2D, or VK_IMAGE_VIEW_TYPE_3D; and subresourceRange.layerCount is VK_REMAINING_ARRAY_LAYERS,
then the remaining number of layers must be 1"
Partially fixes https://streams.videolan.org/issues/19938/20000_20180305-15.04.59.ts
The is coded as 1920x1080, meant to be rendered at 1440x1080 with cropping,
or 1680x1080 before cropping. Currently, the created DPB is 1440x1080, which results
in the image being decoded incorrectly, as the decoder overwrites output memory.
This commit fixes this.
This eases actual development of the assembly functions, by only
allowing extension instructions within the sections that explicitly
enable them, instead of having all extensions enabled everywhere.
Signed-off-by: Martin Storsjö <martin@martin.st>
It is unnecessary since the removal of non-thread-safe callbacks
in e0786a8eeb. Since then, the
AVCodecContext has only been used as logcontext.
Removing ff_thread_release_buffer() allowed to remove AVCodecContext*
parameters from several other functions (not only unref functions,
but also e.g. ff_h264_ref_picture() which calls ff_h264_unref_picture()
on error).
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
At various points through the function librsvg_decode_frame, errors are
returned from immediately without deallocating any allocated structs.
This patch both fixes those leaks, and also fixes the use of functions
that are deprecated since librsvg version 2.52.0. The older calls are
still used, guarded by #ifdefs while the newer replacements are used if
librsvg >= 2.52.0. One of the deprecated functions is used as a check
for the configure shell script, so it was replaced with a different
function.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally
uses a layered indentation style to visually show how different
unrolled/interleaved phases fit together.
Signed-off-by: Martin Storsjö <martin@martin.st>
Some functions have slightly different indentation styles; try
to match the surrounding code.
libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally
uses a layered indentation style to visually show how different
unrolled/interleaved phases fit together.
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes: index 32 out of bounds for type 'uint32_t [32]'
Fixes: 63003/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-4685160840560640
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1. If user don't specify the profile, set it to main10 when pixel
format is 10 bits. Before the patch, videotoolbox output main
profile bitstream with 10 bit input, which can be confusing.
2. Warning when user set profile to main explicitly with 10 bit
input. It works, but not the best choice.
3. Reject main 10 profile with 8 bit input, it doesn't work.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Will be used in the following patch. With hw_config we can get
avctx->hw_frames_ctx, and with avctx->hw_frames_ctx we get
sw_pix_fmt. Otherwise sw_pix_fmt is none. I need sw_pix_fmt
before get the first frame to set hevc encoder profile.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
In f7ac3512f5 the size of the dynamically
allocated buffer was shrunk, but it was made too small for very small
alphabet sizes. This patch restores the size to prevent an OOB read.
Reported-by: Cole Dilorenzo <coolkingcole@gmail.com>
Signed-off-by: Leo Izen <leo.izen@gmail.com>
EAGAIN causes an assertion failure when it is returned from the decoder
Fixes: Assertion consumed != (-(11)) failed at libavcodec/decode.c:462
Fixes: assertion_IOT_instruction_decode_c_462/poc
Found-by: Hardik Shah of Vehere (Dawn Treaders team)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 1496950099 + 728014168 cannot be represented in type 'int'
Fixes: 62667/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MJPEGB_fuzzer-6511785170305024
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -2146469728 - 1488954 cannot be represented in type 'int'
Fixes: 62490/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BONK_fuzzer-5612782399389696
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Fixes: 62678/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-4858264984354816
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Kieran Kunhya <kierank@obe.tv>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
They are generally set in ff_mpv_init_context_frame()
(mostly called by ff_mpv_common_init()); setting them
somewhere else should be avoided.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Forgotten in 0eb399ac39.
While just at it, also use a forward declaration.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unused by ff_mpeg4_decode_picture_header() (unsurprisingly given
that when decoding this function is called before the context has been
initialized).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
(The return value doesn't really matter: For video decoders
every return value >= 0 is treated as "consumed all of the input".)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This patch makes the libkvazaar encoder respect color settings that are
present on the codec context, including color range, primaries, transfer
function and colorspace.
Since at least commit c954cf1e1b
(adding ff_encode_alloc_frame()), a large part of ff_alloc_picture()
is completely separate for the two callers. Move the caller-specific
parts out to the callers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary in case of user-supplied frames, because
it happens directly after a av_frame_ref() with the same
src and dst.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_alloc_picture() performs two tasks: a) In most instances,
it allocates frame buffers and b) it allocates certain
auxiliary buffers.
The exception to a) is the case when the encoder can reuse
user-supplied frames. And for these frames the auxiliary buffers
are unused, because this frame will never be used as current_picture
(and therefore also not as next_picture or last_picture);
see select_input_picture().
This means that we can simply avoid calling ff_alloc_picture()
with user-supplied frames at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
None of the mpegvideo encoders support anything but coded frames;
and if this were to change, it is unclear whether they would need
the adjustment here. So remove it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only entries 0..max_b_frames are ever used.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In case "!direct" we are not reusing the input buffers
(due to e.g. insufficient alignment), but allocating
new ones. These of course do not alias with the ones
provided by the user, so these checks are always-false.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
mpegvideo_enc uses a fixed-size array of Pictures; a slot is
considered taken if the Picture's AVFrame is set.
When an error happens after a slot has been taken, this Picture
has typically not been reset and is therefore not usable for
future requests. The code aborts when one runs out of slots
and this can happen in case of allocation failures.
Fix this by always unreferencing a Picture in case of errors.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
There is no need to parse the header twice; doing so does nothing.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
And stop setting picture_number which was only done to not parse
extradata multiple times.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: signed integer overflow: 0 - -9223372036854775808 cannot be represented in type 'long'
Fixes: 51896/clusterfuzz-testcase-minimized-ffmpeg_IO_DEMUXER_fuzzer-6112289464123392
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Avoids allocations and error checks as well as the boilerplate
code for creating an AVBuffer with a custom free callback.
Also increases type safety.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Tested-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Avoids allocations and error checks and allows to remove
cleanup code for earlier allocations. Also avoids casts
and indirections.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Given that the RefStruct API relies on the user to know
the size of the objects and does not provide a way to get it,
we need to store the number of elements allocated ourselves;
but this is actually better than deriving it from the size
in bytes.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Avoids allocations and therefore error checks: Syncing
hwaccel_picture_private across threads can't fail any more.
Also gets rid of an unnecessary pointer in structures and
in the parameter list of ff_hwaccel_frame_priv_alloc().
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Lynne <dev@lynne.ee>
Tested-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>