This avoids constant allocations+frees and will also allow
to simply switch to the RefStruct API, thereby avoiding
the overhead of the AVBuffer API.
It also simplifies the code, because it removes the "needs_realloc"
field: It was added in 435c0b87d2,
before the introduction of the AVBuffer API: given that these buffers
may be used by different threads, they were not freed immediately
and instead were marked as being freed later by setting needs_realloc.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
VC-1 can switch from between being progressive and interlaced
on a per-frame basis. In the latter case, the number of macroblocks
is aligned to two (or equivalently, the height to 32); therefore
certain buffers are allocated for the bigger mb_height
(see 950fb8acb4 and
017e234c20).
This commit changes how this is done: Aligning these buffers is
restricted to VC-1 and it is done directly by aligning
mb_height (but not MpegEncContext.mb_height) instead of
adding something in an ad-hoc manner.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Codecs call ff_find_unused_picture() to get the index of
an unused picture; said picture may have buffers left
from using it previously (these buffers are intentionally
not unreferenced so that it might be possible to reuse them;
they are only reused when they are writable, otherwise
they are replaced by new, zeroed buffers). They should
not make any assumptions about which picture they get.
Yet this is not true for mbskip_table and damaged bitstreams.
When one returns old unused slots randomly, the output
becomes nondeterministic. This can't happen now (see below),
but it will be possible once mpegpicture uses proper pools
for the picture tables.
The following discussion uses the sample created via
ffmpeg -bitexact -i fate-suite/svq3/Vertical400kbit.sorenson3.mov -ps 50 -bf 2 -bitexact -an -qscale 5 -ss 40 -error_rate 4 -threads 1 out.avi
When decoding this with one thread, the slots are as follows:
Cur 0 (type I), last -1, Next -1; cur refcount -1, not reusing buffers
Cur 1 (type P), last -1, Next 0; cur refcount -1, not reusing buffers
Cur 2 (type B), last 0, Next 1; cur refcount -1, not reusing buffers
Cur 2 (type B), last 0, Next 1; cur refcount 2, not reusing buffers
Cur 0 (type P), last 0, Next 1; cur refcount 2, not reusing buffers
Cur 2 (type B), last 1, Next 0; cur refcount 1, reusing buffers
Cur 2 (type B), last 1, Next 0; cur refcount 2, not reusing buffers
Cur 1 (type P), last 1, Next 0; cur refcount 2, not reusing buffers
Cur 2 (type B), last 0, Next 1; cur refcount 1, reusing buffers
Cur 2 (type B), last 0, Next 1; cur refcount 2, not reusing buffers
Cur 0 (type I), last 0, Next 1; cur refcount 2, not reusing buffers
Cur 2 (type B), last 1, Next 0; cur refcount 1, reusing buffers
Cur 2 (type B), last 1, Next 0; cur refcount 2, not reusing buffers
Cur 1 (type P), last 1, Next 0; cur refcount 2, not reusing buffers
After the slots have been filled initially, the buffers are only
reused for the first B-frame in a B-frame chain:
a) When the new picture is an I or a P frame, the slot of the backward
reference is cleared and reused for the new frame (as has been said,
"cleared" does not mean that the auxiliary buffers have been
unreferenced). Given that not only the slot in the picture array,
but also MpegEncContext.last_picture contain references to these
auxiliary buffers, they are not writable and are therefore not reused,
but replaced by new, zero-allocated buffers.
b) When the new picture is the first B-frame in a B-frame chain,
the two reference slots are kept as-is and one gets a slot that
does not share its auxiliary buffers with any of MpegEncContext.
current_picture, last_picture, next_picture. The buffers are
therefore writable and are reused.
c) When the new picture is a B-frame that is not the first frame
in a B-frame chain, ff_mpv_frame_start() reuses the slot occupied
by the preceding B-frame. Said slot shares its auxilary buffers
with MpegEncContext.current_picture, so that they are not considered
writable and are therefore not reused.
When using frame-threading, the slots are made to match the one
from the last thread, so that the above analysis is mostly the same
with one exception: Other threads may also have references to these
buffers, so that initial B-frames of a B-frame chain need no longer
have writable/reusable buffers. In particular, all I and P-frames
always use new, zeroed buffers. Because only the mbskip_tables of
I- and P-frames are ever used, it follows that there is currently
no problem with using stale values for them at all.
Yet as the analysis shows this is very fragile:
1. MpegEncContext.(current|last|next)_picture need not have
references of their own, but they have them and this influences
the writability decision.
2. It would not work if the slots were returned in a truely random
fashion or if there were a proper pool used.
Therefore this commit always resets said buffer. This is in preparation
for actually adding such a pool (where the checksums for said sample
would otherwise be depending on the number of threads used for
decoding).
Future commits will restrict this to only the codecs for which
it is necessary (namely the MPEG-4 decoder).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Codecs call ff_find_unused_picture() to get the index of
an unused picture; said picture may have buffers left
from using it previously (these buffers are intentionally
not unreferenced so that it might be possible to reuse them;
this is mpegvideo's version of a bufferpool). They should
not make any assumptions about which picture they get.
Yet somehow this is not true when decoding OBMC: Returning
random empty pictures (instead of the first one) leads
to nondeterministic results; similarly, explicitly
rezeroing the buffer before handing it over to the codec
changes the outcome of the h263-obmc tests, but it makes it
independent of the returned pictures. Therefore this commit
does so.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
coded_block is only used for I-frames, so it is unnecessary
to reset it in ff_clean_intra_table_entries() (which
cleans certain tables for a non-intra MB).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When !CONFIG_SMALL, we create separate functions for FMT_MPEG1
(i.e. for MPEG-1/2); given that there are only three possibilities
for out_format (FMT_MPEG1, FMT_H263 and FMT_H261 -- MJPEG and SpeedHQ
are both intra-only and do not have motion vectors at all, ergo
they don't call this function), one can optimize MPEG-1/2-only code
away in mpeg_motion_internal().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These references now always exist due to dummy frames.
Also remove the corresponding checks in the lowres code
in mpegvideo_dec.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
MPEG-2 allows to pair an intra field (as first field) together
with a P-field. In this case a conformant bitstream has to satisfy
certain restrictions in order to ensure that only the I field
is used for prediction. See section 7.6.3.5 of the MPEG-2
specifications.
We do not check these restrictions; normally we simply allocate
dummy frames for reference in order to avoid checks lateron.
This happens in ff_mpv_frame_start() and therefore does not happen
for a second field. This is inconsistent. Fix this by allocating
these dummy frames for the second field, too.
This already fixes two bugs:
1. Undefined pointer arithmetic in prefetch_motion() in
mpegvideo_motion.c where it is simply presumed that the reference
frame exists.
2. Several MPEG-2 hardware accelerations rely on last_picture
being allocated for P pictures and next picture for B pictures;
e.g. VDPAU returns VDP_STATUS_INVALID_HANDLE when decoding
an I-P fields pair because the forward_reference was set incorrectly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This will allow to reuse it to allocate dummy frames for
the second field (which can be a P-field even if the first
field was an intra field).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
linesize and uvlinesize are supposed to be the common linesize of all
the Y/UV-planes of all the currently cached pictures.
ff_mpeg_update_thread_context() syncs the pictures, yet it did not sync
linesize and uvlinesize. This mostly works, because ff_alloc_picture()
only accepts new pictures if they coincide with the linesize of the
already provided pictures (if any). Yet there is a catch: Linesize
changes are accepted when the dimensions change (in which case the
cached frames are discarded).
So imagine a scenario where all frame threads use the same dimension A
until a frame with a different dimension B is encountered in the
bitstream, only to be instantly reverted to A in the next picture. If
the user changes the linesize of the frames upon the change to dimension
B and keeps the linesize thereafter (possible if B > A),
ff_alloc_picture() will report an error when frame-threading is in use:
The thread decoding B will perform a frame size change and so will the
next thread in ff_mpeg_update_thread_context() as well as when decoding
its picture. But the next thread will (presuming it is not the same
thread that decoded B, i.e. presuming >= 3 threads) not perform a frame
size change, because the new frame size coincides with its old frame
size, yet the linesize it expects from ff_alloc_picture() is outdated,
so that it errors out.
It is also possible for the user to use the original linesizes for
the frame after the frame that reverted back to A; this will be
accepted, yet the assumption that of all pictures are the same
will be broken, leading to segfaults.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The mpegvideo-based codecs currently require the linesize to be
constant (except when the frame dimensions change); one reason
for this is that certain scratch buffers whose size depend on
linesize are only allocated once and are presumed to be correctly
sized if the pointers are != NULL.
This commit changes this by storing the actual linesize these
buffers belong to and reallocating the buffers if it does not
suffice. This is not enough to actually support changing linesizes,
but it is a start. And it is a prerequisite for the next patch.
Also don't emit an error message in case the source ctx's
edge_emu_buffer is unset in ff_mpeg_update_thread_context().
It need not be an error at all; e.g. it is a perfectly normal
state in case a hardware acceleration is used as the scratch
buffers are not allocated in this case (it is easy to run into
this issue with MPEG-4) or if the src context was not initialized
at all (e.g. because the first packet contained garbage).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary to check for whether the number of planes
of an already existing audio pool coincides with the number
of planes to use for the frame: If the common format of both
is planar, then the number of planes coincides with the number
of channels for which there is already a check*; if not,
then both the existing pool as well as the frame use one pool.
*: In fact, one could reuse the pool in this case even if the
number of channels changes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is currently done inconsistently: Only one error path
(namely the one from init_pass2()) made ff_rate_control_init()
call ff_rate_control_uninit(); in other error paths
cleanup was left to the caller.
Given that the only caller of this function already performs
the necessary cleanup this commit changes this to always
rely on the caller to perform cleanup on error.
Also return the error code from init_pass2().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The issue is that if a frame has no complex stereo prediction,
the alpha values must all be assumed to be zero if the next frame
has complex prediction and uses delta coding.
The LC part of the decoder combines scalefactor application with
spectrum decoding, and this was the plan here, but that's not possible,
so change the function name.
The issue here is that the spec implied that the offset is done
on the dequantized scalefactor, but in fact, it is done on the
scalefactor offset. Delay dequantizing the scalefactors until
after noise synthesis is performed, and change to apply the
offset onto the offset.
Otherwise nothing is written into the destination when a write mapping
is requested.
For example, a vulkan frame mapped from a drm frame (which is wrapped as
a vaapi frame in the example) is used as the output of scale_vulkan
filter, it always gets a green screen without this patch.
ffmpeg -init_hw_device vaapi=va -init_hw_device vulkan=vulkan@va
-filter_hw_device vulkan -f lavfi -i testsrc=size=352x288,format=nv12
-vf
"hwupload,scale_vulkan,hwmap=derive_device=vaapi:reverse=1,format=vaapi,hwdownload,format=nv12"
-f nut - | ffplay -
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This changes the behavior and makes it behave how it probably was intended.
Either way this is unlikely to result in any user visible change
Fixes: CID1494637 Missing break in switch
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This also makes the code more robust
Fixes: CID1512414 Uninitialized pointer read
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Pierre-Anthony Lemieux <pal@sandflow.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Alot more input checking can be performed, this is only checking the obvious missing case
Fixes: CID1598562 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This adds runtime support to use Zbb REV8 for 32- and 64-bit byte-wise
swaps. The result is about five times slower than if targetting Zbb
statically, but still a lot faster than the default bespoke C code or a
call to GCC run-time functions.
For 16-bit swap, this is however unsurprisingly a lot worse, and so this
sticks to the baseline. In fact, even using REV8 statically does not
seem to be beneficial in that case.
Zbb static Zbb dynamic I baseline
bswap16: 0.668184765 3.340764069 0.668029012
bswap32: 0.668174014 3.340763319 9.353855435
bswap64: 0.668221765 3.340496313 14.698672283
(seconds for 1 billion iterations on a SiFive-U74 core)
Due to hysterical raisins, most RISC-V Linux distributions target a
RV64GC baseline excluding the Bit-manipulation ISA extensions, most
notably:
- Zba: address generation extension and
- Zbb: basic bit manipulation extension.
Most CPUs that would make sense to run FFmpeg on support Zba and Zbb
(including the current FATE runner), so it makes sense to optimise for
them. In fact a large chunk of existing assembler optimisations relies
on Zba and/or Zbb.
Since we cannot patch shared library code, the next best thing is to
carry a flag initialised at load-time and check it on need basis.
This results in 3 instructions overhead on isolated use, e.g.:
1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported)
LBU rd, %pcrel_lo(1b)(rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
The C compiler will typically load the flag ahead of time to reducing
latency, and can also keep it around if Zbb is used multiple times in a
single optimisation scope. For this to work, the flag symbol must be
hidden; otherwise the optimisation degrades with a GOT look-up to
support interposition:
1: AUIPC rd, GOT_OFFSET_HI
LD rd, GOT_OFFSET_LO(rd)
LBU rd, (rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
This patch adds code to provision the flag in libraries using bit
manipulation functions from libavutil: byte-swap, bit-weight and
counting leading or trailing zeroes.
Do not do it in hls_slice_header(), which is the wrong place for it.
Avoids special magic return value of 1 in that function. The comment
mentioning potential corrupted state is no longer relevant, as
hls_slice_header() modifies no state beyond SliceHeader, which will only
get used for a valid frame.