use_intra_dc_vlc is currently kept in sync between frame threads
in mpeg4_update_thread_context(), yet it is set when decoding
blocks, i.e. after ff_thread_finish_setup(). This is a data race
and therefore undefined behaviour.
This race can be fixed easily by moving the variable from the context
to the stack: use_intra_dc_vlc is only read in
mpeg4_decode_block() and only if one is decoding an intra block.
There are three callsites for this function: One in
mpeg4_decode_partitioned_mb() which always sets use_intra_dc_vlc
before the call and two in mpeg4_decode_mb(). One of these callsites
is for intra blocks and use_intra_dc_vlc is set before it;
the last callsite is for non-intra blocks, where use_intra_dc_vlc
is ignored. So if it is used, it always uses a new value and can
therefore be moved to the stack.
The above also explains why this data race did not lead to
FATE-test failures.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
An offset has the advantage of not needing to be updated
when the buffer is reallocated. Furthermore, the way the pointer
is currently updated is undefined behaviour in case the pointer
is not already set (i.e. when not encoding MPEG-1/2), because
it calculates the nonsense NULL - s->pb.buf.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also use said function in mpegvideo.c and mpegvideo_enc.c;
and make ff_free_picture_tables() static as it isn't needed anymore
outside of mpegpicture.c.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is possible now that dealing with the Simple Studio Profile
has been moved to mpeg4videodec.c. It also allows to avoid
allocations, because one can simply put the required buffers
on the context (if one made these buffers part of MpegEncContext,
the memory would be wasted for every codec other than MPEG-4).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The sample mpeg4/mpeg4_sstp_dpcm.m4v existed in the FATE-suite,
but it was surprisingly unused.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In this case the macroblocks written to are smaller, yet
the MPEG-4 Simple Studio Profile code for 10bit DPCM ignored this;
e.g. in case of lowres = 2 or = 3, the sample mpeg4_sstp_dpcm.m4v
from the FATE-suite reads beyond the end of the buffer.
This commit fixes this by taking lowres into account.
The DPCM macroblocks of the aforementioned sample look
as good as can be expected after this patch; yet the non-DPCM
coded macroblocks are simply corrupt.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
jpeg2000_decode_tile() (which is run concurrently by several threads
when using slice threading) currently modifies some joint values
before doing its actual work. This is a data race that happens to work
because all threads set the same values; but it is nevertheless
undefined behaviour.
Fix this by performing said preparatory work in the main thread instead.
This fixes the vsynth(1|2|_lena)-jpeg2000(-97)? FATE-tests when using
TSAN and slice threading.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When AV_CODEC_EXPORT_DATA_FILM_GRAIN is present, AV1 decoder should
disable film grain application and export the corresponding side data
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
For vaapi if the init_pool_size is not zero, the pool size is fixed.
This means max surfaces is init_pool_size, but when mapping vaapi
frame to qsv frame, the init_pool_size < nb_surface. The cause is that
vaapi_decode_make_config() config the init_pool_size and it is called
twice. The first time is to init frame_context and the second time is to
init codec. On the second time the init_pool_size is changed to original
value so the init_pool_size is lower than the reall size because
pool_size used to initialize frame_context need to plus thread_count and
3 (guarantee 4 base work surfaces). Now add code to make sure
init_pool_size is only set once. Now the following commandline works:
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi -i input.264 \
-vf "hwmap=derive_device=qsv,format=qsv" \
-c:v h264_qsv output.264
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Makes Bulldozer prefer AVX functions rather than AVX2,
which are 64% slower:
AVX: 117653 decicycles in av_tx (fft), 1048535 runs, 41 skips
AVX2: 193385 decicycles in av_tx (fft), 1048561 runs, 15 skips
The only difference between both is that vgatherdpd is used in
the former. We don't want to mark them with the new SLOW_GATHER
flag however, since gathers are still faster on Haswell/Zen 2/3
than plain loads.
If a codelet initializes 2 subtransforms, and the second one fails,
the failure would free all subcontexts.
Instead, if there are subcontexts still left, don't free the array.
If all initializations fail, the init() function will return,
and reset_ctx() from the previous step will clean up all contained
subtransforms.
Fix CID: 1497864
The control flow should return ENOSYS if nb_cd_matches is 0 at before
and the ret equal AVERROR(ENOMEM) or goto end label, so remove the last
control flow if (ret >= 0) before end label.
Signed-off-by: Steven Liu <liuqi05@kuaishou.com>