We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm.
In a nutshell, the new algorithm does three things, gathering data from
8/16 rows, blurring data, and scattering data back to the image buffer.
Here we used a customized transpose 8x8/16x16 to avoid the huge overhead
brought by gather and scatter instructions, which is dependent on the
temporary buffer called localbuf added newly.
Performance data:
ff_horiz_slice_avx2(old): 109.89
ff_horiz_slice_avx2(new): 666.67
ff_horiz_slice_avx512: 1000
Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
The new vertical slice with AVX2/512 acceleration can significantly
improve the performance of Gaussian Filter 2D.
Performance data:
ff_verti_slice_c: 32.57
ff_verti_slice_avx2: 476.19
ff_verti_slice_avx512: 833.33
Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
EINVAL is the wrong error code here, since the arguments passed to the
function are valid. The error is that the function is not implemented in
the build, which corresponds to ENOSYS.
When both -filter_threads and -threads are specified, the latter takes
effect. Since -threads is an encoder option and -filter_threads is a
filter option, it makes sense for the -filter_threads to take
precedence.
Implements a gray world color correction algorithm
using a log scale LAB colorspace.
Signed-off-by: Paul Buxton <paulbuxton.mail@googlemail.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
Fixes: signed integer overflow: 1179337772 + 1392508928 cannot be represented in type 'int'
Fixes: 34088/clusterfuzz-testcase-minimized-ffmpeg_dem_AVI_fuzzer-5846945303232512
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Don't allocate the buffer for the title ourselves, leave it to
av_dict_set(). This simplifies freeing.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
A single VorbisComment consists of a length field and a
non-NUL-terminated string of the form "key=value". Up until now,
when parsing such a VorbisComment, zero-terminated duplicates of
key and value would be created. This is wasteful if these duplicates
are freed shortly afterwards, as happens in particular in case of
attached pictures: In this case value is base64 encoded and only
needed to decode the actual data.
Therefore this commit changes this: The buffer is temporarily modified
so that both key and value are zero-terminated. Then the data is used
in-place and restored to its original state afterwards.
This requires that the buffer has at least one byte of padding. All
buffers currently have AV_INPUT_BUFFER_PADDING_SIZE bytes padding,
so this is ok.
Finally, this also fixes weird behaviour from ogm_chapter():
It sometimes freed given to it, leaving the caller with dangling
pointers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This results in warnings on compilers which don't support it,
objections were raised during the review process about it but went unnoticed,
and the speed benefit is highly compiler and version specific, and
also not very critical.
We generally hand-write assembly to optimize loops like that, rather
than use compiler magic, and for 40% best case scenario, it's simply
not worth it.
Plus, tree vectorization is still problematic with GCC and disabled by default
for a good reason, so enabling it locally is sketchy.
This patch renames the InferenceItem to LastLevelTaskItem in the
three backends to avoid confusion among the meanings of these structs.
The following are the renames done in this patch:
1. extract_inference_from_task -> extract_lltask_from_task
2. InferenceItem -> LastLevelTaskItem
3. inference_queue -> lltask_queue
4. inference -> lltask
5. inference_count -> lltask_count
Signed-off-by: Shubhanshu Saxena <shubhanshu.e01@gmail.com>
Remove async flag from filter's perspective after the unification
of async and sync modes in the DNN backend.
Signed-off-by: Shubhanshu Saxena <shubhanshu.e01@gmail.com>
This commit removes the unused sync mode specific code from the DNN
filters since the sync and async mode are now unified from the
filters' perspective.
Signed-off-by: Shubhanshu Saxena <shubhanshu.e01@gmail.com>
This commit unifies the async and sync mode from the DNN filters'
perspective. As of this commit, the Native backend only supports
synchronous execution mode.
Now the user can switch between async and sync mode by using the
'async' option in the backend_configs. The values can be 1 for
async and 0 for sync mode of execution.
This commit affects the following filters:
1. vf_dnn_classify
2. vf_dnn_detect
3. vf_dnn_processing
4. vf_sr
5. vf_derain
This commit also updates the filters vf_dnn_detect and vf_dnn_classify
to send only the input frame and send NULL as output frame instead of
input frame to the DNN backends.
Signed-off-by: Shubhanshu Saxena <shubhanshu.e01@gmail.com>
As the second argument for init_get_bits(avctx and buf) can be crafted,
a return value check for this function call is necessary,
so replace init_get_bits with init_get_bits8 and add return value check.
Consider data as invalid if ff_wma_run_level_decode
gets out with an error.
It avoids an unpleasant sound distorsion.
See http://trac.ffmpeg.org/ticket/9358