For detect_range, the usage of vpbroadcast{b,w} requires the AVX512BW extension, and for
detect_alpha we don't want ZMM instructions downclocking old CPUs.
Signed-off-by: James Almer <jamrial@gmail.com>
This filter can detect various properties about the image, including
whether or not there are out-of-range values, or whether the input appears
to use straight or premultiplied alpha.
Of course, these can only be heuristics, with "undetermined" as the base
case. While we can definitely prove the existence of full range or
straight alpha colors, we can never infer the opposite.
scale was never initialized. av_tx_init() will use default scale if we
pass NULL.
Fixes: b3117f376d
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Requested by a user. Even with autovectorization enabled, the compiler
performs a quite poor job of optimizing this function, due to not being
able to take advantage of the pmaxub + pcmpeqb trick for counting the number
of pixels less than or equal-to a threshold.
blackdetect8_c: 4625.0 ( 1.00x)
blackdetect8_avx2: 155.1 (29.83x)
blackdetect16_c: 2529.4 ( 1.00x)
blackdetect16_avx2: 163.6 (15.46x)
This naive hist[p[x]]++ loop suffers badly when there are large regions of
identical values in the image, because of store-to-load forwarding delay.
Splitting up the histogram into four "parallel" histograms and processing
them one at a time speeds things up significantly, about 40% on my end.
Since psadbw only exists for 8-bits, we have to emulate it for 16-bit
inputs. The simplest sequence is to use a normal subtraction, which is safe
as long as the inputs do not exceed 32767 - so limit this implementation
to 15-bit inputs and below.
For 16-bit inputs, we could in theory instead use a pminw / pmaxw to ensure
the resulting difference does not overflow, but this is slower, and also
breaks the subsequent use of pmaddwd, so I opted to skip 16-bit SIMD for
now.
scene_sad10_c: 114175.6 ( 1.00x)
scene_sad10_avx2: 9617.7 (11.87x)
scene_sad10_avx512: 5208.8 (21.92x)
scene_sad12_c: 114537.8 ( 1.00x)
scene_sad12_avx2: 9614.0 (11.91x)
scene_sad12_avx512: 5186.3 (22.08x)
scene_sad14_c: 114113.9 ( 1.00x)
scene_sad14_avx2: 9612.9 (11.87x)
scene_sad14_avx512: 5186.0 (22.00x)
scene_sad15_c: 114108.9 ( 1.00x)
scene_sad15_avx2: 9612.3 (11.87x)
scene_sad15_avx512: 5186.4 (22.00x)
scene_sad16_c: 114136.0 ( 1.00x)
Trivial to add, but a lot faster (on my machine).
scene_sad8_c: 114476.4 ( 1.00x)
scene_sad8_sse2: 8644.3 (13.24x)
scene_sad8_avx2: 4520.1 (25.33x)
scene_sad8_avx512: 3153.0 (36.31x)
This adds support for high bit depth formats, as well as formats with fewer
than 3 planes. The implementation for HBD is the same as for 8 bit formats,
just right shifted to 8 bits.
It's worth pointing out that this also works for HDR formats (and even DV),
because the underlying implementation is just trying to minimize the histogram
difference. If anything, using a HDR format will result in a *more* accurate
detection, because HDR formats tend to be more perceptually uniform.
Enables timeline editing options for overlay_cuda similar to what overlay allows
Example overlaying an image on a video between 30 to 60 seconds:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i sample-video.mp4 -i sample-image.jpg
-filter_complex "[1:v]hwupload_cuda[image],[0:v]scale_npp=format=yuv420p[video],[video][image]overlay_cuda=enable='between(t,30,60)'"
-c:v h264_nvenc -c:a copy -y overlay-output-gpu.mp4
Signed-off-by: Jorge Estrada <jestrada.list@gmail.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
In config_input(), fir_to_phase() allocates memory in h[longer], which
would leak if av_calloc() to s->coeffs failed. lpf() allocates memory
in h[0] and h[1], which would leak if fir_to_phase() failed. To fix
this leak, add av_free(h[longer]) in as cleanup code, and replace
return AVERROR* with goto cleanup to prevent from leaks.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch adds the pad_cuda video filter. A filter similar to the existing pad filter but accelerated by CUDA.
The filter shares the same options as the software pad filter.
Example usage:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=iw+100:h=ih+100:x=-1:y=-1:color=red" out.mp4
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
A frame graph activation might not produce a frame in the requested sink, so
keep on requesting a frame there unless we encounter a filter activation with
buffersrc empty error.
This makes av_buffersink_get_frame(_flags) work according to its documentation
which claims that EAGAIN is only returned if additional frames must be inserted
into the graph.
Fate changes are because audio frames will have different sizes at segment
boundaries, but content is the same.
Signed-off-by: Marton Balint <cus@passwd.hu>
Sinks without an activate callback have no means to request frames in their
input, therefore the default activate callback should do it for them.
Fixes ticket #11624.
Fixes ticket #10988.
Fixes ticket #10990.
Signed-off-by: Marton Balint <cus@passwd.hu>
Even if all inputs are blocked an activate callback should request a frame on
some if its inputs if a frame is requested on any of its outputs.
Signed-off-by: Marton Balint <cus@passwd.hu>
EOF only need to be forwarded back if all outputs have reached EOF.
Fixes infinte loop with ffprobe -f lavfi -i "smptebars=d=1,select=n=2:e=1[out0][out1]"
Regression since d9e41ead82.
Fixes ticket #10959.
Fixes ticket #11366.
Signed-off-by: Marton Balint <cus@passwd.hu>
The status check is unneeded because an outlink with a nonzero status should
always return 0 for ff_outlink_frame_wanted(). Also use unsigned for index
because nb_outputs is unsigned as well.
Signed-off-by: Marton Balint <cus@passwd.hu>
In parse_cinespace(), memory allocated in in_prelut[] and out_prelut[]
would leak if allocate_3dlut() failed. Replace return ret with goto end
to free memory before return error code.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
After 3b26b782ee, `ebur128->true_peak` was only set to the maximum of the
current "true peak per frame" values, when it should report the true peak for
the entire stream.
Fixes: 3b26b782ee
The codecviewfilter, when used with qp=1, did not display quantization parameter values for H.264 streams because the QP table extraction was restricted to MPEG-2 video.
This patch enables H.264 support by updating ff_qp_table_extractto accept AV_VIDEO_ENC_PARAMS_H264. This allows for correct QP overlay on H.264 video
Signed-off-by: Timothee <timothee.informatique@regaud-chapuy.fr>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This should prevent the possibility of audio data accumulating.
The commit also cleans up and simplifies the code a bit so all frame producers
(filter_frame(), flush_frame()) functions follow similar logic as
ff_inlink_consume_frame() for the return code.
Signed-off-by: Marton Balint <cus@passwd.hu>
Easier to read, less convoluted, and ~30% faster. Most importantly, this
avoids repeating the redundant recalculation of the true peak on every
single sample, by moving the FIND_PEAK() loop out of the main loop. (Note
that FIND_PEAK() does not depend on the current sample index at all, so
there is no reason for it to ever be recomputed here)