This helps figuring out where the filter is slow:
70.53% ffmpeg_g ffmpeg_g [.] nlmeans_slice
25.73% ffmpeg_g ffmpeg_g [.] compute_safe_ssd_integral_image_c
1.74% ffmpeg_g ffmpeg_g [.] compute_unsafe_ssd_integral_image
0.82% ffmpeg_g ffmpeg_g [.] ff_mjpeg_decode_sos
0.51% ffmpeg_g [unknown] [k] 0xffffffff91800a80
0.24% ffmpeg_g ffmpeg_g [.] weight_averages
(Tested with a large image that takes several seconds to process)
Since this function is irrelevant speed wise, the file's TODO is
updated.
before: ssd_integral_image_c: 49204.6
after: ssd_integral_image_c: 44272.8
Unrolling by 4 made the biggest difference on odroid-c2 (aarch64);
unrolling by 2 or 8 both raised 46k cycles vs 44k for 4.
Additionally, this is a much better reference when writing SIMD (SIMD
vectorization will just target 16 instead of 4).
SIMD code will not have to deal with padding itself. Overwriting in that
function may have been possible but involve large overreading of the
sources. Instead, we simply make sure the width to process is always a
multiple of 16. Additionally, there must be some actual area to process
so the SIMD code can have its boundary checks after processing the first
pixels.
The thread id was invalid because it was not initialised
during the calls to init_complex_filtergraph.
This adds a flag to check for initialisation before trying to
peform the join.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Kevin Wheatley <kevin.j.wheatley@gmail.com>
Temporarily keep the old method for ffmpeg_filters.c choose_pix_fmt and
avfiltergraph.c pick_format() until a paletted pixel format without alpha is
introduced.
Signed-off-by: Marton Balint <cus@passwd.hu>
For filters based on framesync, the input frame was managed
by framesync, so we should not directly keep and destroy it,
instead we make a clone of it here, or else double-free will occur.
But for other filters not based on framesync, we still need to
free the input frame inside filter_frame.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
The existing version which was cherry-picked from Libav does not work
with FFmpeg framework, because ff_request_frame() was totally
different between Libav (recursive) and FFmpeg (non-recursive).
The existing overlay_qsv implementation depends on the recursive version
of ff_request_frame to trigger immediate call to request_frame() on input pad.
But this has been removed in FFmpeg since "lavfi: make request_frame() non-recursive."
Now that we have handy framesync support in FFmpeg, so I make it work
based on framesync. Some other fixing which is also needed to make
overlay_qsv work are put in a separate patch.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
This removes the XP compatibility code, and switches entirely to SRW
locks, which are available starting at Windows Vista.
This removes CRITICAL_SECTION use, which allows us to add
PTHREAD_MUTEX_INITIALIZER, which will be useful later.
Windows XP is hereby not a supported build target anymore.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Currently vpp pipeline is always created, even for the unnecessary
cases such as setting the option "vpp_qsv=w=1280:h=720" for an input
with native resolution 1280x720. Thus introduces unnecessary performance
dropping, so bypass vpp if not needed.
Signed-off-by: Zhong Li <zhong.li@intel.com>
Signed-off-by: Maxym Dmytrychenko <maxim.d33@gmail.com>
PSEUDOPAL pixel formats are not paletted, but carried a palette with the
intention of allowing code to treat unpaletted formats as paletted. The
palette simply mapped the byte values to the resulting RGB values,
making it some sort of LUT for RGB conversion.
It was used for 1 byte formats only: RGB4_BYTE, BGR4_BYTE, RGB8, BGR8,
GRAY8. The first 4 are awfully obscure, used only by some ancient bitmap
formats. The last one, GRAY8, is more common, but its treatment is
grossly incorrect. It considers full range GRAY8 only, so GRAY8 coming
from typical Y video planes was not mapped to the correct RGB values.
This cannot be fixed, because AVFrame.color_range can be freely changed
at runtime, and there is nothing to ensure the pseudo palette is
updated.
Also, nothing actually used the PSEUDOPAL palette data, except xwdenc
(trivially changed in the previous commit). All other code had to treat
it as a special case, just to ignore or to propagate palette data.
In conclusion, this was just a very strange old mechnaism that has no
real justification to exist anymore (although it may have been nice and
useful in the past). Now it's an artifact that makes the API harder to
use: API users who allocate their own pixel data have to be aware that
they need to allocate the palette, or FFmpeg will crash on them in
_some_ situations. On top of this, there was no API to allocate the
pseuo palette outside of av_frame_get_buffer().
This patch not only deprecates AV_PIX_FMT_FLAG_PSEUDOPAL, but also makes
the pseudo palette optional. Nothing accesses it anymore, though if it's
set, it's propagated. It's still allocated and initialized for
compatibility with API users that rely on this feature. But new API
users do not need to allocate it. This was an explicit goal of this
patch.
Most changes replace AV_PIX_FMT_FLAG_PSEUDOPAL with FF_PSEUDOPAL. I
first tried #ifdefing all code, but it was a mess. The FF_PSEUDOPAL
macro reduces the mess, and still allows defining FF_API_PSEUDOPAL to 0.
Passes FATE with FF_API_PSEUDOPAL enabled and disabled. In addition,
FATE passes with FF_API_PSEUDOPAL set to 1, but with allocation
functions manually changed to not allocating a palette.
avdevice_register_all() is still required to register devices into
lavf (this is required due to lavd being somewhat of a hack).
Signed-off-by: Josh de Kock <josh@itanimul.li>
Fixes parsing of expressions like c0=c0+c0 or c0=c0|c0=c1. Previously no
error was thrown and for input channels, only the last gain factor was used,
for output channels the source channel gains were combined.
Signed-off-by: Marton Balint <cus@passwd.hu>
The output frame size is larger than the image containing a subsampled
plane - use the actual size of the image being written rather than the
dimensions of the intended output frame.
Reviewed-by: Dylan Fernando <dylanf123@gmail.com>
Behaves like the existing avgblur filter, except working on OpenCL
hardware frames. Takes exactly the same options.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
The intended target is OpenCL 1.2, so disable warnings for APIs deprecated
after that. This primarily applies to clCreateCommandQueue(), we can't use
the replacement clCreateCommandQueueWithProperties() because it was
introduced in OpenCL 2.0.
Also remove some unnecessary includes from overlay and program filters so
that the define is available at the right moment.
Add a new function to find the global work size given the output image and
the required block alignment, then use it in the overlay, program and unsharp
filters. Fixes the overlay and unsharp filters applying the kernel to
locations outside the frame when subsampled planes are present.
Since the config_props function now references both the input and output
links, rename the 'link' variable to 'outlink'.
Fix up some mismatching indentation.
Don't bother setting the width and height on the outlink; the filter
framework does that for us.
The old version of the filter had a problem where it would queue up
all of the duplicate frames required to fill a timestamp gap in a
single call to filter_frame. In problematic files - I've hit this in
webcam streams with large gaps due to network issues - this will queue
up a potentially huge number of frames. (I've seen it trigger the Linux
OOM-killer on particularly large pts gaps.)
This revised version of the filter using the activate callback will
generate at most 1 frame each time it is called.
This patch makes it possible to dinamically close the current segment
and step to the next one by introducing command handling capabilities
into the filter. This new feature is very usefull when working with
real-time sources or live streams as source. Combinig usage with zmqsend
tool you can interactively end the current segment and step to next one.
Signed-off-by: Bela Bodecs <bodecsb@vivanet.hu>
Right now, the PTS always starts out as 0, which causes problems on a
seek or when inserting this filter mid-stream.
Initialize it instead to AV_NOPTS_VALUE and copy the PTS from the first
frame instead if this is the case.
* commit 'b128be1748f3920a14a98307265df5f2d3433e1d':
vf_*_vaapi: Support increasing hardware frame pool size
Rewritten to apply to common VAAPI code rather than specific filters.
Merged-by: Mark Thompson <sw@jkqxz.net>
* commit '6d86cef06ba36c0ed591e14a2382e9630059fc5d':
lavfi: Add support for increasing hardware frame pool sizes
Merged-by: Mark Thompson <sw@jkqxz.net>
These filters do not directly know whether the API they are using will
support dynamic frame pools, so this is somewhat tricky. If the user
sets extra_hw_frames, we assume that they are aware of the problem and
set a fixed size based on that. If not, most cases use dynamic sizing
just like they did previously. The hardware-reverse-mapping case for
hwmap previously had a large fixed size (64) here, primarily as a hack
for QSV use - this is removed and extra_hw_frames will need to be set
for QSV to work since it requires fixed-size pools (as the other cases
do, and which didn't work before).
Previously if ff_outlink_frame_wanted() returned 0 it could dereference a null pointer when trying to read nb_samples.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
comment about the looks like a duplicate line.
but that is used to reason x is expressed from y
Suggested-by: Paul B Mahol
Suggested-by: Michael Niedermayer
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
This is done mainly in preparation for the SIMD patches.
- for the 8-bit input, decrease the blend factor precision to 7-bit.
- for the 16-bit input, increase the blend factor precision to 15-bit.
- make sure the blend functions are not called with 0 or maximum blending
factors, because we don't want the signed factor integers to overflow.
Fate test changes are due to different rounding.
Signed-off-by: Marton Balint <cus@passwd.hu>
Regression since: c6939f65a1
Found-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fix the green output issue when use procamp_vaapi without any
arguments, now if use procamp_vaapi without any arguments, will use
the default value to setting procamp_vaapi.
Signed-off-by: Jun Zhao <jun.zhao@intel.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Most code between them is common, so put them in a new file for
miscellaneous VAAPI filters.
Signed-off-by: Yun Zhou <yunx.z.zhou@intel.com>
Signed-off-by: Jun Zhao <jun.zhao@intel.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Add ProcAmp(color balance) vaapi video filter, use the option
like -vf "procamp_vaapi=b=10:h=120:c=2.8:s=3.7" to set
brightness/hue/contrast/saturation.
Signed-off-by: Yun Zhou <yunx.z.zhou@intel.com>
Signed-off-by: Jun Zhao <jun.zhao@intel.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Re-work the VAAPI common infrastructure to avoid code duplication
between filters.
Signed-off-by: Jun Zhao <jun.zhao@intel.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
When enabled, text, including effects like shadow or box, will be
completely bound within the video frame.
Default value changed to false to keep continuity of behaviour.
Fixes#6960.
Signed-off-by: Kyle Swanson <k@ylo.ph>
libvidstab introduced this variable only for packed formats but in
vf_vidstab*.c, it's checked for all inputs. So the filter errors out for YUV422/444P streams.
Fixes#6736.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
use ffmpeg -h filter=deinterlace_vaapi can't get full help information,
the root cause is not setting the flags fileld in options.
Signed-off-by: Jun Zhao <jun.zhao@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The framerate filter was quite convoluted with some filter_frame /
request_frame logic bugs. It seemed easier to rewrite the whole filter_frame /
request_frame part and also the frame interpolation ratio calculation part in
one step.
Notable changes:
- The filter now only stores 2 frames instead of 3
- filter_frame outputs all the frames it can to be able to handle consecutive
filter_frame calls which previously caused early drops of buffered frames.
- because of this, request_frame is largely simplified and it only outputs
frames on flush. Previously consecuitve request_frame calls could cause the
filter to think it is in flush mode filling its buffer with the same frames
causing a "ghost" effect on the output.
- PTS discontinuities are handled better
- frames with unknown PTS values are now dropped
Fixes ticket #4870.
Probably fixes ticket #5493.
Signed-off-by: Marton Balint <cus@passwd.hu>
It was truncated to int later on anyway. Fate test changes are due to rounding
instead of truncation.
Fixes fate test failures on x86-32 (gcc 4.8 (Ubuntu 4.8.5-2ubuntu1~14.04.1))
after 090b740680.
Signed-off-by: Marton Balint <cus@passwd.hu>
This removes the XP compatibility code, and switches entirely to SWR
locks, which are available starting at Windows Vista.
This removes CRITICAL_SECTION use, which allows us to add
PTHREAD_MUTEX_INITIALIZER, which will be useful later.
Windows XP is hereby not a supported build target anymore. It was
decided in a project vote that this is OK.
It should not be needed for each filter that sets sample aspect ratio
to set it explicitly also for each and every frame, instead that is
automatically done in get_buffer call.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
Also, do not overread input if linesize > width, or linesize is not divisible
by 8, and use the proper rounded width/height for MAFD calculation.
Signed-off-by: Marton Balint <cus@passwd.hu>
This speeds up the filter, and also fixes scene change detection score which is
reduced based on the difference of the current MAFD to the preivous MAFD.
Obviously if we compare two frames twice, the difference will be 0...
Signed-off-by: Marton Balint <cus@passwd.hu>
- normalize score to [0..100] instead of [0..85]
- change the default score to 8.2 to roughly keep existing behaviour
- take into account bit depth
- do not truncate to integer
Signed-off-by: Marton Balint <cus@passwd.hu>