Add "Leaky_relu" and "None" option in activation function.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Xuewei Meng <xwmeng96@gmail.com>
Signed-off-by: Steven Liu <lq@onvideo.cn>
Add another two padding methods "VALID" and "SAME" as tensorflow,
and keep the existing "SAME_CLAMP_TO_EDGE" method suggested by sr filter.
As "SAME_CLAMP_TO_EDGE"can keep the output with the same size as original input,
and gives a slight better result as mentioned by sr filter.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Xuewei Meng <xwmeng96@gmail.com>
Signed-off-by: Steven Liu <lq@onvideo.cn>
I'm not sure why this was written the way it was originally. We
initialise the plane addresses correctly in hwcontext_cuda so
why try and play games to calculate the plane offsets directly
in this code?
When i converted the filter to use texture objects instead of
texture references, I incorrect dropped the `pixel_size` scaling
factor when setting `pitchInBytes`. `src_pitch` is in pixels and
so must be scaled up.
b3b7ba62 introduced undefined behaviour: A (non-modifiable) string
literal has been assigned to a modifiable string; said string was indeed
modified later via av_strtok.
This of course caused compiler warnings because of the discarded
qualifier; these are in particular fixed by this commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
currently, only float is supported as model input, actually, there
are other data types, this patch adds uint8.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
some models such as ssd, yolo have more than one output.
the clean up code in this patch is a little complex, it is because
that set_input_output_tf could be called for many times together
with ff_dnn_execute_model_tf, we have to clean resources for the
case that the two interfaces are called interleaved.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
Currently, within interface set_input_output, the dims/memory of the tensorflow
dnn model output is determined by executing the model with zero input,
actually, the output dims might vary with different input data for networks
such as object detection models faster-rcnn, ssd and yolo.
This patch moves the logic from set_input_output to execute_model which
is suitable for all the cases. Since interface changed, and so dnn_backend_native
also changes.
In vf_sr.c, it knows it's srcnn or espcn by executing the model with zero input,
so execute_model has to be called in function config_props
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
remove the requirment that the name of DNN model input/output
should be "x"/"y",
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
remove 'else' since there is always 'return' in 'if' scope,
so the code will be clean for later maintenance
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
otherwise, the following check will return error if layer_add_res
is randomly initialized.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
Instead of doing each column one by one, doing several columns
together gives about 30% better performance.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
When asetnsamples uses output samples < input samples, remaining samples build up in the fifo over time.
Fix this by marking the filter as ready again if there are enough samples.
Regression since ef3babb2c7
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
I put this call in by habit, rather than because there was any
actual need. The filter is simply processing frames one after
the other and has no need to synchronise.
malakudi on the devtalk forums noticed a slowdown when using nvenc
with temporal/spatial aq and that the slowdown went away if the
sync call was removed. I also verified that in the basic encoding
case there's an observable speedup.
I also verified that we aren't doing unnecessary sync calls in any
other filter.
The lensfun filter wraps the lensfun library which performs
transformations on videos to correct for lens distortion. Often this
results in areas in the input being mapped to areas that fall outside
the boundaries of the output. The library has a parameter called scale
which is a scale factor applied to the output video. By decreasing it it
is possible to regain the areas of the video which would otherwise have
been lost. There is a special value of 0 which indicates that the
library should automatically determine a scale factor that results in
the output frame being filled (i.e. little or no black/unmapped areas).
This patch adds a corresponding scale option to the lensfun filter which
is passed through to the library. The existing behaviour of using the
automatic value of 0 is retained as the default behaviour, while other
values will be passed through to the library.
Signed-off-by: Daniel Playfair Cal <daniel.playfair.cal@gmail.com>
Some filters may not need to do linearize/delinearize, thus
will even not define them. Add ifdef check, so they could easily
re-use the .cl file.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
These functions can be reused by other colorspace filters,
so move them to common file. No functional changes.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
The channel loop is now the outer loop for both planar and interleaved. This is
needed by the next patch, and the speed difference is negligable if any.
Signed-off-by: Marton Balint <cus@passwd.hu>
set_metadata with many entries is not very efficient, and with small audio
frames the performance loss is noticable. Also with this very simple
calculations (like peak) can be even further optimized.
Unfoturnately there are some small differences in metadata and av_log info
output, so factorizing calculations and output might not worth the hassle.
Signed-off-by: Marton Balint <cus@passwd.hu>
Set specific field for repeat in PicStruct if the frame has repeat
flag.
Match the CheckInputPicStruct in MSDK.
Fix#7701.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Zhong Li <zhong.li@intel.com>
This change switches the vf_thumbnail_cuda filter from using the
full cuda sdk to using the ffnvcodec headers and loader.
Most of the change is a direct mapping, but I also switched from
using texture references to using texture objects. This is supposed
to be the preferred way of using textures, and the texture object API
is the one I added to ffnvcodec.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This change switches the vf_scale_cuda filter from using the
full cuda sdk to using the ffnvcodec headers and loader.
Most of the change is a direct mapping, but I also switched from
using texture references to using texture objects. This is supposed
to be the preferred way of using textures, and the texture object API
is the one I added to ffnvcodec.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This change switches the vf_thumbnail_cuda filter from using the
full cuda sdk to using the ffnvcodec headers and loader.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
If we fill with black then the generated palette will have one color more
than what the user requested. This also resulted in unwanted black specks in
the output of paletteuse, especially when generating small palettes.
Fix build warning like "warning: ISO C90 forbids mixed declarations
and code" after adjust the location for malloc fail check.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Need to check malloc fail before using it, so adjust the location
in the code.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Remove the pdiff_lut_scale in nlmeans and increase weight_lut table size
from 2^9 to 500000, this change will avoid using pdiff_lut_scale in
nlmeans_slice() for weight_lut table search, improving the performance
by about 12%. (in 1080P size picture case).
Use the profiling command like:
perf stat -a -d -r 5 ./ffmpeg -i input -an -vf nlmeans=s=30 -vframes 10 \
-f null /dev/null
without this change:
when s=1.0(default value) 63s
s=30.0 72s
after this change:
s=1.0(default value) 56s
s=30.0 63s
Reviewed-by: Carl Eugen Hoyos <ceffmpeg@gmail.com>
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Signed-off-by: Clément Bœsch <u@pkh.me>
The timestamp of the changed input frame as well as its relevant
properties can be examined by the user. Only applicable when
reinit_filter is disabled on the input stream.
fcmul_add_c: 1228.8
fcmul_add_sse3: 334.3
fcmul_add_avx: 186.3
Tested on a Core i5 4460 @ 3.2GHz
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>