Add dilation parameter in dnn native to support dilated convolution.
Signed-off-by: Xuewei Meng <xwmeng96@gmail.com>
Signed-off-by: Steven Liu <lq@onvideo.cn>
ff_filter_get_nb_threads() respect AVFilterContext.nb_threads and
graph->nb_threads both, in most case, we perfer this API than using
ctx->graph->nb_threads directly.
Reviewed-by: Steven Liu <lq@onvideo.cn>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Add "Leaky_relu" and "None" option in activation function.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Xuewei Meng <xwmeng96@gmail.com>
Signed-off-by: Steven Liu <lq@onvideo.cn>
Add another two padding methods "VALID" and "SAME" as tensorflow,
and keep the existing "SAME_CLAMP_TO_EDGE" method suggested by sr filter.
As "SAME_CLAMP_TO_EDGE"can keep the output with the same size as original input,
and gives a slight better result as mentioned by sr filter.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Xuewei Meng <xwmeng96@gmail.com>
Signed-off-by: Steven Liu <lq@onvideo.cn>
I'm not sure why this was written the way it was originally. We
initialise the plane addresses correctly in hwcontext_cuda so
why try and play games to calculate the plane offsets directly
in this code?
When i converted the filter to use texture objects instead of
texture references, I incorrect dropped the `pixel_size` scaling
factor when setting `pitchInBytes`. `src_pitch` is in pixels and
so must be scaled up.
b3b7ba62 introduced undefined behaviour: A (non-modifiable) string
literal has been assigned to a modifiable string; said string was indeed
modified later via av_strtok.
This of course caused compiler warnings because of the discarded
qualifier; these are in particular fixed by this commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
currently, only float is supported as model input, actually, there
are other data types, this patch adds uint8.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
some models such as ssd, yolo have more than one output.
the clean up code in this patch is a little complex, it is because
that set_input_output_tf could be called for many times together
with ff_dnn_execute_model_tf, we have to clean resources for the
case that the two interfaces are called interleaved.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
Currently, within interface set_input_output, the dims/memory of the tensorflow
dnn model output is determined by executing the model with zero input,
actually, the output dims might vary with different input data for networks
such as object detection models faster-rcnn, ssd and yolo.
This patch moves the logic from set_input_output to execute_model which
is suitable for all the cases. Since interface changed, and so dnn_backend_native
also changes.
In vf_sr.c, it knows it's srcnn or espcn by executing the model with zero input,
so execute_model has to be called in function config_props
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
remove the requirment that the name of DNN model input/output
should be "x"/"y",
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
remove 'else' since there is always 'return' in 'if' scope,
so the code will be clean for later maintenance
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
otherwise, the following check will return error if layer_add_res
is randomly initialized.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
Instead of doing each column one by one, doing several columns
together gives about 30% better performance.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
When asetnsamples uses output samples < input samples, remaining samples build up in the fifo over time.
Fix this by marking the filter as ready again if there are enough samples.
Regression since ef3babb2c7
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
I put this call in by habit, rather than because there was any
actual need. The filter is simply processing frames one after
the other and has no need to synchronise.
malakudi on the devtalk forums noticed a slowdown when using nvenc
with temporal/spatial aq and that the slowdown went away if the
sync call was removed. I also verified that in the basic encoding
case there's an observable speedup.
I also verified that we aren't doing unnecessary sync calls in any
other filter.