since tf.pad is enabled, the conv2d(valid) changes back to its original behavior.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
the reason to add this layer first is that vf_sr uses it in its
tensorflow model, and the next plan is to update the python script
to convert tf.pad into native model.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
it is expected that there will be more files to support native mode,
so put all the dnn codes under libavfilter/dnn
The main change of this patch is to move the file location, see below:
modified: libavfilter/Makefile
new file: libavfilter/dnn/Makefile
renamed: libavfilter/dnn_backend_native.c -> libavfilter/dnn/dnn_backend_native.c
renamed: libavfilter/dnn_backend_native.h -> libavfilter/dnn/dnn_backend_native.h
renamed: libavfilter/dnn_backend_tf.c -> libavfilter/dnn/dnn_backend_tf.c
renamed: libavfilter/dnn_backend_tf.h -> libavfilter/dnn/dnn_backend_tf.h
renamed: libavfilter/dnn_interface.c -> libavfilter/dnn/dnn_interface.c
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
This patch does not make other pixel formats usable yet to make sure the test
result is the same with rgb32 format.
Reviewed-by: Marton Balint <cus@passwd.hu>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Changes to vf_drawtext.c written by
Calvin Walton <calvin.walton@kepstin.ca>
Changes to filters.texi written by
greg Luce <electron.rotoscope@gmail.com>
with lots of help from Moritz Barsnick and Gyan
Fixes#7947.
The linearize function (usually refered to EOTF) is the inverse of
delinearize function (usually referred to OETF). Demarcation point of
EOTF should be beta*delta, but the actual value used now in the source
code is beta.
For ITU Rec.709, they are 0.081 (0.018*4.5) and 0.018 respectively
(beta = 0.018 and delta = 4.5), and they correspond to pixel value 5
and 21 for an 8-bit image. Linearized result of pixel within that range
(5-21) will be different, but this commit will make linearize function
of the filter more accurate in the mathematical sense.
Signed-off-by: Yonglin Luo <vincenluo@tencent.com>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
low_power mode will use a fixed HW engine (SFC), thus can offload EU usage.
high quality mode will take EU usage (AVS sampler).
Performance and EU usage (Render usage) comparsion on Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz:
High quality mode : ffmpeg -hwaccel qsv -c:v h264_qsv -i bbb_sunflower_1080p_30fps_normal_2000frames.h264 \
-vf scale_qsv=w=1280:h=736:mode=hq -f null -
fps=389
RENDER usage: 28.10 (provided by MSDK metrics_monitor)
Low Power mode: ffmpeg -hwaccel qsv -c:v h264_qsv -i ~/bbb_sunflower_1080p_30fps_normal_2000frames.h264 \
-vf scale_qsv=w=1280:h=736:mode=low_power -f null -
fps=343
RENDER usage: 0.00
Low power mode (SFC) may be disabled if not supported by
MSDK/Driver/HW, and replaced by AVS mode interanlly.
Signed-off-by: Zhong Li <zhong.li@intel.com>
Redundant condition: '!A || B' is equivalent to '!A || (A && B)' but
more clearly.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
1. Currently output format is hard-coded as NV12, thus means
CSC is always done for not NV12 input such as P010.
Follow original input format as default output.
2. Add an option to specify output format.
Signed-off-by: Zhong Li <zhong.li@intel.com>
The horizontal pass get ~2x performance with the patch
under single thread.
Tested overall performance using the command(avx2 enabled):
./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
For single thread, the fps improves from 43 to 60, about 40%.
For multi-thread, the fps improves from 110 to 130, about 20%.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Remove the rain in the input image/video by applying the derain
methods based on convolutional neural networks. Training scripts
as well as scripts for model generation are provided in the
repository at https://github.com/XueweiMeng/derain_filter.git.
Signed-off-by: Xuewei Meng <xwmeng96@gmail.com>
We perfer the coding style like:
/* some stuff */
if (error) {
/* error handling */
return -(errorcode);
}
/* normal actions */
do_something()
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
benchmarking with a simple command:
ffmpeg -i 1080p.mp4 -vf unsharp=la=3:ca=3 -an -f null /dev/null
with the patch, the fps increase from 50 to 120 on my local machine (i7-6770HQ).
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Used the command for 1080p h264 clip as follow:
a). ffmpeg -i input -vf lutyuv="u=128:v=128" -f null /dev/null
b). ffmpeg -i input -vf lutrgb="g=0:b=0" -f null /dev/null
after enabled the slice threading, the fps change from:
a). 144fps to 258fps (lutyuv)
b). 94fps to 153fps (lutrgb)
in Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Add slice threading support, use the command like:
./ffmpeg -i input -vf colorlevels -f null /dev/null
with 1080p h264 clip, the fps from 39 fps to 79 fps
in the local(Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz)
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Attempts to pick the set of supported colour properties best matching the
input. Output is then set with the same values, except for the colour
matrix which may change when converting between RGB and YUV.
Fixes two warnings:
libavfilter/avf_showspatial.c:157:26: warning: variable ‘w’ set but not used
libavfilter/avf_showspatial.c:157:23: warning: variable ‘h’ set but not used
Currently, picref will be freed by calling av_frame_free(&picref) in
submit_frame() in qsvvpp.c when working in system memory mode,and normally it
is freed in filter_frame() in vf_vpp_qsv.c when working in other modes.
Double free happens when working in system memory mode, remove to
fix the memory issue.
Reproduce:
ffmpeg -init_hw_device qsv=foo -filter_hw_device foo -f rawvideo -pix_fmt nv12 -s:v 852x480 \
-i 852x480.nv12 -vf 'vpp_qsv=w=500:h=400' -f rawvideo -pix_fmt nv12 qsv.nv12
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Zhong Li <zhong.li@intel.com>
Fixes infinte loop with -vf loop=loop=1 and also fixes looping when the input
is less frames than the specified loop size.
Possible regressions since ef1aadffc7.
Signed-off-by: Marton Balint <cus@passwd.hu>
Add dilation parameter in dnn native to support dilated convolution.
Signed-off-by: Xuewei Meng <xwmeng96@gmail.com>
Signed-off-by: Steven Liu <lq@onvideo.cn>
ff_filter_get_nb_threads() respect AVFilterContext.nb_threads and
graph->nb_threads both, in most case, we perfer this API than using
ctx->graph->nb_threads directly.
Reviewed-by: Steven Liu <lq@onvideo.cn>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Add "Leaky_relu" and "None" option in activation function.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Xuewei Meng <xwmeng96@gmail.com>
Signed-off-by: Steven Liu <lq@onvideo.cn>
Add another two padding methods "VALID" and "SAME" as tensorflow,
and keep the existing "SAME_CLAMP_TO_EDGE" method suggested by sr filter.
As "SAME_CLAMP_TO_EDGE"can keep the output with the same size as original input,
and gives a slight better result as mentioned by sr filter.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Xuewei Meng <xwmeng96@gmail.com>
Signed-off-by: Steven Liu <lq@onvideo.cn>
I'm not sure why this was written the way it was originally. We
initialise the plane addresses correctly in hwcontext_cuda so
why try and play games to calculate the plane offsets directly
in this code?
When i converted the filter to use texture objects instead of
texture references, I incorrect dropped the `pixel_size` scaling
factor when setting `pitchInBytes`. `src_pitch` is in pixels and
so must be scaled up.
b3b7ba62 introduced undefined behaviour: A (non-modifiable) string
literal has been assigned to a modifiable string; said string was indeed
modified later via av_strtok.
This of course caused compiler warnings because of the discarded
qualifier; these are in particular fixed by this commit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
currently, only float is supported as model input, actually, there
are other data types, this patch adds uint8.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
some models such as ssd, yolo have more than one output.
the clean up code in this patch is a little complex, it is because
that set_input_output_tf could be called for many times together
with ff_dnn_execute_model_tf, we have to clean resources for the
case that the two interfaces are called interleaved.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
Currently, within interface set_input_output, the dims/memory of the tensorflow
dnn model output is determined by executing the model with zero input,
actually, the output dims might vary with different input data for networks
such as object detection models faster-rcnn, ssd and yolo.
This patch moves the logic from set_input_output to execute_model which
is suitable for all the cases. Since interface changed, and so dnn_backend_native
also changes.
In vf_sr.c, it knows it's srcnn or espcn by executing the model with zero input,
so execute_model has to be called in function config_props
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
remove the requirment that the name of DNN model input/output
should be "x"/"y",
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
remove 'else' since there is always 'return' in 'if' scope,
so the code will be clean for later maintenance
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
otherwise, the following check will return error if layer_add_res
is randomly initialized.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
Instead of doing each column one by one, doing several columns
together gives about 30% better performance.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Ruiling Song <ruiling.song@intel.com>