from proc_from_frame_to_dnn to ff_proc_from_frame_to_dnn, and
from proc_from_dnn_to_frame to ff_proc_from_dnn_to_frame.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
The OpenVINO model file format changes when OpenVINO goes to a new
release, it does not work if the versions between model file and
runtime are mismatched.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
OpenVINO APIs require specify input size to run the model, while some
OpenVINO model does accept different input size. To enable this feature
adding input_resizable option here for easier use.
Setting bool variable input_resizable to specify if the input can be resizable or not.
input_resizable = 1 means support input resize, aka accept different input size.
input_resizable = 0 (default) means do not support input resize.
Please make sure the inference model does accept different input size
before use this option, otherwise the inference engine may report error(s).
eg: ./ffmpeg -i video_name.mp4 -vf dnn_processing=dnn_backend=openvino:\
model=model_name.xml:input=input_name:output=output_name:\
options=device=CPU\&input_resizable=1 -y output_video_name.mp4
Signed-off-by: Ting Fu <ting.fu@intel.com>
Move openvino model/inference request creation and initialization steps
from ff_dnn_load_model_ov to new function init_model_ov, for later input
resize support.
Signed-off-by: Ting Fu <ting.fu@intel.com>
the default number of batch_size is 1
Signed-off-by: Xie, Lin <lin.xie@intel.com>
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
The prefix for symbols not exported from the library and not
local to one translation unit is ff_ (or FF for types).
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
'void *' is too flexible, since we can derive info from
AVFilterContext*, so we just unify the interface with this data
structure.
Signed-off-by: Xie, Lin <lin.xie@intel.com>
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
function fill_model_input_ov and infer_completion_callback are
extracted, it will help the async execution for reuse.
Signed-off-by: Xie, Lin <lin.xie@intel.com>
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
TensorFlow C library accepts config for session options to
set different parameters for the inference. This patch exports
this interface.
The config is a serialized tensorflow.ConfigProto proto, so we need
two steps to use it:
1. generate the serialized proto with python (see script example below)
the output looks like: 0xab...cd
where 0xcd is the least significant byte and 0xab is the most significant byte.
2. pass the python script output into ffmpeg with
dnn_processing=options=sess_config=0xab...cd
The following script is an example to specify one GPU. If the system contains
3 GPU cards, the visible_device_list could be '0', '1', '2', '0,1' etc.
'0' does not mean physical GPU card 0, we need to try and see.
And we can also add more opitions here to generate more serialized proto.
script example to generate serialized proto which specifies one GPU:
import tensorflow as tf
gpu_options = tf.GPUOptions(visible_device_list='0')
config = tf.ConfigProto(gpu_options=gpu_options)
s = config.SerializeToString()
b = ''.join("%02x" % int(ord(b)) for b in s[::-1])
print('0x%s' % b)
for some cases (for example, super resolution), the DNN model changes
the frame size which impacts the filter behavior, so the filter needs
to know the out frame size at very beginning.
Currently, the filter reuses DNNModule.execute_model to query the
out frame size, it is not clear from interface perspective, so add
a new explict interface DNNModel.get_output for such query.
suppose we have a detect and classify filter in the future, the
detect filter generates some bounding boxes (BBox) as AVFrame sidedata,
and the classify filter executes DNN model for each BBox. For each
BBox, we need to crop the AVFrame, copy data to DNN model input and do
the model execution. So we have to save the in_frame at DNNModel.set_input
and use it at DNNModule.execute_model, such saving is not feasible
when we support async execute_model.
This patch sets the in_frame as execution_model parameter, and so
all the information are put together within the same function for
each inference. It also makes easy to support BBox async inference.
Currently, every filter needs to provide code to transfer data from
AVFrame* to model input (DNNData*), and also from model output
(DNNData*) to AVFrame*. Actually, such transfer can be implemented
within DNN module, and so filter can focus on its own business logic.
DNN module also exports the function pointer pre_proc and post_proc
in struct DNNModel, just in case that a filter has its special logic
to transfer data between AVFrame* and DNNData*. The default implementation
within DNN module is used if the filter does not set pre/post_proc.
Before patch, fate test for dnn may fail in some Windows environment
while succeed in my Linux. The bug was caused by a wrong loop boundary.
After patch, fate test succeed in my windows mingw 64-bit.
Signed-off-by: Xu Jun <xujunzz@sjtu.edu.cn>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Before patch, memory was allocated in each thread functions,
which may cause more than one time of memory allocation and
cause crash.
After patch, memory is allocated in the main thread once,
an index was parsed into thread functions. Bug fixed.
Signed-off-by: Xu Jun <xujunzz@sjtu.edu.cn>
show all input/output names when the input or output name not correct
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Found via ASAN with the dnn-layer-conv2d FATE-test.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Use pthread to multithread dnn_execute_layer_conv2d.
Can be tested with command "./ffmpeg_g -i input.png -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y:options=conv2d_threads=23 \
-y sr_native.jpg -benchmark"
before patch: utime=11.238s stime=0.005s rtime=11.248s
after patch: utime=20.817s stime=0.047s rtime=1.051s
on my 3900X 12c24t @4.2GHz
About the increase of utime, it's because that CPU HyperThreading
technology makes logical cores twice of physical cores while cpu's
counting performance improves less than double. And utime sums
all cpu's logical cores' runtime. As a result, using threads num
near cpu's logical core's number will double utime, while reduce
rtime less than half for HyperThreading CPUs.
Signed-off-by: Xu Jun <xujunzz@sjtu.edu.cn>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Unify all error return as DNN_ERROR, in order to cease model executing
when return error in ff_dnn_execute_model_native layer_func.pf_exec
Signed-off-by: Ting Fu <ting.fu@intel.com>
currently, output is set both at DNNModel.set_input_output and
DNNModule.execute_model, it makes sense that the output name is
provided at model inference time so all the output info is set
at a single place.
and so DNNModel.set_input_output is renamed to DNNModel.set_input
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>