We have a pattern of wrapping CUDA calls to print errors and
normalise return values that is used in a couple of places. To
avoid duplication and increase consistency, let's put the wrapper
implementation in a shared place and use it everywhere.
Affects:
* avcodec/cuviddec
* avcodec/nvdec
* avcodec/nvenc
* avfilter/vf_scale_cuda
* avfilter/vf_scale_npp
* avfilter/vf_thumbnail_cuda
* avfilter/vf_transpose_npp
* avfilter/vf_yadif_cuda
This reverts commit 7d4e1f7cfb667585514bfa0a4d0fee2f717a93ed.
Accidentially pushed this with a batch of other patches, and it didn't
seem to break anything, so I went with it.
Except it does, so reverting it it is.
The patch enables dynamic bitrate through ReconfigureEncoder method
from nvenc API.
This is useful for live streaming in case of network congestion.
Signed-off-by: pkviet <pkv.stream@gmail.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
If there is input like DVB-T streams it can change aspect ratio
on-the-fly, so nvenc should respect this change and change aspect ratio
in encoder.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
nvenc doesn't support P016, but we have two problems today:
1) We declare support for YUV444P16 which nvenc also doesn't support.
We do this because it's the only pix_fmt we have that can
approximate nvenc's internal format that is YUV444P10 with data in
MSBs instead of LSBs. Because the declared format is a 16bit one,
it will be preferrentially chosen when encoding >10bit content,
but that content will normally be YUV420P12 or P016 which should
get mapped to P010 and not YUV444P10.
2) Transcoding P016 content with nvenc should be possible in a pure
hardware pipeline, and that can't be done if nvenc doesn't say it
accepts P016. By mapping it to P010, we can use it, albeit with
truncation. I have established that swscale doesn't know how to
dither to 10bits so we'd get truncation anyway, even if we tried
to do this 'properly'.
Currently the resource is only ever unregistered when the
registered_frames array is fully in use and an unmapped entry is re-used
and cleaned up.
I'm pretty sure the frame will have been cleaned up before that happens,
so I'm kinda surprised this never blew up.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
If some logic like vsync in ffmpeg.c duplicates frames, it might pass
the same frame twice, which will result in a crash due it being
effectively mapped and unmapped twice.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
In function process_output_surface(), the return value is 0 on the path
that av_mallocz() returns a NULL pointer. 0 indicates success, which
deviates from the fact. Return "AVERROR(ENOMEM)" instead of "0".
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Interlaced encoding profits from it, or might even need it in some
players.
No harm in enabling it unconditionally.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
hw accelerated transcode (h264_cuvid -> h264_nvenc with -hwaccel cuvid) was
broken after the filtergraph initialization was changed to intialize decoder
first followed by encoder (commit af1761f7b5b1b72197dc40934953b775c2d951cc).
During initialzing encoder with bframes, local buffers are allocated
internally in encoder which fails since no cuda context is available. Now
pushing the correct cuda context before encoder initialization fixes the issue.
Also adding push/pop cuda ctx during create/destroy/map/unmap resources and
destroy encoder session.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This patch aims to reduce the number of input/output surfaces
NVENC allocates per session. Previous default sets allocated surfaces to 32
(unless there is user specified param or lookahead involved). Having large
number of surfaces consumes extra video memory (esp for higher resolution
encoding). The patch changes the surfaces calculation for default, B-frames,
lookahead scenario respectively.
The other change involves surface selection. Previously, if a session
allocates x surfaces, only x-1 surfaces are used (due to combination
of output delay and lock toggle logic). To prevent unused surfaces,
changing surface rotation to using predefined fifo.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
qmin and qmax are not necessary for nvenc vbr.
Enforcing this constraint, doesn't allow user to use vbr 2 pass mode without explicity setting the qmin and qmax options
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Round qpIntra and qpInter calculation instead of old floor behavior.
Adopted from vaapi_encode_h264.c
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>