Still much left to optimize, but it provides a significant performance
improvement - 10% for 300Mbps (1080p30), 25% for 1.5Gbps (4k 60fps) in
comparison with the default implementation.
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
Now coefficients are written to a buffer and are then dequantized by the
new SIMD dequantization functions. For the lower bands without enough
coefficients to fill a register (and hence they overwrite) the C version
of the dequantization function is used.
The buffer is per-thread and will be realloc'd if anything changes.
This prevents regressions and having to limit slice size.
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
Prevents having to have random magic values in the decoder and a
separate macro in the encoder.
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
In preparation for the following commits, this commit simplifies the
coefficient parsing and dequantization function. It was needlessly
inlined without much performance gain.
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
The version structure in the main decoder context was not (and
apparently has never been) populated since it was added.
Still, having VC-2 break the existing Dirac Low Delay mode was odd and
easily avoidable had the specifications authors noticed/cared.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
The DSP lacked a function needed to convert signed to unsigned. This was
ignored when originally adding support and templating for bit depths
greater than 8. The 10 bit function was used for 12 bit pictures and
resulted in an improper conversion.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
It seems the previous tables where calculated with 32bit integers ignoring
overflows.
Also check for the max qindex, the value is choosen so that the qfactor/offset
fit in int32.
Fixes: 070b7914fd5dfe8f93248bea71363410/asan_static-oob_c8d034_2764_258e20f4a3c79158aecddb61a833d756.drc
Fixes out of array reads
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This avoids closing and opening the bit reader
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The transformation to bytes must happen after alignment to get the same
resulting pointers as before.
This fixes segmentation faults in the assembler code.
The regression was introduced in commit 9553689.
Reviewed-by: Kieran Kunhya <kierank@obe.tv>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
* commit 'e02de9df4b218bd6e1e927b67fd4075741545688':
lavc: export Dirac parsing API used by the ogg demuxer as public
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
Fixes null pointer dereference
Fixes: signal_sigsegv_b02a96_280_RL_420p_ffdirac.drc with memlimit of 67108864
Found-by: Samuel Groß, Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes CID1271788
with this change the value is more explicitly checked, it was fully checked
before though
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
In init_planes p->xblen and p->yblen are set to:
p->xblen = s->plane[0].xblen >> s->chroma_x_shift;
p->yblen = s->plane[0].yblen >> s->chroma_y_shift;
These are later used as block_w and block_h arguments of
s->vdsp.emulated_edge_mc. If one of them is 0 it triggers an av_assert2
in emulated_edge_mc:
av_assert2(start_x < end_x && block_w > 0);
av_assert2(start_y < end_y && block_h > 0);
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
previously various variables had a too small type to support the required 32bit unsigned
range allowed from the spec
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>