All users (namely HEVC) that use ff_progress_frame_alloc()
should just use ff_thread_get_buffer(). Using
ff_progress_frame_get_buffer() is not a must; it is merely
a convenience wrapper.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit adds support for compiler hints.
While on AMD these are not used/needed, Nvidia benefits from them, and gives
a sizeable 10% speedup on 4k.
This commit also makes it possible for the encoder to choose a different
quantization table on a per-slice basis, as well as adding this capability
to the decoder.
Also, this commit fully fixes decoding of context=1 encoded files.
This reduces the intermediate VRAM used for RGB decoding by a
factor of 100x for 6k video.
This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video.
This is equivalent to what the software decoder does, but with less pointers.
We know the final size before encoding, so we can switch to
ff_get_encode_buffer() which avoids an implicit memcpy().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Most of the encoders is the same. So deduplicate them.
This reduces code size from 22410B to 12637B here.
The data in mpegaudiotab.h is also automatically deduplicated.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The sample rates here have already been checked generically
via CODEC_SAMPLERATES().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The sample rates have already been checked generically
via AVCodec.supported_samplerates.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to avoid the codes table; furthermore, given
that the runs fit into seven bits and level into nine,
one can put them into one int16_t and use as symbols table
in ff_vlc_init_from_lengths().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Initialize a list of 128 pointers at decoder init
instead of using a const list of pointers (which
will be initialized at runtime when libavcodec
is loaded when using pic code with Elf); the former
takes only 128 bytes (+ a bit of initialization code),
the latter 1KiB on 64 bit systems (+3KiB on x64 elf
for relocation information).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, hq_decode_block() zeroed every block (of 128 bytes)
before decoding a block; yet this is suboptimal for all modes,
because all modes need to reset all the blocks they use anyway
and so it should be done in one go for all blocks.
For the alpha mode (where blocks need not be coded) all blocks
are zeroed initially anyway, because decode_block() might not
be doing it, so zeroing there again for the coded blocks is
a waste.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This table is so small (32 elements amounting to 128 bytes)
that it is more efficient size-wise to hardcode it instead
of initializing it at runtime.
Also stop duplicating it in hq_hqa.o and hqx.o.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>