This commit adds a decoder for the frequency-domain part of USAC.
What works:
- Mono
- Stereo (no prediction)
- Stereo (mid/side coding)
- Stereo (complex prediction)
What's left:
- SBR
- Speech coding
Known issues:
- Desync with certain sequences
- Preroll crossover missing (shouldn't matter, bitrate adaptation only)
AAC uses an unconventional system to send scalefactors
(the volume+quantization value for each band).
Each window is split into either 1 or 8 blocks (long vs short),
and transformed separately from one another, with the coefficients
for each being also completely independent. The scalefactors
slightly increase from 64 (long) to 128 (short) to accomodate
better per-block-per-band volume for each window.
To reduce overhead, the codec signals scalefactor sizes in an obtuse way,
where each group's scalefactor types are sent via a variable length decoding,
with a range.
But our decoder was written in a way where those ranges were carried through
the entire decoder, and to actually read them you had to use the range.
Instead of having a dedicated array with a range for each scalefactor,
just let the decoder directly index each scalefactor.
This also switches the form of quantized scalefactors to the format
the spec uses, where for intensity stereo and regular, scalefactors
are stored in a scalefactor - 100 form, rather than as-is.
USAC gets rid of the complex scalefactor handling. This commit permits
for code sharing between both.
The 8x8 pixel arrays are not necessarily aligned to 64 bits, so the
current code leads to Bus error on real hardware. This reproducible
with FATE's vc1_ilaced_twomv test case.
The new "pessimist" code can trivially be shared for 16x16 pixel
arrays so we also do that. FWIW, this also nominally reduces the
hardware requirement from Zve64x to Zve32x.
T-Head C908:
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c: 14.7
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 3.5
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c: 3.7
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i32: 1.5
SpacemiT X60:
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c: 13.0
vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 3.0
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c: 3.2
vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i32: 1.2
The function pointer is appended to the structure for backward binary
compatibility. Fortunately, this is allocated by libavutil, not by the
user, so increasing the structure size is safe.
hf_apply_noise_0_c: 35.7
hf_apply_noise_0_rvv_f32: 9.5
hf_apply_noise_1_c: 38.5
hf_apply_noise_1_rvv_f32: 10.0
hf_apply_noise_2_c: 35.5
hf_apply_noise_2_rvv_f32: 9.7
hf_apply_noise_3_c: 38.5
hf_apply_noise_3_rvv_f32: 10.0
Maybe extending the noise table manually is not such great idea, but I
not quite sure how to deal with that otherwise? Allocating the table
dynamically is possible but would require an ELF destructor to clean up.
They are only used in vulkan_hevc and are not actually needed, as they
can be computed from delta_poc.
Reduces sizeof(HEVCSPS) by 16kB.
Also, fix a typo (s0->s1) in the code being touched.
It is actually supposed to go negative in the loop over num_negative
pics, but underflow does not break anything as the result is then
assigned to a signed int.