start_code_size depends on whether PS comes from out-of-band or
in-band. Make the code more readable.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
This avoids SEI and IDR recovery flags affecting each other
Also eliminate litteral numbers from recovery handling
This should make the code clearer
Improves: tickets/4738/tickets_cut.ts
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is only supported at compilation time. If Zfhmin is supported, then
conversions are fast, which is what the flag is used for. At this time,
run-tiem detection is not possible, as in not supported by Linux. But even
if it were, the current FFmpeg approach seems unable to deal with it (same
problem as on x86, really).
In this case, the inner loop computing the scalar product can be reduced
to just one multiplication and one sum even with 128-bit vectors. The
result is a lot simpler, but also brings more modest performance gains:
flac_lpc_16_13_c: 15241.0
flac_lpc_16_13_rvv_i32: 11230.0
flac_lpc_16_16_c: 17884.0
flac_lpc_16_16_rvv_i32: 12125.7
flac_lpc_16_29_c: 27847.7
flac_lpc_16_29_rvv_i32: 10494.0
flac_lpc_16_32_c: 30051.5
flac_lpc_16_32_rvv_i32: 10355.0
The entire set of 32 coefficients and corresponding past 32 samples can
fit in a single vector (with LMUL=8) exactly, but... since widening
double the needed vector sizes, we still end up too short with 128-bit
vectors. This adds a very simple version for future 256+-bit hardware,
and for pred_orders values up to 16, and a bit more involved loop for
for 128-bit hardware with pred_orders between 17 and 32.
With 128-bit hardware, the benchmarks look like this:
flac_lpc_32_13_c: 30152.0
flac_lpc_32_13_rvv_i32: 10244.7
flac_lpc_32_16_c: 37314.2
flac_lpc_32_16_rvv_i32: 10126.2
flac_lpc_32_29_c: 61910.0
flac_lpc_32_29_rvv_i32: 14495.2
flac_lpc_32_32_c: 68204.0
flac_lpc_32_32_rvv_i32: 13273.7
decorrelate_ls, _rs and _ms are decorrelate[1], [2] and [3] respectively.
The code ended up testing indep ([0]) as twice, ms never, and misnaming
the other two.
Segmented loads are slow, so here we use unit-strided load and narrowing shifts.
c910:
fcmul_add_c: 2179
fcmul_add_rvv_f64: 1652
c908:
fcmul_add_c: 4891.2
fcmul_add_rvv_f64: 2399.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
This commit adds support for VP8 bitstream read methods to the cbs
codec. This enables the trace_headers bitstream filter to support VP8,
in addition to AV1, H.264, H.265, and VP9. This can be useful for
debugging VP8 stream issues.
The CBS VP8 implements a simple VP8 boolean decoder using GetBitContext
to read the bitstream.
Only the read methods `read_unit` and `split_fragment` are implemented.
The write methods `write_unit` and `assemble_fragment` return the error
code AVERROR_PATCHWELCOME. This is because CBS VP8 write is unlikely to
be used by any applications at the moment. The write methods can be
added later if there is a real need for them.
TESTS: ffmpeg -i fate-suite/vp8/frame_size_change.webm -vcodec copy
-bsf:v trace_headers -f null -
Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This commit exports the `vp8_token_update_probs` variable to internal
library scope to facilitate its reuse within the library.
Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Better performance can probably be achieved with a more intricate
unrolled loop, but this is a start:
add_hfyu_left_pred_bgr32_c: 15084.0
add_hfyu_left_pred_bgr32_rvv_i32: 10280.2
This would actually be cleaner with the RISC-V P extension, but that is
not ratified yet (I think?) and usually not supported if V is supported.