This fixes corrupted audio for applications relying on ch_layout when
codec downmixing is active.
Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Applications making use of this codec with the `downmix` option are
segfaulting unless the `ch_layout` is overridden after `avcodec_open2`
as can be seen in projects like MythTV[1]
This patch fixes this by overriding the ch_layout as done in other
decoders such as AC3.
1: af6f362a14/mythtv/libs/libmythtv/decoders/avformatdecoder.cpp (L4607)
Signed-off-by: Geoffrey McRae <geoff@hostfission.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The input is laid out in 16 segments, of which 13 actually need to be
loaded. There are no really efficient ways to deal with this:
1) If we load 8 segments wit unit stride, then narrow to 16 segments with
right shifts, we can only get one half-size vector per segment, or just 2
elements per vector (EMUL=1/2) - at least with 128-bit vectors.
This ends up unsurprisingly about as fas as the C code.
2) The current approach is to load with strides. We keep that approach,
but improve it using three 4-segmented loads instead of 12 single-segment
loads. This divides the number of distinct loaded addresses by 4.
3) A potential third approach would be to avoid segmentation altogether
and splat the scalar coefficient into vectors. Then we can use a
unit-stride and maximum EMUL. But the downside then is that we have to
multiply the 3 (of 16) unused segments with zero as part of the
multiply-accumulate operations.
In addition, we also reuse vectors mid-loop so as to increase the EMUL
from 1 to 2, which also improves performance a little bit.
Oeverall the gains are quite small with the device under test, as it does
not deal with segmented loads very well. But at least the code is tidier,
and should enjoy bigger speed-ups on better hardware implementation.
Before:
ps_hybrid_analysis_c: 1819.2
ps_hybrid_analysis_rvv_f32: 1037.0 (before)
ps_hybrid_analysis_rvv_f32: 990.0 (after)
This stores the constant coefficients deinterleaved, so that they can be
loaded directly with NF=0. Unfortunately, we cannot optimise loading the
input, due to insufficient memory alignment (not 32-bit).
Before:
g722_apply_qmf_c: 82.5
g722_apply_qmf_rvv_i32: 78.2
After:
g722_apply_qmf_c: 82.5
g722_apply_qmf_rvv_i32: 65.2
Validate that a hw_frames_ctx is available before using it for
the AVHWAccel.free_frame_priv callback, and don't require it to
be present when the callback is not in use by the HWAccel.
v2: check for free_frame_priv (Hendrik)
v3: return EINVAL (Christoph Reiter)
v4: better commit message (Hendrik)
v5: fix typo with missed frames_ctx (Lynne)
See[1]: https://github.com/msys2/MINGW-packages/pull/19050
Fixes: be07145109 ("avcodec: add AVHWAccel.free_frame_priv callback")
CC: Lynne <dev@lynne.ee>
CC: Christoph Reiter <reiter.christoph@gmail.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
For fate-h264_mp4toannexb_ticket5927 and
fate-h264_mp4toannexb_ticket5927_2, they work by accident
previously. The sample file has two 'avc1' entries, and video
samples use the second one. It means packets should be decoded with
new extradata in side data. Before this patch, only extradata was
kept in the output, new extradata has been dropped. The output can
be decoded because the two extradata are almost the same, except
level indication. This patch fixed the issue, and add another
fate test.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
If there is a single group of SPS/PPS before an IDR frame, but no
SPS/PPS after that, we will miss the chance to reset
idr_sps_seen/idr_pps_seen. No SPS/PPS are inserted afterwards.
This patch saves in-band SPS/PPS and insert them before IDR frames
when necessary.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
start_code_size depends on whether PS comes from out-of-band or
in-band. Make the code more readable.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
This avoids SEI and IDR recovery flags affecting each other
Also eliminate litteral numbers from recovery handling
This should make the code clearer
Improves: tickets/4738/tickets_cut.ts
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In this case, the inner loop computing the scalar product can be reduced
to just one multiplication and one sum even with 128-bit vectors. The
result is a lot simpler, but also brings more modest performance gains:
flac_lpc_16_13_c: 15241.0
flac_lpc_16_13_rvv_i32: 11230.0
flac_lpc_16_16_c: 17884.0
flac_lpc_16_16_rvv_i32: 12125.7
flac_lpc_16_29_c: 27847.7
flac_lpc_16_29_rvv_i32: 10494.0
flac_lpc_16_32_c: 30051.5
flac_lpc_16_32_rvv_i32: 10355.0
The entire set of 32 coefficients and corresponding past 32 samples can
fit in a single vector (with LMUL=8) exactly, but... since widening
double the needed vector sizes, we still end up too short with 128-bit
vectors. This adds a very simple version for future 256+-bit hardware,
and for pred_orders values up to 16, and a bit more involved loop for
for 128-bit hardware with pred_orders between 17 and 32.
With 128-bit hardware, the benchmarks look like this:
flac_lpc_32_13_c: 30152.0
flac_lpc_32_13_rvv_i32: 10244.7
flac_lpc_32_16_c: 37314.2
flac_lpc_32_16_rvv_i32: 10126.2
flac_lpc_32_29_c: 61910.0
flac_lpc_32_29_rvv_i32: 14495.2
flac_lpc_32_32_c: 68204.0
flac_lpc_32_32_rvv_i32: 13273.7
This commit adds support for VP8 bitstream read methods to the cbs
codec. This enables the trace_headers bitstream filter to support VP8,
in addition to AV1, H.264, H.265, and VP9. This can be useful for
debugging VP8 stream issues.
The CBS VP8 implements a simple VP8 boolean decoder using GetBitContext
to read the bitstream.
Only the read methods `read_unit` and `split_fragment` are implemented.
The write methods `write_unit` and `assemble_fragment` return the error
code AVERROR_PATCHWELCOME. This is because CBS VP8 write is unlikely to
be used by any applications at the moment. The write methods can be
added later if there is a real need for them.
TESTS: ffmpeg -i fate-suite/vp8/frame_size_change.webm -vcodec copy
-bsf:v trace_headers -f null -
Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This commit exports the `vp8_token_update_probs` variable to internal
library scope to facilitate its reuse within the library.
Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Better performance can probably be achieved with a more intricate
unrolled loop, but this is a start:
add_hfyu_left_pred_bgr32_c: 15084.0
add_hfyu_left_pred_bgr32_rvv_i32: 10280.2
This would actually be cleaner with the RISC-V P extension, but that is
not ratified yet (I think?) and usually not supported if V is supported.
With explicit unrolling, we can skip half of the sign bit flips, and
the compiler is then better able to optimise the scalar loop:
predictor_c: 31376.0 (before)
predictor_c: 23703.0 (after)
This is restricted to 128-bit vectors as larger vector sizes could read
past the end of the noise array. Support for future hardware with larger
vector sizes is left for some other time.
hf_apply_noise_0_c: 2319.7
hf_apply_noise_0_rvv_f32: 1229.0
hf_apply_noise_1_c: 2539.0
hf_apply_noise_1_rvv_f32: 1244.7
hf_apply_noise_2_c: 2319.7
hf_apply_noise_2_rvv_f32: 1232.7
hf_apply_noise_3_c: 2541.2
hf_apply_noise_3_rvv_f32: 1244.2
This commit fixes issue with missing SPS/PPS headers in video
encoded by AMF AV1 encoder.
Missing headers leads to broken seek in MPV video player.
Default value for property AV1_HEADER_INSERTION_MODE shouldn't be setup
to NONE (no headers insertion). We need to skip definition of this property,
because default value depends on USAGE property.
Signed-off-by: Dmitrii Ovchinnikov <ovchinnikov.dmitrii@gmail.com>