This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.
This is cherrypicked from libav commit
b0806088d3.
Signed-off-by: Martin Storsjö <martin@martin.st>
This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.
This is cherrypicked from libav commit
e18c39005a.
Signed-off-by: Martin Storsjö <martin@martin.st>
Previously we first calculated hev, and then negated it.
Since we were able to schedule the negation in the middle
of another calculation, we don't see any gain in all cases.
Before: Cortex A7 A8 A9 A53 A53/AArch64
vp9_loop_filter_v_4_8_neon: 147.0 129.0 115.8 89.0 88.7
vp9_loop_filter_v_8_8_neon: 242.0 198.5 174.7 140.0 136.7
vp9_loop_filter_v_16_8_neon: 500.0 419.5 382.7 293.0 275.7
vp9_loop_filter_v_16_16_neon: 971.2 825.5 731.5 579.0 453.0
After:
vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7
vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7
vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7
vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0
This is cherrypicked from libav commit
e1f9de86f4.
Signed-off-by: Martin Storsjö <martin@martin.st>
This work is sponsored by, and copyright, Google.
Before: Cortex A53
vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3
vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1
After:
vp9_inv_dct_dct_16x16_sub1_add_neon: 180.2
vp9_inv_dct_dct_32x32_sub1_add_neon: 475.3
This is cherrypicked from libav commit
3fcf788fbb.
Signed-off-by: Martin Storsjö <martin@martin.st>
No measured speedup on a Cortex A53, but other cores might benefit.
This is cherrypicked from libav commit
388e0d2515.
Signed-off-by: Martin Storsjö <martin@martin.st>
Fold the field lengths into the macro.
This makes the macro invocations much more readable, when the
lines are shorter.
This also makes it easier to use only half the registers within
the macro.
This is cherrypicked from libav commit
5e0c2158fb.
Signed-off-by: Martin Storsjö <martin@martin.st>
The ld1r is a leftover from the arm version, where this trick is
beneficial on some cores.
Use a single-lane load where we don't need the semantics of ld1r.
This is cherrypicked from libav commit
ed8d293306.
Signed-off-by: Martin Storsjö <martin@martin.st>
This work is sponsored by, and copyright, Google.
This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.
This gives a pretty substantial speedup for the smaller subpartitions.
The code size increases from 14740 bytes to 24292 bytes.
The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.
Before:
vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7
vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0
vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0
vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0
vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4
vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6
vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1
vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5
vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6
vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3
vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4
vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3
vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4
vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3
vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3
vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7
After:
vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7
vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8
vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0
vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0
vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3
vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3
vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1
vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5
vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1
vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4
vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2
vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4
vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9
vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5
vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6
vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2
This is cherrypicked from libav commit
a63da4511d.
Signed-off-by: Martin Storsjö <martin@martin.st>
This allows reusing the macro for a separate implementation of the
pass2 function.
This is cherrypicked from libav commit
79d332ebbd.
Signed-off-by: Martin Storsjö <martin@martin.st>
This allows reusing the macro for a separate implementation of the
pass2 function.
This is cherrypicked from libav commit
47b3c2c18d.
Signed-off-by: Martin Storsjö <martin@martin.st>
This work is sponsored by, and copyright, Google.
This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
19496 to 14740 bytes.
This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.
Before:
vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7
vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.2
vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0
vp9_inv_dct_dct_32x32_sub32_add_neon: 8095.7
After:
vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0
vp9_inv_dct_dct_16x16_sub16_add_neon: 1390.1
vp9_inv_dct_dct_32x32_sub4_add_neon: 5199.9
vp9_inv_dct_dct_32x32_sub32_add_neon: 8125.8
This is cherrypicked from libav commit
115476018d.
Signed-off-by: Martin Storsjö <martin@martin.st>
This work is sponsored by, and copyright, Google.
This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
15324 to 12388 bytes.
This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.
Before: Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub4_add_neon: 2063.4 1516.0 1719.5 1245.1
vp9_inv_dct_dct_16x16_sub16_add_neon: 3279.3 2454.5 2525.2 1982.3
vp9_inv_dct_dct_32x32_sub4_add_neon: 10750.0 7955.4 8525.6 6754.2
vp9_inv_dct_dct_32x32_sub32_add_neon: 18574.0 17108.4 14216.7 12010.2
After:
vp9_inv_dct_dct_16x16_sub4_add_neon: 2060.8 1608.5 1735.7 1262.0
vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.2 2443.5 2546.1 1999.5
vp9_inv_dct_dct_32x32_sub4_add_neon: 10682.0 8043.8 8581.3 6810.1
vp9_inv_dct_dct_32x32_sub32_add_neon: 18522.4 17277.4 14286.7 12087.9
This is cherrypicked from libav commit
0331c3f5e8.
Signed-off-by: Martin Storsjö <martin@martin.st>
This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.
This is also arguably more readable.
This is cherrypicked from libav commit
58d87e0f49.
Signed-off-by: Martin Storsjö <martin@martin.st>
1. limit to single layer, as there is no current support for setting distortion/quality of multiple layers
2. encoder mode should be kept at default setting (0)
3. remove fixed_alloc parameter from context : seldom if ever used, and no way of properly configuring at the moment
4. add irreversible setting, to allow for lossless encoding. Set to OpenJPEG default (enabled)
5. set numresolution max to 33, which is the maximum number of allowed resolutions according the J2K spec
Signed-off-by: Michael Bradshaw <mjbshaw@google.com>
Make it clear that there is no timing-dependent behavior. In particular,
there is no state in which both input and output are denied, and where
you have to wait for a while yourself to make progress (apparently some
hardware decoders like to do this).
Avoid wording that makes references to time. It shouldn't be mistaken
for some kind of asynchronous API (like POSIX read() can return EAGAIN
if there is no new input yet). It's a state machine, so try to use
appropriate terms.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Merges Libav commit 8a60bba0ae.
Apparently the demuxer outputs the wrong padding for HE-AAC (based on
the raw sample rate, or so). aacdec contains a hack to adjust the muxer
padding accordingly before it's used to trim the decoder output. This
modified the packet side data, which in combination with the old
decoding API would change the packet the user passed to the decoder.
This is clearly not allowed, and it breaks running some gapless fate
tests with "-fflags +keepside" applied (without keepside, the packet
metadata is typically newly allocated, essentially making a copy and not
modifying the user's input packet).
This should probably be fixed in the demuxer (and consequently also the
muxer), but for now only fix the immediate problem.
Regression since 946ed78f5f (2012).
Fixes: timeout in 730/clusterfuzz-testcase-5265113739165696 (part 2 of 2)
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Reviewed-by: BBB
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: timeout in 730/clusterfuzz-testcase-5265113739165696 (part 1 of 2)
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Reviewed-by: BBB
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The constants used in the decoder used floating point precision,
and this caused different values to be generated on different
architectures.
So, eradicate floating point numbers and use fixed point (32.32)
arithmetics everywhere, replacing constants with precomputed integer
values.
Signed-off-by: Vittorio Giovara <vittorio.giovara at gmail.com>
Signed-off-by: Paul B Mahol <onemda@gmail.com>
The way videotoolbox hooks in as a hwaccel is pretty hacky. The VT decode
API is not invoked until end_frame(), so alloc_frame() returns a dummy
frame with a 1-byte buffer. When end_frame() is eventually called, the
dummy buffer is replaced with the actual decoded data from
VTDecompressionSessionDecodeFrame().
When the VT decoder fails, the frame returned to the h264 decoder from
alloc_frame() remains invalid and should not be used. Before
9747219958, it was accidentally being
returned all the way up to the API user. After that commit, the dummy
frame was unref'd so the user received an error.
However, since that commit, VT hwaccel failures started causing random
segfaults in the h264 decoder. This happened more often on iOS where the
VT implementation is more likely to throw errors on bitstream anomolies.
A recent report of this issue can be see in
http://ffmpeg.org/pipermail/libav-user/2016-November/009831.html
The issue here is that the dummy frame is still referenced internally by the
h264 decoder, as part of the reflist and cur_pic_ptr. Deallocating the
frame causes assertions like this one to trip later on during decoding:
Assertion h->cur_pic_ptr->f->buf[0] failed at src/libavcodec/h264_slice.c:1340
With this commit, we leave the dummy 1-byte frame intact, but avoid returning it
to the user.
This reverts commit 9747219958.
Signed-off-by: wm4 <nfxjfg@googlemail.com>
Without this the FPU state becomes trashed and causes mysterious
fate failures with cpuflags=0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
There is no reason that draining couldn't return an error or two. But
some decoders don't handle this very well, and might always return an
error. This can lead to API users getting into an infinite loop and
burning CPU, because no progress is made and EOF is never returned.
In fact, ffmpeg.c contains a hack against such a case. It is made
unnecessary with this commit, and removed with the next one. (This
particular error case seems to have been fixed since the hack was
added, though.)
This might lose frames if decoding returns errors during draining.
This checks the sprite delta intermediates for overflow
Fixes: 716/clusterfuzz-testcase-4890287480504320
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Also clear the state on errors
Fixes integer overflows in 701/clusterfuzz-testcase-6594719951880192
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes timeout with 700/clusterfuzz-testcase-5660909504561152
Fixes timeout with 702/clusterfuzz-testcase-4553541576294400
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If AVVideotoolboxContext.cv_pix_fmt_type is set to 0, don't set the
kCVPixelBufferPixelFormatTypeKey value on the VT decoder.
This makes VT output its native format, which can be much faster on
some hardware iterations (if the native format does not match with
the requested format, it will be converted, which is slow).
The default is still forcing nv12.
Allow all struct fields to be accessed directly, as long as they're
public.
Before this change, many fields were "public", but could be accessed via
AVOption only. This meant they were effectively not public, but were
present for documentation purposes, which was incredibly confusing at
best.
Similar code is used elsewhere in vp56 to force a more complete reinit in the future.
Fixes null pointer dereference
Fixes: 707/clusterfuzz-testcase-4717453097566208
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes undefined shift, all callers should be changed to check the value
they use with wp_exp2() or its return value.
Fixes: 692/clusterfuzz-testcase-5757381516460032
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This matches ff_h263_decode_motion() both functions error codes are interpreted by the same common code
Fixes: 690/clusterfuzz-testcase-4744944981901312
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes timeout with 686/clusterfuzz-testcase-5853946876788736
this shortcuts (i.e. speeds up) the error and
return-to-user when decoding a truncated frame
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Previous version reviewed by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
qmin and qmax are not necessary for nvenc vbr.
Enforcing this constraint, doesn't allow user to use vbr 2 pass mode without explicity setting the qmin and qmax options
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Fixes: 677/clusterfuzz-testcase-6635120628858880
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Reviewed-by: Steven Liu <lingjiujianke@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 673/clusterfuzz-testcase-5948736536576000
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 672/clusterfuzz-testcase-5595018867769344
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
671/clusterfuzz-testcase-4990381827555328
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes invalid shift
Fixes: 670/clusterfuzz-testcase-4852021066727424
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: runtime error: shift exponent 34 is too large for 32-bit type 'int'
Fixes: 653/clusterfuzz-testcase-5773837415219200
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This should fix the fate failure due to a truncated last frame.
Alternatively the frame could be dropped.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 664/clusterfuzz-testcase-4917047475568640
The change to fate is due to a truncated last frames which is now detected as damaged.
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This also fixes several integer overflows by checking each value before
use.
Fixes: 662/clusterfuzz-testcase-4898131432964096
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Since the PVQ search has been well fuzzed and is guaranteed to never
break SUM(abs(y[])) == K, the assert is no longer needed.
Also the assert only prevented coding the wrong vector index but didn't
prevent crashes during searching for it, which made the assert rather
informational than practical.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Since the probelm mentioned only happened when the phase was negative
(e.g. the sum had to be decreased), only discarding dimensions with a
zero pulse in that case restored the search's previously low distortion
at low Ks when the phase is never negative.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Fixes: 659/clusterfuzz-testcase-5866673603084288
Huge DMV could be created by an encoder ignoring the spec
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is not necessarily specific to fuzzed files
Fixes: Multiple integer overflows
Fixes: 656/clusterfuzz-testcase-6463814516080640
Fixes: 658/clusterfuzz-testcase-6691260146384896
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes#6081. Some dictionary keys are not present on OS X 10.8.
This loads the symbols and uses a default value if not present.
Signed-off-by: Rick Kern <kernrj@gmail.com>
Fixes: 647/clusterfuzz-testcase-5195745823031296
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Reviewed-by: BBB
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This ensures that the wrapped avframe will not get reallocated later, which
would invalidate internal references such as extended data.
Reviewed-by: wm4 <nfxjfg@googlemail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
There is code checking height and width later, leaving an invalid value invalid
is thus fine.
Fixes: 635/clusterfuzz-testcase-6225161437052928
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 5 + 2147483646 cannot be represented in type 'int'
Fixes: 634/clusterfuzz-testcase-5285420445204480
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: runtime error: left shift of 32 by 26 places cannot be represented in type 'int'
Fixes: 628/clusterfuzz-testcase-6187747641393152
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The quantization table is stored in the natural order, but when we
access it, we use an index that's in zigzag order, causing us to read
the wrong value. This causes artifacts, especially in areas with
horizontal or vertical edges. The artifacts look a lot like the
DCT ringing artifacts you'd expect to see from a low-bitrate file,
but when comparing to NewTek's own decoder, it's obvious they're not
supposed to be there.
Fix by simply storing the scaled quantization table in zigzag order.
Performance is unchanged.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If the PVQ search picked a place to increment/decrement on the y[]
vector which had no pulse then it would cause a desync since it would
change the sum in the wrong direction. Fix this by not considering
places without pulses as viable.
This makes the PVQ search slightly worse at K < 5 which isn't all that
common. Still, this is a workaround to prevent making broken files until
I can think of a better way of fixing it.
Also add an assertion, which can be removed or moved to assert1/2 once
the PVQ search is stable.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Fixes: runtime error: shift exponent 132 is too large for 32-bit type 'int'
Fixes: 609/clusterfuzz-testcase-4825202619842560
See 11.2.2 IHDR Image header
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If there is progressive input it will disable deinterlacing in cuvid for
all future frames even those interlaced.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Channel mapping 2 additionally supports a non-diegetic stereo track
appended to the end of a full-order ambisonics signal, such that the
total channel count is either
(n + 1) ^ 2, or
(n + 1) ^ 2 + 2
where n is the ambisonics order
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This marks the first time anyone has written an Opus encoder without
using any libopus code. The aim of the encoder is to prove how far
the format can go by writing the craziest encoder for it.
Right now the encoder's basic, it only supports CBR encoding, however
internally every single feature the CELT layer has is implemented
(except the pitch pre-filter which needs to work well with the rest of
whatever gets implemented). Psychoacoustic and rate control systems are
under development.
The encoder takes in frames of 120 samples and depending on the value of
opus_delay the plan is to use the extra buffered frames as lookahead.
Right now the encoder will pick the nearest largest legal frame size and
won't use the lookahead, but that'll change once there's a
psychoacoustic system.
Even though its a pretty basic encoder its already outperforming
any other native encoder FFmpeg has by a huge amount.
The PVQ search algorithm is faster and more accurate than libopus's
algorithm so the encoder's performance is close to that of libopus
at zero complexity (libopus has more SIMD).
The algorithm might be ported to libopus or other codecs using PVQ in
the future.
The encoder still has a few minor bugs, like desyncs at ultra low
bitrates (below 9kbps with 20ms frames).
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This is meant to be applied on top of my previous patch which
split PVQ into celt_pvq.c and made opus_celt.h
Essentially nothing has been changed other than renaming CeltFrame
to CeltBlock (CeltFrame had absolutely nothing at all to do with
a frame) and CeltContext to CeltFrame.
3 variables have been put in CeltFrame as they make more sense
there rather than being passed around as arguments.
The coefficients have been moved to the CeltBlock structure
(why the hell were they in CeltContext and not in CeltFrame??).
Now the encoder would be able to use the exact context the decoder
uses (plus a couple of extra fields in there).
FATE passes, no slowdowns, etc.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
A huge amount can be reused by the encoder, as the only thing
which needs to be done would be to add a 10 line celt_icwrsi,
a wrapper around it (celt_alg_quant) and templating the
ff_celt_decode_band to replace entropy decoding functions
with entropy encoding.
There is no performance loss but in fact a performance gain of
around 6% which is caused by the compiler being able to optimize
the decoding more efficiently.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Handles strides (needed for Opus transients), does pre-reindexing and folding
without needing a copy.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Mostly used the RFC document, the decoding functions and
the reference encoder's implmenentation as a reference.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
A strict reading of the spec seems to imply that it should be aligned to
the start of the element instance tag, but that would break all of the
samples with PCEs.
It seems like a well formed LATM stream should have its PCE in the ASC
rather than inband.
Fixes ticket 4544
This limits the bugs, speedloss and extra memory allocation to the case when
optimal tables are needed.
Fixes regressions with slice multi-threading
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If this is wanted iam not against it but it must be designed to work with all cases
like slice threads, and a single growing buffer does not work very well with slices.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Only do this when building for a recent VAAPI version - initial
driver implementations were confused about the interpretation of the
framerate field, but hopefully this will be consistent everywhere
once 0.40.0 is released.
(cherry picked from commit ff35aa8ca4)
Default to using VBR when a target bitrate is set, unless the max rate
is also set and matches the target. Changes to the Intel driver mean
that min_qp is also respected in this case, so set a codec default to
unset the value rather than using the current default inherited from
the MPEG-4 part 2 encoder.
(cherry picked from commit eddfb57210)
This includes a backward-compatibility hack to choose CBR anyway on
old drivers which have no CBR support, so that existing programs will
continue to work their options now map to VBR.
(cherry picked from commit f033ba470f)
Before this change, it was possible to overflow pic_order_cnt_lsb and
generate a stream with invalid POC numbering. This makes sure that
the field is large enough that a single IDR B* P sequence uses fewer
than half the available POC lsb values.
(cherry picked from commit 89725a8512)
This change makes the configured GOP size be respected exactly -
previously the value could be exceeded slightly due to flaws in the
frame type selection logic.
(cherry picked from commit 37fab0661a)