Since the new PNS implementation has been merged and is no longer considered
proof of concept (as it's much more complex and better than the previous), change
the comments to reflect that. We need people testing it (since all AAC profiles
require it to be on by default) and having it tagged as proof of concept might drive some away.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit implements intensity stereo coding support
to the native aac encoder. This is a way to increase the efficiency
of the encoder by zeroing the right channel's spectral coefficients
(in a channel pair) and rederiving them in the decoder using information
from the scalefactor indices of special band types. This commit
confomrs to the official ISO 13818-7 specifications, although due to
their ambiguity certain deviations have been taken to ensure maximum
sound quality. This commit has been extensively tested and has shown
to not result in audiable audio artifacts unless in extreme cases.
This commit also adds an option, aac_is, which has the value of
0 by default. Intensity Stereo is part of the scalable aac profile
and is thus non-default.
The way IS coding works is that it rederives the right channel's
spectral coefficients from the left channel via the scalefactor
index values left in the right channel. Since an entire band's
spectral coefficients do not need to be coded, the encoder's
efficiency jumps up and it unzeroes some high frequency values
which it previously did not have enough bits to encode. That way
less information is lost than the information lost by rederiving
the spectral coefficients with some error. This is why the
filesize of files encoded with IS do not decrease significantly.
Users wishing that IS coding should reduce filesize are expected
to reduce their encoding bitrates appropriately.
This is V2 of the commit. The old version did not mark ms_mask as
0 since M/S and IS coding are incompactible, which resulted in
distortions with M/S coding enabled. This version also improves
phase detection by measuring it for every spectral coefficient in
the band and using a simple majority rule to determine whether the
coefficients are in or out of phase. Also, the energy values per
spectral coefficient were changed as to reflect the
official specifications.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit adds support for the coding of intensity stereo spectral
coefficients. It also fixes the Mid/Side coding of band_types higher
than RESERVED_BT (M/S must not be applied to their spectral coefficients,
but marking M/S as present in encode_ms_info() is okay). Much
of the changes here were taken from the decoder and inverted.
This commit does not change the functionality of the decoder as the
previous patch in this series zeroes ms_mask and is_mask.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit finalizes the PNS implementation previously added to the encoder
by moving it to a seperate function search_for_pns() and thus making it
coder-generic. This new implementation makes use of the spread field of
the psy bands and the lambda quality feedback paremeter. The spread of the
spectrum in a band prevents PNS from being used excessively and thus preserve
more phase information in high frequencies. The lambda parameter allows
the number of PNS-marked bands to vary based on the lambda parameter and the
amount of bits available, making better choices on which bands are to be marked
as noise. Comparisons with the previous PNS implementation can be found
here: https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/
This is V2 of the patch, the changes from the previous version being that this
version uses the new band->spread metric from aacpsy and normalizes the
energy using the group size. These changes were suggested by Claudio Freire
on the mailing list. Another change is the use of lambda to alter the
frequency threshold. This change makes the actual threshold frequencies
vary between +-2Khz of what's specified, depending on frame encoding performance.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit enables the function added with commit 7c10b87 and uses that
new function for setting any special scalefactor indices. This commit does
not change the behaviour of the encoder since no bands are being marked as
either NOISE_BT(due to the previous PNS implementation removed in the
previous commit) or INTENSITY_BT2/INTENSITY_BT.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit resets any bands marked as M/S or IS upon encoding a frame.
This is needed because the arrays may contain some residual information
upon allocation on startup and because there isn't any mechanism to
reset the arrays once the frame has been encoded.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit adds support for the coding of intensity stereo scalefactor indices.
It does not do any marking of such bands and as such does no functional changes
to the encoder. It removes any old twoloop specific code for PNS and moves it
into a seperate function which handles setting of scalefactor indices for
PNS and IS bands.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit implements the perceptual noise substitution AAC extension. This is a proof of concept
implementation, and as such, is not enabled by default. This is the fourth revision of this patch,
made after some problems were noted out. Any changes made since the previous revisions have been indicated.
In order to extend the encoder to use an additional codebook, the array holding each codebook has been
modified with two additional entries - 13 for the NOISE_BT codebook and 12 which has a placeholder function.
The cost system was modified to skip the 12th entry using an array to map the input and outputs it has. It
also does not accept using the 13th codebook for any band which is not marked as containing noise, thereby
restricting its ability to arbitrarily choose it for bands. The use of arrays allows the system to be easily
extended to allow for intensity stereo encoding, which uses additional codebooks.
The 12th entry in the codebook function array points to a function which stops the execution of the program
by calling an assert with an always 'false' argument. It was pointed out in an email discussion with
Claudio Freire that having a 'NULL' entry can result in unexpected behaviour and could be used as
a security hole. There is no danger of this function being called during encoding due to the codebook maps introduced.
Another change from version 1 of the patch is the addition of an argument to the encoder, '-aac_pns' to
enable and disable the PNS. This currently defaults to disable the PNS, as it is experimental.
The switch will be removed in the future, when the algorithm to select noise bands has been improved.
The current algorithm simply compares the energy to the threshold (multiplied by a constant) to determine
noise, however the FFPsyBand structure contains other useful figures to determine which bands carry noise more accurately.
Some of the sample files provided triggered an assertion when the parameter to tune the threshold was set to
a value of '2.2'. Claudio Freire reported the problem's source could be in the range of the scalefactor
indices for noise and advised to measure the minimal index and clip anything above the maximum allowed
value. This has been implemented and all the files which used to trigger the asserion now encode without error.
The third revision of the problem also removes unneded variabes and comparisons. All of them were
redundant and were of little use for when the PNS implementation would be extended.
The fourth revision moved the clipping of the noise scalefactors outside the second loop of the two-loop
algorithm in order to prevent their redundant calculations. Also, freq_mult has been changed to a float
variable due to the fact that rounding errors can prove to be a problem at low frequencies.
Considerations were taken whether the entire expression could be evaluated inside the expression
, but in the end it was decided that it would be for the best if just the type of the variable were
to change. Claudio Freire reported the two problems. There is no change of functionality
(except for low sampling frequencies) so the spectral demonstrations at the end of this commit's message were not updated.
Finally, the way energy values are converted to scalefactor indices has changed since the first commit,
as per the suggestion of Claudio Freire. This may still have some drawbacks, but unlike the first commit
it works without having redundant offsets and outputs what the decoder expects to have, in terms of the
ranges of the scalefactor indices.
Some spectral comparisons: https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/Original.png (original),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS_NO.png (encoded without PNS),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS1.2.png (encoded with PNS, const = 1.2),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/Difference1.png (spectral difference).
The constant is the value which multiplies the threshold when it gets compared to the energy, larger
values means more noise will be substituded by PNS values. Example when const = 2.2:
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS_2.2.png
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit adjusts the intial offset for PNS values, introduced
with commit f7f71b5795 earlier. This
commit shifts the value in such a way that no further offsets are
required in the aaccoder.c file. Earlier version of the PNS patch had 2 offsets in both the aaccoder and aacenc.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit implements support for writing the noise energy values used in PNS.
The difference between regular scalefactors and noise energy values is that the latter
require a small preamble (NOISE_PRE + energy_value_diff) to be written as the first
noise-containing band. Any following noise energy values use the previous one to
base their "diff" on. Ordinary scalefactors remain unchanged other than that they ignore the noise values.
This commit should not change anything by itself, the following commits will bring it in use.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Instead, warn that bitrate will be clamped down to the maximum allowed.
Patch is mostly work of Kamendo2 in issue #2686, quite tested within that issue.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch fixes a pointer arithmetic bug in adjust_frame_information that resulted in heavily corrupted audio when using M/S encoding. Also, a backup copy of untransformed coefficients has to be kept around or attempts at re-processing the frame (which happens when hevavily overspending bits during transients) will result in re-encoding of the coefficients and subsequent corruption of the resulting stream.
A/B testing shows the bug as corrected, but still cannot prove that M/S coding is a win at least in numbers. Limited listening tests do show improvement on M/S encoded samples in lower bitrates, but they're hidden among the other artifacts that remain to be corrected in the encoder.
Some of the regressions flagged in the report do show poor stereo image (but not buggy), so M/S encoding is clearly not good enough yet to be defaulted to auto.
In numbers, Patched against Unpatched, stereo_mode auto:
Files: 114
Bitrates: 6
Tests: 683
Serious Regressions: 0 (0%)
Regressions: 0 (0%)
Improvements: 227 (33%)
Big improvements: 92 (13%)
Worst regression - mybloodrusts.wv - 256k
- StdDev: 28.61 pSNR: -0.43 maxdiff: 1372.00
Best improvement - 60.wv - 384k
- StdDev: -369.57 pSNR: 45.02 maxdiff: -13322.00
Average - StdDev: -80.56 pSNR: 2.49 maxdiff: -8858.00
Patched against Unpatched stereo_mode ms_off shows no difference.
Patched stereo_mode auto vs Unpatched stereo_mode ms_off shows a small average improvement, just not too significant:
Serious Regressions: 0 (0%)
Regressions: 10 (1%)
Improvements: 45 (6%)
Big improvements: 2 (0%)
Worst regression - Illinois.wv - 256k
- StdDev: 33.20 pSNR: -2.03 maxdiff: 477.00
Best improvement - song_of_circomstances.flac - 384k
- StdDev: -3.97 pSNR: 7.61 maxdiff: -826.00
Average - StdDev: -10.25 pSNR: 0.20 maxdiff: -281.00
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Several encoders were multiplying the buffer size by 8, in order to get
a bit size. However, the buffer_size argument is for the byte size of
the buffer. We had experienced crashes encoding prores (Anatoliy) at
size 4096x4096.
* commit '2df0c32ea12ddfa72ba88309812bfb13b674130f':
lavc: use a separate field for exporting audio encoder padding
Conflicts:
libavcodec/audio_frame_queue.c
libavcodec/avcodec.h
libavcodec/libvorbisenc.c
libavcodec/utils.c
libavcodec/version.h
libavcodec/wmaenc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Currently, the amount of padding inserted at the beginning by some audio
encoders, is exported through AVCodecContext.delay. However
- the term 'delay' is heavily overloaded and can have multiple different
meanings even in the case of audio encoding.
- this field has entirely different meanings, depending on whether the
codec context is used for encoding or decoding (and has yet another
different meaning for video), preventing generic handling of the codec
context.
Therefore, add a new field -- AVCodecContext.initial_padding. It could
conceivably be used for decoding as well at a later point.
This was due to a miscomputation of s->cur_channel, which led to
psy-based encoders using the psy coefficients for the wrong channel.
Signed-off-by: Martin Storsjö <martin@martin.st>
This was due to a miscomputation of s->cur_channel, which led to
psy-based encoders using the psy coefficients for the wrong channel.
Test sample attached on the bug tracker had the peculiar case of all
other channels being silent, so the error was far more noticeable.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '0f24a3ca999a702f83af9307f9f47b6fdeb546a5':
lavc: remove disabled FF_API_OLD_ENCODE_VIDEO cruft
lavc: remove disabled FF_API_OLD_ENCODE_AUDIO cruft
lavc: remove disabled FF_API_OLD_DECODE_AUDIO cruft
Conflicts:
libavcodec/flacenc.c
libavcodec/libgsm.c
libavcodec/utils.c
libavcodec/version.h
The compatibility wrapers are left as they likely sre still
in wide use. They will be removed when they break or otherwise
cause work without an volunteer being available.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
This fixes segfault caused by 3d3cf6745e
when SingleChannelElement.ret was renamed to SingleChannelElement.ret_buf.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
* commit '3d3cf6745e2a5dc9c377244454c3186d75b177fa':
aacdec: use float planar sample format for output
Conflicts:
libavcodec/aacdec.c
libavcodec/aacsbr.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '381dc1a5ec0925b281c573457c413ae643567086':
fate: ac3: Place E-AC-3 tests and AC-3 tests in different groups
fate: Add shorthands for acodec PCM and ADPCM tests
avconv: Drop unused function argument from do_video_stats()
cmdutils: Conditionally compile libswscale-related bits
aacenc: Drop some unused function arguments
rtsp: Avoid a cast when calling strtol
nut: support textual data
nutenc: verbosely report unsupported negative pts
Conflicts:
cmdutils.c
ffmpeg.c
libavformat/nut.c
libavformat/nutenc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
wmaenc: use float planar sample format
(e)ac3enc: use planar sample format
aacenc: use planar sample format
adpcmenc: use planar sample format for adpcm_ima_wav and adpcm_ima_qt
adpcmenc: move 'ch' variable to higher scope
adpcmenc: fix 3 instances of variable shadowing
adpcm_ima_wav: simplify encoding
libvorbis: use planar sample format
libmp3lame: use planar sample formats
vorbisenc: use float planar sample format
ffm: do not write or read the audio sample format
parseutils: fix parsing of invalid alpha values
doc/RELEASE_NOTES: update for the 9 release.
smoothstreamingenc: Add a more verbose error message
smoothstreamingenc: Ignore the return value from mkdir
smoothstreamingenc: Try writing a manifest when opening the muxer
smoothstreamingenc: Move the output_chunk_list and write_manifest functions up
smoothstreamingenc: Properly return errors from ism_flush to the caller
smoothstreamingenc: Check the output UrlContext before accessing it
Conflicts:
doc/RELEASE_NOTES
libavcodec/aacenc.c
libavcodec/ac3enc_template.c
libavcodec/wmaenc.c
tests/ref/lavf/ffm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The value used in allocation is based on a estimate of the
maximum size of the spectral coefficients multiplied with 2
and rounded up. The exact or a tighter limit should be
found and used instead. But this issue shouldnt be left
open until someone works on that.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '124134e42455763b28cc346fed1d07017a76e84e':
avopt: Store defaults for AV_OPT_TYPE_CONST in the i64 union member
Conflicts:
libavcodec/aacenc.c
libavcodec/libopenjpegenc.c
libavcodec/options_table.h
libavdevice/bktr.c
libavdevice/v4l2.c
libavdevice/x11grab.c
libavfilter/af_amix.c
libavfilter/vf_drawtext.c
libavformat/movenc.c
libavformat/options_table.h
libavutil/opt.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>