* commit '87552d54d3337c3241e8a9e1a05df16eaa821496':
armv6: Accelerate ff_fft_calc for general case (nbits != 4)
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5c22e8e4ad0852d61d5c4ba8d67d33fd72339497':
armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)
See: 42c1cc35b7
Merged-by: Michael Niedermayer <michaelni@gmx.at>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1542.8 43.7 1470.5 41.5 100.0% +4.9%
butterflies_float 130.0 11.9 70.2 12.1 100.0% +85.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1598.2 47.4 1529.2 25.4 100.0% +4.5%
vector_fmul_window 244.0 22.1 188.9 22.3 100.0% +29.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.
The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4%
FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Martin Storsjö <martin@martin.st>
These where removed by libav in
See: git show -C 2d60444331
diff --git a/libavcodec/dsputil.c b/libavcodec/me_cmp.c
similarity index 98%
rename from libavcodec/dsputil.c
rename to libavcodec/me_cmp.c
index ba71a99..9fcc937 100644
--- a/libavcodec/dsputil.c
+++ b/libavcodec/me_cmp.c
@@ -1,8 +1,4 @@
/*
- * DSP utils
- * Copyright (c) 2000, 2001 Fabrice Bellard
- * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at>
- *
* This file is part of Libav.
*
* Libav is free software; you can redistribute it and/or
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'a578b0407dc983aecd72028e1127062689b67089':
configure: Assume runtime cpu detection on arm on --target-os=android as well
Merged-by: Michael Niedermayer <michaelni@gmx.at>
When merging the formats around the automatically inserted
convert filters, the refcount of the format lists can not be 0.
Coverity does not detect it, and suspects a memory leak,
because if refcount is 0 the newly allocated lists are not
stored anywhere. That gives CIDs 1224282, 1224283 and 1224284.
Lists with refcount 0 are used in can_merge_formats(), so the
asserts can not be moved inside the merge functions.
The X11 servers by VNC, at 32-bits depths, has the following masks:
R:0x000007ff G:0x003ff800 B:0xffc00000
This is not compatible with AV_PIX_FMT_0RGB32, and the result
is success with completely wrong colors.
Avoid negative durations in case there is a single packet in the current
segment, since in that case the end time is still set to the previous
segment end time.
It evaluates expression and outputs it as integer value, using specified
format.
Address trac ticket #3699.
Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
Make the segment muxer keep segment_list_size segments instead of
segment_list_size + 1 segments. This patch also changes the
documentation for segment_list_size to reduce possible confusion over
how many segments are kept.
this allows the segment list to
be limited to containing only one segment which used to be impossible
because a segment_list_size of 0 kept all the segments and a
segment_list_size of 1 kept 2 segments.
Signed-off-by: Simon Thelen <ffmpeg-dev@c-14.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1542.8 43.7 1470.5 41.5 100.0% +4.9%
butterflies_float 130.0 11.9 70.2 12.1 100.0% +85.2%
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1598.2 47.4 1529.2 25.4 100.0% +4.5%
vector_fmul_window 244.0 22.1 188.9 22.3 100.0% +29.2%
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>