I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1598.2 47.4 1529.2 25.4 100.0% +4.5%
vector_fmul_window 244.0 22.1 188.9 22.3 100.0% +29.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
When running on a 64 bit kernel, /proc/cpuinfo lists different
optional features than on 32 bit kernels (because some of them
are mandatory in the 64 bit implemenations).
The kernel does list the old features properly if they are queried
via /proc/self/auxv though - however this file is not always readable
(e.g. on most android systems). The getauxval function could also
provide the same info as /proc/self/auxv even if this file isn't
readable, but this function is not always available (and thus would
need to be loaded with dlsym for compatibility with older android
versions).
The android cpufeatures library does this slightly differently,
by assuming that these are available if the "CPU architecture"
line is >= 8, see [1] for details.
It has been suggested to include the old, non-optional features in
/proc/cpuinfo as well, but that suggested patch never was merged.
See [2] for the discussion around this suggestion.
[1] https://android-review.googlesource.com/91380
[2] http://marc.info/?l=linux-arm-kernel&m=139087240101974
Signed-off-by: Martin Storsjö <martin@martin.st>
If linking in an object file without this attribute set, the
linker will assume that an executable stack might be needed.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.
This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.
Signed-off-by: Martin Storsjö <martin@martin.st>
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).
The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.
This restores the original intention, to align the loop entry
points.
Signed-off-by: Martin Storsjö <martin@martin.st>
This matches the other eabi attribute in the same file. This is
required in order to build for arm/hardfloat with other object
file formats than ELF.
Signed-off-by: Martin Storsjö <martin@martin.st>
On recent android versions, /proc/self/auxw is unreadable
(unless the process is running running under the shell uid or
in debuggable mode, which makes it hard to notice). See
http://b.android.com/43055 and
https://android-review.googlesource.com/51271 for more information
about the issue.
This makes sure e.g. neon optimizations are enabled at runtime in
android apps even when built in release mode, if configured to
use the runtime detection.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
This allows compiling optimised functions for features not enabled
in the core build and selecting these at runtime if the system has
the necessary support.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This is consistent with usual ARM nomenclature as well as with the
VFPV3 and NEON symbols which both lack the ARM prefix.
Signed-off-by: Mans Rullgard <mans@mansr.com>
All our ARM asm preserves alignment so setting this attribute
in a common location is simpler. This removes numerous warnings
when linking with armcc.
Signed-off-by: Mans Rullgard <mans@mansr.com>
LDR with register offset and PC as base register is not available in
the Thumb instruction set so the addition must be done separately.
Signed-off-by: Mans Rullgard <mans@mansr.com>
When building Thumb2 code, the end of a function, where the PIC
offsets are placed, need not be aligned. Although the values
are only accessed with instructions allowing unaligned addresses,
keeping them aligned is preferable.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Fixed-point audio codecs often use saturating arithmetic, and
special instructions for these operations are common.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Some compilers do not support the Q/R modifiers used to access
the low/high parts of a 64-bit register pair. Check for this
and disable all uses of it when not supported.
Fixes bug #337.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This creates proper position independent code when accessing
data symbols if CONFIG_PIC is set.
References to external symbols should now use the movrelx macro.
Some additional code changes are required since this macro may
need a register to hold the GOT pointer.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Commit adebad0 "arm: intreadwrite: fix inline asm constraints for gcc
4.6 and later" caused some older gcc versions to miscompile code.
This reverts to the old version of the code for these compilers.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Starting with version 4.7, gcc properly supports unaligned
memory accesses on ARM. Not using the inline asm with these
compilers results in better code.
Signed-off-by: Mans Rullgard <mans@mansr.com>
With a dereferenced type-cast pointer as memory operand, gcc 4.6
and later will sometimes copy the data to a temporary location,
the address of which is used as the operand value, if it thinks
the target address might be misaligned. Using a pointer to a
packed struct type instead does the right thing.
The 16-bit case is special since the ldrh instruction addressing
modes are limited compared to ldr. The "Uq" constraint produces a
memory reference suitable for an ldrsb instruction, which supports
the same addressing modes as ldrh. However, the restrictions appear
to apply only when the operand addresses a single byte. The memory
reference must thus be split into two operands each targeting one
byte. Finally, the "Uq" constraint is only available in ARM mode.
The Thumb-2 ldrh instruction supports most addressing modes so the
normal "m" constraint can be used there.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The were broken since August of 2010 without anyone noticing until
three weeks ago. Nobody cares about it anymore and hopefully Marvell
will support NEON like in the PXA978 from now on.