1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00
Commit Graph

177 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
620e6e1487 arm: relax byte-swap assembler constraints
There are no particular reasons to force the compiler to use the same
register as output and input operand. This forces an extra MOV
instruction if the input value needs to be reused after the swap.

In most cases, this makes no differences, as the compiler will seleect
the same register for both operands either way.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-03 23:54:05 +03:00
Martin Storsjö
0dd8fe6f4b arm: Check the build time constants in av_clip_*intp2
This fixes building for arm targets with optimizations disabled.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-02 23:12:26 +03:00
Aman Karmani
2f299c0b8b avutil: use getauxval(3) for CPU capabilities on linux/android ARM
getauxval is marginally faster, and works even when procfs is not mounted

support on Linux was added in glibc 2.16
support on Android was added in 4.4 (API 20)
fixes #6578

Signed-off-by: Aman Karmani <aman@tmm1.net>
2022-02-07 13:42:40 -08:00
Martin Storsjö
ee040a7fc2 arm/aarch64: Use mach_absolute_time as timer on apple platforms
This is much less precise than the cycle counter register, but
the cycle counter register is not available on apple platforms
(and on linux, it requires a kernel module for allowing user mode
access).

Signed-off-by: Martin Storsjö <martin@martin.st>
2021-02-21 22:41:34 +02:00
James Almer
ebdc5c419a Merge commit '41cf3e3b1ca375962951fde1b90a03b16197d205'
* commit '41cf3e3b1ca375962951fde1b90a03b16197d205':
  arm: Create proper .rdata sections for COFF

Merged-by: James Almer <jamrial@gmail.com>
2019-02-20 14:48:58 -03:00
James Almer
b2f32d60ee Merge commit '5584abf69d83169a010aca404cd1cf95c23ad9ef'
* commit '5584abf69d83169a010aca404cd1cf95c23ad9ef':
  arm: Emit .thumb_func directives

Merged-by: James Almer <jamrial@gmail.com>
2019-02-20 13:51:30 -03:00
Martin Storsjö
41cf3e3b1c arm: Create proper .rdata sections for COFF
As .rodata isn't one of the default created sections for COFF, it was
created as a read-write data section. By using the default .rdata
section name for COFF, it automatically becomes a read-only data section.
The existing ".section .rodata" works as intended for ELF though.

This is based on an original patch and diagnose by Tom Tan
<Tom.Tan@microsoft.com>.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-01-25 23:53:37 +02:00
Martin Storsjö
5584abf69d arm: Emit .thumb_func directives
Prior to Xcode 9.3, the clang built-in assembler didn't support
altmacro, and gas-preprocessor was used for assembling for arm/darwin.

For thumb functions, gas-preprocessor took care of adding the .thumb_func
directives, but when now being able to assemble without gas-preprocessor,
we need to add these directives ourselves.

Signed-off-by: Martin Storsjö <martin@martin.st>
2018-10-12 23:25:53 +03:00
James Almer
33bd2b99a1 Merge commit '3a7b4ae62c798edbd82bcd8fef863c74ed2acd4a'
* commit '3a7b4ae62c798edbd82bcd8fef863c74ed2acd4a':
  arm: Produce .const_data instead of .section .rodata for Mach-O

Merged-by: James Almer <jamrial@gmail.com>
2018-03-30 15:48:17 -03:00
Martin Storsjö
3a7b4ae62c arm: Produce .const_data instead of .section .rodata for Mach-O
This is the same combination of .section directives as used in
aarch64/asm.S.

Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.

Signed-off-by: Martin Storsjö <martin@martin.st>
2018-03-30 15:49:30 +03:00
James Almer
35347e7e9b Merge commit '4cf84e254ae75b524e1cacae499a97d7cc9e5906'
* commit '4cf84e254ae75b524e1cacae499a97d7cc9e5906':
  Drop some unnecessary config.h #includes

Merged-by: James Almer <jamrial@gmail.com>
2018-02-11 23:08:48 -03:00
Diego Biurrun
4cf84e254a Drop some unnecessary config.h #includes 2018-02-06 10:03:15 +01:00
Andrew D'Addesio
9b45bcf713 libavutil: Add saturating subtraction functions
Add av_sat_sub32 and av_sat_dsub32 as the subtraction analogues to
av_sat_add32/av_sat_dadd32.

Also clarify the formulas for dadd32/dsub32.

Signed-off-by: Andrew D'Addesio <modchipv12@gmail.com>
2017-12-04 07:28:45 +00:00
James Almer
71bf534dd6 Merge commit '59cee42d7d22530e66a155305389e29679b11f78'
* commit '59cee42d7d22530e66a155305389e29679b11f78':
  arm: Check for the .arch directive in configure

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 20:04:46 -03:00
Martin Storsjö
17f5171cd4 arm: Check for have_vfp_vm instead of !have_vfpv3 for float_dsp_vfp
This was missed in e2710e790c since those functions weren't exercised
by checkasm.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-10-24 09:06:56 +03:00
Martin Storsjö
f1fd12ef85 lavu/arm: Check for have_vfp_vm instead of !have_vfpv3 for float_dsp_vfp
This was missed in e754c8e8 / e2710e790c since those functions
weren't exercised by checkasm.

Fixes ticket #6766.
2017-10-23 13:29:12 +02:00
James Almer
3d828c9fd5 cpu: split flag checks per arch in av_cpu_max_align()
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-10-09 11:48:24 +02:00
James Almer
3b345d389b avutil/cpu: split flag checks per arch in av_cpu_max_align()
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-27 23:10:09 -03:00
Martin Storsjö
59cee42d7d arm: Check for the .arch directive in configure
When targeting windows, the .arch directive isn't available.

So far, when building for windows, we've always used gas-preprocessor,
both when using msvc's armasm and when using clang. Lately, clang/llvm
has implemented the last missing piece (altmacro support) for building
our assembly without gas-preprocessor. This means that we now build
for arm/windows with clang without any extra compatibility layer.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-05-08 14:21:08 +03:00
James Almer
d1ee6fb729 Merge commit '6a1ea4ec932f4fc9fdc00ec51ee070b298ddb35f'
* commit '6a1ea4ec932f4fc9fdc00ec51ee070b298ddb35f':
  arm: warn/error on movrelx usage problematic with PIC on ELF

Merged-by: James Almer <jamrial@gmail.com>
2017-04-04 16:04:29 -03:00
Clément Bœsch
a0860b0a38 Merge commit '6f9e34baea4f6f484392e4e67f606a0835d07b73'
* commit '6f9e34baea4f6f484392e4e67f606a0835d07b73':
  arm: Check for support for the .fpu directive

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-02-02 11:22:04 +01:00
Janne Grunau
6a1ea4ec93 arm: warn/error on movrelx usage problematic with PIC on ELF
The warning has false positives but our asm does not trigger it. For
new code false positives can only be avoided by changing the register
allocation.
2016-11-24 21:26:22 +01:00
Martin Storsjö
86c5a23ee5 arm: Clear the gp register alias at the end of functions
We reset .Lpic_gp to zero at the start of each function, which means
that the logic within movrelx for clearing gp when necessary will
be missed.

This fixes using movrelx in different functions with a different
helper register.

This is cherry-picked from libav commit
824e8c2840.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
Martin Storsjö
824e8c2840 arm: Clear the gp register alias at the end of functions
We reset .Lpic_gp to zero at the start of each function, which means
that the logic within movrelx for clearing gp when necessary will
be missed.

This fixes using movrelx in different functions with a different
helper register.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 14:01:04 +02:00
Martin Storsjö
6f9e34baea arm: Check for support for the .fpu directive
When targeting COFF (windows), clang doesn't support this
directive (while binutils supports it for all targets).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-21 12:52:10 +03:00
Timothy Gu
44304ae322 all: Add missing header guards 2016-01-28 19:49:48 -08:00
Hendrik Leppkes
a04a9434b5 Merge commit '73c8c0341cce9e1a6c4169721f5123f97fc4be2f'
* commit '73c8c0341cce9e1a6c4169721f5123f97fc4be2f':
  arm: Fix vfp dead code elimination with have_vfp_vm

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-19 08:51:12 +01:00
Martin Storsjö
73c8c0341c arm: Fix vfp dead code elimination with have_vfp_vm
This fixes builds with --disable-vfp.

Checking for the armv6 cpu flag is incorrect, since vfpv2 isn't
armv6 specific.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-01-08 23:52:59 +02:00
Hendrik Leppkes
e754c8e8ca Merge commit 'e2710e790c09e49e86baa58c6063af0097cc8cb0'
* commit 'e2710e790c09e49e86baa58c6063af0097cc8cb0':
  arm: add a cpu flag for the VFPv2 vector mode

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 11:01:29 +01:00
Janne Grunau
e2710e790c arm: add a cpu flag for the VFPv2 vector mode
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.

Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.
2015-12-14 16:42:35 +01:00
James Almer
36e1665d3d avutil/attributes: add AV_GCC_VERSION_AT_MOST
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-18 12:41:29 -03:00
Michael Niedermayer
d82d11397f avutil/arm/intmath: return int for uint8 / uint16 clip
The C functions return uint8/16_t but that is effectively int not unsigned int
Fixes fate-filter-tblend

We do not return uint8/16_t as that would require the compiler to truncate the
values, slowing it down.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-20 17:20:16 +02:00
Andreas Cadhalpun
5bf84a584e arm: only enable setend on ARMv6
Without this check it causes SIGILL crashes on ARMv5.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-06-05 17:14:10 +02:00
Michael Niedermayer
9a1884a10e Merge commit 'dcae2e32f7d8a1ca5fb8c1e4aa81313be854dd73'
* commit 'dcae2e32f7d8a1ca5fb8c1e4aa81313be854dd73':
  arm: Suppress tags about used cpu arch and extensions

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-07 19:30:51 +01:00
Martin Storsjö
dcae2e32f7 arm: Suppress tags about used cpu arch and extensions
When all the codepaths using manually set .arch/.fpu code is
behind runtime detection, the elf attributes should be suppressed.

This allows tools to know that the final built binary doesn't
strictly require these extensions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2015-03-07 17:10:08 +02:00
Michael Niedermayer
1253091d6f Merge commit '76ce9bd8e26dcb3652240a1072840ff4011d7cdc'
* commit '76ce9bd8e26dcb3652240a1072840ff4011d7cdc':
  libavutil: Add ARM av_clip_intp2_arm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-21 11:15:32 +01:00
Peter Meerwald
76ce9bd8e2 libavutil: Add ARM av_clip_intp2_arm
add ARM code for implementing av_clip_intp2 using the ssat instruction

on Cortex-A8, av_clip_intp2_arm() is faster than av_clip_intp2_c() and
the generic av_clip(), about -19%

Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-02-21 00:54:40 +01:00
Michael Niedermayer
16e65419ed Merge commit 'f963f80399deb1a2b44c1bac3af7123e8a0c9e46'
* commit 'f963f80399deb1a2b44c1bac3af7123e8a0c9e46':
  arm: Use .data.rel.ro for const data with relocations

Conflicts:
	configure

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-12-09 11:58:13 +01:00
Martin Storsjö
f963f80399 arm: Use .data.rel.ro for const data with relocations
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-12-09 11:43:25 +02:00
jessejiang
29d208d5d4 avutil/arm/float_dsp_init_vfp: replace restrict by av_restrict
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-20 11:17:42 +01:00
Michael Niedermayer
1e519b9d40 avutil: turn arm setend into a cpuflag
this allows disabling and enabling it
it also prevents crashes if vfpv3 and neon are disabled which previously
would have enabled the flag

And last but not least one can enable setend on cpus like cortex-a8 where
its fast but disabled by default

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-13 14:50:15 +02:00
Michael Niedermayer
7cdb3b2b79 Merge commit '6869612f5c7d4d2f20f69a5658328a761deadb1c'
* commit '6869612f5c7d4d2f20f69a5658328a761deadb1c':
  arm: Macroize the test for 'setend' CPU instruction support

Conflicts:
	libavcodec/arm/h264dsp_init_arm.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-22 12:46:13 +02:00
Ben Avison
6869612f5c arm: Macroize the test for 'setend' CPU instruction support
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-07-21 15:08:01 -07:00
Ben Avison
5a272190a0 armv6: Accelerate butterflies_float
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
same sample AAC stream:

                   Before          After
                   Mean   StdDev   Mean   StdDev  Confidence  Change
Audio decode       1542.8 43.7     1470.5 41.5    100.0%      +4.9%
butterflies_float  130.0  11.9     70.2   12.1    100.0%      +85.2%

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 01:34:38 +03:00
Ben Avison
5edad2c4a1 armv6: Accelerate vector_fmul_window
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:

                    Before          After
                    Mean   StdDev   Mean   StdDev  Confidence  Change
Audio decode        1598.2 47.4     1529.2 25.4    100.0%      +4.5%
vector_fmul_window  244.0  22.1     188.9  22.3    100.0%      +29.2%

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 01:34:31 +03:00
Ben Avison
57641410d1 armv6: Accelerate butterflies_float
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
same sample AAC stream:

                   Before          After
                   Mean   StdDev   Mean   StdDev  Confidence  Change
Audio decode       1542.8 43.7     1470.5 41.5    100.0%      +4.9%
butterflies_float  130.0  11.9     70.2   12.1    100.0%      +85.2%

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-16 21:38:02 +02:00
Ben Avison
649c666137 armv6: Accelerate vector_fmul_window
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:

                    Before          After
                    Mean   StdDev   Mean   StdDev  Confidence  Change
Audio decode        1598.2 47.4     1529.2 25.4    100.0%      +4.5%
vector_fmul_window  244.0  22.1     188.9  22.3    100.0%      +29.2%

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-16 21:37:41 +02:00
Michael Niedermayer
01983e50c0 Merge commit '7b0c7c9163fe3dd0081696befde28617119d2590'
* commit '7b0c7c9163fe3dd0081696befde28617119d2590':
  arm: Detect 32 bit cpu features on ARMv8 when running on a 64 bit kernel

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-28 21:31:18 +02:00
Martin Storsjö
7b0c7c9163 arm: Detect 32 bit cpu features on ARMv8 when running on a 64 bit kernel
When running on a 64 bit kernel, /proc/cpuinfo lists different
optional features than on 32 bit kernels (because some of them
are mandatory in the 64 bit implemenations).

The kernel does list the old features properly if they are queried
via /proc/self/auxv though - however this file is not always readable
(e.g. on most android systems). The getauxval function could also
provide the same info as /proc/self/auxv even if this file isn't
readable, but this function is not always available (and thus would
need to be loaded with dlsym for compatibility with older android
versions).

The android cpufeatures library does this slightly differently,
by assuming that these are available if the "CPU architecture"
line is >= 8, see [1] for details.

It has been suggested to include the old, non-optional features in
/proc/cpuinfo as well, but that suggested patch never was merged.
See [2] for the discussion around this suggestion.

[1] https://android-review.googlesource.com/91380
[2] http://marc.info/?l=linux-arm-kernel&m=139087240101974

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-06-28 22:16:59 +03:00
Michael Niedermayer
a40c338a00 Merge commit 'd5a55981986ac5d1a31aef3a8d16eaff8534a412'
* commit 'd5a55981986ac5d1a31aef3a8d16eaff8534a412':
  build: check if AS supports the '.func' directive

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-04 12:45:35 +02:00