1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-07 11:13:41 +02:00
Commit Graph

499 Commits

Author SHA1 Message Date
Diego Biurrun
054013a0fc dsputil: Move APE-specific bits into apedsp 2014-05-29 06:41:15 -07:00
Anton Khirnov
6a13505c06 mpegvideo: move the MpegEncContext fields used from arm asm to the beginning
This should reduce the frequency with which the offsets need to be
updated.
2014-04-29 14:49:42 +02:00
Janne Grunau
a88e1d1c59 lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets 2014-04-24 18:28:26 +02:00
Diego Biurrun
3dc6272bed Remove a number of unnecessary dsputil.h #includes 2014-04-04 19:08:05 +02:00
Janne Grunau
f37815b1d5 arm: asm decode_block_coeffs_internal is vp8 specific
Unbreaks compilation on arm due to conflicting types for
'ff_decode_block_coeffs_armv6'.
2014-04-04 10:39:29 +02:00
Peter Ross
ac4b32df71 On2 VP7 decoder
Further performance improvements and security fixes by
Vittorio Giovara, Luca Barbato and Diego Biurrun.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-04-04 04:00:11 +02:00
Diego Biurrun
c3a0b3eb64 arm: build: Maintain decoder objects separate from infrastructure objects 2014-03-27 03:00:05 -07:00
Ben Avison
3b5946bcce truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.
Profiling results for overall decode and the output_data function in
particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     339.6  15.1     329.3  16.0    95.8%       +3.1%  (insignificant)
6:2 function  24.6   6.0      9.9    3.1     100.0%      +148.5%
8:2 total     324.5  15.5     323.6  14.3    15.2%       +0.3%  (insignificant)
8:2 function  20.4   3.9      9.9    3.4     100.0%      +104.7%
6:6 total     572.8  20.6     539.9  24.2    100.0%      +6.1%
6:6 function  54.5   5.6      16.0   3.8     100.0%      +240.9%
8:8 total     741.5  21.2     702.5  18.5    100.0%      +5.6%
8:8 function  63.9   7.6      18.4   4.8     100.0%      +247.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:54:32 +02:00
Ben Avison
483321fe78 truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.
Profiling results for overall audio decode and the rematrix_channels function
in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     370.8  17.0     348.8  20.1    99.9%       +6.3%
6:2 function  46.4   8.4      45.8   6.6     18.0%       +1.2%  (insignificant)
8:2 total     343.2  19.0     339.1  15.4    54.7%       +1.2%  (insignificant)
8:2 function  38.9   3.9      40.2   6.9     52.4%       -3.2%  (insignificant)
6:6 total     658.4  15.7     604.6  20.8    100.0%      +8.9%
6:6 function  109.0  8.7      59.5   5.4     100.0%      +83.3%
8:8 total     896.2  24.5     766.4  17.6    100.0%      +16.9%
8:8 function  223.4  12.8     93.8   5.0     100.0%      +138.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:54:10 +02:00
Ben Avison
15a29c39d9 truehd: add hand-scheduled ARM asm version of mlp_filter_channel.
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     380.4  22.0     370.8  17.0    87.4%       +2.6%  (insignificant)
6:2 function  60.7   7.2      36.6   8.1     100.0%      +65.8%
8:2 total     357.0  17.5     343.2  19.0    97.8%       +4.0%  (insignificant)
8:2 function  60.3   8.8      37.3   3.8     100.0%      +61.8%
6:6 total     717.2  23.2     658.4  15.7    100.0%      +8.9%
6:6 function  140.4  12.9     81.5   9.2     100.0%      +72.4%
8:8 total     981.9  16.2     896.2  24.5    100.0%      +9.6%
8:8 function  193.4  15.0     103.3  11.5    100.0%      +87.2%

Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:53:52 +02:00
Diego Biurrun
322a1dda97 dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros 2014-03-22 06:17:29 -07:00
Diego Biurrun
82bb304801 dsputil: Use correct type in me_cmp_func function pointer 2014-03-20 05:03:23 -07:00
Diego Biurrun
0e083d7e43 build: Group general components separate from de/encoders in arch Makefiles
This is in line with how the top-level libavcodec Makefile is structured.
2014-03-20 05:03:23 -07:00
Diego Biurrun
5169e68895 dsputil: Propagate bit depth information to all (sub)init functions
This avoids recalculating the value over and over again.
2014-03-20 05:03:23 -07:00
Diego Biurrun
cf7a216757 arm: dsputil: K&R formatting cosmetics 2014-03-20 05:03:23 -07:00
Diego Biurrun
36b822b8be arm: dsputil: Drop restrict keyword from add_pixels_clamped_armv6 prototype
The function is assigned to a function pointer that does not have the
restrict keyword for that parameter.

This fixes compilation for MSVC builds that don't recognize "restrict",
broken since ed9625eb62.
2014-03-14 13:45:40 +01:00
Diego Biurrun
831a118078 Update dsputil- and SIMD-related comments to match reality more closely 2014-03-13 05:50:29 -07:00
Diego Biurrun
d1184b8110 arm: dsputil: Add a bunch of missing #includes 2014-03-13 05:50:28 -07:00
Diego Biurrun
49676eb730 dsputil: Remove prototypes for nonexisting optimization functions 2014-03-13 05:50:28 -07:00
Janne Grunau
5a7f382a5d armv6: vp8: use explicit labels in motion compensation asm
The integrated arm assembler in clang-503.0.38 (Xcode-5.1) fails
to assemble a branch to 'label + offset' in thumb mode.
2014-03-12 15:06:05 +01:00
Janne Grunau
634d9d8b39 arm: get_cabac inline asm
Based on the aarch64 asm. CPU cycle counts on cortex-a9 compared to
gcc 4.8.2:
before: 475 decicycles in get_cabac_noinline, 67106035 runs, 2829 skips
after:  393 decicycles in get_cabac_noinline, 67106474 runs, 2390 skips

Overall speedup is above 2%. Code generated by clang 3.4 is slower on
the same hardware and the relative change is a little larger.
2014-03-09 00:45:34 +01:00
Janne Grunau
4506a854a4 arm: vp3: remove incorrect const in ff_vp3_idct_dc_add_neon declaration
Was missed in aeaf268e52 when integrating
clear_blocks into the idct.
2014-03-09 00:45:33 +01:00
Janne Grunau
61985ad72c arm: hpeldsp: fix put_pixels8_y2_{,no_rnd_}armv6
The overread avoidance fix in cbddee1cca
broke the computation for the last row since it prevented the safe
reading from the height+1-th row.

CC: libav-stable@libav.org
2014-03-08 18:31:57 +01:00
Janne Grunau
cbddee1cca arm: hpeldsp: prevent overreads in armv6 asm
Based on a patch by Russel King <rmk+libav@arm.linux.org.uk>

Bug-Id: 646
CC: libav-stable@libav.org
2014-03-05 14:30:57 +01:00
Janne Grunau
6e4009d4cd arm: dcadsp: implement decode_hf as external NEON asm 2014-02-28 13:12:19 +01:00
Christophe Gisquet
4cb6964244 dcadec: simplify decoding of VQ high frequencies
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.

Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.

The decode_hf implementations have following timings:

For x86 Arrandale:
        C  SSE SSE2 SSE4
win32: 260 162  119  104
win64: 242 N/A   89   72

The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
2014-02-28 13:03:22 +01:00
Christophe Gisquet
87ec849fe9 dcadec: remove scaling in lfe_interpolation_fir
The scaling factor is constant so it is faster to scale the
FIR coefficients in the tables during compilation.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-28 13:00:47 +01:00
Martin Storsjö
d6eac2f1bc arm: Remove a stray .fpu directive
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-02-09 18:36:16 +01:00
Janne Grunau
5c1c6e8226 dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders 2014-02-08 13:38:36 +01:00
Christophe Gisquet
5fdbfcb5b7 dcadsp: split lfe_dir cases
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.

Arm asm changes by Janne Grunau.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-07 22:54:18 +01:00
Christophe Gisquet
2bd44cb705 dcadsp: add int8x8_fmul_int32 to dsp context
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.

Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.

On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-07 22:51:59 +01:00
Martin Storsjö
5bcbb516f2 arm: Add X() around all references to extern symbols
Don't rely on the fact that an unprefixed label currently exists.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-02-07 15:13:58 +02:00
Martin Storsjö
49ec551595 vp8: Use 2 registers for dst_stride and src_stride in neon bilin filter
Based on a patch by Ronald S. Bultje.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-02-06 09:32:26 +02:00
Diego Biurrun
7151c5d04a arm: Use full filenames as multiple inclusion guards 2014-01-14 00:04:52 +01:00
Martin Storsjö
44a0a98f92 arm: Add an option for making sure NEON registers aren't clobbered
This is pretty much based on the same test for XMM registers.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-01-11 00:03:00 +02:00
Martin Storsjö
952d3187d8 arm: Add a missing # as prefix for an immediate constant
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-01-07 19:30:13 +02:00
Martin Storsjö
5dae487235 arm: Allow overriding the alignment set in the function macro
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).

The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.

This restores the original intention, to align the loop entry
points.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-01-07 19:29:56 +02:00
Martin Storsjö
b7b932f5e3 arm: Remove a leftover define for the pld instruction
This file no longer uses the pld instruction at all, all such uses
have been split into hpeldsp_arm.S.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-01-07 19:29:42 +02:00
Martin Storsjö
67bb3a4e28 arm: cosmetics: Reindent the h264dsp neon init function
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-01-07 19:29:31 +02:00
Diego Biurrun
794fcf79a8 Rename CONFIG_FFT_FLOAT ---> FFT_FLOAT
The define does not originate from configure, so it should not
have a name that is CONFIG_-prefixed.
2014-01-06 19:12:48 +01:00
Anton Khirnov
a03a642d5c h264: do not use 422 functions for monochrome
Fixes invalid memory access.

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
2014-01-06 08:25:36 +01:00
Martin Storsjö
3348e3492d arm: Use the matching endfunc macro instead of the assembler directive directly
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-01-04 13:53:08 +02:00
Martin Storsjö
2ad4ee345a arm: Add a missing endfunc macro call
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-01-04 13:53:02 +02:00
Martin Storsjö
d307e408d4 arm: Don't clobber callee saved registers in scalarproduct
q4-q7/d8-d15 are supposed to not be clobbered by the callee.

CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-12-20 20:48:30 +02:00
Mason Carter
b254490bda vc1: arm: Add NEON no_rnd chroma MC
Apply David Conrad's old patch to the modern codebase.

http://ffmpeg.org/pipermail/ffmpeg-devel/2009-April/059877.html

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-12-20 14:53:42 +02:00
Mason Carter
832e190632 vc1: arm: Add NEON assembly
For:

ff_vc1_inv_trans_{8,4}x{8,4}_{dc_,}neon
ff_put_pixels8x8_neon
ff_put_vc1_mspel_mc{0,1,2,3}{0,1,2,3}_neon (except for 00)

Based on ARM assembly code in libavcodec/arm by Rob Clark and Mans
Rullgard.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-12-20 14:53:39 +02:00
Diego Biurrun
4958f35a2e dsputil: Move apply_window_int16 to ac3dsp
The (optimized) functions are used nowhere else.
2013-12-08 17:57:15 +01:00
Diego Biurrun
f0389eb777 arm: fmtconvert: Split armv6 fmtconvert code off from vfp code 2013-08-29 11:24:14 +02:00
Diego Biurrun
bd549cbaac arm: dcadsp: Move synth filter initialization to dcadsp file 2013-08-29 11:24:14 +02:00
Diego Biurrun
f407856968 arm: h264chroma: Do not compile h264_chroma_mc* dependent on h264 decoder
The functions are used by all codecs that enable the h264chroma component
and the file is already compiled conditional on h264chroma being enabled.
2013-08-23 17:21:14 +02:00
Diego Biurrun
8506ff97c9 vp56: Mark VP6-only optimizations as such.
Most of our VP56 optimizations are VP6-only and will stay that way.
So avoid compiling them for VP5-only builds.
2013-08-23 14:42:19 +02:00
Ben Avison
45e10e5c8d arm: Add assembly version of h264_find_start_code_candidate
Before          After
               Mean   StdDev   Mean   StdDev  Change
This function   508.8 23.4      185.4  9.0    +174.4%
Overall        3068.5 31.7     2752.1 29.4     +11.5%

In combination with the preceding patch:
                Before          After
                Mean   StdDev   Mean   StdDev  Change
Overall         2925.6 26.2     2752.1 29.4     +6.3%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-08-08 12:08:34 +03:00
Martin Storsjö
54ba52077c arm: Comment out unused labels in simple_idct_arm
When building for iOS in thumb mode, gas-preprocessor.pl doesn't
mark unused labels as thumb functions (as it does for other
local labels, where it can figure out that they are functions
due to being referenced in branch instructions). This leads to
linker warnings for some of those local labels, such as:

ld: warning: ARM function not 4-byte aligned: __a_evaluation from
libavcodec/libavcodec.a(simple_idct_arm.o)

Therefore, comment them out since they don't have any function.
They do still have a value in documenting key points in the
assembly source though.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-24 22:43:21 +03:00
Martin Storsjö
69e6702c01 arm: Mangle external symbols properly in new vfp assembly files
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 14:48:30 +03:00
Ben Avison
ff30d12159 arm: Add VFP-accelerated version of qmf_32_subbands
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   1323.0  98.0      746.2  60.6   +77.3%
Overall        15400.0 336.4    14147.5 288.4    +8.9%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:44 +03:00
Martin Storsjö
8b9eba664e arm: Add VFP-accelerated version of fft16
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   1389.3  4.2       967.8  35.1   +43.6%
Overall        15577.5 83.2     15400.0 336.4    +1.2%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:41 +03:00
Martin Storsjö
ba6836c966 arm: Add VFP-accelerated version of dca_lfe_fir
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function    868.2  33.5      436.0  27.0   +99.1%
Overall        15973.0 223.2    15577.5  83.2    +2.5%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:39 +03:00
Martin Storsjö
b63bb251ea arm: Add VFP-accelerated version of imdct_half
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   2653.0  28.5     1108.8  51.4   +139.3%
Overall        17049.5 408.2    15973.0 223.2     +6.7%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:37 +03:00
Ben Avison
d6e4f5fef0 arm: Add VFP-accelerated version of int32_to_float_fmul_array8
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function    366.2  18.3      277.8  13.7   +31.9%
Overall        18420.5 489.1    17049.5 408.2    +8.0%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:36 +03:00
Ben Avison
ce9ed10ac2 arm: Add VFP-accelerated version of int32_to_float_fmul_scalar
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   1175.0   4.4      366.2  18.3   +220.8%
Overall        19285.5 292.0    18420.5 489.1     +4.7%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:30 +03:00
Ben Avison
41ef1d360b arm: Add VFP-accelerated version of synth_filter_float
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   9295.0 114.9     4853.2 83.5    +91.5%
Overall        23699.8 397.6    19285.5 292.0   +22.9%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:17 +03:00
Christophe Gisquet
b6293e2798 fmtconvert: Explicitly use int32_t instead of int
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-17 11:02:47 +03:00
Martin Storsjö
86113667c0 arm: Include hpeldsp_neon.o if h264qpel is enabled
A few of the h264qpel neon functions are shared with other
hpeldsp functions in this file.

This fixes standalone compilation of the h264 decoder on arm.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:37 +03:00
Martin Storsjö
efb7968cfe arm: Don't unconditionally build dsputil files
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:35 +03:00
Martin Storsjö
36a7df8cf1 arm: Only build the FFT init files if FFT is enabled
This fixes build errors in cases where FFT is disabled.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:33 +03:00
Diego Biurrun
186599ffe0 build: cosmetics: Place unconditional before conditional OBJS lines
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:31 +03:00
Diego Biurrun
9b9b2e9f30 build: arm: cosmetics: Place all OBJS declarations in alphabetical order
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:27 +03:00
Diego Biurrun
383fd4d478 arm: Drop unnecessary ff_ name prefixes from static functions 2013-04-30 16:02:03 +02:00
Ronald S. Bultje
7384b7a713 arm: hpeldsp: Move half-pel assembly from dsputil to hpeldsp
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-19 23:19:08 +03:00
Ronald S. Bultje
015821229f vp3: Use full transpose for all IDCTs
This way, the special IDCT permutations are no longer needed. This
is similar to how H264 does it, and removes the dsputil dependency
imposed by the scantable code.

Also remove the unused type == 0 cases from the plain C version
of the idct.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-15 12:32:05 +03:00
Ronald S. Bultje
62844c3fd6 h264: Integrate clear_blocks calls with IDCT
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-10 11:03:06 +03:00
Luca Barbato
a8b6015823 dsputil: convert remaining functions to use ptrdiff_t strides
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-03-12 18:26:42 +01:00
Diego Biurrun
c242bbd8b6 Remove unnecessary dsputil.h #includes 2013-02-26 00:51:34 +01:00
Diego Biurrun
76b19a3984 Fix a number of incorrect intmath.h #includes. 2013-02-26 00:51:34 +01:00
Diego Biurrun
3e85b46ecf arm: vp8: Add missing #includes for header to compile standalone 2013-02-20 14:24:07 +01:00
Diego Biurrun
82bd04b170 rv34: Drop now unnecessary dsputil dependencies 2013-02-06 11:30:54 +01:00
Diego Biurrun
79dad2a932 dsputil: Separate h264chroma 2013-02-06 11:30:53 +01:00
Diego Biurrun
c9f933b5b6 Add av_cold attributes to arch-specific init functions 2013-02-05 17:01:05 +01:00
Diego Biurrun
25841dfe80 Use ptrdiff_t instead of int for {avg, put}_pixels line_size parameter.
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
2013-02-05 12:59:12 +01:00
Diego Biurrun
6c1a7d07eb Use proper "" quotes for local header #includes 2013-02-01 12:51:15 +01:00
Martin Storsjö
2026eb1408 arm: vp8: Fix the plain-armv6 version of vp8_luma_dc_wht
This makes the plain-armv6 version use the same registers as the
armv6t2 version above.

This fixes fate-vp8 on plain-armv6 devices.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-01-27 13:17:25 +02:00
Diego Biurrun
33552a5f7b arm: Add mathops.h to ARCH_HEADERS list
It is an arch-specific header not suitable for standalone compilation.
2013-01-24 20:59:22 +01:00
Janne Grunau
8e148b8742 arm: h264qpel: use neon h264 qpel functions only if supported 2013-01-24 17:06:52 +01:00
Mans Rullgard
e9d817351b dsputil: Separate h264 qpel
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-24 10:44:43 +01:00
Ronald S. Bultje
baf35bb4bc dsputil: remove one array dimension from avg_no_rnd_pixels_tab. 2013-01-22 18:41:36 -08:00
Ronald S. Bultje
32ff643228 dsputil: remove avg_no_rnd_pixels8.
This is never used.
2013-01-22 18:41:36 -08:00
Diego Biurrun
88bd7fdc82 Drop DCTELEM typedef
It does not help as an abstraction and adds dsputil dependencies.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2013-01-22 18:32:56 -08:00
Diego Biurrun
73b704ac60 arm: Add some missing header #includes 2013-01-22 21:24:10 +01:00
Ronald S. Bultje
d56668bd80 floatdsp: move scalarproduct_float from dsputil to avfloatdsp.
This makes the aac decoder and all voice codecs independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
5959bfaca3 floatdsp: move butterflies_float from dsputil to avfloatdsp.
This makes wmadec/enc, twinvq and mpegaudiodec (i.e. mp2/mp3)
independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
42d3246948 floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
55aa03b9f8 floatdsp: move vector_fmul_add from dsputil to avfloatdsp. 2013-01-22 11:55:42 -08:00
Ronald S. Bultje
1768e43ceb vorbisdsp: change block_size type from int to intptr_t.
This saves one instruction in the x86-64 assembly.
2013-01-20 22:26:42 -08:00
Janne Grunau
68f18f0351 videodsp_armv5te: remove #if HAVE_ARMV5TE_EXTERNAL
libavutil/arm/asm.S sets '.arch' depending on HAVE_ARMV5TE so that
assembling armv5te code will always succeed even if the default -march
flag does not support it. HAVE_ARMV5TE_EXTERNAL tests assembling code
with the default arch.
Fixes the missing symbol ff_prefetch_arm with --cpu= not including
armv5te.

CC: libav-stable@libav.org
2013-01-20 15:20:00 +01:00
Ronald S. Bultje
fef906c77c Move vorbis_inverse_coupling from dsputil to vorbisdspcontext.
Conveniently (together with Justin's earlier patches), this makes
our vorbis decoder entirely independent of dsputil.
2013-01-19 22:21:10 -08:00
Ronald S. Bultje
aeaf268e52 vp3: integrate clear_blocks with idct of previous block.
This is identical to what e.g. vp8 does, and prevents the function call
overhead (plus dependency on dsputil for this particular function).

Arm asm updated by Janne Grunau <janne-libav@jannau.net>.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2013-01-19 22:04:55 -08:00
Justin Ruggles
e034cc6c60 lavc: Move vector_fmul_window to AVFloatDSPContext
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-16 10:45:45 +01:00
Luca Barbato
6906b19346 lavc: add missing files for arm
Across the many retouches those did not make the main commit.
2012-12-20 14:07:23 +01:00
Ronald S. Bultje
8c53d39e7f lavc: introduce VideoDSPContext
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-20 13:40:45 +01:00
Diego Biurrun
523c7bd23c misc typo, style and wording fixes 2012-12-18 13:36:51 +01:00
Mans Rullgard
b326755989 arm: rename ARMVFP config symbol to VFP
This is consistent with usual ARM nomenclature as well as with the
VFPV3 and NEON symbols which both lack the ARM prefix.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 16:54:04 +00:00
Mans Rullgard
a7831d509f arm: use HAVE*_INLINE/EXTERNAL macros for conditional compilation
These macros reflect the actual capabilities required here.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 16:54:03 +00:00
Mans Rullgard
92dad6687f arm: fix use of uninitialised value in ff_fft_fixed_init_arm()
When initialising an FFTContext for a plain FFT, mdct_bits is not set
and can contain a garbage value.  Since nbits is always valid and for
MDCT operation is mdct_bits - 2 checking this instead avoids using an
uninitialised value while having the same effect.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 13:11:57 +00:00
Justin Ruggles
284ea790d8 dsputil: move vector_fmul_scalar() to AVFloatDSPContext in libavutil 2012-11-26 11:29:06 -05:00
Ronald S. Bultje
95c89da36e Use ptrdiff_t instead of int for intra pred "stride" function parameter.
This way, SIMD-optimized functions don't have to sign-extend their
stride argument manually to be able to do pointer arithmetic.
2012-10-29 17:49:13 -07:00
Mans Rullgard
1846ddf0a7 ARM: fix overreads in neon h264 chroma mc
The loops were reading ahead one line, which could end up outside the
buffer for reference blocks at the edge of the picture.  Removing
this readahead has no measurable performance impact.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-20 01:28:38 +01:00
Diego Biurrun
9734b8ba56 Move avutil tables only used in libavcodec to libavcodec. 2012-10-11 18:29:36 +02:00
Jean-Baptiste Kempf
507dce2536 arm: call arm-specific rv34dsp init functions under if (ARCH_ARM)
Assign NEON specific function pointers after runtime check via
av_get_cpu_flags().

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-10-10 15:28:50 +02:00
Diego Biurrun
ac56ff9cc9 build: non-x86: Only compile mpegvideo optimizations when necessary 2012-10-09 14:45:59 +02:00
Mans Rullgard
5e826fd65e ARM: set Tag_ABI_align_preserved in all asm files
All our ARM asm preserves alignment so setting this attribute
in a common location is simpler.  This removes numerous warnings
when linking with armcc.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-02 19:47:56 +01:00
Mans Rullgard
a27a690fac ARM: swap source operands in some add instructions
This allows using a 16-bit opcode when generating Thumb2 code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-09-20 17:07:18 +01:00
Mans Rullgard
7689eea49a flacdsp: arm optimised lpc filter 2012-09-15 23:54:21 +01:00
Anton Khirnov
36ef5369ee Replace all CODEC_ID_* with AV_CODEC_ID_* 2012-08-07 16:00:24 +02:00
Mans Rullgard
e6cd698955 ARMv6: vp8: fix stack allocation with Apple's assembler
In the GNU assembler, a relational expression, bizarrely, has the
value -1 if true, whereas in Apple's it is +1.  This patch makes
sure the correct expression is used in both cases.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-04 00:59:14 +01:00
Mans Rullgard
9829a81bcd ARM: vp56: allow inline asm to build with clang
The clang integrated assembler does not support pre-UAL syntax,
while gcc requires pre-UAL syntax for ARM code.  A patch[1] for
clang to support the old syntax as well has been ignored since
January.

This patch chooses the syntax appropriate for each compiler,
allowing both to build the code.  Notably, this change allows
building for iphone with the latest Apple Xcode update.

[1] http://llvm.org/bugs/show_bug.cgi?id=11855

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-04 00:59:14 +01:00
Mans Rullgard
faa788227f ARM: use =const syntax instead of explicit literal pools
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-01 10:32:24 +01:00
Mans Rullgard
998170913c ARM: use standard syntax for all LDRD/STRD instructions
The standard syntax requires two destination registers for
LDRD/STRD instructions.  Some versions of the GNU assembler
allow using only one with the second implicit, others are
more strict.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-01 10:32:24 +01:00
Mans Rullgard
28f9ab7029 vp3: move idct and loop filter pointers to new vp3dsp context
This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context.  There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-18 10:32:19 +01:00
Mans Rullgard
ab9f987661 build: add CONFIG_VP3DSP, reduce repetition in OBJS lists
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-18 10:32:18 +01:00
Mans Rullgard
62634158b7 ARM: generate position independent code to access data symbols
This creates proper position independent code when accessing
data symbols if CONFIG_PIC is set.

References to external symbols should now use the movrelx macro.
Some additional code changes are required since this macro may
need a register to hold the GOT pointer.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-01 11:25:06 +01:00
Justin Ruggles
cb5042d02c float_dsp: Move vector_fmac_scalar() from libavcodec to libavutil 2012-06-18 18:01:14 -04:00
Justin Ruggles
d5a7229ba4 Add a float DSP framework to libavutil
Move vector_fmul() from DSPContext to AVFloatDSPContext.
2012-06-08 13:14:38 -04:00
Justin Ruggles
94d2b0d2fd ARM: Move asm.S from libavcodec to libavutil
This will allow for easier implementation of ARM-optimized functions in
libraries other than libavcodec.
2012-06-08 13:14:38 -04:00
Mans Rullgard
e54e6f25cf arm/neon: dsputil: use correct size specifiers on vld1/vst1
Change the size specifiers to match the actual element sizes
of the data.  This makes no practical difference with strict
alignment checking disabled (the default) other than somewhat
documenting the code.  With strict alignment checking on, it
avoids trapping the unaligned loads.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-05-10 22:56:37 +01:00
Mans Rullgard
2eba6898c9 arm: dsputil: prettify some conditional instructions in put_pixels macros
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-05-10 22:56:09 +01:00
Mans Rullgard
cbc7d60afa arm: dsputil: fix overreads in put/avg_pixels functions
The vertically interpolating variants of these functions read
ahead one line to optimise the loop.  On the last line processed,
this might be outside the buffer.  Fix these invalid reads by
processing the last line outside the loop.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-05-10 14:39:34 +01:00
Mans Rullgard
96f7590efd aacps: NEON optimisations
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-05-05 22:04:21 +01:00
Mans Rullgard
3d11c2d76d vp8: armv6: fix non-armv6t2 build
The assembler may fail to place literal pools close enough to
instructions referencing them.  An explicit .ltorg directive
fixes this.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-25 23:16:31 +01:00
Mans Rullgard
e4ac031233 vp8: armv6 optimisations
Based on patch by Ronald S. Bultje <rsbultje@gmail.com>,
partially ported from libvpx.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-25 21:41:39 +01:00
Mans Rullgard
b692d246ea vp8: arm: separate ARMv6 functions from NEON
This is a preparation for complete ARMv6 optimisations.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-25 21:41:39 +01:00
Mans Rullgard
dac78fd1d7 ARM: add some compatibility macros
This adds some macros simplifying Thumb and pre-v6T2 compatibility.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-25 21:41:39 +01:00
Mans Rullgard
d526c5338d ARM: allow runtime masking of CPU features
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-22 12:30:45 +01:00
Mans Rullgard
2bcbd98459 Remove lowres video decoding
This feature is complex, of questionable utility, and slows down
normal decoding.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-21 18:56:19 +01:00
Diego Biurrun
7bb3a302fe build: Consistently handle conditional compilation for all optimization OBJS. 2012-04-12 09:00:49 +02:00
Christophe GISQUET
272b252c01 rv40dsp: implement prescaled versions for biweight.
Quite often, the original weights are multiple of 512. By prescaling them
by 1/512 when they are computed (once per frame), no intermediate shifting
is needed, and no prescaling on each call either.

The x86 code already used that trick.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-10 10:06:48 -07:00
Diego Biurrun
3dde147ff9 cosmetics: Consistently place static, inline and av_cold attributes/keywords. 2012-04-04 14:54:13 +02:00
Janne Grunau
363bd1c62c remove iwmmxt optimizations
The were broken since August of 2010 without anyone noticing until
three weeks ago. Nobody cares about it anymore and hopefully Marvell
will support NEON like in the PXA978 from now on.
2012-03-12 22:46:56 +01:00
Christophe GISQUET
7e1ce6a6ac dsputil: remove shift parameter from scalarproduct_int16
There is only one caller, which does not need the shifting. Other use cases
are situations where different roundings would be needed.

The x86 and neon versions are modified accordingly.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-03-07 10:29:52 -08:00
Ronald S. Bultje
bd66f073fe vp8: change int stride to ptrdiff_t stride.
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
2012-03-02 10:31:50 -08:00
Christophe GISQUET
2e74a5abc2 SBR DSP: use intptr_t for the ixh parameter.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-02-23 15:48:40 -08:00
Ronald S. Bultje
3ab9a2a557 rv34: change most "int stride" into "ptrdiff_t stride".
This prevents having to sign-extend on 64-bit systems with 32-bit ints,
such as x86-64. Also fixes crashes on systems where we don't do it and
arguments are not in registers, such as Win64 for all weight functions.
2012-02-20 14:58:25 -08:00
Martin Storsjö
efd29844eb mpegvideo: Add ff_ prefix to nonstatic functions
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-02-15 22:07:23 +02:00
Martin Storsjö
9cf0841ef3 dsputil: Add ff_ prefix to the dsputil*_init* functions
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-02-15 22:06:34 +02:00
Diego Biurrun
aa06d65693 arm: Add missing #include to vp8.h to fix a make checkheaders warning. 2012-02-09 12:26:47 +01:00
Diego Biurrun
32f3c541bc doxygen: Do not include license boilerplates in Doxygen comment blocks. 2012-02-06 19:39:24 +01:00
Mans Rullgard
cd2f98f365 ARM: ac3: fix ac3_bit_alloc_calc_bap_armv6
This function was broken when the start bin was not at the start
of a band.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-02-02 18:50:42 +00:00
Mans Rullgard
be822d77b6 aacsbr: ARM NEON optimised sbrdsp functions
Overall speedup of HE-AAC decoding 2.3x on Cortex-A8, 1.2x on A9.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-01-28 14:56:18 +00:00
Felipe Contreras
c3d5e290ca ARM: fix build with FFT enabled and MDCT disabled
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-01-20 16:14:01 +00:00
Janne Grunau
9e12002f11 rv34: add NEON rv34_idct_add
Overall almost 4% faster, idct_add down from 350 to 85 cycles, idct_dc_add
down from 83 to 30 cycles.

squash: rv34 idct rearrange partial register loads
2012-01-16 19:26:41 +01:00
Christophe GISQUET
9ba9c34024 rv34: 1-pass inter MB reconstruction
Implement 1-pass inverse transform and reconstruction for inter blocks.
2012-01-16 19:26:41 +01:00