Diego Biurrun
0361e4dcb4
h264_qpel: x86: Move function with only one instance out of template macro
...
libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]
2016-11-08 17:21:02 +01:00
Diego Biurrun
3cba09e522
x86: Drop stray semicolons after function definitions
...
libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
2016-11-05 12:41:45 +01:00
Martin Storsjö
2e55e26b40
vp9: Flip the order of arguments in MC functions
...
This makes it match the pattern already used for VP8 MC functions.
This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Pierre Edouard Lepere
6d5636ad9a
hevc: x86: Add add_residual() SIMD optimizations
...
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-10-22 17:33:35 +02:00
Diego Biurrun
788544ff0e
audiodsp: x86: Remove pointless header file
...
Its single forward declaration can be moved to the only place
it is used, like is done for all other dsp init files.
2016-10-19 15:20:41 +02:00
Diego Biurrun
b89804da9b
x86: videodsp: Add parentheses to expression to work around warning
...
libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds
2016-10-19 10:13:34 +02:00
Diego Biurrun
6be7944ee2
x86: Add missing colons after assembly labels
...
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00
Alexandra Hájková
112cee0241
hevc: Add SSE2 and AVX IDCT
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-11 18:21:04 +02:00
Anton Khirnov
e4128c08d7
Revert "hevc: x86: Refactor IDCT macro declarations"
...
This reverts commit d9dccc0389
. There were
outstanding objections to this commit.
2016-10-06 15:24:04 +02:00
Diego Biurrun
5801f9ed24
h264_intrapred: x86: Update comments left behind in 95c89da36e
2016-10-06 12:32:34 +02:00
Diego Biurrun
d9dccc0389
hevc: x86: Refactor IDCT macro declarations
2016-10-06 12:32:34 +02:00
Ronald S. Bultje
715f139c9b
vp9lpf/x86: make filter_16_h work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
8915320db9
vp9lpf/x86: make filter_48/84/88_h work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
725a216481
vp9lpf/x86: make filter_44_h work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
5bfa96c4b3
vp9lpf/x86: make filter_16_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
b905e8d2fe
vp9lpf/x86: make filter_48/84_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
37637e6590
vp9lpf/x86: make filter_88_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
be10834bd9
vp9lpf/x86: make filter_44_v work on 32-bit.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
7c62891efe
vp9lpf/x86: save one register in SIGN_ADD/SUB.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
c6375a83d1
vp9lpf/x86: store unpacked intermediates for filter6/14 on stack.
...
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
4ce8ba72f9
vp9lpf/x86: move variable assigned inside macro branch.
...
The value is not used outside the branch.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
e4961035b2
vp9lpf/x86: simplify ABSSUM_CMP by inverting the comparison meaning.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
683da2788e
vp9lpf/x86: remove unused register from ABSSUB_CMP macro.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6e74e9636b
vp9lpf/x86: slightly simplify 44/48/84/88 h stores.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6411c328a2
vp9lpf/x86: make cglobal statement more conservative in register allocation.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
a6e288d624
vp9lpf/x86: save one register in loopfilter surface coverage.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
0ed21bdc9e
vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
f2e3d706a1
vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}().
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
92d47550ea
vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16
...
Similar gains as the ssse3 version once again
Additional improvements by Clément Bœsch <u@pkh.me>.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
6bea478158
vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
1f451eed60
vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2().
...
Similar gains in performance as the SSSE3 version
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
a692724c58
vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Justin Ruggles
b57e38f52c
ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm
...
Adds a wrapper function for downmixing which detects channel count changes
and updates the selected downmix function accordingly.
Simplification and porting to current x86inc infrastructure by Diego Biurrun.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-01 00:46:25 +02:00
Justin Ruggles
43717469f9
ac3dsp: Reverse matrix in/out order in downmix()
...
Also use (float **) instead of (float (*)[2]). This matches the matrix
layout in libavresample so we can reuse assembly code between the two.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-01 00:45:55 +02:00
Hendrik Leppkes
8d1267932c
x86/h264_weight: use appropriate register size for weight parameters
...
This fixes decoding corruption on 64 bit windows.
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-09-30 12:18:22 +03:00
Diego Biurrun
2caa93b813
mpegaudiodsp: Change type of array stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
2016-09-29 17:54:24 +02:00
Diego Biurrun
e4a94d8b36
h264chroma: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
2016-09-29 14:48:04 +02:00
Diego Biurrun
2ec9fa5ec6
idct: Change type of array stride parameters to ptrdiff_t
...
ptrdiff_t is the correct type for array strides and similar.
2016-09-29 14:48:03 +02:00
Diego Biurrun
009adfd4fb
x86: fpel: Remove unnecessary sign extend
2016-09-29 14:47:41 +02:00
Anton Khirnov
de2ae3c1fa
lavc: add clobber tests for the new encoding/decoding API
2016-09-28 10:01:52 +02:00
Anton Khirnov
12004a9a7f
audiodsp/x86: yasmify vector_clipf_sse
2016-09-22 09:47:52 +02:00
Anton Khirnov
683da86aab
audiodsp: reorder arguments for vector_clipf
...
This will make the x86 asm simpler.
ARM conversion by Martin Storsjö <martin@martin.st> and Janne Grunau
<janne-libav@jannau.net>
2016-09-22 09:47:52 +02:00
Anton Khirnov
eea9857bfd
blockdsp: drop the high_bit_depth parameter
...
It has no effect, since the code is supposed to operate the same way for
any bit depth.
2016-09-22 09:47:52 +02:00
Anton Khirnov
75d98e30af
audiodsp/x86: clear the high bits of the order parameter on 64bit
...
Also change shl to add, since it can be faster on some CPUs.
CC: libav-stable@libav.org
2016-09-19 19:18:07 +02:00
Anton Khirnov
1d6c76e11f
audiodsp/x86: fix ff_vector_clip_int32_sse2
...
This version, which is the only one doing two processing cycles per loop
iteration, computes the load/store indices incorrectly for the second
cycle.
CC: libav-stable@libav.org
2016-09-19 19:18:07 +02:00
Diego Biurrun
de452e5037
pixblockdsp: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "stride" everywhere.
2016-09-14 14:12:36 +02:00
Diego Biurrun
721d57e608
vp56: Separate VP5 and VP6 dsp initialization
...
VP5 has no arch-specific optimizations (nor will it get some in the
future), so it makes no sense to try to share dsp init code with VP6.
2016-08-26 11:50:22 +02:00
Diego Biurrun
3fd22538bc
prores: Change type of stride parameters to ptrdiff_t
...
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
Also adjust parameter names to be "linesize" everywhere.
2016-08-26 11:50:21 +02:00
Diego Biurrun
f81be06cf6
cavs: Change type of stride parameters to ptrdiff_t
...
ptrdiff_t is the correct type for array strides and similar.
2016-08-26 11:48:15 +02:00
Diego Biurrun
802727b538
vp8: Update some assembly comments left unchanged in bd66f073fe
2016-08-26 11:36:53 +02:00