1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00
Commit Graph

2336 Commits

Author SHA1 Message Date
Mirage Abeysekara
5eb4f95bef h264pred: added AVX2 implementation for tm_vp8 16x16.
checkasm --bench results with 5000 runs

pred16x16_tm_vp8_c: 302.8
pred16x16_tm_vp8_mmx: 101.4
pred16x16_tm_vp8_mmxext: 95.5
pred16x16_tm_vp8_sse2: 95.1
pred16x16_tm_vp8_avx2: 38.2

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-03-20 09:45:42 -04:00
James Almer
6966a5e4d7 Merge commit '721d57e608dc4fd6c86f27c5ae76ef559d646220'
* commit '721d57e608dc4fd6c86f27c5ae76ef559d646220':
  vp56: Separate VP5 and VP6 dsp initialization

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 17:15:24 -03:00
James Almer
663640d745 Merge commit '3fd22538bc0e0de84b31335266b4b1577d3d609e'
* commit '3fd22538bc0e0de84b31335266b4b1577d3d609e':
  prores: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:30:13 -03:00
James Almer
aec42ebc27 Merge commit 'f81be06cf614919d71ded29b8f595bef40123ad8'
* commit 'f81be06cf614919d71ded29b8f595bef40123ad8':
  cavs: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:23:52 -03:00
James Almer
4e4dfcac58 Merge commit '802727b538b484e3f9d1345bfcc4ab24cfea8898'
* commit '802727b538b484e3f9d1345bfcc4ab24cfea8898':
  vp8: Update some assembly comments left unchanged in bd66f073fe

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:18:31 -03:00
James Almer
4004d33fcb Merge commit 'd9d26a3674f31f482f54e936fcb382160830877a'
* commit 'd9d26a3674f31f482f54e936fcb382160830877a':
  vp56: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 14:54:25 -03:00
Clément Bœsch
6a42a54b9d Merge commit '6892df9294d93322d43255ada299507465bc93c8'
* commit '6892df9294d93322d43255ada299507465bc93c8':
  vp3: Change type of stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-19 18:41:26 +01:00
Clément Bœsch
8695ce73ca Merge commit 'e2b9993558b6adee42dcc6eb385a14943aaca974'
* commit 'e2b9993558b6adee42dcc6eb385a14943aaca974':
  simple_idct: x86: Drop disabled IDCT implementation

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-19 16:11:11 +01:00
Clément Bœsch
8286c359ad Merge commit 'e99ecda55082cb9dde8fd349361e169dc383943a'
* commit 'e99ecda55082cb9dde8fd349361e169dc383943a':
  checkasm: add vp9 MC tests.
  vp9mc/x86: sse2 MC assembly.
  vp9mc/x86: add AVX and AVX2 MC
  vp9mc/x86: rename ff_* to ff_vp9_*
  vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
  vp9mc/x86: simplify a few inits.
  vp9mc/x86: add 16px functions (64bit only).

Noop (aside from a formatting comment in vp9mc.asm). We already have all
of this. We should consider making a final diff between the two projects
when the dust comes down.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:25:39 +01:00
Clément Bœsch
a4f5e79f7c Merge commit '89466de4aeaf5e359489b81b8a9920a2bc7936d6'
* commit '89466de4aeaf5e359489b81b8a9920a2bc7936d6':
  vp9/x86: rename vp9dsp to vp9mc

File was already renamed, only the top description is updated.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:10:47 +01:00
James Almer
e632fe9bab Merge commit '3c504bc3599f00bfc5923adc114beef34bce11d0'
* commit '3c504bc3599f00bfc5923adc114beef34bce11d0':
  x86: deduplicate some constants

Merged-by: James Almer <jamrial@gmail.com>
2017-03-15 22:07:28 -03:00
Michael Niedermayer
835d9f299c avcodec/x86/cavsdsp: Put MMX code under mmx check
Without this the FPU state becomes trashed and causes mysterious
fate failures with cpuflags=0

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-06 16:47:17 +01:00
James Darnley
33de0fee2c avcodec/h264: enable sse2 chroma deblock/loop filter functions
Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad.
Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
2017-02-27 13:22:06 +01:00
James Darnley
cd893b9307 avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
0e16b3e2be avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
987ffe4b8d avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter
~1.14x faster (90 vs 78 cycles) compared with mmxext
2017-02-27 13:22:06 +01:00
James Darnley
88307b3eec avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
ac096fc82d avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5c56758843 avcodec/h264: add avx 8-bit chroma v deblock/loop filter
~1.24x faster (101 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5336887867 avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
x86-64 only

Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)

Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)

Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx:  ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley
e18bc2114f avcodec/h264: add named parameters to x86 function 2017-02-18 20:26:50 +01:00
James Darnley
9d815b7424 avcodec/x86: deduplicate PASS8ROWS macro 2017-02-18 20:26:49 +01:00
James Almer
c8467abbad x86/rv34dsp: add ff_rv34_idct_dc_add_sse2
Also disable ff_rv34_idct_dc_add_mmx on x86_64 as the presence of sse2
is guaranteed in such builds.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-02-02 17:51:21 -03:00
James Almer
ab5c4d006d x86/vp8dsp: add ff_vp8_idct_dc_add_sse2
Also disable ff_vp8_idct_dc_add_mmx on x86_64 as the presence of sse2
is guaranteed in such builds.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-02-02 17:18:58 -03:00
Michael Niedermayer
536ac72f46 Revert "Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553'"
The assumption this is based on is wrong, the code is not always run with bitexact flags

This reverts commit a956164e1e, reversing
changes made to f6005907fd.

Approved-by: James Almer <jamrial@gmail.com>
2017-02-01 02:01:07 +01:00
James Almer
ba5d089381 Merge commit 'd06dfaa5cbdd20acfd2364b16c0f4ae4ddb30a65'
* commit 'd06dfaa5cbdd20acfd2364b16c0f4ae4ddb30a65':
  x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:36:49 -03:00
James Almer
ac774cfa57 Merge commit '4efab89332ea39a77145e8b15562b981d9dbde68'
* commit '4efab89332ea39a77145e8b15562b981d9dbde68':
  x86: Use *_FAST/*_SLOW CPU feature detection macros where appropriate

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:08:19 -03:00
James Almer
a956164e1e Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553'
* commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553':
  x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:59:29 -03:00
James Almer
f6005907fd Merge commit '95c1df929b92d81454656c222a35ec5f7db576b4'
* commit '95c1df929b92d81454656c222a35ec5f7db576b4':
  x86: hpeldsp: Drop unused function parameters

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:56:11 -03:00
James Almer
4d0e89ce27 Merge commit 'c3e83ad3b7d75f3597f47ada2616ba4479665009'
* commit 'c3e83ad3b7d75f3597f47ada2616ba4479665009':
  x86: hpeldsp: Use EXTERNAL_SSE2_FAST where appropriate

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:53:27 -03:00
James Almer
ca8a3978e5 Merge commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5'
* commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5':
  x86: hpeldsp: Split off VP3-specific bits into a separate file

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:49:29 -03:00
Clément Bœsch
7c300a8ed4 lavc/hevc: remove a few random spaces to reduce diff with libav 2017-01-31 17:02:24 +01:00
Clément Bœsch
78d16eb452 Merge commit 'fca3c3b61952aacc45e9ca54d86a762946c21942'
* commit 'fca3c3b61952aacc45e9ca54d86a762946c21942':
  hevc: Add AVX2 DC IDCT

Mostly noop as we already have that code.

In the ASM, code is merged with the exception of SECTION which is kept
uppercase for consistency with the rest of the codebase.

Still in the ASM, the prototype comment is fixed to honor the '_' added
from the original commit.

idct_dc_proto() is dropped as it's not used anymore here.

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31 16:53:37 +01:00
Clément Bœsch
d0e132bab6 Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d'
* commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d':
  hevc: Separate adding residual to prediction from IDCT

This commit should be a noop but isn't because of the following renames:

- transform_add  → add_residual
- transform_skip → dequant
- idct_4x4_luma  → transform_4x4_luma

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31 15:31:34 +01:00
Anton Khirnov
b4a911c189 mpegvideoenc: make a table const 2017-01-19 09:52:21 +01:00
James Almer
6d4c9f2ade lossless_videodsp: rename add_hfyu_left_pred_int16 to add_left_pred_int16
Signed-off-by: James Almer <jamrial@gmail.com>
2017-01-12 22:53:05 -03:00
James Almer
47f212329e huffyuvdsp: move functions only used by huffyuv from lossless_videodsp
Signed-off-by: James Almer <jamrial@gmail.com>
2017-01-12 22:53:05 -03:00
James Almer
cf9ef83960 huffyuvencdsp: move shared functions to a new lossless_videoencdsp context
Signed-off-by: James Almer <jamrial@gmail.com>
2017-01-12 22:53:04 -03:00
James Almer
30c1f27299 huffyuvencdsp: move functions only used by huffyuv from lossless_videodsp
Signed-off-by: James Almer <jamrial@gmail.com>
2017-01-12 22:53:04 -03:00
James Almer
5ac1dd8e23 lossless_videodsp: move shared functions from huffyuvdsp
Several codecs other than huffyuv use them.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-01-12 22:53:04 -03:00
Michael Niedermayer
aa95292043 avcodec/x86/vc1dsp_mc: Fix build with NASM 2.09.10
make fate passes

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-02 22:37:55 +01:00
John Comeau
d06518752b avcodec/x86/imdct36: fix building with nasm 2.11.05
fixes `operation size not specified` errors as described here:
http://stackoverflow.com/questions/36854583/compiling-ffmpeg-for-kali-linux-2

I rebuilt again with yasm and made sure it didn't break that.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-02 20:44:16 +01:00
Paul B Mahol
6d09d6edbc avcodec/magicyuv: add 10 bit support
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2016-12-20 13:32:15 +01:00
James Darnley
acdd2d805d avcodec/h264: resolve assert being triggered when stack is not aligned
32-bit msvc.
2016-12-07 22:32:19 +01:00
James Darnley
728651df06 avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter
Yorkfield:
 - mmx2: 2.53x (504 vs. 199 cycles)
 - sse2: 3.83x (504 vs. 131 cycles)

Nehalem:
 - mmx2: 2.42x (365 vs. 151 cycles)
 - sse2: 3.56x (365 vs. 103 cycles)

Skylake:
 - mmx2: 1.81x (308 vs. 170 cycles)
 - sse2: 2.84x (308 vs. 108 cycles)
 - avx:  2.93x (308 vs. 105 cycles)
2016-12-07 00:29:13 +01:00
James Darnley
add21d0bb3 avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
Yorkfield:
 - mmx2: 2.45x (279 vs. 114 cycles)
 - sse2: 3.36x (279 vs.  83 cycles)

Nehalem:
 - mmx2: 2.10x (192 vs.  92 cycles)
 - sse2: 2.84x (192 vs.  68 cycles)

Skylake:
 - mmx2: 1.75x (170 vs.  97 cycles)
 - sse2: 2.47x (170 vs.  69 cycles)
 - avx:  2.47x (170 vs.  69 cycles)
2016-12-07 00:29:13 +01:00
James Darnley
58ca2ef62e whitespace changes after last commit 2016-12-07 00:29:13 +01:00
James Darnley
f33714a694 avcodec/h264: clean up and expand x86 function definitions 2016-12-07 00:29:13 +01:00
Diego Biurrun
0a35f128f3 cabac: x86: Give optimizations header a more meaningful name 2016-12-01 08:23:54 +01:00
James Darnley
13d71c28cc avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions
Yorkfield:
 - sse2:
   - complex: 4.13x faster (1514 vs. 367 cycles)
   - simple:  4.38x faster (1836 vs. 419 cycles)

Skylake:
 - sse2:
   - complex: 3.61x faster ( 936 vs. 260 cycles)
   - simple:  3.97x faster (1126 vs. 284 cycles)
 - avx (versus sse2):
   - complex: 1.07x faster (260 vs. 244 cycles)
   - simple:  1.03x faster (284 vs. 274 cycles)
2016-11-30 22:58:28 +01:00
James Darnley
1dae7ffa0b avcodec/h264: mmx 4:2:2 idct add8 function
2.87 times faster (1830 vs. 638 cycles)
2016-11-30 22:58:27 +01:00
James Darnley
815ea8c6cc avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter
2.1 times faster (401 vs. 194 cycles)
2016-11-30 22:58:27 +01:00
James Almer
2de1c79b61 x86/vp9itxfm: add missing AVX2 guards
Fixes compilation with Yasm 1.1.0 and older.

Signed-off-by: James Almer <jamrial@gmail.com>
2016-11-18 17:01:11 -03:00
Ronald S. Bultje
83a139e3d8 vp9: add avx2 iadst16 implementations.
Also a small cosmetic change to the avx2 idct16 version to make it
explicit that one of the arguments to the write-out macros is unused
for >=avx2 (it uses pmovzxbw instead of punpcklbw).
2016-11-15 11:01:36 -05:00
Hendrik Leppkes
db854c6c4a Merge commit '4a081f224e12f4227ae966bcbdd5384f22121ecf'
* commit '4a081f224e12f4227ae966bcbdd5384f22121ecf':
  libavcodec: fix constness in clobber test avcodec_open2() wrappers

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-13 17:30:33 +01:00
Diego Biurrun
0361e4dcb4 h264_qpel: x86: Move function with only one instance out of template macro
libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]
2016-11-08 17:21:02 +01:00
Diego Biurrun
3cba09e522 x86: Drop stray semicolons after function definitions
libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
2016-11-05 12:41:45 +01:00
Martin Storsjö
2e55e26b40 vp9: Flip the order of arguments in MC functions
This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Pierre Edouard Lepere
6d5636ad9a hevc: x86: Add add_residual() SIMD optimizations
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.

Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-10-22 17:33:35 +02:00
Andreas Cadhalpun
c8a6eb58d7 doc: fix spelling errors
Thanks to Mathieu Malaterre <malat@debian.org> for reporting the
Que/Queue typo. (https://bugs.debian.org/839542)

Reviewed-by: Lou Logan <lou@lrcd.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-10-21 23:58:47 +02:00
Diego Biurrun
788544ff0e audiodsp: x86: Remove pointless header file
Its single forward declaration can be moved to the only place
it is used, like is done for all other dsp init files.
2016-10-19 15:20:41 +02:00
Diego Biurrun
b89804da9b x86: videodsp: Add parentheses to expression to work around warning
libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds
2016-10-19 10:13:34 +02:00
Rostislav Pehlivanov
d2ae5f77c6 aacenc: add SIMD optimizations for abs_pow34 and quantization
Performance improvements:

quant_bands:
with:     681 decicycles in quant_bands, 8388453 runs,    155 skips
without: 1190 decicycles in quant_bands, 8388386 runs,    222 skips
Around 42% for the function

Twoloop coder:

abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder

Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder

Fast coder:

abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder

Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>
2016-10-18 21:41:18 +01:00
Diego Biurrun
6be7944ee2 x86: Add missing colons after assembly labels
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00
Alexandra Hájková
112cee0241 hevc: Add SSE2 and AVX IDCT
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-11 18:21:04 +02:00
Anton Khirnov
e4128c08d7 Revert "hevc: x86: Refactor IDCT macro declarations"
This reverts commit d9dccc0389. There were
outstanding objections to this commit.
2016-10-06 15:24:04 +02:00
Diego Biurrun
5801f9ed24 h264_intrapred: x86: Update comments left behind in 95c89da36e 2016-10-06 12:32:34 +02:00
Diego Biurrun
d9dccc0389 hevc: x86: Refactor IDCT macro declarations 2016-10-06 12:32:34 +02:00
Ronald S. Bultje
715f139c9b vp9lpf/x86: make filter_16_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
8915320db9 vp9lpf/x86: make filter_48/84/88_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
725a216481 vp9lpf/x86: make filter_44_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
5bfa96c4b3 vp9lpf/x86: make filter_16_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
b905e8d2fe vp9lpf/x86: make filter_48/84_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
37637e6590 vp9lpf/x86: make filter_88_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
be10834bd9 vp9lpf/x86: make filter_44_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
7c62891efe vp9lpf/x86: save one register in SIGN_ADD/SUB.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
c6375a83d1 vp9lpf/x86: store unpacked intermediates for filter6/14 on stack.
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
4ce8ba72f9 vp9lpf/x86: move variable assigned inside macro branch.
The value is not used outside the branch.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
e4961035b2 vp9lpf/x86: simplify ABSSUM_CMP by inverting the comparison meaning.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
683da2788e vp9lpf/x86: remove unused register from ABSSUB_CMP macro.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6e74e9636b vp9lpf/x86: slightly simplify 44/48/84/88 h stores.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6411c328a2 vp9lpf/x86: make cglobal statement more conservative in register allocation.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
a6e288d624 vp9lpf/x86: save one register in loopfilter surface coverage.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
0ed21bdc9e vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
f2e3d706a1 vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}().
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
92d47550ea vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16
Similar gains as the ssse3 version once again

Additional improvements by Clément Bœsch <u@pkh.me>.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
6bea478158 vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
1f451eed60 vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2().
Similar gains in performance as the SSSE3 version

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
a692724c58 vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
42111e8543 avcodec: fix arguments on xmm/neon clobber test wrappers
Signed-off-by: James Almer <jamrial@gmail.com>
2016-10-02 02:15:47 -03:00
James Almer
449f263f9f avcodec: add missing xmm/neon clobber test wrappers for the new encode API
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-10-01 14:08:50 -03:00
Justin Ruggles
b57e38f52c ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm
Adds a wrapper function for downmixing which detects channel count changes
and updates the selected downmix function accordingly.

Simplification and porting to current x86inc infrastructure by Diego Biurrun.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-01 00:46:25 +02:00
Justin Ruggles
43717469f9 ac3dsp: Reverse matrix in/out order in downmix()
Also use (float **) instead of (float (*)[2]). This matches the matrix
layout in libavresample so we can reuse assembly code between the two.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-01 00:45:55 +02:00
Hendrik Leppkes
8d1267932c x86/h264_weight: use appropriate register size for weight parameters
This fixes decoding corruption on 64 bit windows.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-09-30 12:18:22 +03:00
Diego Biurrun
2caa93b813 mpegaudiodsp: Change type of array stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
2016-09-29 17:54:24 +02:00
Diego Biurrun
e4a94d8b36 h264chroma: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.
2016-09-29 14:48:04 +02:00
Diego Biurrun
2ec9fa5ec6 idct: Change type of array stride parameters to ptrdiff_t
ptrdiff_t is the correct type for array strides and similar.
2016-09-29 14:48:03 +02:00
Diego Biurrun
009adfd4fb x86: fpel: Remove unnecessary sign extend 2016-09-29 14:47:41 +02:00
Anton Khirnov
de2ae3c1fa lavc: add clobber tests for the new encoding/decoding API 2016-09-28 10:01:52 +02:00
Hendrik Leppkes
5ae0ad001a x86/h264_weight: use appropriate register size for weight parameters
Fixes trac 5579

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Acked-by: Michael Niedermayer <michael@niedermayer.cc>
2016-09-23 16:40:57 +02:00