1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-03-03 14:32:16 +02:00

2334 Commits

Author SHA1 Message Date
James Almer
933dd62288 x86/aacpsdsp: optimize ff_ps_mul_pair_single_sse
~2% faster.
2017-06-04 23:29:56 -03:00
James Almer
be3809a521 x86/aacpsdsp: optimize ff_ps_stereo_interpolate_sse3
Move the unpacking outside of the loop. 5% to 10% faster.

Suggested-by: ubitux
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-03 12:39:43 -03:00
James Almer
b5a0971ff0 x86/aacps: add ff_ps_stereo_interpolate_ipdopd_sse3()
About 2x faster than the c version.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-02 11:06:24 -03:00
James Darnley
0dea0114fb avcodec/x86/idctdsp_init: reindent 2017-05-30 13:20:44 +02:00
James Darnley
8e89f6fd37 avcodec/x86: move simple_idct to external assembly 2017-05-30 13:20:42 +02:00
Clément Bœsch
584366a436 lavc/mpegvideoenc: reformat inv_zigzag_direct16 so the zigzag pattern is visible 2017-05-19 11:17:58 +02:00
Clément Bœsch
19bb2cade5 Merge commit 'b4a911c189962e563a09fb0efaf6fa9ab56263a4'
* commit 'b4a911c189962e563a09fb0efaf6fa9ab56263a4':
  mpegvideoenc: make a table const

Merged-by: Clément Bœsch <u@pkh.me>
2017-05-19 11:15:16 +02:00
James Darnley
7aa90b4e94 avcodec/h264: add sse2 versions of previous idct functions
Kaby Lake Pentium:
 - ff_h264_idct_add_8_sse2:    ~1.18x faster than mmxext
 - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext
2017-05-15 15:00:20 +02:00
James Darnley
27460dfebc avcodec/h264: add avx 8-bit h264_idct_dc_add
Haswell:
 - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext

Skylake-U:
 - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext
2017-05-15 15:00:19 +02:00
James Darnley
f61d454ca1 avcodec/h264: add avx 8-bit h264_idct_add
Haswell:
 - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext

Skylake-U:
 - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
2017-05-15 15:00:17 +02:00
James Darnley
b5325c6711 avcodec/h264: use some 3 operand forms 2017-05-15 15:00:16 +02:00
James Darnley
060ba9e5e3 avcodec/h264: change RETs into REP_RETs where appropriate 2017-05-15 15:00:15 +02:00
Michael Niedermayer
fa8fd0808f avcodec/x86/vc1dsp_init: Fix build failure with --disable-optimizations and clang
compilers doing DCE at -O0 do not necessarily understand "complex" boolean expressions
Build succeeds with this change, this was the only failure

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-04-27 04:25:31 +02:00
Clément Bœsch
5be1440c74 Merge commit '0a35f128f3c6e0ae9a0a2236c557602c108da269'
* commit '0a35f128f3c6e0ae9a0a2236c557602c108da269':
  cabac: x86: Give optimizations header a more meaningful name

Merged-by: Clément Bœsch <u@pkh.me>
2017-04-08 14:30:13 +02:00
Ronald S. Bultje
83ae7e6350 x86/idctdsp_init: reindent. 2017-04-06 10:03:28 -04:00
Ronald S. Bultje
e0c205677f x86/simple_idct: add explicit sse2 simple_idct_put/add versions.
These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations.
This way we don't need to use the ff_put/add_pixels_clamped function
pointers.
2017-04-06 10:03:28 -04:00
Ronald S. Bultje
2f0591cfa3 cavs: add a sse2 idct implementation.
This makes using the function pointer ff_add_pixels_clamped() unnecessary,
since we always know what the best implementation is at compile-time.
2017-04-06 10:03:28 -04:00
Ronald S. Bultje
c9d98c5649 cavs: convert idct from inline asm to yasm. 2017-04-06 10:03:27 -04:00
Ronald S. Bultje
b51d7d89f8 x86/xvididct: remove use of ff_put/add_pixels_clamped function pointer.
Since there's separate SSE2 implementations of xvid_idct_put/add, this
patch has no practical impact on performance.
2017-04-06 10:03:27 -04:00
James Almer
6171f178e7 x86/hevc_add_res: merge last remaining changes from 3d6535983282bea542dac2e568ae50da5796be34
See https://lists.libav.org/pipermail/libav-devel/2016-October/079829.html
2017-03-31 20:49:45 -03:00
Clément Bœsch
1ea0df14c3 Merge commit '0361e4dcb4d394c88c33364415a3b8fe315b67d1'
* commit '0361e4dcb4d394c88c33364415a3b8fe315b67d1':
  h264_qpel: x86: Move function with only one instance out of template macro

Note: warning is present with clang.

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-31 09:44:04 +02:00
Ronald S. Bultje
f8c019944d vp9: re-split the decoder/format/dsp interface header files.
The advantage here is that the internal software decoder interface is
not exposed to the DSP functions or the hardware accelerations.
2017-03-28 18:04:26 -04:00
Clément Bœsch
1c9f4b5078 lavc/vp9: split into vp9{block,data,mvs}
This is following Libav layout to ease merges.
2017-03-27 21:38:21 +02:00
Michael Niedermayer
73fb40dc87 avcodec/x86/idctdsp: Remove duplicate include
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-26 19:17:30 +02:00
James Almer
ac42f08099 x86/hevc_add_res: merge missing changes from 3d6535983282bea542dac2e568ae50da5796be34
Unrolling the loops triplicates the size of the assembled output
while not generating any gain in performance.
2017-03-24 11:24:18 -03:00
Clément Bœsch
3d65359832 Merge commit '6d5636ad9ab6bd9bedf902051d88b7044385f88b'
* commit '6d5636ad9ab6bd9bedf902051d88b7044385f88b':
  hevc: x86: Add add_residual() SIMD optimizations

See a6af4bf64dae46356a5f91537a1c8c5f86456b37

This merge is only cosmetics (renames, space shuffling, etc).

The functionnal changes in the ASM are *not* merged:
- unrolling with %rep is kept
- ADD_RES_MMX_4_8 is left untouched: this needs investigation

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-24 12:33:25 +01:00
Clément Bœsch
40ac226014 lavc/x86/hevc: rename hevc_res_add to hevc_add_res
This will simplify incoming merge.
2017-03-24 11:45:23 +01:00
James Almer
bac44a5020 Merge commit 'b89804da9bad2d94dd95bf20ac6187447e9c17e9'
* commit 'b89804da9bad2d94dd95bf20ac6187447e9c17e9':
  x86: videodsp: Add parentheses to expression to work around warning

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:35:49 -03:00
James Almer
29db87af52 Merge commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4'
* commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4':
  x86: Add missing colons after assembly labels

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:05:27 -03:00
Clément Bœsch
947230837c Merge commit '112cee0241f5799edff0e4682b9e8639b046dc78'
* commit '112cee0241f5799edff0e4682b9e8639b046dc78':
  hevc: Add SSE2 and AVX IDCT

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 15:58:46 +01:00
Clément Bœsch
733b13ad66 Merge commit 'e4128c08d786eb5513578e8c6063671ba03226ab'
* commit 'e4128c08d786eb5513578e8c6063671ba03226ab':
  Revert "hevc: x86: Refactor IDCT macro declarations"

So apparently this was technically correct be reverted due to
authorship. Reverted as well in FFmpeg for now...

See http://lists.libav.org/pipermail/libav-devel/2016-October/079560.html

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 12:03:25 +01:00
Clément Bœsch
4bb4fa28e3 Merge commit '5801f9ed245ca5ebb57b0b5183de7a24aaece133'
* commit '5801f9ed245ca5ebb57b0b5183de7a24aaece133':
  h264_intrapred: x86: Update comments left behind in 95c89da36ebeeb96b7146c0d70f46c582397da7f

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 11:58:01 +01:00
Clément Bœsch
9954d5b44e Merge commit 'd9dccc03890a976dba59d66ed3b5aceeaa33d14c'
* commit 'd9dccc03890a976dba59d66ed3b5aceeaa33d14c':
  hevc: x86: Refactor IDCT macro declarations

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 11:54:53 +01:00
James Almer
30cadfe071 avcodec/lossless_videodsp: use ptrdiff_t for length parameters
Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-22 18:38:35 -03:00
Clément Bœsch
af607b7e07 lavc/huffyuvdsp: only transmit the pix_fmt instead of the whole avctx
Only the pixel format is required in that init function. This will also
simplify the incoming merge.
2017-03-22 16:22:20 +01:00
Clément Bœsch
c66bd8f3ff Merge commit 'b57e38f52cc3f31a27105c28887d57cd6812c3eb'
* commit 'b57e38f52cc3f31a27105c28887d57cd6812c3eb':
  ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 12:49:29 +01:00
Clément Bœsch
e39d4ff150 Merge commit '43717469f9daa402f6acb48997255827a56034e9'
* commit '43717469f9daa402f6acb48997255827a56034e9':
  ac3dsp: Reverse matrix in/out order in downmix()

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 11:29:46 +01:00
James Almer
aee046a895 x86/audiodsp: remove an unnecessary movss 2017-03-22 00:14:56 -03:00
James Almer
9a0fbb9ca9 Merge commit '2caa93b813adc5dbb7771dfe615da826a2947d18'
* commit '2caa93b813adc5dbb7771dfe615da826a2947d18':
  mpegaudiodsp: Change type of array stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 16:04:22 -03:00
James Almer
a8474df944 Merge commit 'e4a94d8b36c48d95a7d412c40d7b558422ff659c'
* commit 'e4a94d8b36c48d95a7d412c40d7b558422ff659c':
  h264chroma: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 15:20:45 -03:00
James Almer
5a49097b42 Merge commit '2ec9fa5ec60dcd10e1cb10d8b4e4437e634ea428'
* commit '2ec9fa5ec60dcd10e1cb10d8b4e4437e634ea428':
  idct: Change type of array stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 14:29:52 -03:00
Clément Bœsch
f54da138e9 Merge commit '009adfd4fbdd78a890a4a65d6f141c467bb027fa'
* commit '009adfd4fbdd78a890a4a65d6f141c467bb027fa':
  x86: fpel: Remove unnecessary sign extend

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-21 15:02:31 +01:00
Clément Bœsch
ad98af27f7 Merge commit 'de2ae3c1fae5a2eb539b9abd7bc2a9ca8c286ff0'
* commit 'de2ae3c1fae5a2eb539b9abd7bc2a9ca8c286ff0':
  lavc: add clobber tests for the new encoding/decoding API

The merge only re-order what we already have.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-21 14:43:53 +01:00
Clément Bœsch
83cd80d10a Merge commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5'
* commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5':
  audiodsp/x86: yasmify vector_clipf_sse
  audiodsp: reorder arguments for vector_clipf

Merged the version from Libav after a discussion with James Almer on
IRC:

19:22 <ubitux> jamrial: opinion on 12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5?
19:23 <ubitux> it was apparently yasmified differently
19:23 <ubitux> (it depends on the previous commit arg shuffle)
19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw
19:24 <ubitux> it's a port from 1d36defe94c7d7ebf995d4dbb4f878d06272f9c6
19:25 <jamrial> seems better thanks to said arg shuffle
19:25 <jamrial> the loop is the same, but init is simpler
19:25 <jamrial> probably worth merging
19:25 <ubitux> OK
19:25 <ubitux> thanks
19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh
19:26 <ubitux> yeah indeed

Both commits are merged at the same time to prevent a conflict with our
existing yasmified ff_vector_clipf_sse.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 22:35:07 +01:00
Clément Bœsch
43a4c729d4 Merge commit '75d98e30afab61542faab3c0f11880834653bd6b'
* commit '75d98e30afab61542faab3c0f11880834653bd6b':
  audiodsp/x86: clear the high bits of the order parameter on 64bit

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:44:00 +01:00
Clément Bœsch
072fad7cf5 Merge commit '1d6c76e11febb58738c9647c47079d02b5e10094'
* commit '1d6c76e11febb58738c9647c47079d02b5e10094':
  audiodsp/x86: fix ff_vector_clip_int32_sse2

No functionnal changes, only cosmetics. This issue was fixed in
9a9e2f1c8aa4539a261625145e5c1f46a8106ac2.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:42:37 +01:00
Clément Bœsch
e07fa3008b Merge commit 'de452e503734ebb0fdbce86e9d16693b3530fad3'
* commit 'de452e503734ebb0fdbce86e9d16693b3530fad3':
  pixblockdsp: Change type of stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 15:58:32 +01:00
Ilia
2f3d10a01a avcodec/vp9: avx2 implementation of ipred_dl_16x16_16
vp9_diag_downleft_16x16_10bpp_c: 263.0
vp9_diag_downleft_16x16_10bpp_sse2: 44.7
vp9_diag_downleft_16x16_10bpp_ssse3: 32.5
vp9_diag_downleft_16x16_10bpp_avx: 31.9
vp9_diag_downleft_16x16_10bpp_avx2: 25.7
vp9_diag_downleft_16x16_12bpp_c: 264.7
vp9_diag_downleft_16x16_12bpp_sse2: 44.4
vp9_diag_downleft_16x16_12bpp_ssse3: 32.0
vp9_diag_downleft_16x16_12bpp_avx: 32.4
vp9_diag_downleft_16x16_12bpp_avx2: 25.5

Benchmarked with 10000 runs

Signed-off-by: Ilia <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-03-20 09:47:43 -04:00
Mirage Abeysekara
5eb4f95bef h264pred: added AVX2 implementation for tm_vp8 16x16.
checkasm --bench results with 5000 runs

pred16x16_tm_vp8_c: 302.8
pred16x16_tm_vp8_mmx: 101.4
pred16x16_tm_vp8_mmxext: 95.5
pred16x16_tm_vp8_sse2: 95.1
pred16x16_tm_vp8_avx2: 38.2

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-03-20 09:45:42 -04:00
James Almer
6966a5e4d7 Merge commit '721d57e608dc4fd6c86f27c5ae76ef559d646220'
* commit '721d57e608dc4fd6c86f27c5ae76ef559d646220':
  vp56: Separate VP5 and VP6 dsp initialization

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 17:15:24 -03:00