1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-29 22:00:58 +02:00

2158 Commits

Author SHA1 Message Date
Hendrik Leppkes
5ae0ad001a x86/h264_weight: use appropriate register size for weight parameters
Fixes trac 5579

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Acked-by: Michael Niedermayer <michael@niedermayer.cc>
2016-09-23 16:40:57 +02:00
Michael Niedermayer
bc26fe8927 avcodec/h264: Use ptrdiff_t for (bi)weight functions
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-09-23 04:10:44 +02:00
James Almer
d950279cbf avcodec/ttadsp: cosmetics
Clean some header includes and use the same naming scheme as
in ttaencdsp

Signed-off-by: James Almer <jamrial@gmail.com>
2016-08-06 18:27:01 -03:00
James Almer
efc9d5c4bc x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4}
Signed-off-by: James Almer <jamrial@gmail.com>
2016-08-02 15:48:04 -03:00
Clément Bœsch
15b26e88cb Merge commit '9df889a5f116c1ee78c2f239e0ba599c492431aa'
* commit '9df889a5f116c1ee78c2f239e0ba599c492431aa':
  h264: rename h264.[ch] to h264dec.[ch]

Merged-by: Clément Bœsch <u@pkh.me>
2016-07-29 11:01:36 +02:00
Ronald S. Bultje
a4edaa0270 vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters.
Each takes about 0.1% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
2016-07-26 15:59:07 -04:00
Ronald S. Bultje
7ca422bb1b vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters.
Each takes about 0.5% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
2016-07-26 15:59:07 -04:00
Ronald S. Bultje
726501a34e vp9: add 32x32 idct AVX2 implementation.
About 1.8x speedup compared to AVX version for full IDCT. Other
sub-IDCT scenarios also see speedups. Full --bench output for
idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles):

nop: 16.5
vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4
vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0
vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4
vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1
vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2
vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8
vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2
vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9
vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5
vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2
vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1
vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1
vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7
vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7
vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1
vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4
vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8
vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5
vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0
vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4
vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7
vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7
vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4
vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7
vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5
vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6
vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6
vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9
vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6
vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0
2016-07-26 15:59:07 -04:00
James Almer
7a15cf42ee x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-07-20 13:43:38 -03:00
Rostislav Pehlivanov
df1dc52195 diracdsp_init: add missing ARCH_X86_64 check
That SIMD is still x86_64 only for now.

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2016-07-12 00:39:12 +01:00
Rostislav Pehlivanov
bd61f3c6bf diracdsp: add SIMD for the 10 bit version of put_signed_rect_clamped
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
2016-07-11 23:33:24 +01:00
Rostislav Pehlivanov
80721cc1ff diracdsp: add dequantization SIMD
Currently unused, to be used in the following commits.

Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
2016-07-11 23:30:11 +01:00
Ronald S. Bultje
f0a2b6249b vp9: add 16x16 idct avx2 (8-bit).
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
that it's about 1.65x as fast as the AVX version for the full IDCT, and
similar speedups for the sub-IDCTs:

nop: 24.6
vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
2016-07-11 10:14:58 -04:00
Clément Bœsch
84ecbbfb27 Merge commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c'
* commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c':
  x86: Add missing movsxd for the int stride parameter

Merged-by: Clément Bœsch <u@pkh.me>
2016-07-09 14:52:23 +02:00
James Almer
645489cf90 x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32
About 10% faster.

Signed-off-by: James Almer <jamrial@gmail.com>
2016-07-05 17:48:20 -03:00
James Almer
293484fa5e avcodec: add missing xmm/neon clobber test wrappers for the new decode API
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-07-03 18:04:30 -03:00
Matthieu Bouron
9eb3da2f99 asm: FF_-prefix internal macros used in inline assembly
See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
2016-06-27 17:21:18 +02:00
Hendrik Leppkes
c142dc203e Merge commit 'dc40a70c5755bccfb1a1349639943e1f408bea50'
* commit 'dc40a70c5755bccfb1a1349639943e1f408bea50':
  Drop unnecessary libavutil/x86/asm.h #includes

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-06-26 15:53:00 +02:00
Clément Bœsch
5d48e4eafa Merge commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196'
* commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196':
  tests: Move all test programs to a subdirectory

Merged-by: Clément Bœsch <clement@stupeflix.com>
2016-06-22 13:44:34 +02:00
Clément Bœsch
8ef57a0d61 Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'
* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb':
  cosmetics: Fix spelling mistakes

Merged-by: Clément Bœsch <u@pkh.me>
2016-06-21 21:55:34 +02:00
Anton Khirnov
9df889a5f1 h264: rename h264.[ch] to h264dec.[ch]
This is more consistent with the naming of other decoders.
2016-06-21 11:11:26 +02:00
Martin Storsjö
f1a9eee41c x86: Add missing movsxd for the int stride parameter
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-17 00:11:21 +03:00
James Almer
ede4ec1f8f x86/aacpsdsp: optimize add_squares loop
Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-14 12:41:23 -03:00
James Almer
82dbfccaf0 x86/aacdec: use HADDPS macro
Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 14:18:18 -03:00
Diego Biurrun
dc40a70c57 Drop unnecessary libavutil/x86/asm.h #includes 2016-05-28 19:18:26 +02:00
Diego Biurrun
1e9c5bf4c1 asm: FF_-prefix internal macros used in inline assembly
These warnings conflict with system macros on Solaris, producing
truckloads of warnings about macro redefinition.
2016-05-28 19:18:26 +02:00
Diego Biurrun
a6a750c7ef tests: Move all test programs to a subdirectory 2016-05-13 14:55:56 +02:00
Christophe Gisquet
9630b3fc06 x86: lossless audio: SSE4 madd 32bits
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.

Timings: 68 -> 49 cycles

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-05-07 23:28:48 +02:00
Vittorio Giovara
41ed7ab45f cosmetics: Fix spelling mistakes
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-05-04 18:16:21 +02:00
Derek Buitenhuis
d496d52d02 Merge commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622'
* commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622':
  fft: x86: cosmetics: Drop silly comments, add comment, whitespace

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-04-12 15:42:21 +01:00
Diego Biurrun
01621202aa build: miscellaneous cosmetics
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
2016-04-07 15:26:08 +02:00
Michael Niedermayer
305344d89e avcodec/fft: Add revtab32 for FFTs with more than 65536 samples
x86 optimizations are used only for the cases they support (<=65536 samples)

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-03-04 16:05:47 +01:00
Michael Niedermayer
ae76b84221 avcodec: Extend fft to size 2^17
Asked-for-by: durandal_1707

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-03-04 13:51:42 +01:00
Diego Biurrun
1a094af638 fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
Timothy Gu
e3461197b1 x86/vc1dsp: Split the file into MC and loopfilter 2016-02-29 08:46:53 -08:00
Diego Biurrun
73ff983e8d fft: x86: cosmetics: Drop silly comments, add comment, whitespace 2016-02-26 14:34:58 +01:00
Derek Buitenhuis
b056482ef3 Merge commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c'
* commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c':
  build: Add vc1dsp component for more fine-grained dependencies

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-24 18:21:38 +00:00
Diego Biurrun
257b30af8e x86: hevc: Fix linking with both yasm and optimizations disabled
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
2016-02-23 11:47:54 +01:00
James Almer
45d3af9059 x86/dcadec: add ff_lfe_fir1_float_{sse3,avx}
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-22 21:21:34 -03:00
Diego Biurrun
15a24614ae build: Add vc1dsp component for more fine-grained dependencies 2016-02-19 20:38:18 +01:00
Derek Buitenhuis
04e4166536 Merge commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45'
* commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45':
  v210: Use separate sample_factors

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-16 17:23:32 +00:00
Derek Buitenhuis
8f8381bf03 Merge commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a'
* commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a':
  v210: x86: Add the correct guards around the asm code

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-16 17:02:56 +00:00
James Almer
70d685a77f x86: use the new helper macros where useful
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:21 -03:00
Timothy Gu
bcc223523e x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM format
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
2016-02-14 11:11:02 -08:00
Timothy Gu
59ebf32bca huffyuvencdsp: Undefine "i" macro after each use 2016-02-07 09:19:17 -08:00
James Almer
8ae7447941 x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-06 01:36:55 -03:00
Timothy Gu
9fd6ea933f dirac_dwt: Make x86 files/functions names consistent 2016-02-05 19:30:23 -08:00
Timothy Gu
17ab8f7e68 diracdsp: Make x86 files/functions names consistent 2016-02-05 19:29:43 -08:00
Henrik Gramner
aa751573fe avcodec/h264: Fix segfault in 4:2:2 chroma deblock with 32-bit msvc
Using rNm and x86inc's stack allocation with a negative value at the same
time isn't supported, and caused the original stack pointer to be clobbered
when using a compiler that doesn't support stack alignment.
2016-02-05 22:01:38 +01:00
James Darnley
7042a55c55 avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter
2.6 times faster (366 vs. 142 cycles)
2016-02-05 17:26:04 +01:00