James Almer
ca8a3978e5
Merge commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5'
...
* commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5':
x86: hpeldsp: Split off VP3-specific bits into a separate file
Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 14:49:29 -03:00
James Almer
cf9ef83960
huffyuvencdsp: move shared functions to a new lossless_videoencdsp context
...
Signed-off-by: James Almer <jamrial@gmail.com>
2017-01-12 22:53:04 -03:00
Rostislav Pehlivanov
d2ae5f77c6
aacenc: add SIMD optimizations for abs_pow34 and quantization
...
Performance improvements:
quant_bands:
with: 681 decicycles in quant_bands, 8388453 runs, 155 skips
without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips
Around 42% for the function
Twoloop coder:
abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder
Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder
Fast coder:
abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder
Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>
2016-10-18 21:41:18 +01:00
James Almer
efc9d5c4bc
x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4}
...
Signed-off-by: James Almer <jamrial@gmail.com>
2016-08-02 15:48:04 -03:00
Diego Biurrun
1dfc3cf89d
x86: hpeldsp: Split off VP3-specific bits into a separate file
2016-07-20 18:33:25 +02:00
James Almer
fca3c3b619
hevc: Add AVX2 DC IDCT
...
Originally written by Pierre Edouard Lepere <pierre-edouard.lepere@insa-rennes.fr>.
Integrated to Libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-07-18 15:27:13 +02:00
Diego Biurrun
01621202aa
build: miscellaneous cosmetics
...
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
2016-04-07 15:26:08 +02:00
Diego Biurrun
1a094af638
fft: Split MDCT bits off from FFT
2016-03-01 10:18:28 +01:00
Timothy Gu
e3461197b1
x86/vc1dsp: Split the file into MC and loopfilter
2016-02-29 08:46:53 -08:00
Derek Buitenhuis
b056482ef3
Merge commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c'
...
* commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c':
build: Add vc1dsp component for more fine-grained dependencies
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-24 18:21:38 +00:00
Diego Biurrun
15a24614ae
build: Add vc1dsp component for more fine-grained dependencies
2016-02-19 20:38:18 +01:00
James Almer
8ae7447941
x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}
...
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-06 01:36:55 -03:00
Timothy Gu
9fd6ea933f
dirac_dwt: Make x86 files/functions names consistent
2016-02-05 19:30:23 -08:00
Timothy Gu
17ab8f7e68
diracdsp: Make x86 files/functions names consistent
2016-02-05 19:29:43 -08:00
foo86
ae5b2c5250
avcodec/dca: add new decoder based on libdcadec
2016-01-31 17:09:38 +01:00
foo86
4608996772
avcodec/dca: remove old decoder
...
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
2016-01-31 17:09:38 +01:00
James Almer
209f50e16b
avcodec/synth_filter: split off remaining code from dcadec files
...
Signed-off-by: James Almer <jamrial@gmail.com>
2016-01-25 14:57:38 -03:00
Diego Biurrun
03ef89faf2
x86: build: Group all encoder objects together
2016-01-18 14:47:58 +01:00
Anton Khirnov
e7078e842d
hevcdsp: add x86 SIMD for MC
2015-12-05 21:11:52 +01:00
James Almer
73353af6e5
x86/Makefile: move decoder/encoder objects out of the subsystems section
...
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-22 03:55:18 -03:00
Timothy Gu
6b41b44149
huffyuvencdsp: Convert ff_diff_bytes_mmx to yasm
...
Heavily based upon ff_add_bytes by Christophe Gisquet.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2015-10-20 18:24:54 -07:00
Ronald S. Bultje
1c3be32533
vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function.
2015-10-13 11:05:57 -04:00
Christophe Gisquet
4369b9dc7b
x86: simple_idct(_put): 10bits versions
...
Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.
For 16 frames of a DNxHD 4:2:2 10bits test sequence:
C: 60861 decicycles in idct, 1048205 runs, 371 skips
sse2: 27567 decicycles in idct, 1048216 runs, 360 skips
avx: 26272 decicycles in idct, 1048171 runs, 405 skips
The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 13:32:21 +02:00
Paul B Mahol
35af7add6f
avcodec/takdec: add x86 SIMD for rest of decorrelation modes
...
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-09 21:38:15 +02:00
James Almer
72254b19b8
x86/alacdsp: add simd optimized functions
...
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-06 20:22:00 -03:00
Ronald S. Bultje
26ece7a511
vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions.
2015-10-03 14:42:39 -04:00
Ronald S. Bultje
db7786e8ff
vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd.
2015-10-03 14:42:39 -04:00
James Almer
3178931a14
x86/hevc_sao: move 10/12bit functions into a separate file
...
Tested-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-30 02:59:55 -03:00
Ronald S. Bultje
344d519040
vp9: add subpel MC SIMD for 10/12bpp.
2015-09-16 21:11:34 -04:00
Ronald S. Bultje
6354ff0383
vp9: add fullpel (put) MC SIMD for 10/12bpp.
2015-09-16 21:11:34 -04:00
Hendrik Leppkes
41194f065c
Merge commit 'cad40a3833ad81a352e7657ec6f7d637cea3b798'
...
* commit 'cad40a3833ad81a352e7657ec6f7d637cea3b798':
lavc: Drop deprecated deinterlace module
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-09-05 17:06:14 +02:00
Vittorio Giovara
cad40a3833
lavc: Drop deprecated deinterlace module
...
Deprecated in 03/2013.
2015-08-28 16:04:19 +02:00
James Almer
9dcaae70f2
x86/aacpsdsp: add SSE and SSE3 optimized functions
...
Between 1.5 and 2.5 times faster
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-30 19:01:15 -03:00
Michael Niedermayer
115a9b5091
Merge commit 'd42191c78befc1983f23b1899b2dda513b72f1ed'
...
* commit 'd42191c78befc1983f23b1899b2dda513b72f1ed':
configure: Factor out vp8dsp module
Conflicts:
configure
libavcodec/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 22:45:34 +02:00
Michael Niedermayer
fd29dd432c
Merge commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418'
...
* commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418':
configure: Factor out rv34dsp module
Conflicts:
libavcodec/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 22:21:36 +02:00
Vittorio Giovara
d42191c78b
configure: Factor out vp8dsp module
2015-07-17 18:46:24 +01:00
Vittorio Giovara
5cb4bdb2a0
configure: Factor out rv34dsp module
2015-07-17 18:46:24 +01:00
James Almer
7912a6830d
avcodec/jpeg200dsp: add ff_ict_float_{sse,avx}
...
Original intrinsics version by Nicolas Bertrand.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-13 16:53:27 -03:00
Christophe Gisquet
c3bf52713a
x86: xvid_idct: port MMX iDCT to yasm
...
Also reduce the table duplication with SSE2 code, remove duplicated
macro parameters.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 11:45:11 +01:00
Christophe Gisquet
2999bd7da2
x86: xvid_idct: port SSE2 iDCT to yasm
...
The main difference consists in renaming properly labels, and
letting yasm select the gprs for skipping 1D transforms.
Previous-version-reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-13 01:04:52 +01:00
Michael Niedermayer
7fce8c752d
Merge commit '71f1ad37d858b810b71a4af1c25771beaa50b27b'
...
* commit '71f1ad37d858b810b71a4af1c25771beaa50b27b':
lavc: do not compile fmtconvert unconditionally
Conflicts:
configure
libavcodec/ppc/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-01 00:06:42 +01:00
Anton Khirnov
71f1ad37d8
lavc: do not compile fmtconvert unconditionally
...
Only ac3dec and dcadec use it.
2015-02-28 21:51:24 +01:00
James Almer
03adafb318
x86/g722dsp: add ff_g722_apply_qmf_sse2
...
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-16 00:41:21 -03:00
James Almer
fa3eccb4f9
x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2}
...
Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere.
10/12bit yasm ports, refactoring and optimizations by James Almer
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
width 32
40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips
8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips
7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips
4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips
width 64
136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips
28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips
26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips
14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-01 20:22:35 -03:00
Kieran Kunhya
9a738c27dc
v210enc: Add SIMD optimised 8-bit and 10-bit encoders
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00
Kieran Kunhya
36091742d1
v210enc: Add SIMD optimised 8-bit and 10-bit encoders
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00
Carl Eugen Hoyos
600e38f563
Fix standalone compilation of the apng decoder on x86.
2014-11-23 13:21:29 +01:00
Michael Niedermayer
65ce8f8895
avcodec/x86/Makefile: fix order
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-23 01:49:04 +01:00
James Almer
0de1d6287e
x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2}
...
2x to 2.5x faster than the C version.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-02 22:11:55 -03:00
James Almer
4f4f08e6f0
x86/idctdsp: port {put,add}_pixels_clamped to yasm
...
Also add sse2 versions for both.
put_pixels_clamped port and sse2 version originally written by Timothy Gu.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-09-24 21:52:13 -03:00