1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00
Commit Graph

2156 Commits

Author SHA1 Message Date
Ronald S. Bultje
e578638382 vp9: use registers for constant loading where possible. 2015-10-13 11:06:01 -04:00
Ronald S. Bultje
408bb8556f vp9: refactor itx coefficients and share between 8 and 10/12bpp. 2015-10-13 11:06:01 -04:00
Ronald S. Bultje
eb4b5ff738 vp9: add itxfm_add eob shortcuts to 10/12bpp functions.
These aren't quite as helpful as the ones in 8bpp, since over there,
we can use pmulhrsw, but here the coefficients have too many bits to
be able to take advantage of pmulhrsw. However, we can still skip
cols for which all coefs are 0, and instead just zero the input data
for the row itx. This helps a few % on overall decoding speed.
2015-10-13 11:06:01 -04:00
Ronald S. Bultje
488fadebbc vp9: add 10/12bpp idct_idct_32x32 sse2 SIMD version. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
3d0ca2fe89 vp9: 10/12bpp sse2 SIMD for iadst16. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
0e80265b0a vp9: refactor 10/12bpp dc-only code in 4x4/8x8 and add to 16x16. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
1338fb79d4 vp9: add 10/12bpp sse2 SIMD version for idct_idct_16x16. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
cb054d061a vp9: add 10/12bpp sse2 SIMD versions of iadst8x8. 2015-10-13 11:05:59 -04:00
Ronald S. Bultje
e0610787b2 vp9: add 10/12bpp sse2 SIMD for idct_idct_8x8. 2015-10-13 11:05:59 -04:00
Ronald S. Bultje
a35f6bdb38 vp9: add 12bpp sse2 versions of iadst4. 2015-10-13 11:05:59 -04:00
Ronald S. Bultje
235e76aeb8 vp9: initial attempt at a idct_idct_4x4 12bpp x86 simd (sse2) impl.
The trouble with this function is that intermediates overflow 31+sign
bits, so I've added some helpers (that will also be used in 10/12bpp
8x8, 16x16 and 32x32) to make that easier, basically emulating a half-
assed pmaddqd using 2xpmaddwd. It's currently sse2-only, if anyone sees
potential in adding ssse3, I'd love to hear it.
2015-10-13 11:05:58 -04:00
Ronald S. Bultje
f76423d097 vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions. 2015-10-13 11:05:58 -04:00
Ronald S. Bultje
6b579cf547 vp9: add 10bpp simd (mmxext/ssse3) for idct_idct_4x4. 2015-10-13 11:05:58 -04:00
Ronald S. Bultje
1c3be32533 vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function. 2015-10-13 11:05:57 -04:00
Christophe Gisquet
b6594a9605 x86: dct-test: add more idcts
In particular for 10 and 12 bits.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 16:03:04 +02:00
Christophe Gisquet
7ece8b50b1 x86: simple_idct: 12bits versions
On 12 frames of a 444p 12 bits DNxHR sequence, _put function:
C:         78902 decicycles in idct,  262071 runs,     73 skips
avx:       32478 decicycles in idct,  262045 runs,     99 skips

Difference between the 2:
stddev:    0.39 PSNR:104.47 MAXDIFF:    2

This is unavoidable and due to the scale factors used in the x86
version, which cannot match the C ones.

In addition, the trick of adding an initial bias to the input of a
pass can overflow, as the input coefficients are already 15bits,
which is the maximum this function can handle.

Overall, however, the omse on 12 bits samples goes from 0.16916 to
0.16883. Reducing rowshift by 1 improves to 0.0908, but causes
overflows.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 15:34:32 +02:00
Christophe Gisquet
4369b9dc7b x86: simple_idct(_put): 10bits versions
Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.

For 16 frames of a DNxHD 4:2:2 10bits test sequence:

C:    60861 decicycles in idct, 1048205 runs,    371 skips
sse2: 27567 decicycles in idct, 1048216 runs,    360 skips
avx:  26272 decicycles in idct, 1048171 runs,    405 skips

The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 13:32:21 +02:00
Christophe Gisquet
e652f69b35 x86: simple_idct10_template: fix overflow in pass
When the input of a pass has 15 or 16 bits of precision (in particular
the column pass), the addition of a bias to W4 may lead to overflows
in the input to pmaddwd.

This requires postponing the adding of the bias to after the first
butterfly. To do so, the fact that m15, unused although zeroed, is
exploited. In case the pass is safe, an address can be directly used,
and the number of xmm regs can be decreased. Otherwise, the 32bits bias
is loaded into it.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 12:51:10 +02:00
Christophe Gisquet
e9a68b0316 x86: prores: templatize 10 bits simple_idct
This should be reused for a generic simple_idct10 function.
Requires a bit of trickery to declare common constants in C.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 01:10:34 +02:00
James Almer
dab5f65b25 x86/takdsp: use arithmetic shift instructions
p1 and p2 are int32_t.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-09 23:52:39 -03:00
Paul B Mahol
35af7add6f avcodec/takdec: add x86 SIMD for rest of decorrelation modes
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-09 21:38:15 +02:00
Ronald S. Bultje
ce78729033 vp9: don't keep a stack pointer if we don't need it.
This saves one register in a few cases on 32bit builds with unaligned
stack (e.g. MSVC), making the code slightly easier to maintain.

(Can someone please test this on 32bit+msvc and confirm make fate-vp9
and tests/checkasm/checkasm still work after this patch?)
2015-10-07 08:55:19 -04:00
James Almer
72254b19b8 x86/alacdsp: add simd optimized functions
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-06 20:22:00 -03:00
Ronald S. Bultje
cb912b4521 vp9: fix msvc build by using 6 GPRs on 32bit if stack!=aligned. 2015-10-05 16:51:05 -04:00
Christophe Gisquet
f827a17005 blockdsp: reindent after parameter removal
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-03 23:34:56 +02:00
Ronald S. Bultje
061b67fb50 vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction. 2015-10-03 14:42:39 -04:00
Ronald S. Bultje
26ece7a511 vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions. 2015-10-03 14:42:39 -04:00
Ronald S. Bultje
db7786e8ff vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd. 2015-10-03 14:42:39 -04:00
Ganesh Ajjanagadde
0493e42eb2 avcodec/x86/hpeldsp_rnd_template: silence -Wunused-function on --disable-mmx
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
Header guards are too brittle and ugly for this case.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-03 14:24:41 +02:00
Christophe Gisquet
562ba4a827 blockdsp: remove high bitdepth parameter
It is only (mis-)used to set the dsp fucntions clear_block(s). But
these functions always work on 16bits-wide elements, which make
the parameter useless and actually harmful, as it causes all content
on more than 8-bits to not use accelerated functions.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-02 04:38:40 +02:00
James Almer
3178931a14 x86/hevc_sao: move 10/12bit functions into a separate file
Tested-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-30 02:59:55 -03:00
Ganesh Ajjanagadde
308e7484a3 avcodec/x86/rnd_template: silence -Wunused-function on --disable-mmx
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
Header guards are too brittle and ugly for this case.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-29 19:37:26 +02:00
Michael Niedermayer
1b82b934a1 avcodec/x86/sbrdsp: Fix using uninitialized upper 32bit of noise
Fixes crash
Fixes: flicker-1.scout3d21443372922.28.m4a

Found-by: Dale Curtis <dalecurtis@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-29 13:23:25 +02:00
Ganesh Ajjanagadde
07cd8d5676 avcodec/x86/cavsdsp: silence -Wunused-variable on --disable-mmx
This silences -Wunused-variable when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
The alternative of header guards will make it far too ugly.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-24 04:27:50 +02:00
Ganesh Ajjanagadde
0544c95fd6 avcodec/x86/mpegaudiodsp: silence -Wunused-variable on --disable-mmx
This silences -Wunused-variable when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
The alternative of header guards will make it far too ugly.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-22 23:45:03 +02:00
Ganesh Ajjanagadde
4f90818ea1 avcodec/x86/rv40dsp_init: silence -Wunused-variable on --disable-mmx
This silences -Wunused-variable when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
The alternative of header guards will make it far too ugly.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-22 23:45:03 +02:00
James Almer
7086154aaa x86/vp9dsp: fix local header include
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-21 14:37:32 -03:00
James Almer
91fcb10f08 x86/vp9dsp: add missing header include
Fixes make checkheaders

Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-21 14:34:08 -03:00
James Almer
4bb6cb4c7d x86/vp9mc: fix string concatenation of fullpel function names
Fixes compilation with NASM

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-20 12:32:27 -03:00
Ganesh Ajjanagadde
92fabca427 avcodec/x86/hpeldsp_rnd_template: silence -Wunused-function on --disable-mmx
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
Header guards are too brittle and ugly for this case.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-20 04:00:42 +02:00
Ganesh Ajjanagadde
e681baf638 avcodec/x86/mpegvideoenc: silence -Wunused-function on --disable-mmx
This silences -Wunused-function when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-19 23:26:57 +02:00
Ganesh Ajjanagadde
f0c635f577 avcodec/x86/hpeldsp_init: silence -Wunused-function on --disable-mmx
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-19 23:10:52 +02:00
James Almer
6f9ba0cb82 x86/vp9dsp: add missing preprocessor guards
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-19 13:33:53 -03:00
James Almer
e47564828b x86/vp9mc: add missing preprocessor guards
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-18 15:14:53 -03:00
James Almer
2f9ab15960 x86/vp9: add avx2 subpel MC SIMD for 10/12bpp
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-18 12:28:55 -03:00
Michael Niedermayer
58fe57d5a0 avcodec/mpeg12enc: Basic support for encoding non even QPs for -non_linear_quant 1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-18 02:52:57 +02:00
Michael Niedermayer
2d35757814 avcodec/mpegvideo: Change mpeg2 unquant to work with higher precission qscale
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-18 02:39:17 +02:00
Ronald S. Bultje
344d519040 vp9: add subpel MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
Ronald S. Bultje
77f359670f vp9: add fullpel (avg) MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
Ronald S. Bultje
6354ff0383 vp9: add fullpel (put) MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
Hendrik Leppkes
7b865c222e Merge commit '5d14cf199990cd378904a2618b5c72c4b02290f6'
* commit '5d14cf199990cd378904a2618b5c72c4b02290f6':
  mpegvideo: Make sure mpegutils.h is included where needed

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-09-16 11:23:40 +02:00
Vittorio Giovara
5d14cf1999 mpegvideo: Make sure mpegutils.h is included where needed 2015-09-13 17:34:45 +02:00
James Almer
d5f8a642f6 x86: port PSIGNW to cpuflags
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-11 23:27:03 -03:00
Ronald S. Bultje
4b66274a86 vp9: save one (PSIGNW) instruction in iadst16_1d sse2/ssse3. 2015-09-11 20:36:51 -04:00
Ronald S. Bultje
fd8b90f5f6 vp9: fix overflow in 8x8 topleft 32x32 idct ssse3 version.
Also disable the mmx/iwht optimization when the bitexact flag is set.
With synthetically coded coefficients (i.e. these that lead to a
residual well outside the [-255,255] range), our optimizations will
overflow. It doesn't make sense to fix the overflows, since they can
only occur on synthetic input, not on real fwht-generated input. Thus,
add a bitexact flag that disables this optimization.
2015-09-10 07:51:16 -04:00
Hendrik Leppkes
5d8e836d0e Replace all remaining occurances of step/depth_minus1 and offset_plus1 2015-09-08 17:10:48 +02:00
Ronald S. Bultje
f12093fffd vp9: fix integer overflows in sse2 version of iadst4. 2015-09-06 15:07:19 -04:00
Michael Niedermayer
8d860f9a77 avcodec/x86/w64xmmtest: Fix another build failure
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-05 22:15:53 +02:00
Ronald S. Bultje
086c9b78d4 vp9: fix rounding error in idct_8x8_ssse3. 2015-09-05 15:50:02 -04:00
Hendrik Leppkes
41194f065c Merge commit 'cad40a3833ad81a352e7657ec6f7d637cea3b798'
* commit 'cad40a3833ad81a352e7657ec6f7d637cea3b798':
  lavc: Drop deprecated deinterlace module

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-09-05 17:06:14 +02:00
Vittorio Giovara
cad40a3833 lavc: Drop deprecated deinterlace module
Deprecated in 03/2013.
2015-08-28 16:04:19 +02:00
Ganesh Ajjanagadde
6638e4a950 avcodec/x86/mpegaudiodsp: correct asm guards
Fixes -Wunused-function warnings when compiling with --disable-yasm on x86.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-08-23 02:39:21 +02:00
Ganesh Ajjanagadde
907373ea9d avcodec/x86/v210-init: fix unused variable warning
Fixes a -Wunused-variable while compiling with --disable-yasm on x86

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-08-21 17:06:27 +02:00
Ronald S. Bultje
e3b7298aed lavc: fix compilation with FF_API_XVMC. 2015-08-18 12:05:57 -04:00
Henrik Gramner
ab43beefab x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:12:01 +02:00
Henrik Gramner
9f1245eb96 x86inc: Support arbitrary stack alignments
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:04:11 +02:00
Henrik Gramner
4a53c758d2 x86: dcadsp: Avoid SSE2 instructions in SSE functions
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 09:22:46 +02:00
James Almer
9c0407e856 x86/sbrdsp: remove an unnecessary mova in sbr_autocorrelate
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-06 23:42:19 -03:00
Henrik Gramner
f0b7882ceb x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
2015-08-04 20:13:09 +02:00
Henrik Gramner
826790f596 x86inc: Support arbitrary stack alignments
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
2015-08-04 20:13:09 +02:00
James Almer
5750d6c5e9 x86: move XOP emulation code back to x86inc
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.

This further reduces differences with x264's x86inc.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-03 17:11:13 -03:00
Hendrik Leppkes
1ce298dac5 Merge commit 'ebaf571aca2dd6ce3caeeeec4210a3fccd47e7db'
* commit 'ebaf571aca2dd6ce3caeeeec4210a3fccd47e7db':
  x86: dct: Disable dct32_float_sse on x86-64

Conflicts:
	libavcodec/x86/dct32.asm
	libavcodec/x86/dct_init.c

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-08-02 12:31:39 +02:00
Henrik Gramner
ebaf571aca x86: dct: Disable dct32_float_sse on x86-64
There is an SSE2 implementation so the SSE version is never used. The "SSE"
version also happens to contain SSE2 instructions on x86-64.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-02 08:41:45 +02:00
James Almer
9dcaae70f2 x86/aacpsdsp: add SSE and SSE3 optimized functions
Between 1.5 and 2.5 times faster

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-30 19:01:15 -03:00
Michael Niedermayer
29d147c94d Merge commit '059a934806d61f7af9ab3fd9f74994b838ea5eba'
* commit '059a934806d61f7af9ab3fd9f74994b838ea5eba':
  lavc: Consistently prefix input buffer defines

Conflicts:
	doc/examples/decoding_encoding.c
	libavcodec/4xm.c
	libavcodec/aac_adtstoasc_bsf.c
	libavcodec/aacdec.c
	libavcodec/aacenc.c
	libavcodec/ac3dec.h
	libavcodec/asvenc.c
	libavcodec/avcodec.h
	libavcodec/avpacket.c
	libavcodec/dvdec.c
	libavcodec/ffv1enc.c
	libavcodec/g2meet.c
	libavcodec/gif.c
	libavcodec/h264.c
	libavcodec/h264_mp4toannexb_bsf.c
	libavcodec/huffyuvdec.c
	libavcodec/huffyuvenc.c
	libavcodec/jpeglsenc.c
	libavcodec/libxvid.c
	libavcodec/mdec.c
	libavcodec/motionpixels.c
	libavcodec/mpeg4videodec.c
	libavcodec/mpegvideo.c
	libavcodec/noise_bsf.c
	libavcodec/nuv.c
	libavcodec/nvenc.c
	libavcodec/options.c
	libavcodec/parser.c
	libavcodec/pngenc.c
	libavcodec/proresenc_kostya.c
	libavcodec/qsvdec.c
	libavcodec/svq1enc.c
	libavcodec/tiffenc.c
	libavcodec/truemotion2.c
	libavcodec/utils.c
	libavcodec/utvideoenc.c
	libavcodec/vc1dec.c
	libavcodec/wmalosslessdec.c
	libavformat/adxdec.c
	libavformat/aiffdec.c
	libavformat/apc.c
	libavformat/apetag.c
	libavformat/avidec.c
	libavformat/bink.c
	libavformat/cafdec.c
	libavformat/flvdec.c
	libavformat/id3v2.c
	libavformat/isom.c
	libavformat/matroskadec.c
	libavformat/mov.c
	libavformat/mpc.c
	libavformat/mpc8.c
	libavformat/mpegts.c
	libavformat/mvi.c
	libavformat/mxfdec.c
	libavformat/mxg.c
	libavformat/nutdec.c
	libavformat/oggdec.c
	libavformat/oggparsecelt.c
	libavformat/oggparseflac.c
	libavformat/oggparseopus.c
	libavformat/oggparsespeex.c
	libavformat/omadec.c
	libavformat/rawdec.c
	libavformat/riffdec.c
	libavformat/rl2.c
	libavformat/rmdec.c
	libavformat/rtpdec_latm.c
	libavformat/rtpdec_mpeg4.c
	libavformat/rtpdec_qdm2.c
	libavformat/rtpdec_svq3.c
	libavformat/sierravmd.c
	libavformat/smacker.c
	libavformat/smush.c
	libavformat/spdifenc.c
	libavformat/takdec.c
	libavformat/tta.c
	libavformat/utils.c
	libavformat/vqf.c
	libavformat/westwood_vqa.c
	libavformat/xmv.c
	libavformat/xwma.c
	libavformat/yop.c

Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-27 23:15:19 +02:00
Michael Niedermayer
94d68a41fa Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615'
* commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615':
  lavc: AV-prefix all codec flags

Conflicts:
	doc/examples/muxing.c
	ffmpeg.c
	ffmpeg_opt.c
	ffplay.c
	libavcodec/aacdec.c
	libavcodec/aacenc.c
	libavcodec/ac3dec.c
	libavcodec/ac3enc_float.c
	libavcodec/atrac1.c
	libavcodec/atrac3.c
	libavcodec/atrac3plusdec.c
	libavcodec/dcadec.c
	libavcodec/ffv1enc.c
	libavcodec/h264.c
	libavcodec/h264_loopfilter.c
	libavcodec/h264_mb.c
	libavcodec/imc.c
	libavcodec/libmp3lame.c
	libavcodec/libtheoraenc.c
	libavcodec/libtwolame.c
	libavcodec/libvpxenc.c
	libavcodec/libxavs.c
	libavcodec/libxvid.c
	libavcodec/mpeg12dec.c
	libavcodec/mpeg12enc.c
	libavcodec/mpegaudiodec_template.c
	libavcodec/mpegvideo.c
	libavcodec/mpegvideo_enc.c
	libavcodec/mpegvideo_motion.c
	libavcodec/nellymoserdec.c
	libavcodec/nellymoserenc.c
	libavcodec/nvenc.c
	libavcodec/on2avc.c
	libavcodec/options_table.h
	libavcodec/opus_celt.c
	libavcodec/pngenc.c
	libavcodec/ra288.c
	libavcodec/ratecontrol.c
	libavcodec/twinvq.c
	libavcodec/vc1_block.c
	libavcodec/vc1_loopfilter.c
	libavcodec/vc1_mc.c
	libavcodec/vc1dec.c
	libavcodec/vorbisdec.c
	libavcodec/vp3.c
	libavcodec/wma.c
	libavcodec/wmaprodec.c
	libavcodec/x86/hpeldsp_init.c
	libavcodec/x86/me_cmp_init.c

Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-27 22:10:35 +02:00
Vittorio Giovara
7c6eb0a1b7 lavc: AV-prefix all codec flags
Convert doxygen to multiline and express bitfields more simply.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-07-27 15:24:58 +01:00
James Almer
844bef578e avcodec/x86: add missing colon to labels
Silences warnings with Nasm

Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-26 02:50:14 -03:00
Michael Niedermayer
52b6d96268 Merge commit 'a344e5d094ebcf9a23acf3a27c56cbbbc829db42'
* commit 'a344e5d094ebcf9a23acf3a27c56cbbbc829db42':
  x86: bswapdsp: Don't treat 32-bit integers as 64-bit

Conflicts:
	libavcodec/x86/bswapdsp.asm

Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 23:20:14 +02:00
Michael Niedermayer
115a9b5091 Merge commit 'd42191c78befc1983f23b1899b2dda513b72f1ed'
* commit 'd42191c78befc1983f23b1899b2dda513b72f1ed':
  configure: Factor out vp8dsp module

Conflicts:
	configure
	libavcodec/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 22:45:34 +02:00
Michael Niedermayer
fd29dd432c Merge commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418'
* commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418':
  configure: Factor out rv34dsp module

Conflicts:
	libavcodec/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-17 22:21:36 +02:00
Henrik Gramner
a344e5d094 x86: bswapdsp: Don't treat 32-bit integers as 64-bit
The upper halves are not guaranteed to be zero in x86-64.

Also use `test` instead of `and` when the result isn't used for anything other
than as a branch condition, this allows some register moves to be eliminated.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-07-17 20:02:28 +02:00
Vittorio Giovara
d42191c78b configure: Factor out vp8dsp module 2015-07-17 18:46:24 +01:00
Vittorio Giovara
5cb4bdb2a0 configure: Factor out rv34dsp module 2015-07-17 18:46:24 +01:00
Michael Niedermayer
b8c438e762 videodsp: assert that linesize is larger than width
Suggested-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-07-08 01:32:04 +02:00
Andreas Cadhalpun
28efeb6502 doc: avoid incorrect phrase 'allows to'
Also fix typo found by Lou Logan:
Sacrifying -> Sacrificing

Reviewed-by: Lou Logan <lou@lrcd.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-06-16 21:48:51 +02:00
James Almer
9f815bc2c2 avcodec/jpeg200dsp: add ff_rct_int_{sse2,avx2}
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-13 16:53:31 -03:00
James Almer
7912a6830d avcodec/jpeg200dsp: add ff_ict_float_{sse,avx}
Original intrinsics version by Nicolas Bertrand.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-13 16:53:27 -03:00
Michael Niedermayer
63b0356274 Merge commit 'b7a4127a45b780d76e6b09427a3d0197c4bc1cdb'
* commit 'b7a4127a45b780d76e6b09427a3d0197c4bc1cdb':
  h264_qpel: Use the correct header

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-12 21:55:40 +02:00
Michael Niedermayer
b68b5ec513 Merge commit '5e87080f2c73186066df0b9c43877b4af0beef3a'
* commit '5e87080f2c73186066df0b9c43877b4af0beef3a':
  h264_weight: Fix SSSE3 biweight code with weights of 128

Conflicts:
	libavcodec/x86/h264_weight.asm

See: e100966575
See: fb2288834b
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-12 21:47:01 +02:00
Vittorio Giovara
b7a4127a45 h264_qpel: Use the correct header 2015-06-12 17:02:48 +01:00
Michael Niedermayer
5e87080f2c h264_weight: Fix SSSE3 biweight code with weights of 128
CC: libav-stable@libav.org
Sample-Id: test_bref.mp4

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-06-12 17:02:48 +01:00
Michael Niedermayer
e100966575 avcodec/x86/h264_weight: handle weight1=128
Fix ticket4596

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-09 05:11:09 +02:00
James Almer
c16e99e3b3 x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-01 00:15:35 +02:00
James Almer
d68c05380c x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
Michael Niedermayer
b666e81c13 Merge commit 'e4610300de6869bd6b3b00e76cfeabb6d7653dcd'
* commit 'e4610300de6869bd6b3b00e76cfeabb6d7653dcd':
  x86: cavs: Remove an unneeded scratch buffer

Conflicts:
	libavcodec/x86/cavsdsp.c

See: d79f7bf0d6
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 22:12:41 +02:00
Michael Niedermayer
e4610300de x86: cavs: Remove an unneeded scratch buffer
Simplifies the code and makes it build on certain compilers
running out of registers on x86.

CC: libav-stable@libav.org
Reported-By: mudler
2015-05-28 18:40:40 +02:00
Timothy Gu
2b388e6dde Revert "Move struc FFTContext below SECTION_RODATA"
This reverts commit 599888a480.

The commit does not silence the warning on ELF-based systems, and will be
fixed in the subsequent commit.

Conflicts:
	libavcodec/x86/fft_mmx.asm

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 00:08:32 +02:00
Michael Niedermayer
d9b264bc73 Merge commit '848e86f74d3e6e87fa592ee8ba8c184cc5fd9a42'
* commit '848e86f74d3e6e87fa592ee8ba8c184cc5fd9a42':
  mpegvideo: Drop flags and flags2

Conflicts:
	libavcodec/mpeg12dec.c
	libavcodec/mpeg12enc.c
	libavcodec/mpegvideo.c
	libavcodec/mpegvideo_enc.c
	libavcodec/mpegvideo_motion.c
	libavcodec/ratecontrol.c
	libavcodec/vc1_block.c
	libavcodec/vc1_loopfilter.c
	libavcodec/vc1_mc.c
	libavcodec/vc1dec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-22 20:24:41 +02:00
Vittorio Giovara
848e86f74d mpegvideo: Drop flags and flags2
They are just duplicates of AVCodecContext members so use those instead.
2015-05-22 15:34:39 +01:00
Michael Niedermayer
451be676f3 Merge remote-tracking branch 'rbultje/vp9-bugfixes'
* rbultje/vp9-bugfixes:
  vp9: match another find_ref_mvs() bug in libvpx.
  vp9: fix scaled motion vector clipping for sub8x8 blocks.
  vp9: improve signbias check.
  vp9: don't allow compound references if error_resilience is enabled.
  vp9: clamp segmented lflvl before applying ref/mode deltas.
  vp9: reset loopfilter mode/ref deltas on keyframe.
  vp9: fix crash when playing back 440/440 content with width%64<56.
  vp9: extend loopfilter workaround for vp9 h/v mix-up to work for 422.
  vp9: clip motion vectors in the same way as libvpx does.
  vp9: set skip flag if the block had no coded coefficients.
  vp9: apply mv scaling workaround only when subsampling is enabled.
  vp9: read all 4x4 blocks in sub8x8 blocks individually with scalability.
  vp9: fix segmentation map referencing upon framesize change.
  vp9: disable more pmulhrsw optimizations in idct16/32.
  vp9: disable all pmulhrsw in 8/16 iadst x86 optimizations.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-18 02:35:16 +02:00
Carl Eugen Hoyos
e609cfd697 lavc/flac: Fix encoding and decoding with high lpc.
Based on an analysis by trac user lvqcl.

Fixes ticket #4421, reported by Chase Walker.
2015-05-17 02:08:58 +02:00
Ronald S. Bultje
d32d0593f1 vp9: disable more pmulhrsw optimizations in idct16/32.
For idct16, only when called from a adst16x16 variant, so impact is
minor. For idct32, for all, so relatively major impact.
2015-05-14 14:15:27 -04:00
Ronald S. Bultje
96d30c3495 vp9: disable all pmulhrsw in 8/16 iadst x86 optimizations.
They all overflow in various samples that are considered valid input.
2015-05-14 13:39:37 -04:00
Michael Niedermayer
cc77bb09e4 avcodec/x86/vp9dsp_init: Fix mix of declaration and statement
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-07 14:33:10 +02:00
Ronald S. Bultje
b224b165cb vp9: add keyframe profile 2/3 support. 2015-05-06 15:10:41 -04:00
Michael Niedermayer
6ef3426d90 avcodec/x86/deinterlace: use INIT_MMX like other asm code does too 2015-05-05 02:41:15 +02:00
Michael Niedermayer
dfc0708e23 avcodec/x86/dct-test: Use uint8_t for idct_simple_mmx_perm
The table contains no element outside the unsigned 8bit range

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-02 13:43:15 +02:00
Michael Niedermayer
270e647adc avcodec/x86/dct-test: Make static table const
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-02 13:42:46 +02:00
Ronald S. Bultje
3de13d5212 vp9: remove another optimization branch in iadst16 which causes overflows.
See sample vp90-2-14-resize-fp-tiles-16-8.webm from the vp9 test vector
set to reproduce the issue.
2015-04-24 16:54:31 +02:00
Ronald S. Bultje
d02d04a18f vp9: remove one optimization branch in iadst16 which causes overflows.
See sample vp90-2-14-resize-fp-tiles-16-8-4-2-1.webm from the vp9 test
vector set which reproduces the issue. This probably costs a few cycles,
but I don't think there's an easy way to workaround that.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-04-22 21:37:10 +02:00
Michael Niedermayer
0245abc7c1 avcodec/x86/hpeldsp_init: Put CONFIG_* first in if()
This is more consistent and may fix a build failure

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-26 15:41:27 +01:00
James Almer
6b940b8c99 x86/xvididct: add some yasm guards
Should fix compilation on compilers with less-than-ideal dead code elimination

Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-20 02:38:20 -03:00
James Almer
b0fea4ad7e x86/xvididct: remove obsolete function prototypes
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-20 02:38:14 -03:00
Michael Niedermayer
1eb28479da Merge commit '48aef27f5232794e70ecef0d347b9f65e27a9bad'
* commit '48aef27f5232794e70ecef0d347b9f65e27a9bad':
  x86: Put COPY3_IF_LT under HAVE_6REGS

Conflicts:
	libavcodec/x86/mathops.h

See: b38910c979
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-17 20:25:47 +01:00
Luca Barbato
48aef27f52 x86: Put COPY3_IF_LT under HAVE_6REGS
It uses 6 registers, unbreaks building on hardened x86 system.

Bug-Id: gentoo/541930
CC: libav-stable@libav.org
2015-03-17 12:31:04 +01:00
Michael Niedermayer
d79f7bf0d6 avcodec/x86/cavsdsp: remove incorrect LOCAL_ALIGN tmp
This is faster and simpler as well

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-16 14:51:51 +01:00
James Almer
e8374d7202 x86/proresdsp: remove ff_prores_idct_put_10_sse4
It's exactly the same as the sse2 version.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-16 01:52:44 -03:00
James Almer
bdd179c8cb x86/proresdsp: remove unused macro
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-16 01:49:34 -03:00
Christophe Gisquet
238db7cc56 x86: lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED
The later may yield incorrect code for on-stack variables.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 20:06:47 +01:00
Christophe Gisquet
15ce160183 x86: xvid_idct: SSE2 merged add version
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:36:47 +01:00
Christophe Gisquet
decd5193e1 x86: xvid_idct: merged idct_put SSE2 versions
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:36:29 +01:00
Christophe Gisquet
8200575d84 x86: dct-test: evaluate prores idct avx version
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:23:27 +01:00
Christophe Gisquet
4eb4451be1 x86: dct-test: fix compilation for prores
When the decoder is deactivated, the x86-optimized versions are
not compiled, resulting in a link error.

The C version is unaffected, as it is part of the idctdsp
subsystem.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 13:23:06 +01:00
Christophe Gisquet
c3bf52713a x86: xvid_idct: port MMX iDCT to yasm
Also reduce the table duplication with SSE2 code, remove duplicated
macro parameters.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-14 11:45:11 +01:00
Christophe Gisquet
2999bd7da2 x86: xvid_idct: port SSE2 iDCT to yasm
The main difference consists in renaming properly labels, and
letting yasm select the gprs for skipping 1D transforms.

Previous-version-reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-13 01:04:52 +01:00
James Almer
5c8f747085 x86/hevc_sao: use unaligned movs for sao_{band,filter} with width 8
Suggested-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-01 20:02:43 -03:00
Michael Niedermayer
7fce8c752d Merge commit '71f1ad37d858b810b71a4af1c25771beaa50b27b'
* commit '71f1ad37d858b810b71a4af1c25771beaa50b27b':
  lavc: do not compile fmtconvert unconditionally

Conflicts:
	configure
	libavcodec/ppc/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-03-01 00:06:42 +01:00
Michael Niedermayer
5c17377e28 Merge commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca'
* commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca':
  fmtconvert: drop unused functions

Conflicts:
	libavcodec/arm/fmtconvert_vfp_armv6.S
	libavcodec/x86/fmtconvert.asm
	libavcodec/x86/fmtconvert_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-28 23:58:29 +01:00
Anton Khirnov
71f1ad37d8 lavc: do not compile fmtconvert unconditionally
Only ac3dec and dcadec use it.
2015-02-28 21:51:24 +01:00
Anton Khirnov
d74a8cb7e4 fmtconvert: drop unused functions 2015-02-28 21:51:24 +01:00
Michael Niedermayer
23a90768a8 avcodec/v210dec: Add ff prefix to v210_x86_init()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-27 19:08:09 +01:00
Michael Niedermayer
0e699676f9 avcodec/snow: mark dwt init as av_cold
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-27 16:53:37 +01:00
Carl Eugen Hoyos
36a6fb989b hevc_deblock: Fix compilation with nasm
CC: libav-stable@libav.org
Bug-Id: 795
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-02-22 22:34:20 +00:00
Michael Niedermayer
03f39fbb2a avcodec/x86/mlpdsp_init: Simplify mlp_filter_channel_x86()
Based on patch by Francisco Blas Izquierdo Riera
Commit message partly taken from carl

fixes a compilation
error in mlpdsp_init.c with -fstack-check and some gcc compilers (I
reproduced the issue with gcc 4.7.3) by simplifying the code.

See also https://bugs.gentoo.org/show_bug.cgi?id=471756

$ make libavcodec/x86/mlpdsp_init.o
libavcodec/x86/mlpdsp_init.c: In function ‘mlp_filter_channel_x86’:
libavcodec/x86/mlpdsp_init.c:142:5: error: can’t find a register in
class ‘GENERAL_REGS’ while reloading ‘asm’
libavcodec/x86/mlpdsp_init.c:142:5: error: ‘asm’ operand has impossible
constraints

4551 -> 4509 dezicycles

Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-21 16:05:41 +01:00
Christophe Gisquet
398f531915 x86: hevc_mc: fewer xmm regs used in epel h/v
11 xmm regs seem only required for avx2.

Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-17 15:19:19 +01:00
Christophe Gisquet
89cb4995fa x86: hevc_mc: save 1 gpr in epel filter loading
The 3*stride value stored in r3src can be loaded much later,
so use r3src instead of a dedicated gpr when possible.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-16 21:53:51 +01:00
James Almer
03adafb318 x86/g722dsp: add ff_g722_apply_qmf_sse2
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-16 00:41:21 -03:00
Christophe Gisquet
b533949813 x86: hevc: remove a parameter to WP internals
The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to
get the value in bytes).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-14 17:22:50 +01:00
James Almer
1679d68dbf x86/hevc_mc: optimize AVX2 mc functions
Before
40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips

After
37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:58 -03:00
James Almer
14b44c1614 x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-12 13:21:30 -03:00
James Almer
06fe6dfe12 x86/hevc_sao: make sao_band_filter work on x86_32
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-09 20:41:21 -03:00
Christophe Gisquet
b61b9e4919 x86: hevc_mc: remove lea in EPEL_LOAD
The second parameter to the macro is always an immediate address,
so no lea is needed.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-08 22:19:35 +01:00
Christophe Gisquet
4919b38421 x86: hevc_mc: fewer gpr autoloads for _v filters
In that case, it's just to load my, but mx/r3src is not used.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-08 22:19:34 +01:00
James Almer
92d903afaa x86/vp9dsp: fix clobbering of xmm6 on IDCT sse2 functions
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-08 00:50:39 -03:00
Christophe Gisquet
626d6184ce x86: lavc/hevc_mc: fix comments
The width parameter is now completely at the back, and actually
never used. This helps understanding the actual parameter list.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 20:52:03 +01:00
Christophe Gisquet
ed450d4acf x86: lavc: share more constant through defines
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 17:48:14 +01:00
Christophe Gisquet
691b7f5e9e lavc/lossless_audiodsp: revert various commits
Their intent was to make the DSP work with wmalossless pro.
The later was fixed to work with the DSP.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-07 15:15:19 +01:00
Christophe Gisquet
9dc45d1f42 x86: lavc: share more constants
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 23:35:02 +01:00
Mickaël Raulet
6ecc3fd612 x86/hevc_mc: use aligned loads
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-06 21:38:00 +01:00