1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00
Commit Graph

2156 Commits

Author SHA1 Message Date
Timothy Gu
838abfc1d7 x86: vc1dsp: Convert vc1_inv_trans_*_dc to NASM format 2016-02-01 17:01:11 -08:00
Luca Barbato
e280fe1329 v210: Use separate sample_factors
The 10bit and the 8bit functions can now be implemented to process
a different amount of samples.

And while at it simplify a little the code.
2016-02-01 13:40:07 +01:00
James Darnley
15ec7aa417 v210: Add avx2 version of the 10-bit line encoder
Around 25% faster than the ssse3 version.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-02-01 13:40:07 +01:00
James Darnley
d29237e557 v210: Add avx2 version of the 8-bit line encoder
Around 35% faster than the avx version.

Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-02-01 13:40:07 +01:00
Timothy Gu
180f9a0958 all: Make header guard names consistent 2016-01-31 15:44:11 -08:00
foo86
ae5b2c5250 avcodec/dca: add new decoder based on libdcadec 2016-01-31 17:09:38 +01:00
foo86
4608996772 avcodec/dca: remove old decoder
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
2016-01-31 17:09:38 +01:00
James Almer
c792528970 x86/imdct36: use extractps inside the STORE macro
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-01-28 13:35:15 -03:00
Derek Buitenhuis
ea2df33052 Merge commit '4f22b138886e29f7fffa8c715673951e51be9f32'
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-01-27 18:23:31 +00:00
Luca Barbato
eafb05fcf3 v210: x86: Add the correct guards around the asm code
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-01-26 23:31:57 +01:00
James Almer
209f50e16b avcodec/synth_filter: split off remaining code from dcadec files
Signed-off-by: James Almer <jamrial@gmail.com>
2016-01-25 14:57:38 -03:00
Geza Lore
cc602061ee x86inc: Add debug symbols indicating sizes of compiled functions
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.

Currently only implemented for ELF.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:46:28 +01:00
Geza Lore
d39c229e54 x86inc: Add debug symbols indicating sizes of compiled functions
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.

Currently only implemented for ELF.
2016-01-21 23:19:46 +01:00
Ronald S. Bultje
0f88b3f82f videodsp: fix 1-byte overread in top/bottom READ_NUM_BYTES iterations.
This can overread (either before start or beyond end) of the buffer in
Nx1 (i.e. height=1) images.

Fixes mozilla bug 1240080.
2016-01-18 11:12:47 -05:00
Diego Biurrun
03ef89faf2 x86: build: Group all encoder objects together 2016-01-18 14:47:58 +01:00
Diego Biurrun
4f22b13888 x86: ac3dsp: Drop forward declaration for nonexisting function 2016-01-18 11:55:38 +01:00
James Darnley
f59b727e2f avcodec/v210: guard new avx2 functions from old assemblers 2016-01-17 21:23:58 +01:00
James Darnley
2cba1825f7 avcodec/v210: add avx2 version of the 10-bit line encoder
Around 25% faster than the ssse3 version.
2016-01-17 16:03:43 +01:00
James Darnley
3836f404a8 avcodec/v210: add avx2 version of the 8-bit line encoder
Around 35% faster than the avx version.

Signed-off-by: Henrik Gramner <henrik@gramner.com>
2016-01-17 16:03:43 +01:00
Michael Niedermayer
da6f34516b avcodec/x86/fmtconvert: Add emms to int32_to_float_fmul_array8_sse()
this should fix checkasm on x86_64-archlinux-gcc-valgrind

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-01-15 17:08:37 +01:00
Hendrik Leppkes
2214207d04 Merge commit '8563f9887194b07c972c3475d6b51592d77f73f7'
* commit '8563f9887194b07c972c3475d6b51592d77f73f7':
  x86: use emms after ff_int32_to_float_fmul_scalar_sse

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:27:11 +01:00
Hendrik Leppkes
a9cd11b212 Merge commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c'
* commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c':
  x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:23:25 +01:00
Hendrik Leppkes
d03da3e240 Merge commit '2008f76054906e9ff6bf744800af0e5a5bfe61be'
* commit '2008f76054906e9ff6bf744800af0e5a5bfe61be':
  dca: remove unused decode_hf function and quant_d tables

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 13:17:48 +01:00
Hendrik Leppkes
00e91d0676 Merge commit '5dfe4edad63971d669ae456b0bc40ef9364cca80'
* commit '5dfe4edad63971d669ae456b0bc40ef9364cca80':
  x86_64: int32_to_float_fmul_scalar sign extend integer length

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 10:46:18 +01:00
Janne Grunau
8563f98871 x86: use emms after ff_int32_to_float_fmul_scalar_sse
Intel's Instruction Set Reference (as of September 2015) clearly states
that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
source is a memory location. The Instruction Set Reference from 1999
(Order Number 243191) describes this behaviour but all later versions
I've seen have make no distinction whether MMX registers or memory is
used as source.
The documentation for the matching SSE2 instruction to convert to double
(cvtpi2pd) was fixed (see the valgrind bug
https://bugs.kde.org/show_bug.cgi?id=210264).

It will take time to get a clarification and fixes in place. In the
meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
be correct according to the documentation. The vast majority of users
will have SSE2 so a change to the SSE version has little effect.

Fixes fate-checkasm on x86 valgrind targets.

Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
2015-12-30 13:37:57 +01:00
Janne Grunau
f4f27e4cf1 x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly
This reverts commit 5dfe4edad6.
2015-12-29 11:42:51 +01:00
Alexandra Hájková
2008f76054 dca: remove unused decode_hf function and quant_d tables
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
2015-12-24 13:58:18 +01:00
James Almer
d4c47333e1 x86/hevc_sao: add ff_hevc_sao_edge_filter_{8,16}_{10,12}
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 17:01:15 -03:00
James Almer
3ff2beff65 x86/hevc_sao: simplify sao_edge_filter 10/12bit
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 16:45:37 -03:00
James Almer
34b2bd03cf x86/hevc_sao: simplify sao_band_filter 10/12bit
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-12-20 16:42:36 -03:00
Janne Grunau
5dfe4edad6 x86_64: int32_to_float_fmul_scalar sign extend integer length 2015-12-14 16:42:35 +01:00
Dave Yeo
b0b133b8c0 hevcdsp: use a macro for .rodata section
fixes assembling on OS/2

Signed-off-by: Dave Yeo <dave.r.yeo@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-12-11 16:19:30 +01:00
Kieran Kunhya
3f07f12f65 diracdec: Template DSP functions adding 10-bit versions 2015-12-10 18:25:02 +00:00
Anton Khirnov
e7078e842d hevcdsp: add x86 SIMD for MC 2015-12-05 21:11:52 +01:00
Timothy Gu
4b80b895a9 pixblockdsp: x86: Condense diff_pixels_* to a shared macro
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
2015-11-07 14:31:34 -08:00
Ganesh Ajjanagadde
38f4e973ef all: fix -Wextra-semi reported on clang
This fixes extra semicolons that clang 3.7 on GNU/Linux warns about.
These were trigggered when built under -Wpedantic, which essentially
checks for strict ISO compliance in numerous ways.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-24 17:58:17 -04:00
Ronald S. Bultje
52f84d82bd videodsp: don't overread edges in vfix3 emu_edge.
Fixes trac ticket 3226. Also see Andreas' analysis in
https://bugs.debian.org/801745, which was very helpful.
2015-10-24 14:34:50 -04:00
Michael Niedermayer
ea5a1d1485 avcodec/x86/vc1dsp: Remove unused macro
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-22 21:13:42 +02:00
Carl Eugen Hoyos
775b84e30e lavc/x86/vc1dsp_init: Fix compilation with --disable-yasm. 2015-10-22 11:37:42 +02:00
James Almer
73353af6e5 x86/Makefile: move decoder/encoder objects out of the subsystems section
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-22 03:55:18 -03:00
Timothy Gu
ab5f43e634 vc1dsp: Port ff_vc1_put_ver_16b_shift2_mmx to yasm
This function is only used within other inline asm functions, hence the
HAVE_MMX_INLINE guard. Per recent discussions, we should not worry about
the performance of inline asm-only builds.
2015-10-21 20:01:52 -07:00
Timothy Gu
98da061461 huffyuvencdsp: Cherry pick changes left out in the last commit
Oops.
2015-10-21 12:42:33 -07:00
Timothy Gu
5e586e1bef huffyuvencdsp: Add ff_diff_bytes_{sse2,avx2}
SSE2 version 4%-35% faster than MMX depending on the width.
AVX2 version 1%-13% faster than SSE2 depending on the width.
2015-10-21 12:25:32 -07:00
Timothy Gu
6b41b44149 huffyuvencdsp: Convert ff_diff_bytes_mmx to yasm
Heavily based upon ff_add_bytes by Christophe Gisquet.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2015-10-20 18:24:54 -07:00
Timothy Gu
068e6cb732 huffyuvencdsp: Use intptr_t for width
It is done this way in huffyuvdsp as well.
2015-10-19 16:57:33 -07:00
Timothy Gu
a079cbf458 x86: vc1dsp_mmx: Move yasm initiation steps to vc1dsp_init
That's where all yasm initiation steps are. Also removes the overlap
between the two files.
2015-10-19 16:52:52 -07:00
Timothy Gu
607f820ec7 x86: fpel: Remove erroneous ff_put_pixels8_mmxext prototype
This function does not exist.
2015-10-19 16:52:37 -07:00
Timothy Gu
cb6f1f8bf9 x86: fpel: Move prototypes for 4-px block functions 2015-10-19 16:52:33 -07:00
James Almer
74a87ae210 x86/vp9itxfm: fix register clobbering in ff_vp9_idct_idct_4x4_add_12_sse2
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-13 20:21:33 -03:00
Christophe Gisquet
74c414202f x86: simple_idct10_template: use const
This avoid going through constants.c while still sharing them
with proresdsp.asm

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 22:52:33 +02:00
Ronald S. Bultje
e578638382 vp9: use registers for constant loading where possible. 2015-10-13 11:06:01 -04:00
Ronald S. Bultje
408bb8556f vp9: refactor itx coefficients and share between 8 and 10/12bpp. 2015-10-13 11:06:01 -04:00
Ronald S. Bultje
eb4b5ff738 vp9: add itxfm_add eob shortcuts to 10/12bpp functions.
These aren't quite as helpful as the ones in 8bpp, since over there,
we can use pmulhrsw, but here the coefficients have too many bits to
be able to take advantage of pmulhrsw. However, we can still skip
cols for which all coefs are 0, and instead just zero the input data
for the row itx. This helps a few % on overall decoding speed.
2015-10-13 11:06:01 -04:00
Ronald S. Bultje
488fadebbc vp9: add 10/12bpp idct_idct_32x32 sse2 SIMD version. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
3d0ca2fe89 vp9: 10/12bpp sse2 SIMD for iadst16. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
0e80265b0a vp9: refactor 10/12bpp dc-only code in 4x4/8x8 and add to 16x16. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
1338fb79d4 vp9: add 10/12bpp sse2 SIMD version for idct_idct_16x16. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
cb054d061a vp9: add 10/12bpp sse2 SIMD versions of iadst8x8. 2015-10-13 11:05:59 -04:00
Ronald S. Bultje
e0610787b2 vp9: add 10/12bpp sse2 SIMD for idct_idct_8x8. 2015-10-13 11:05:59 -04:00
Ronald S. Bultje
a35f6bdb38 vp9: add 12bpp sse2 versions of iadst4. 2015-10-13 11:05:59 -04:00
Ronald S. Bultje
235e76aeb8 vp9: initial attempt at a idct_idct_4x4 12bpp x86 simd (sse2) impl.
The trouble with this function is that intermediates overflow 31+sign
bits, so I've added some helpers (that will also be used in 10/12bpp
8x8, 16x16 and 32x32) to make that easier, basically emulating a half-
assed pmaddqd using 2xpmaddwd. It's currently sse2-only, if anyone sees
potential in adding ssse3, I'd love to hear it.
2015-10-13 11:05:58 -04:00
Ronald S. Bultje
f76423d097 vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions. 2015-10-13 11:05:58 -04:00
Ronald S. Bultje
6b579cf547 vp9: add 10bpp simd (mmxext/ssse3) for idct_idct_4x4. 2015-10-13 11:05:58 -04:00
Ronald S. Bultje
1c3be32533 vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function. 2015-10-13 11:05:57 -04:00
Christophe Gisquet
b6594a9605 x86: dct-test: add more idcts
In particular for 10 and 12 bits.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 16:03:04 +02:00
Christophe Gisquet
7ece8b50b1 x86: simple_idct: 12bits versions
On 12 frames of a 444p 12 bits DNxHR sequence, _put function:
C:         78902 decicycles in idct,  262071 runs,     73 skips
avx:       32478 decicycles in idct,  262045 runs,     99 skips

Difference between the 2:
stddev:    0.39 PSNR:104.47 MAXDIFF:    2

This is unavoidable and due to the scale factors used in the x86
version, which cannot match the C ones.

In addition, the trick of adding an initial bias to the input of a
pass can overflow, as the input coefficients are already 15bits,
which is the maximum this function can handle.

Overall, however, the omse on 12 bits samples goes from 0.16916 to
0.16883. Reducing rowshift by 1 improves to 0.0908, but causes
overflows.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 15:34:32 +02:00
Christophe Gisquet
4369b9dc7b x86: simple_idct(_put): 10bits versions
Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.

For 16 frames of a DNxHD 4:2:2 10bits test sequence:

C:    60861 decicycles in idct, 1048205 runs,    371 skips
sse2: 27567 decicycles in idct, 1048216 runs,    360 skips
avx:  26272 decicycles in idct, 1048171 runs,    405 skips

The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 13:32:21 +02:00
Christophe Gisquet
e652f69b35 x86: simple_idct10_template: fix overflow in pass
When the input of a pass has 15 or 16 bits of precision (in particular
the column pass), the addition of a bias to W4 may lead to overflows
in the input to pmaddwd.

This requires postponing the adding of the bias to after the first
butterfly. To do so, the fact that m15, unused although zeroed, is
exploited. In case the pass is safe, an address can be directly used,
and the number of xmm regs can be decreased. Otherwise, the 32bits bias
is loaded into it.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 12:51:10 +02:00
Christophe Gisquet
e9a68b0316 x86: prores: templatize 10 bits simple_idct
This should be reused for a generic simple_idct10 function.
Requires a bit of trickery to declare common constants in C.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 01:10:34 +02:00
James Almer
dab5f65b25 x86/takdsp: use arithmetic shift instructions
p1 and p2 are int32_t.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-09 23:52:39 -03:00
Paul B Mahol
35af7add6f avcodec/takdec: add x86 SIMD for rest of decorrelation modes
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-09 21:38:15 +02:00
Ronald S. Bultje
ce78729033 vp9: don't keep a stack pointer if we don't need it.
This saves one register in a few cases on 32bit builds with unaligned
stack (e.g. MSVC), making the code slightly easier to maintain.

(Can someone please test this on 32bit+msvc and confirm make fate-vp9
and tests/checkasm/checkasm still work after this patch?)
2015-10-07 08:55:19 -04:00
James Almer
72254b19b8 x86/alacdsp: add simd optimized functions
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-06 20:22:00 -03:00
Ronald S. Bultje
cb912b4521 vp9: fix msvc build by using 6 GPRs on 32bit if stack!=aligned. 2015-10-05 16:51:05 -04:00
Christophe Gisquet
f827a17005 blockdsp: reindent after parameter removal
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-03 23:34:56 +02:00
Ronald S. Bultje
061b67fb50 vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction. 2015-10-03 14:42:39 -04:00
Ronald S. Bultje
26ece7a511 vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions. 2015-10-03 14:42:39 -04:00
Ronald S. Bultje
db7786e8ff vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd. 2015-10-03 14:42:39 -04:00
Ganesh Ajjanagadde
0493e42eb2 avcodec/x86/hpeldsp_rnd_template: silence -Wunused-function on --disable-mmx
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
Header guards are too brittle and ugly for this case.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-03 14:24:41 +02:00
Christophe Gisquet
562ba4a827 blockdsp: remove high bitdepth parameter
It is only (mis-)used to set the dsp fucntions clear_block(s). But
these functions always work on 16bits-wide elements, which make
the parameter useless and actually harmful, as it causes all content
on more than 8-bits to not use accelerated functions.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-02 04:38:40 +02:00
James Almer
3178931a14 x86/hevc_sao: move 10/12bit functions into a separate file
Tested-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-30 02:59:55 -03:00
Ganesh Ajjanagadde
308e7484a3 avcodec/x86/rnd_template: silence -Wunused-function on --disable-mmx
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
Header guards are too brittle and ugly for this case.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-29 19:37:26 +02:00
Michael Niedermayer
1b82b934a1 avcodec/x86/sbrdsp: Fix using uninitialized upper 32bit of noise
Fixes crash
Fixes: flicker-1.scout3d21443372922.28.m4a

Found-by: Dale Curtis <dalecurtis@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-29 13:23:25 +02:00
Ganesh Ajjanagadde
07cd8d5676 avcodec/x86/cavsdsp: silence -Wunused-variable on --disable-mmx
This silences -Wunused-variable when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
The alternative of header guards will make it far too ugly.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-24 04:27:50 +02:00
Ganesh Ajjanagadde
0544c95fd6 avcodec/x86/mpegaudiodsp: silence -Wunused-variable on --disable-mmx
This silences -Wunused-variable when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
The alternative of header guards will make it far too ugly.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-22 23:45:03 +02:00
Ganesh Ajjanagadde
4f90818ea1 avcodec/x86/rv40dsp_init: silence -Wunused-variable on --disable-mmx
This silences -Wunused-variable when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
The alternative of header guards will make it far too ugly.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-22 23:45:03 +02:00
James Almer
7086154aaa x86/vp9dsp: fix local header include
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-21 14:37:32 -03:00
James Almer
91fcb10f08 x86/vp9dsp: add missing header include
Fixes make checkheaders

Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-21 14:34:08 -03:00
James Almer
4bb6cb4c7d x86/vp9mc: fix string concatenation of fullpel function names
Fixes compilation with NASM

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-20 12:32:27 -03:00
Ganesh Ajjanagadde
92fabca427 avcodec/x86/hpeldsp_rnd_template: silence -Wunused-function on --disable-mmx
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.
Header guards are too brittle and ugly for this case.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-20 04:00:42 +02:00
Ganesh Ajjanagadde
e681baf638 avcodec/x86/mpegvideoenc: silence -Wunused-function on --disable-mmx
This silences -Wunused-function when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-19 23:26:57 +02:00
Ganesh Ajjanagadde
f0c635f577 avcodec/x86/hpeldsp_init: silence -Wunused-function on --disable-mmx
This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g
http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-19 23:10:52 +02:00
James Almer
6f9ba0cb82 x86/vp9dsp: add missing preprocessor guards
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-19 13:33:53 -03:00
James Almer
e47564828b x86/vp9mc: add missing preprocessor guards
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-18 15:14:53 -03:00
James Almer
2f9ab15960 x86/vp9: add avx2 subpel MC SIMD for 10/12bpp
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-18 12:28:55 -03:00
Michael Niedermayer
58fe57d5a0 avcodec/mpeg12enc: Basic support for encoding non even QPs for -non_linear_quant 1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-18 02:52:57 +02:00
Michael Niedermayer
2d35757814 avcodec/mpegvideo: Change mpeg2 unquant to work with higher precission qscale
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-18 02:39:17 +02:00
Ronald S. Bultje
344d519040 vp9: add subpel MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
Ronald S. Bultje
77f359670f vp9: add fullpel (avg) MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
Ronald S. Bultje
6354ff0383 vp9: add fullpel (put) MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00