1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00
Commit Graph

564 Commits

Author SHA1 Message Date
James Almer
d68c05380c x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
James Almer
f7cafb5d02 x86: add AV_CPU_FLAG_AVXSLOW flag
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
Timothy Gu
dd4d709be7 x86inc: Clear __SECT__
Silences warning(s) like:

    libavcodec/x86/fft.asm:93: warning: section flags ignored on
    section redeclaration

The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].

The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].

Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.

That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].

That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).

[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-28 11:40:15 +02:00
Timothy Gu
204b228a1d x86inc: Clear __SECT__
This commit silences warning(s) like:

    libavcodec/x86/fft.asm:93: warning: section flags ignored on section
    redeclaration

The cause of this warning is that because `struc` and `endstruc` attempts to
revert to the previous section state [1]. The section state is stored in the
macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the
`SECTION` directive [2].  Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.

That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].

That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).

[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 00:08:37 +02:00
James Almer
c312bfac4c x86/cpu: add AV_CPU_FLAG_AVXSLOW flag
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-05-27 03:31:11 -03:00
Michael Niedermayer
d630f38f47 avutil/x86/Makefile: fix conditional x86/emms.o build
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-04-09 01:12:51 +02:00
Ronald S. Bultje
b926f02e81 avutil/x86/Makefile: Make building and linking of emms.c conditional
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-04-08 17:25:35 +02:00
James Almer
60b9373dbd libavutil: add bmi2 optimized av_mod_uintp2
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-20 15:47:43 -03:00
Peter Cordes
9e5687adf2 pixelutils: Comment on (lack of) sad_8x8_sse2
Signed-off-by: Peter Cordes <peter@cordes.ca>
2015-03-04 21:58:53 +01:00
James Almer
bc65abc8d7 libavutil: add x86 optimized av_popcount
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-25 19:58:00 -03:00
Christophe Gisquet
d9293c776e x86inc: Correctly warn on use of SSE2 instructions in SSE functions
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-17 12:35:58 +01:00
Christophe Gisquet
e93d3a22cb x86: lavu/x264asm: fix ymm register instantiation
This mimicks what is done for the other instruction sets.

Tested-by: James Almer <jamrial@gmail.com>
Tested-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-04 00:18:29 +01:00
James Darnley
12120174ce lavu/x86/x86inc: deprecate INIT_AVX
The same can be done with INIT_XMM avx

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-02 01:09:16 +01:00
Anton Mitrofanov
a1684311b3 x264asm: warn when inappropriate instruction used in function with specified cpuflags
Requested-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Requested-by: "Ronald S. Bultje" <rsbultje@gmail.com>
2015-02-02 00:06:14 +01:00
James Almer
37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
Kieran Kunhya
9a738c27dc v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00
Kieran Kunhya
36091742d1 v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00
Michael Niedermayer
579a0fdc21 avutil/lls: Make unchanged function arguments const
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-28 19:32:07 +02:00
lvqcl
e58fc44649 avutil/x86/cpu: fix cpuid sub-leaf selection
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-27 13:21:31 +02:00
Henrik Gramner
f629705b02 x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:25 -07:00
Loren Merritt
ec217218c2 x86inc: Free up variable name "n" in global namespace
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:19 -07:00
Henrik Gramner
176a0fca3f x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 01:45:14 -07:00
Henrik Gramner
428aa14a48 x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 14:06:03 +02:00
Henrik Gramner
720c21d11f x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 01:55:28 +02:00
Loren Merritt
a4dbabc8b3 x86inc: free up variable name "n" in global namespace
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 01:41:50 +02:00
Clément Bœsch
554d819062 avutil/pixelutils: faster pixelutils_sad_16x16
501 to 439 decicycles.

See 45c7f3997e.
2014-08-23 20:12:56 +02:00
Clément Bœsch
45c7f3997e avutil/pixelutils: faster pixelutils_sad_[au]_16x16
~560 → ~500 decicycles

This is following the comments from Michael in
https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html

Using 2 registers for accumulator didn't help. On the other hand,
some re-ordering between the movs and psadbw allowed going ~538 to ~500.
2014-08-23 10:18:53 +02:00
Michael Niedermayer
70b8668fb5 drop LLS1, rename LLS2 to LLS
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-09 23:20:31 +02:00
Clément Bœsch
28a2107a8d avutil: add pixelutils API 2014-08-05 21:05:52 +02:00
James Almer
d0f56ca071 x86/hevc_deblock: improve 8bit transpose store macros
Up to four instructions less depending on function and instruction set.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-03 04:24:15 +02:00
James Almer
1ace9573dc x86/hevc_idct: replace old and unused idct functions
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).

Benchmarks on an Intel Core i5-4200U:

idct8x8_dc
       SSE2   MMXEXT  C
cycles 22     26      57

idct16x16_dc
       AVX2   SSE2    C
cycles 27     32      249

idct32x32_dc
       AVX2   SSE2    C
cycles 62     126     1375

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 18:00:11 +02:00
Michael Niedermayer
8d0c7031a8 Merge commit '79793f833784121d574454af4871866576c0749d'
* commit '79793f833784121d574454af4871866576c0749d':
  Update Fiona's name in copyright statements.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 15:43:40 +02:00
Diego Biurrun
79793f8337 Update Fiona's name in copyright statements. 2014-07-01 03:26:51 -07:00
Christophe Gisquet
9107612818 x86util: add and use RSHIFT/LSHIFT macros
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 13:19:27 +02:00
James Almer
85065d2a7c x86/float_dsp: add missing femms
It was lost during the port.
Should fix fate on 3dnowext machines.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-08 20:06:28 +02:00
James Almer
dcaf9660b6 x86/float_dsp: port vector_fmul_window to yasm
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-08 12:41:32 +02:00
James Almer
fc8db12a73 x86/vp9: inital AVX2 intra_pred
tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz

1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips
439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips

3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips
2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips

1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips
717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips

2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips
2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips

3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips
2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips

1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips
922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-08 02:37:20 +02:00
Christophe Gisquet
2267003981 x86: hpeldsp: better factorization
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 21:47:40 +02:00
James Almer
561bfc85eb x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-28 23:29:34 +02:00
Matt Oliver
1898c2f49d inline asm: fix arrays as named constraints.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-07 15:02:45 +02:00
James Almer
3b06208a57 x86/float_dsp: remove duplicated code from vector_dmul_scalar
Use the xm# and ym# aliases as they remain in sync with m# after a SWAP.
No actual changes to the assembly.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-19 14:21:51 +02:00
James Almer
76ed71a72b x86: move horizontal add macros to x86util
Also port relevant AVX2/XOP optimizations from x264 with permission
to relicense to LGPL from the corresponding authors

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-17 14:15:09 +02:00
James Almer
11b36b1ee0 x86/float_dsp: unroll loop in vector_fmac_scalar
~6% faster SSE2 performance. AVX/FMA3 are unaffected.

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-16 18:36:52 +02:00
James Almer
3b808900af x86/float_dsp: use SWAP in vector_fmac_scalar Win64
The mova is unnecessary

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-16 15:46:21 +02:00
James Almer
2d9821a208 x86/cpu: check for OS support before enabling AVX2
AV_CPU_FLAG_AVX is enabled at this point only if there's OS support.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-25 17:56:43 +01:00
Matt Oliver
8236747511 Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported.
This is part of the patch-set for intel C inline asm on windows support

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-18 23:39:30 +01:00
James Almer
7d7487e85c x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3
~7% faster than AVX

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-13 04:34:05 +01:00
Michael Niedermayer
4159f702a7 avutil/timer: Fix units for x86 after c708b54033
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-09 15:22:02 +01:00
James Almer
3f3d748cab x86: Move XOP emulation to x86util
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-24 08:30:19 +01:00
Michael Niedermayer
bd8d73ea8b Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: add detection for Bit Manipulation Instruction sets

Conflicts:
	libavutil/x86/cpu.c

See: 0bc3de19ff
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-23 22:52:58 +01:00
Michael Niedermayer
d9574069c1 Merge commit '1b932eb1508f550fac9e911923a0383efda53aa3'
* commit '1b932eb1508f550fac9e911923a0383efda53aa3':
  x86: add detection for FMA3 instruction set

Conflicts:
	configure
	libavutil/cpu.h
	libavutil/x86/cpu.c

See: a2af8eddab
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-23 22:43:08 +01:00
James Almer
d59fcdaff3 x86: add detection for Bit Manipulation Instruction sets
Based on x264 code

Signed-off-by: James Almer <jamrial@gmail.com>
2014-02-23 15:29:36 +01:00
James Almer
1b932eb150 x86: add detection for FMA3 instruction set
Based on x264 code

Signed-off-by: James Almer <jamrial@gmail.com>
2014-02-23 15:29:36 +01:00
James Almer
10b0161d78 x86: add missing XOP checks and macros
Signed-off-by: James Almer <jamrial@gmail.com>
2014-02-23 15:29:36 +01:00
James Almer
0bc3de19ff x86: add detection for Bit Manipulation Instruction sets
Based on x264 code

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-22 17:26:00 +01:00
James Almer
a2af8eddab x86: add detection for FMA3 instruction set
Based on x264 code

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-22 17:25:52 +01:00
Christophe Gisquet
996697e266 x86: float dsp: unroll SSE versions
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.

Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-20 14:18:05 +01:00
Christophe Gisquet
133b34207c x86: float dsp: unroll SSE versions
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.

Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-15 18:54:21 +01:00
James Almer
23a8c63452 x86inc: Extend FMA_INSTR functionality
Support the cases where the first and last operand of
the XOP instruction are the same.

Also add vpmacsdql emulation.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-13 22:14:24 +01:00
James Almer
6c12b1de06 x86: add missing XOP checks and macros
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-11 03:46:52 +01:00
Loren Merritt
b7d0d10a1d x86inc: Speed up assembling with Yasm
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-01-26 18:40:08 +01:00
Loren Merritt
4d55fe7204 x86inc: speed up compilation with yasm
Work around yasm's inefficiency with handling large numbers of variables
in the global scope.
2014-01-18 01:19:16 +01:00
Michael Niedermayer
c3814ab654 rename new lls code to lls2 to avoid conflict with the old which has a different ABI
also remove failed attempt at a compatibility layer, the code simply cannot work

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-17 16:41:08 +01:00
Michael Niedermayer
bbe66ef912 avutil: rename lls to lls2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-17 16:30:23 +01:00
Michael Niedermayer
a665704402 Merge commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4'
* commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4':
  libavutil: x86: Add AVX2 capable CPU detection.

Conflicts:
	libavutil/cpu.c
	libavutil/cpu.h
	libavutil/x86/cpu.c

See: 865b70bc5d
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-26 02:36:36 +02:00
Kieran Kunhya
865b70bc5d Add AVX2 capable CPU detection. Patch based on x264's AVX2 detection
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-26 02:34:22 +02:00
Kieran Kunhya
4d6ee07255 libavutil: x86: Add AVX2 capable CPU detection.
Patch based on x264's AVX2 detection

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-25 19:36:55 +01:00
Michael Niedermayer
f9bef2bec9 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: more AVX2 framework

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:13:57 +02:00
Michael Niedermayer
e3e0e3d0c9 Merge commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497'
* commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497':
  x86inc: FMA3/4 Support

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:06:22 +02:00
Michael Niedermayer
9ac124c889 Merge commit '206895708ea2b464755d340e44501daf9a07c310'
* commit '206895708ea2b464755d340e44501daf9a07c310':
  x86inc: Remove our FMA4 support

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:54:23 +02:00
Michael Niedermayer
12e4493f9c Merge commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098'
* commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098':
  x86inc: Use VEX-encoded instructions in AVX functions

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:48:34 +02:00
Jason Garrett-Glaser
a3fabc6cb3 x86: more AVX2 framework
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:56 +01:00
Jason Garrett-Glaser
c6908d6b4b x86inc: FMA3/4 Support
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:54 +01:00
Derek Buitenhuis
206895708e x86inc: Remove our FMA4 support
This is so we can sync to x264's version of FMA4 support.

This partialy reverts commit 79687079a9.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:39:29 +01:00
Henrik Gramner
c108ba0175 x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.

This change makes it easier to extend existing code to use AVX2.

Also add support for AVX emulation of a few instructions that
were missing before.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:36:11 +01:00
Michael Niedermayer
31d0d35560 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86inc: Remove .rodata kludges

Conflicts:
	libavutil/x86/x86inc.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-09 14:29:42 +02:00
Henrik Gramner
ad7d7d4f6a x86inc: Remove .rodata kludges
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-09 07:44:30 -04:00
Michael Niedermayer
19c3890819 Merge commit '3e2fa991db7ef172579422accd61624d52777e5a'
* commit '3e2fa991db7ef172579422accd61624d52777e5a':
  x86inc: remove misaligned cpu flag

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 12:02:21 +02:00
Michael Niedermayer
31d9aa6b2e Merge commit '71155665414b551ad350622d5abed20e58371fbf'
* commit '71155665414b551ad350622d5abed20e58371fbf':
  x86inc: various minor backports from x264

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:57:39 +02:00
Michael Niedermayer
3f965ab95d Merge commit '47f9d7ce5493e119e09d1227d017414feaaf8d97'
* commit '47f9d7ce5493e119e09d1227d017414feaaf8d97':
  x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:37:22 +02:00
Michael Niedermayer
1f17619fe4 Merge commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450'
* commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450':
  x86inc: Utilize the shadow space on 64-bit Windows

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:23:00 +02:00
Michael Niedermayer
17d9c7c208 Merge commit '3fb78e99a04d0ed8db834d813d933eb86c37142a'
* commit '3fb78e99a04d0ed8db834d813d933eb86c37142a':
  x86inc: create xm# and ym#, analagous to m#

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:15:17 +02:00
Michael Niedermayer
3352fdb292 Merge commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2'
* commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2':
  x86inc: fix some corner cases of SWAP

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:07:03 +02:00
Michael Niedermayer
006c0fcfea Merge commit '63f0d623100bdb0c6081456127f4b6713e83d3db'
* commit '63f0d623100bdb0c6081456127f4b6713e83d3db':
  x86inc: Use SSE instead of SSE2 for copying data

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:01:40 +02:00
Michael Niedermayer
faafffaf82 Merge commit 'ad76e6e7e193b98e7335156422d35467816f9ef1'
* commit 'ad76e6e7e193b98e7335156422d35467816f9ef1':
  x86inc: Set ELF hidden visibility for global constants

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:52:51 +02:00
Michael Niedermayer
c1488fab3d Merge commit '25cb0c1a1e66edacc1667acf6818f524c0997f10'
* commit '25cb0c1a1e66edacc1667acf6818f524c0997f10':
  x86inc: activate REP_RET automatically

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:27:30 +02:00
Henrik Gramner
3e2fa991db x86inc: remove misaligned cpu flag
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.

Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:38 -04:00
Jason Garrett-Glaser
7115566541 x86inc: various minor backports from x264
Small backports that sneaked into other asm commits in x264.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:22 -04:00
Derek Buitenhuis
47f9d7ce54 x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
This is also a valid value for WIN64.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:08 -04:00
Henrik Gramner
bbe4a6db44 x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Loren Merritt
3fb78e99a0 x86inc: create xm# and ym#, analagous to m#
For when we want to mix simd sizes within one function.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:19 -04:00
Loren Merritt
49ebe3f9fe x86inc: fix some corner cases of SWAP
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:06 -04:00
Henrik Gramner
63f0d62310 x86inc: Use SSE instead of SSE2 for copying data
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:33 -04:00
Henrik Gramner
ad76e6e7e1 x86inc: Set ELF hidden visibility for global constants
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:13 -04:00
Loren Merritt
25cb0c1a1e x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.

The implementation involves lots of spurious labels, but that's OK
because we strip them.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:17:59 -04:00
Ronald S. Bultje
c07ac8d467 VP9 MC (ssse3) optimizations.
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
2013-10-02 21:03:15 -04:00
Michael Niedermayer
361bc70731 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  avutil: Fix compilation with inline asm disabled on mingw

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-09-22 11:51:38 +02:00
Alex Smith
08fa828b3f avutil: Fix compilation with inline asm disabled on mingw
Because of -Werror=implicit-function-declaration the build will fail.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-09-22 00:50:32 +03:00
Thilo Borgmann
d814a839ac Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
Michael Niedermayer
f0a3562382 Merge commit '79aec43ce813a3e270743ca64fa3f31fa43df80b'
* commit '79aec43ce813a3e270743ca64fa3f31fa43df80b':
  x86: Add and use more convenience macros to check CPU extension availability

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-30 11:57:35 +02:00
Michael Niedermayer
2a60666d1d Merge commit '8410d6e93c2e074881f1c7b7e4cdefd2e497d52e'
* commit '8410d6e93c2e074881f1c7b7e4cdefd2e497d52e':
  avutil: Refactor CPU extension availability macros

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-29 14:15:10 +02:00
Michael Niedermayer
c83d794936 Merge commit 'b78b10c4b78b696927f2801cf2d9f193b4eff28b'
* commit 'b78b10c4b78b696927f2801cf2d9f193b4eff28b':
  avutil: Move internal CPU detection function declarations to private header

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-29 14:05:15 +02:00
Diego Biurrun
79aec43ce8 x86: Add and use more convenience macros to check CPU extension availability 2013-08-29 13:07:37 +02:00
Diego Biurrun
8410d6e93c avutil: Refactor CPU extension availability macros 2013-08-28 23:54:14 +02:00
Diego Biurrun
b78b10c4b7 avutil: Move internal CPU detection function declarations to private header 2013-08-28 23:54:14 +02:00
Michael Niedermayer
9d01bf7d66 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  Consistently use "cpu_flags" as variable/parameter name for CPU flags

Conflicts:
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/h264dsp_init.c
	libavcodec/x86/hpeldsp_init.c
	libavcodec/x86/motion_est.c
	libavcodec/x86/mpegvideo.c
	libavcodec/x86/proresdsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-18 09:53:47 +02:00
Diego Biurrun
3ac7fa81b2 Consistently use "cpu_flags" as variable/parameter name for CPU flags 2013-07-18 00:31:35 +02:00
Michael Niedermayer
a478e99a60 avutil/x86: reenable ff_update_lls_avx()
The bug has been fixed in c8b920a9b7 by Loren Merritt

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-02 12:02:08 +02:00
Michael Niedermayer
d1fa671895 Merge commit 'c8b920a9b7fa534a6141695ace4e8c2dfcd56cee'
* commit 'c8b920a9b7fa534a6141695ace4e8c2dfcd56cee':
  lls/x86: use 3-operator vaddpd in ADDPD_MEM

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-02 11:40:44 +02:00
Loren Merritt
c8b920a9b7 lls/x86: use 3-operator vaddpd in ADDPD_MEM
Fixes build with yasm-1.1

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-07-02 10:15:09 +02:00
Michael Niedermayer
a6e46ed51a Revert "avutil/x86: disable ff_evaluate_lls_sse2() for 32bit"
This reverts commit 247425241c.
2013-07-01 02:27:47 +02:00
Michael Niedermayer
4e488ac5f5 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: lpc: fix a segfault in av_evaluate_lls_sse2()

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-01 02:26:22 +02:00
Loren Merritt
1221bb6239 x86: lpc: fix a segfault in av_evaluate_lls_sse2() 2013-06-30 23:11:19 +00:00
Michael Niedermayer
247425241c avutil/x86: disable ff_evaluate_lls_sse2() for 32bit
It just segfaults on 32bit, thus its disabled until someone fixes it.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 19:03:57 +02:00
Michael Niedermayer
6e76e6a05a Merge commit 'b545179fdff1ccfbbb9d422e4e9720cb6c6d9191'
* commit 'b545179fdff1ccfbbb9d422e4e9720cb6c6d9191':
  x86: lpc: simd av_evaluate_lls

Conflicts:
	libavutil/x86/lls.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 12:15:12 +02:00
Michael Niedermayer
a285079bc7 lls.asm: disable ff_update_lls_avx
The code doesnt build with yasm from ubuntu 12.04

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 12:12:11 +02:00
Michael Niedermayer
0b40c50508 lls.asm: put avx code under if HAVE_AVX_EXTERNAL
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 12:12:01 +02:00
Michael Niedermayer
78b5479633 Merge commit '502ab21af0ca68f76d6112722c46d2f35c004053'
* commit '502ab21af0ca68f76d6112722c46d2f35c004053':
  x86: lpc: simd av_update_lls

The versions are bumped due to changes in lls.h which is used across
libraries affecting intra library ABI
(This version bump also covers changes to lls.h in the immedeatly previous
 commits)

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 11:35:52 +02:00
Loren Merritt
b545179fdf x86: lpc: simd av_evaluate_lls
1.5x-1.8x faster on sandybridge

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-06-29 13:23:57 +02:00
Loren Merritt
502ab21af0 x86: lpc: simd av_update_lls
4x-6x faster on sandybridge

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-06-29 13:23:57 +02:00
Michael Niedermayer
3c200aa693 Merge commit '1fda184a85178cfd7b98d9e308d18e1ded76a511'
* commit '1fda184a85178cfd7b98d9e308d18e1ded76a511':
  avutil: Add av_cold attributes to init functions missing them

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-05 12:53:50 +02:00
Diego Biurrun
1fda184a85 avutil: Add av_cold attributes to init functions missing them 2013-05-04 22:48:05 +02:00
Michael Niedermayer
e91339cde2 Merge commit '566b7a20fd0cab44d344329538d314454a0bcc2f'
* commit '566b7a20fd0cab44d344329538d314454a0bcc2f':
  x86: float dsp: butterflies_float SSE

Conflicts:
	libavutil/x86/float_dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-03 11:57:59 +02:00
Christophe Gisquet
566b7a20fd x86: float dsp: butterflies_float SSE
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.
2013-05-03 08:08:02 +02:00
Michael Niedermayer
92218aad00 butterflies_float: replace 2 lea by 2 add
adds are simpler instructions and should be faster or equally fast
on all cpus

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-17 00:10:06 +02:00
Christophe Gisquet
1a4007964c x86: float dsp: butterflies_float SSE
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-17 00:03:25 +02:00
Ronald S. Bultje
b93b27edb0 dsputil: Make dsputil selectable
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-10 11:04:05 +03:00
Christophe Gisquet
2e81acc687 x86inc: Fix number of operands for cmp* instructions
cmp{p,s}{s,d} instructions do take an imm8 operand.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-04-09 23:55:30 +02:00
Christophe Gisquet
0b467a6e83 x264asm: fix cmp* number of arguments
cmp{p,s}{s,d} instructions do take an imm8 operand.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-05 16:42:12 +02:00
Michael Niedermayer
63a97d5674 Merge commit 'b6649ab5037fb55f78c2606f3d23cea0867cdeaa'
* commit 'b6649ab5037fb55f78c2606f3d23cea0867cdeaa':
  cosmetics: Remove unnecessary extern keywords from function declarations

Conflicts:
	libswscale/x86/swscale.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-28 11:20:41 +01:00
Diego Biurrun
b6649ab503 cosmetics: Remove unnecessary extern keywords from function declarations 2013-03-27 14:21:45 +01:00
Ronald S. Bultje
6a701306db dsputil: make selectable.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-12 19:56:58 +01:00
Ronald S. Bultje
0c0828ecc5 x86: Use simple nop codes for <= sse (rather than <= mmx)
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-02-19 22:33:19 +02:00
James Almer
a56fd9edab lavu: Fix checkheaders for x86/emms.h
internal.h doesn't need to include cpu.h anymore since
the relevant code was moved to x86/emms.h

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-17 00:18:16 +01:00
Michael Niedermayer
61fbb4cd57 Merge commit '4db96649ca700db563d9da4ebe70bf9fc4c7a6ba'
* commit '4db96649ca700db563d9da4ebe70bf9fc4c7a6ba':
  avutil: Ensure that emms_c is always defined, even on non-x86
  configure: Move MinGW CPPFLAGS setting to libc section, where it belongs
  avutil: Move emms code to x86-specific header

Conflicts:
	configure
	libavutil/internal.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-15 12:10:08 +01:00
Diego Biurrun
4db96649ca avutil: Ensure that emms_c is always defined, even on non-x86 2013-02-14 19:29:04 +01:00
Diego Biurrun
ab441e20ff avutil: Move emms code to x86-specific header 2013-02-14 17:37:34 +01:00
Ronald S. Bultje
b582af1ed7 Use simple nop codes for <= sse (rather than <= mmx).
The "CPU: CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.

Change-Id: I7e7c52a2191006df30a9aadbc40d481a1db89106
2013-02-11 23:38:57 +01:00
Michael Niedermayer
8102f27b5b Merge commit '73b704ac609d83e0be124589f24efd9b94947cf9'
* commit '73b704ac609d83e0be124589f24efd9b94947cf9':
  arm: Add some missing header #includes
  floatdsp: move scalarproduct_float from dsputil to avfloatdsp.

Conflicts:
	libavcodec/acelp_pitch_delay.c
	libavcodec/amrnbdec.c
	libavcodec/amrwbdec.c
	libavcodec/ra288.c
	libavcodec/x86/dsputil_mmx.c
	libavutil/x86/float_dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-23 14:31:55 +01:00
Michael Niedermayer
6e6e170898 Merge commit '42d324694883cdf1fff1612ac70fa403692a1ad4'
* commit '42d324694883cdf1fff1612ac70fa403692a1ad4':
  floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.

Conflicts:
	libavcodec/arm/dsputil_init_vfp.c
	libavcodec/arm/dsputil_vfp.S
	libavcodec/dsputil.c
	libavcodec/ppc/float_altivec.c
	libavcodec/x86/dsputil.asm
	libavutil/x86/float_dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-23 14:04:50 +01:00
Michael Niedermayer
b1b870fbd7 Merge commit '55aa03b9f8f11ebb7535424cc0e5635558590f49'
* commit '55aa03b9f8f11ebb7535424cc0e5635558590f49':
  floatdsp: move vector_fmul_add from dsputil to avfloatdsp.

Conflicts:
	libavcodec/dsputil.c
	libavcodec/x86/dsputil.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-23 13:54:34 +01:00
Ronald S. Bultje
42d3246948 floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
55aa03b9f8 floatdsp: move vector_fmul_add from dsputil to avfloatdsp. 2013-01-22 11:55:42 -08:00
Ronald S. Bultje
d56668bd80 floatdsp: move scalarproduct_float from dsputil to avfloatdsp.
This makes the aac decoder and all voice codecs independent of dsputil.
2013-01-22 11:55:42 -08:00
Michael Niedermayer
ed8ff70d9e Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: dsputil: Drop some unused macro definitions
  x86: Add a Yasm-based emms() replacement

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-19 13:20:25 +01:00
Michael Niedermayer
b45e0c2573 Merge commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d'
* commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d':
  x86inc: Add cvisible macro for C functions with public prefix

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-19 13:11:41 +01:00
Michael Niedermayer
1b03e09198 Merge commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b'
* commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b':
  x86inc: Rename "program_name" to "private_prefix"
  configure: Run SHFLAGS through ldflags_filter()

Conflicts:
	configure

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-19 13:01:06 +01:00
Martin Storsjö
f4facd2ce7 x86: Add a Yasm-based emms() replacement
This provides a fallback when building with Yasm enabled, but neither
inline assembly, nor the _mm_empty intrinsic are available or enabled.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 22:02:13 +01:00
Diego Biurrun
d633d12b2c x86inc: Add cvisible macro for C functions with public prefix
This allows defining externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 22:02:03 +01:00
Diego Biurrun
ef5d41a553 x86inc: Rename "program_name" to "private_prefix"
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 20:29:53 +01:00
Michael Niedermayer
17596198ca Merge commit '80ac87c13dc8c6c063e26a464c5c542357c0583f'
* commit '80ac87c13dc8c6c063e26a464c5c542357c0583f':
  lavc: support ZenoXVID custom tag
  libcdio: support recent cdio-paranoia
  float_dsp: Add #ifdef HAVE_INLINE_ASM around vector_fmul_window
  theora: Skip zero-sized headers

Conflicts:
	configure

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-18 13:36:39 +01:00
Martin Storsjö
973b4d44f1 float_dsp: Add #ifdef HAVE_INLINE_ASM around vector_fmul_window
This fixes builds on 64bit MSVC.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-01-17 19:07:35 +02:00
Michael Niedermayer
5c7e9e16c9 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  lavc: Move vector_fmul_window to AVFloatDSPContext
  rtpdec_mpeg4: Check the remaining amount of data before reading

Conflicts:
	libavcodec/dsputil.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-16 12:38:41 +01:00
Michael Niedermayer
68f92a70f1 Merge commit 'dae1d507af94261bafd3b11549884e5d1eca590e'
* commit 'dae1d507af94261bafd3b11549884e5d1eca590e':
  x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags
  vf_fps: add final flushed frames to the dropped frame count
  rv34_parser: Adjust #if for disabling individual parsers

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-16 11:44:45 +01:00
Justin Ruggles
e034cc6c60 lavc: Move vector_fmul_window to AVFloatDSPContext
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-16 10:45:45 +01:00
Diego Biurrun
dae1d507af x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags 2013-01-15 17:29:43 +01:00
Michael Niedermayer
b7ede94bbd Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: ABSB2: port to cpuflags

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-15 16:16:18 +01:00
Michael Niedermayer
77041e2474 Merge commit '094a7405e5d8463d7d167d893e04934ec1a84ecd'
* commit '094a7405e5d8463d7d167d893e04934ec1a84ecd':
  x86: ABSB: port to cpuflags
  sdp: Include SRTP crypto params if using the srtp protocol

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-15 16:12:24 +01:00
Michael Niedermayer
cfc40a6aff Merge commit 'd8c772de53d29afb1bada88afa859fce8489c668'
* commit 'd8c772de53d29afb1bada88afa859fce8489c668':
  nutdec: Always return a value from nut_read_timestamp()
  configure: Make warnings from -Wreturn-type fatal errors
  x86: ABS2: port to cpuflags
  vdpau: Remove av_unused attribute from function declaration
  h264: fix ff_generate_sliding_window_mmcos() prototype.

Conflicts:
	configure
	libavformat/nutdec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-15 15:23:20 +01:00
Diego Biurrun
320e1d0df3 x86: ABSB2: port to cpuflags 2013-01-15 11:18:51 +01:00
Diego Biurrun
094a7405e5 x86: ABSB: port to cpuflags 2013-01-15 11:18:51 +01:00
Diego Biurrun
51969a652c x86: ABS2: port to cpuflags 2013-01-14 21:56:55 +01:00
Michael Niedermayer
ea93ccf079 Merge commit '5b4dfbffc258f90a7d2540d21209ac23afcf7cd0'
* commit '5b4dfbffc258f90a7d2540d21209ac23afcf7cd0':
  x86: ABS1: port to cpuflags
  v210x: cosmetics, reformat

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-07 01:35:18 +01:00
Diego Biurrun
5b4dfbffc2 x86: ABS1: port to cpuflags 2013-01-06 13:57:01 +01:00
Michael Niedermayer
7e90053822 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  mpegvideo: increase edge_emu_buffer size for VC1
  lavc: merge latest x86inc.asm fixes with x264

Conflicts:
	libavcodec/mpegvideo.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-20 02:51:35 +01:00
Ronald S. Bultje
a34d9ad969 lavc: merge latest x86inc.asm fixes with x264
Unbreak NASM support.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-19 07:27:33 +01:00
Michael Niedermayer
a01fe55077 Merge commit 'c0dc57f1264dad1e121772d03abdb9e14ed8857f'
* commit 'c0dc57f1264dad1e121772d03abdb9e14ed8857f':
  asyncts: merge two conditions
  x86inc: fully concatenate tokens to fix macro expansion for nasm
  h264: initialize frame-mt context copies properly

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-14 15:43:46 +01:00
Janne Grunau
0995ad8db4 x86inc: fully concatenate tokens to fix macro expansion for nasm
Fixes build errors with nasm introduced in 6f40e9f070 for stack
memory alignment. Noticed by BugMaster.
2012-12-13 23:57:09 +01:00
Michael Niedermayer
7897919a88 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  aacdec: Fix an off-by-one overwrite when switching to LTP profile from MAIN.
  x86inc: fix stack alignment on win64
  rtpproto: Remove unused defines

Conflicts:
	libavcodec/aacdec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-13 12:23:48 +01:00
Ronald S. Bultje
140367aff9 x86inc: fix stack alignment on win64
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-12-12 21:30:49 +02:00
Ronald S. Bultje
ce58642ed0 x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-12 10:37:52 +01:00
Ronald S. Bultje
6f40e9f070 x86inc: support stack mem allocation and re-alignment in PROLOGUE
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-12 05:23:46 +01:00
Michael Niedermayer
5c076205a6 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  golomb: use unsigned arithmetics in svq3_get_ue_golomb()
  x86: float_dsp: fix loading of the len parameter on x86-32
  takdec: fix initialisation of LOCAL_ALIGNED array
  takdec: fix initialisation of LOCAL_ALIGNED array

Conflicts:
	libavcodec/rv30.c
	libavcodec/svq3.c
	libavcodec/takdec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-08 16:36:47 +01:00
Justin Ruggles
1c012e6bfb x86: float_dsp: fix loading of the len parameter on x86-32 2012-12-07 21:19:29 -05:00
Michael Niedermayer
af164d7d9f Merge commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152'
* commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152':
  fate: dpcm: Add dependencies
  SBR DSP x86: implement SSE sbr_hf_gen
  AAC SBR: use AVFloatDSPContext's vector_fmul
  fate: image: Add dependencies
  Changelog: add an entry for deprecating the avconv -vol option
  x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32

Conflicts:
	Changelog
	libavutil/x86/float_dsp.asm
	tests/fate/image.mak

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-07 15:21:41 +01:00
Michael Niedermayer
54a71f2e6c Merge commit 'b519298a1578e0c895d53d4b4ed8867b1c031a56'
* commit 'b519298a1578e0c895d53d4b4ed8867b1c031a56':
  pixdesc: fix yuva 10bit bit depth
  avconv: deprecate the -vol option
  x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling
  x86: af_volume: add SSE2-optimized s16 volume scaling

Conflicts:
	ffmpeg.c
	tests/ref/lavfi/pixdesc
	tests/ref/lavfi/pixfmts_copy
	tests/ref/lavfi/pixfmts_null
	tests/ref/lavfi/pixfmts_scale
	tests/ref/lavfi/pixfmts_vflip

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-06 15:55:47 +01:00
Michael Niedermayer
15784c2bab Merge commit '9d5c62ba5b586c80af508b5914934b1c439f6652'
* commit '9d5c62ba5b586c80af508b5914934b1c439f6652':
  lavu/opt: do not filter out the initial sign character except for flags
  eval: treat dB as decibels instead of decibytes
  float_dsp: add vector_dmul_scalar() to multiply a vector of doubles

Conflicts:
	libavutil/eval.c
	tests/ref/fate/eval

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-06 14:33:38 +01:00
Justin Ruggles
ecc8b02194 x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-12-06 14:11:15 +01:00
Justin Ruggles
b30a363331 x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling 2012-12-05 11:23:37 -05:00
Justin Ruggles
ac7eb4cb20 float_dsp: add vector_dmul_scalar() to multiply a vector of doubles
Include x86-optimized versions for SSE2 and AVX.
2012-12-05 11:23:36 -05:00
Michael Niedermayer
42d3fea65f Merge commit 'af7d13ee4a4bf8d708f9b0598abb8f6e22b76de1'
* commit 'af7d13ee4a4bf8d708f9b0598abb8f6e22b76de1':
  asink_nullsink: plug a memory leak.
  x86: h264_idct: port to cpuflags
  x86: cpu: Drop unused HAVE_RWEFLAGS condition

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-28 13:32:17 +01:00
Diego Biurrun
490df522c7 x86: cpu: Drop unused HAVE_RWEFLAGS condition
The test for rweflags was dropped in a previous commit.
2012-11-28 00:28:09 +01:00
Michael Niedermayer
b4d4e51027 Merge commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9'
* commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9':
  riff: only warn on a bad INFO chunk code size instead of failing
  configure: Add separate list for libraries and use where appropriate
  x86: float_dsp: add SSE version of vector_fmul_scalar()

Conflicts:
	configure
	libavformat/riff.c
	libavutil/x86/float_dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-27 14:10:05 +01:00
Justin Ruggles
947f933687 x86: float_dsp: add SSE version of vector_fmul_scalar() 2012-11-26 11:30:19 -05:00
Michael Niedermayer
e6d81ce22e Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: h264_intrapred: Fix C function names in comments
  x86: SPLATD: port to cpuflags

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-19 14:24:20 +01:00
Diego Biurrun
87af05c575 x86: SPLATD: port to cpuflags 2012-11-18 18:34:05 +01:00
Michael Niedermayer
a1b5c9634e Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: mmx2 ---> mmxext in asm constructs

Conflicts:
	libavcodec/x86/h264_chromamc_10bit.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264dsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-14 12:34:30 +01:00
Diego Biurrun
26301caaa1 x86: mmx2 ---> mmxext in asm constructs 2012-11-14 00:58:51 +01:00
Michael Niedermayer
da501ea857 Merge commit '802713c4e7b41bc2deed754d78649945c3442063'
* commit '802713c4e7b41bc2deed754d78649945c3442063':
  mss2: prevent potential uninitialized reads
  mss2: reindent after last commit
  mss2: fix handling of unmasked implicit WMV9 rectangles
  configure: add lavu dependency to lavr/lavfi .pc files
  x86inc: Set program_name outside of x86inc.asm

Conflicts:
	configure

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-12 10:57:06 +01:00
Diego Biurrun
2b479bcab0 build: Drop AVX assembly ifdefs
An assembler able to cope with AVX instructions is now required.
2012-11-11 20:43:28 +01:00
Diego Biurrun
f0d124f005 x86inc: Set program_name outside of x86inc.asm
This reduces the local difference to the x264 upstream version.
2012-11-11 11:06:19 +01:00
Michael Niedermayer
2ce64413e2 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: PALIGNR: port to cpuflags
  x86: h264_qpel_10bit: port to cpuflags

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-10 12:44:39 +01:00
Diego Biurrun
4b60fac419 x86: PALIGNR: port to cpuflags 2012-11-09 21:31:31 +01:00
Michael Niedermayer
e859339e7a Merge commit '930e26a3ea9d223e04bac4cdde13697cec770031'
* commit '930e26a3ea9d223e04bac4cdde13697cec770031':
  x86: h264qpel: Only define mmxext QPEL functions if H264QPEL is enabled
  x86: PABSW: port to cpuflags
  x86: vc1dsp: port to cpuflags
  rtmp: Use av_strlcat instead of strncat

Conflicts:
	libavcodec/x86/h264_qpel.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-05 22:36:05 +01:00
Diego Biurrun
dbb37e7711 x86: PABSW: port to cpuflags 2012-11-05 14:51:10 +01:00
Michael Niedermayer
37e81996dc Merge commit '9221efef7968463f3e3d9ce79ea72eaca082e73f'
* commit '9221efef7968463f3e3d9ce79ea72eaca082e73f':
  lavf: fix av_interleaved_write_frame() doxy.
  lavf: clarify the lifetime of demuxed packets.
  avconv: do not free muxed packet on streamcopy.
  crc: move doxy to the header
  vf_drawtext: do not use deprecated av_tree_node_size
  x86: Refactor PSWAPD fallback implementations and port to cpuflags

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-03 14:24:11 +01:00
Michael Niedermayer
1885ffb03d Merge commit '9a07c1332cfe092b57b5758f22b686ca58806c60'
* commit '9a07c1332cfe092b57b5758f22b686ca58806c60':
  parser: Move Doxygen documentation to the header files
  PGS subtitles: Expose forced flag
  x86: PMINUB: port to cpuflags

Conflicts:
	libavcodec/avcodec.h
	libavcodec/pgssubdec.c
	libavcodec/version.h
	libavcodec/x86/ac3dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-03 14:13:45 +01:00
Michael Niedermayer
1dad486714 Merge commit '9ce02e14f01de50fcc6f7f459544b140be66d615'
* commit '9ce02e14f01de50fcc6f7f459544b140be66d615':
  x86: ac3dsp: port to cpuflags
  x86util: Add cpuflags_mmxext alias for cpuflags_mmx2
  x86inc: Only define program_name if the macro is unset

Conflicts:
	libavcodec/x86/ac3dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-03 13:38:38 +01:00
Diego Biurrun
0a7a94f2e5 x86: Refactor PSWAPD fallback implementations and port to cpuflags 2012-11-02 17:05:29 +01:00
Diego Biurrun
26f01bd106 x86: PMINUB: port to cpuflags 2012-11-02 15:38:15 +01:00
Diego Biurrun
61bc2bc7d4 x86util: Add cpuflags_mmxext alias for cpuflags_mmx2
"mmxext" is a more sensible name and more common in outside projects.
2012-11-02 15:22:34 +01:00
Diego Biurrun
012f73e271 x86inc: Only define program_name if the macro is unset
This allows overriding the value from outside of the file.
2012-11-02 14:38:00 +01:00
Michael Niedermayer
28c0678eb7 Merge commit 'be923ed659016350592acb9b3346f706f8170ac5'
* commit 'be923ed659016350592acb9b3346f706f8170ac5':
  x86: fmtconvert: port to cpuflags
  x86: MMX2 ---> MMXEXT in macro names

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 14:16:18 +01:00
Dave Yeo
264f12342c x86: Fix assembly with NASM
Unlike YASM, NASM only looks for include files in the current
directory, not in the directory that included files reside in.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-10-31 13:50:01 +01:00
Michael Niedermayer
3174616f59 Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
  x86: include x86inc.asm in x86util.asm
  cng: Reindent some incorrectly indented lines
  cngdec: Allow flushing the decoder
  cngdec: Make the dbov variable have the right unit
  cngdec: Fix the memset size to cover the full array
  cngdec: Update the LPC coefficients after averaging the reflection coefficients
  configure: fix print_config() with broke awks

Conflicts:
	libavcodec/x86/ac3dsp.asm
	libavcodec/x86/dct32.asm
	libavcodec/x86/deinterlace.asm
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/fft.asm
	libavcodec/x86/fmtconvert.asm
	libavcodec/x86/h264_chromamc.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264_deblock_10bit.asm
	libavcodec/x86/h264_idct.asm
	libavcodec/x86/h264_idct_10bit.asm
	libavcodec/x86/h264_intrapred.asm
	libavcodec/x86/h264_intrapred_10bit.asm
	libavcodec/x86/h264_weight.asm
	libavcodec/x86/vc1dsp.asm
	libavcodec/x86/vp3dsp.asm
	libavcodec/x86/vp56dsp.asm
	libavcodec/x86/vp8dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 13:43:33 +01:00
Dave Yeo
9c167914a1 x86: Fix assembly with NASM
Unlike YASM, NASM only looks for include files in the current
directory, not in the directory that included files reside in.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-10-31 10:20:35 +01:00
Diego Biurrun
588fafe7f3 x86: MMX2 ---> MMXEXT in macro names 2012-10-31 01:04:55 +01:00
Diego Biurrun
6860b4081d x86: include x86inc.asm in x86util.asm
This is necessary to allow refactoring some x86util macros with cpuflags.
2012-10-31 00:37:42 +01:00
Ronald S. Bultje
08b028c18d Remove INIT_AVX from x86inc.asm. 2012-10-29 14:51:14 -07:00
Michael Niedermayer
e335658370 Merge commit '9734b8ba56d05e970c353dfd5baafa43fdb08024'
* commit '9734b8ba56d05e970c353dfd5baafa43fdb08024':
  Move avutil tables only used in libavcodec to libavcodec.

Conflicts:
	libavcodec/mathtables.c
	libavutil/intmath.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-12 14:26:46 +02:00
Michael Niedermayer
0ed023275f Merge remote-tracking branch 'qatar/master'
* qatar/master:
  h264: don't touch H264Context->ref_count[] during MB decoding
  x86: get_cpu_flags: add necessary ifdefs around function body
  x86: Drop CPU detection intrinsics
  x86: Add YASM implementations of cpuid and xgetbv from x264

Conflicts:
	configure
	libavcodec/h264_cabac.c
	libavcodec/h264_cavlc.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-05 17:04:15 +02:00
Michael Niedermayer
2a77d4f70b Merge commit '65d12900432ac880d764edbbd36818431484a76e'
* commit '65d12900432ac880d764edbbd36818431484a76e':
  configure: add --enable-lto option
  x86: cpu: Break out test for cpuid capabilities into separate function
  x86: ff_get_cpu_flags_x86(): Avoid a pointless variable indirection
  build: Factor out mpegaudio dependencies to CONFIG_MPEGAUDIO
  segment: Add comments about calls that only are relevant for some muxers
  segment: Add an option for omitting the first header and final trailer

Conflicts:
	configure
	libavcodec/Makefile
	libavformat/segment.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-05 16:52:13 +02:00
Diego Biurrun
a7329e5fc2 x86: get_cpu_flags: add necessary ifdefs around function body
ff_get_cpu_flags_x86() requires cpuid(), which is conditionally defined
elsewhere in the file.  Surrounding the function body with ifdefs allows
building even when cpuid is not defined.  An empty cpuflags mask is
returned in this case.
2012-10-04 19:29:14 +02:00
Diego Biurrun
f6fbce761e x86: Drop CPU detection intrinsics
Now that there is CPU detection in YASM, there will always be one of
inline or external assembly enabled, which obviates the need to fall
back on CPU detection through compiler intrinsics.
2012-10-04 19:29:14 +02:00
Diego Biurrun
1f6d86991f x86: Add YASM implementations of cpuid and xgetbv from x264
This allows detecting CPU features with builds that have neither
gcc inline assembly nor the right compiler intrinsics enabled.
2012-10-04 19:29:14 +02:00
Diego Biurrun
54b243141e x86: cpu: Break out test for cpuid capabilities into separate function 2012-10-04 18:09:21 +02:00
Diego Biurrun
cc5e9e5ff0 x86: ff_get_cpu_flags_x86(): Avoid a pointless variable indirection 2012-10-04 17:58:42 +02:00
Michael Niedermayer
77aedc77ab Merge remote-tracking branch 'qatar/master'
* qatar/master:
  swscale: Provide the right alignment for external mmx asm
  x86: Replace checks for CPU extensions and flags by convenience macros
  configure: msvc: fix/simplify setting of flags for hostcc
  x86: mlpdsp: mlp_filter_channel_x86 requires inline asm

Conflicts:
	libavcodec/x86/fft_init.c
	libavcodec/x86/h264_intrapred_init.c
	libavcodec/x86/h264dsp_init.c
	libavcodec/x86/mpegaudiodec.c
	libavcodec/x86/proresdsp_init.c
	libavutil/x86/float_dsp_init.c
	libswscale/utils.c
	libswscale/x86/swscale.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-09-09 13:27:42 +02:00
Diego Biurrun
e0c6cce447 x86: Replace checks for CPU extensions and flags by convenience macros
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
2012-09-08 18:18:34 +02:00
Michael Niedermayer
7beadfe1f7 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  mov_chan: Only set the channel_layout if setting it to a nonzero value
  mov_chan: Reindent an incorrectly indented line
  mp2 muxer: mark as AVFMT_NOTIMESTAMPS.
  x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64
  x86: more specific checks for availability of required assembly capabilities
  x86: avcodec: Drop silly "_mmx" suffix from dsputil template names
  fate: Drop redundant setting of FUZZ to 1
  cavsdsp: set idct permutation independently of dsputil
  x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov

Conflicts:
	libavcodec/x86/dsputil_mmx.c
	libavformat/mp3enc.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-09-08 12:53:44 +02:00
Justin Ruggles
7327525997 x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64
The SWAP macro does not work for explicit xmm/ymm usage, so instead just move
the scalar value from xmm2 to xmm0.
2012-09-07 14:49:10 -04:00
Michael Niedermayer
9dcc4c30f9 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  configure: add support for bdver1 and bdver2 CPU types.
  avio: make avio_close NULL the freed buffer
  pixdesc: cosmetics
  proresenc: Don't free a buffer not owned by the codec
  proresenc: Write the full value in one put_bits call
  adpcmenc: Calculate the IMA_QT predictor without overflow
  x86: Add convenience macros to check for CPU extensions and flags
  x86: h264dsp: drop some unnecessary ifdefs around prototype declarations
  mss12: merge decode_pixel() and decode_top_left_pixel()
  mss12: reduce SliceContext size from 1067 to 164 KB
  mss12: move SliceContexts out of the common context into the codec contexts

Conflicts:
	libavformat/aviobuf.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-09-04 17:04:51 +02:00
Diego Biurrun
f82c4fb27f x86: Add convenience macros to check for CPU extensions and flags 2012-09-04 01:44:59 +02:00
Carl Eugen Hoyos
a26789cf9f Fix compilation with yasm-0.6.2. 2012-09-01 10:59:16 +02:00
Michael Niedermayer
c617bed34f Merge remote-tracking branch 'qatar/master'
* qatar/master:
  MSS1 and MSS2: set final pixel format after common stuff has been initialised
  MSS2 decoder
  configure: handle --disable-asm before check_deps
  x86: Split inline and external assembly #ifdefs
  configure: x86: Separate inline from standalone assembler capabilities
  pktdumper: Use a custom define instead of PATH_MAX for buffers
  pktdumper: Use av_strlcpy instead of strncpy
  pktdumper: Use sizeof(variable) instead of the direct buffer length

Conflicts:
	Changelog
	configure
	libavcodec/allcodecs.c
	libavcodec/avcodec.h
	libavcodec/codec_desc.c
	libavcodec/dct-test.c
	libavcodec/imgconvert.c
	libavcodec/mss12.c
	libavcodec/version.h
	libavfilter/x86/gradfun.c
	libswscale/x86/yuv2rgb.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-31 13:34:32 +02:00
Michael Niedermayer
98298eb103 Merge commit 'ec36aa69448f20a78d8c4588265022e0b2272ab5'
* commit 'ec36aa69448f20a78d8c4588265022e0b2272ab5':
  x86: Fix linking with some or all of yasm, mmx, optimizations disabled
  configure: Add more fine-grained SSE CPU capabilities flags
  avfilter: x86: Use more precise compile template names
  x86: cosmetics: Comment some #endifs for better readability
  g723_1: add comfort noise generation
  utvideoenc: Switch to dsputils' median prediction
  utvideoenc: Avoid writing into the input picture
  avtools: remove the distinction between func_arg and func2_arg.
  avconv: make the -passlogfile option per-stream.
  avconv: make the -pass option per-stream.
  cmdutils: make -codecs print lossy/lossless flags.
  lavc: add lossy/lossless codec properties.

Conflicts:
	Changelog
	cmdutils.c
	configure
	doc/APIchanges
	ffmpeg.h
	ffmpeg_opt.c
	ffprobe.c
	libavcodec/codec_desc.c
	libavcodec/g723_1.c
	libavcodec/utvideoenc.c
	libavcodec/version.h
	libavcodec/x86/mpegaudiodec.c
	libavcodec/x86/rv40dsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-31 13:01:30 +02:00
Diego Biurrun
17337f54c0 x86: Split inline and external assembly #ifdefs 2012-08-31 01:53:25 +02:00
Diego Biurrun
a886b279a0 x86: cosmetics: Comment some #endifs for better readability 2012-08-30 18:50:33 +02:00
Michael Niedermayer
17106a7c90 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  audio_frame_queue: Clean up ff_af_queue_log_state debug function
  dwt: Remove unused code.
  cavs: convert cavsdata.h to a .c file
  cavs: Move inline functions only used in one file out of the header
  cavs: Move data tables used in only one place to that file
  fate: Add a single symbol Ut Video decoder test
  vf_hqdn3d: x86 asm
  vf_hqdn3d: support 16bit colordepth
  avconv: prefer user-forced input framerate when choosing output framerate

Conflicts:
	ffmpeg.c
	libavcodec/audio_frame_queue.c
	libavcodec/dwt.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-26 22:40:02 +02:00
Loren Merritt
7a1944b907 vf_hqdn3d: x86 asm
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
2012-08-26 10:49:14 +00:00
Michael Niedermayer
bec180e112 Merge commit 'a1bcc76e6036e78f25cbb7323c145056cfca9d93'
* commit 'a1bcc76e6036e78f25cbb7323c145056cfca9d93': (21 commits)
  cmdutils: fix a memleak when specifying an option twice.
  x86: mpegvideo: more sensible names for optimization file and init function
  x86: mpegvideoenc: Split optimizations off into a separate file
  dnxhdenc: x86: more sensible names for optimization file and init function
  svq1/svq3: Move common code out of SVQ1 decoder-specific file
  dirac: add Comments and references to the standard
  lavr: x86: optimized 6-channel flt to fltp conversion
  lavr: x86: optimized 2-channel flt to fltp conversion
  lavr: x86: optimized 6-channel flt to s16p conversion
  lavr: x86: optimized 2-channel flt to s16p conversion
  lavr: x86: optimized 6-channel s16 to fltp conversion
  lavr: x86: optimized 2-channel s16 to fltp conversion
  lavr: x86: optimized 6-channel s16 to s16p conversion
  lavr: x86: optimized 2-channel s16 to s16p conversion
  lavr: x86: optimized 2-channel fltp to flt conversion
  lavr: x86: optimized 6-channel fltp to s16 conversion
  lavr: x86: optimized 2-channel fltp to s16 conversion
  lavr: x86: optimized 6-channel s16p to flt conversion
  lavr: x86: optimized 2-channel s16p to flt conversion
  lavr: x86: optimized 6-channel s16p to s16 conversion
  ...

Conflicts:
	libavcodec/dirac.c
	libavcodec/mpegvideo.h
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-24 14:30:40 +02:00
Justin Ruggles
6092dafb5a lavr: x86: optimized 6-channel s16 to fltp conversion 2012-08-23 20:10:57 -04:00
Mans Rullgard
5b170c0bea x86: remove FASTDIV inline asm
GCC 4.3 and later do the right thing with the plain C code.  Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value.  At least
two gcc configurations miscompile the inline asm in some situations.

In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr.  On Intel i7 and later, this imul is faster 32-bit mul.  On
older Intel and all AMD, it is slightly slower.  On Atom it is much
slower.

Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible.  If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-22 14:29:10 +01:00
Michael Niedermayer
c581cb4e4f Merge remote-tracking branch 'qatar/master'
* qatar/master:
  Fix even more missing includes after the common.h removal
  build: Factor out rangecoder dependencies to CONFIG_RANGECODER
  build: Factor out error resilience dependencies to CONFIG_ERROR_RESILIENCE
  x86: avcodec: Consistently name all init files
  Add more missing includes after removing the implicit common.h
  Add some more missing includes after removing the implicit common.h
  Don't include common.h from avutil.h
  rtmp: Automatically compute the hash for SWFVerification

Conflicts:
	configure
	doc/APIchanges
	doc/examples/decoding_encoding.c
	libavcodec/Makefile
	libavcodec/assdec.c
	libavcodec/audio_frame_queue.c
	libavcodec/avpacket.c
	libavcodec/dv_profile.c
	libavcodec/dwt.c
	libavcodec/libtheoraenc.c
	libavcodec/rawdec.c
	libavcodec/rv40dsp.c
	libavcodec/tiff.c
	libavcodec/tiffenc.c
	libavcodec/v210dec.h
	libavcodec/vc1dsp.c
	libavcodec/x86/Makefile
	libavfilter/asrc_anullsrc.c
	libavfilter/avfilter.c
	libavfilter/buffer.c
	libavfilter/formats.c
	libavfilter/vf_ass.c
	libavfilter/vf_drawtext.c
	libavfilter/vf_fade.c
	libavfilter/vf_select.c
	libavfilter/video.c
	libavfilter/vsrc_testsrc.c
	libavformat/version.h
	libavutil/audioconvert.c
	libavutil/error.h
	libavutil/version.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-16 16:20:30 +02:00
Martin Storsjö
33e112847d Add more missing includes after removing the implicit common.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-16 10:49:54 +03:00
Martin Storsjö
70766c2182 Add some more missing includes after removing the implicit common.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-15 23:48:48 +03:00
Michael Niedermayer
9f088a1ed4 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  mpegvideo: reduce excessive inlining of mpeg_motion()
  mpegvideo: convert mpegvideo_common.h to a .c file
  build: factor out mpegvideo.o dependencies to CONFIG_MPEGVIDEO
  Move MASK_ABS macro to libavcodec/mathops.h
  x86: move MANGLE() and related macros to libavutil/x86/asm.h
  x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
  aacdec: Don't fall back to the old output configuration when no old configuration is present.
  rtmp: Add message tracking
  rtsp: Support mpegts in raw udp packets
  rtsp: Support receiving plain data over UDP without any RTP encapsulation
  rtpdec: Remove an unused include
  rtpenc: Remove an av_abort() that depends on user-supplied data
  vsrc_movie: discourage its use with avconv.
  avconv: allow no input files.
  avconv: prevent invalid reads in transcode_init()
  avconv: rename OutputStream.is_past_recording_time to finished.

Conflicts:
	configure
	doc/filters.texi
	ffmpeg.c
	ffmpeg.h
	libavcodec/Makefile
	libavcodec/aacdec.c
	libavcodec/mpegvideo.c
	libavformat/version.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-09 19:31:56 +02:00
Mans Rullgard
070a402b60 x86: move MANGLE() and related macros to libavutil/x86/asm.h
These x86-specific macros do not belong in generic code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard
c318626ce2 x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
This puts x86-specific things in the x86/ subdirectory where they
belong.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Michael Niedermayer
c794acc44e x86inc.asm: remove redundant ifdef __YASM_VER__
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-08 01:14:18 +02:00
Michael Niedermayer
2fc7c818cb Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: fix build with nasm 2.08
  x86: use nop cpu directives only if supported
  x86: fix rNmp macros with nasm
  build: add trailing / to yasm/nasm -I flags
  x86: use 32-bit source registers with movd instruction
  x86: add colons after labels

Conflicts:
	Makefile
	libavutil/x86/x86inc.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-07 23:04:55 +02:00
Mans Rullgard
edd8226795 x86: fix build with nasm 2.08
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro.  This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
2012-08-07 15:24:34 +01:00
Mans Rullgard
180d43bc67 x86: use nop cpu directives only if supported
nasm does not support 'CPU foonop' directives.  This adds a configure
test for the directive and uses it only if supported.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:22:20 +01:00
Mans Rullgard
7238265052 x86: fix rNmp macros with nasm
For some reason, nasm requires this.  No harm done to yasm.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:21:58 +01:00
Mans Rullgard
a3df4781f4 x86: add colons after labels
nasm prints a warning if the colon is missing.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:20:56 +01:00
Michael Niedermayer
e776ee8f29 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  lavr: fix handling of custom mix matrices
  fate: force pix_fmt in lagarith-rgb32 test
  fate: add tests for lagarith lossless video codec.
  ARMv6: vp8: fix stack allocation with Apple's assembler
  ARM: vp56: allow inline asm to build with clang
  fft: 3dnow: fix register name typo in DECL_IMDCT macro
  x86: dct32: port to cpuflags
  x86: build: replace mmx2 by mmxext
  Revert "wmapro: prevent division by zero when sample rate is unspecified"
  wmapro: prevent division by zero when sample rate is unspecified
  lagarith: fix color plane inversion for YUY2 output.
  lagarith: pad RGB buffer by 1 byte.
  dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.

Conflicts:
	doc/APIchanges
	libavcodec/lagarith.c
	libavfilter/x86/gradfun.c
	libavutil/cpu.h
	libavutil/version.h
	libswscale/utils.c
	libswscale/version.h
	libswscale/x86/yuv2rgb.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-04 23:51:43 +02:00
Michael Niedermayer
a7acab6cda Merge remote-tracking branch 'qatar/master'
* qatar/master:
  vc1dec: Remove separate scaling function for interlaced field MVs
  vc1dec: Invoke edge_emulation regardless of MV precision
  x86: Use consistent 3dnowext function and macro name suffixes
  g723_1: scale output as supposed for the case with postfilter disabled
  g723_1: increase excitation storage by 4
  g723_1: fix upper bound parameter from inverse maximum autocorrelation
  g723_1: make scale_vector() behave like the reference
  g723_1: fix off-by-one error in normalize_bits()
  g723_1: save/restore excitation with offset to store LPC history
  wmapro: prevent division by zero when sample rate is unspecified
  x86: proresdsp: improve SIGNEXTEND macro comments
  x86: h264dsp: K&R formatting cosmetics
  LICENSE: Document all GPL files

Conflicts:
	libavcodec/g723_1.c
	libavcodec/wmaprodec.c
	libavcodec/x86/h264dsp_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-03 23:13:06 +02:00
Diego Biurrun
239fdf1b4a x86: build: replace mmx2 by mmxext
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
2012-08-03 22:51:05 +02:00
Diego Biurrun
ca844b7be9 x86: Use consistent 3dnowext function and macro name suffixes
Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
2012-08-03 14:00:47 +02:00
Michael Niedermayer
706bd8ea19 Merge remote-tracking branch 'qatar/master'
* qatar/master: (35 commits)
  h264_idct_10bit: port x86 assembly to cpuflags.
  x86inc: clip num_args to 7 on x86-32.
  x86inc: sync to latest version from x264.
  fft: rename "z" to "zc" to prevent name collision.
  wv: return meaningful error codes.
  wv: return AVERROR_EOF on EOF, not EIO.
  mp3dec: forward errors for av_get_packet().
  mp3dec: remove a pointless local variable.
  mp3dec: remove commented out cruft.
  lavfi: bump minor to mark stabilizing the ABI.
  FATE: add tests for yadif.
  FATE: add a test for delogo video filter.
  FATE: add a test for amix audio filter.
  audiogen: allow specifying random seed as a commandline parameter.
  vc1dec: Override invalid macroblock quantizer
  vc1: avoid reading beyond the last line in vc1_draw_sprites()
  vc1dec: check that coded slice positions and interlacing match.
  vc1dec: Do not ignore ff_vc1_parse_frame_header_adv return value
  configure: Move parts that should not be user-selectable to CONFIG_EXTRA
  lavf: remove commented out cruft in avformat_find_stream_info()
  ...

Conflicts:
	Makefile
	configure
	libavcodec/vc1dec.c
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264_deblock_10bit.asm
	libavcodec/x86/h264dsp_mmx.c
	libavfilter/version.h
	libavformat/mp3dec.c
	libavformat/utils.c
	libavformat/wv.c
	libavutil/x86/x86inc.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-29 02:16:26 +02:00