FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-28 20:53:54 +02:00

Author	SHA1	Message	Date
Jason Garrett-Glaser	19fb234e4a	H.264: split luma dc idct out and implement MMX/SSE2 versions About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-14 21:34:25 +00:00
Ronald S. Bultje	a52ffc3f54	Move static inline function to a macro, so that constant propagation in inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE breakage after r25254. Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 17:42:26 +00:00
Ronald S. Bultje	cd17285e6c	Merge b_idx and edge variables, and optimize the ASM to directly load variables from memory locations/offsets depending on b_idx plus constants, rather than having gcc do this. This saves several lea calls and together saves about 10 cycles in h264_loop_filter_strength_mmx2(). Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 14:04:39 +00:00
Ronald S. Bultje	0cc8a5d088	Remove mv_mask variable. Replace the related pand -1/0 instructions by either a pxor, or remove the instruction alltogether. Altogether, this saves 1 instruction. Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 14:03:30 +00:00
Ronald S. Bultje	c0673f2cf4	Remove d_idx as a variable, and instead load it as a constant in the asm. This has no measurable speed effect because the surrounding code doesn't take advantage of this yet. Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 14:02:32 +00:00
Ronald S. Bultje	2c3135f6d3	Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid of the d_idx variable and therefore allows for future optimizations. No speed difference by this commit itself. Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 13:35:24 +00:00
Ronald S. Bultje	4b81511cab	Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows inlining various constants within the loop code. 20 cycles faster on cathedral sample. Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 13:34:20 +00:00
Ronald S. Bultje	7e117771cd	Remove unused variable. Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-24 15:31:46 +00:00
Måns Rullgård	c0bc8b9afb	x86: disable SSE functions using stack when stack is not aligned This fixes crashes with ICC 10.1. Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-21 17:57:21 +00:00
Måns Rullgård	f41237c9db	x86: remove hack disabling sse2 h264 loop filter with 32-bit icc Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-18 20:44:32 +00:00
Ronald S. Bultje	1d16a1cf99	Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-14 13:36:26 +00:00
Jason Garrett-Glaser	8acb554aff	LGPL SSE2 H.264 iDCT This leaves no more GPL-only H.264 decoding asm code. Approved by Loren. Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-10 02:25:12 +00:00
Stefano Sabatini	c6c98d0897	Move mm_support() from libavcodec to libavutil, make it a public function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-08 15:07:14 +00:00
Stefano Sabatini	7160bb716b	Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_ symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h. Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-04 09:59:08 +00:00
Ronald S. Bultje	2c166c3af1	Port latest x264 deblock asm (before they moved to using NV12 as internal format), LGPL'ed with permission from Jason and Loren. This includes mmx2 code, so remove inline asm from h264dsp_mmx.c accordingly. Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-03 16:52:46 +00:00
Ronald S. Bultje	a33a2562c1	Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square biweight code to sse2/ssse3; add sse2 weight code; and use that same code to create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be removed. OK'ed by Jason on IRC. Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:56:16 +00:00
Ronald S. Bultje	14bc1f2485	Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c, still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c, which represents H264DSPContext and is now compiled on its own. Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:48:59 +00:00
Ronald S. Bultje	de1c253bab	Split intra prediction initialization (i.e. assigning of function pointers) into its own file, it doesn't belong in h264dsp_mmx.c (much less so in dsputil_mmx.c). Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:34:13 +00:00
Ronald S. Bultje	d0eb5a1174	Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1 fate failures on Win64. Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:31:04 +00:00
Ronald S. Bultje	7e7c4b6008	Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx() functions. Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:22:27 +00:00
Måns Rullgård	c0ec9918b0	Remove global mm_flags variable Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-24 17:47:05 +00:00
Jason Garrett-Glaser	4a384de5b8	Split h264dsp and h264pred in configure. Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions but not the weight/loopfilter functions. This should reduce the size of builds with one of these derivatives but without H.264 decoding itself. Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-07 23:10:25 +00:00
Eli Friedman	c12d6955e2	H.264: SSE2/SSSE3 weighted prediction asm Patch by Eli Friedman <eli.friedman at gmail dot com> Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:13:38 +00:00
Jason Garrett-Glaser	17dc7c7a60	Fix h264/vp8 intra pred on Athlon XP Whose idea was it to have a CPU that didn't SIGILL on an invalid instruction? Originally committed as revision 23927 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-01 10:29:47 +00:00
Jason Garrett-Glaser	29e719377f	Add missing mm_support call toff_h264_pred_init_x86. I'm not sure if this is supposed to be here, but it can't hurt. Originally committed as revision 23885 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-29 12:28:06 +00:00
Jason Garrett-Glaser	bc14f04b2f	MMXEXT version of vp8 4x4 vertical pred Originally committed as revision 23876 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-29 00:23:52 +00:00
Jason Garrett-Glaser	fb9927ad7d	Add mmx/mmxext/ssse3 4x4 TM intra pred functions for vp8 Originally committed as revision 23875 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 23:53:07 +00:00
Jason Garrett-Glaser	270a85d259	Fix some intra pred MMX functions that used MMXEXT instructions Also add predict_4x4_dc MMXEXT function for vp8/h264. Originally committed as revision 23873 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 23:35:17 +00:00
Baptiste Coudurier	50f70541d3	Change MMXEXT to MMX2, MMXEXT is deprecated Originally committed as revision 23865 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 21:12:00 +00:00
Måns Rullgård	1f65b67c46	Fix x86 build with h264dsp disabled Originally committed as revision 23844 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-28 10:02:15 +00:00
Carl Eugen Hoyos	96da2a6967	Cosmetics: Fix indentation. Originally committed as revision 23785 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-25 18:34:03 +00:00
Jason Garrett-Glaser	4af8cdfc3f	16x16 and 8x8c x86 SIMD intra pred functions for VP8 and H.264 Originally committed as revision 23783 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-06-25 18:25:49 +00:00
Reimar Döffinger	1c71b5c89a	Replace more "m" constraints with MANGLE to fix compilation issues with x86_32 gcc 4.4.4 and -fPIC. Originally committed as revision 23082 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-05-10 21:16:08 +00:00
Reimar Döffinger	27eecec359	Convert two "m" constraints to MANGLE to fix compilation with some compilers. Originally committed as revision 22760 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-04-01 16:52:14 +00:00
Måns Rullgård	84dc2d8afa	Remove DECLARE_ALIGNED_{8,16} macros These macros are redundant. All uses are replaced with the generic DECLARE_ALIGNED macro instead. Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-06 14:24:59 +00:00
Loren Merritt	900479bb74	optimize h264_loop_filter_strength_mmx2 244->160 cycles on core2 Originally committed as revision 21462 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-26 17:17:48 +00:00
Måns Rullgård	c67278098d	Move array specifiers outside DECLARE_ALIGNED() invocations Originally committed as revision 21377 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-22 03:25:11 +00:00
David Conrad	1f630b9717	Use two separate memory arguments since 8+() is invalid gas syntax Originally committed as revision 21360 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-21 09:46:57 +00:00
Michael Niedermayer	b4c2ada528	Attempt to fix asm compilation failure. Only tested on gcc 4 & x86_64. Originally committed as revision 21355 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-20 19:23:19 +00:00
David Conrad	c4f2b6dce3	Use constant offsets for memory operands since gcc is unable to This fixes gcc failing to fit 6 memory locations into 7 registers on x86-32 Originally committed as revision 21337 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-20 00:34:10 +00:00
Michael Niedermayer	9ac4548ff7	Fix h264_loop_filter_strength_mmx2() so it works with b frames. Originally committed as revision 21327 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-19 16:40:36 +00:00
Michael Niedermayer	ebddd2e253	Remove -2 -> -1 remapping, its not needed anymore as we must remap all references per LUT anyway. Originally committed as revision 21323 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-19 14:28:19 +00:00
Ramiro Polla	74a841af8b	Replace more uses of __attribute__((aligned)) by DECLARE_ALIGNED. Originally committed as revision 19089 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-06-04 23:25:09 +00:00
Alexander Strange	2b9969a945	H264: Fix out of bounds reads in SSSE3 MC Reading above src[-2] isn't safe, so move loads and palignr ahead 3 pixels to load starting at the first pixel actually used. Fixes issue941. Originally committed as revision 18999 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-05-30 22:19:14 +00:00
David Conrad	8013da7364	VC1: add and use avg_no_rnd chroma MC functions Originally committed as revision 18518 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-14 23:56:10 +00:00
David Conrad	c374691b28	Rename put_no_rnd_h264_chroma* to reflect its usage in VC1 only Originally committed as revision 18517 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-14 23:55:39 +00:00
Baptiste Coudurier	353f87b8d4	fix typo in h264dsp_mmx (no effect currently as the function is not used), approved by Dark Shikari on IRC Originally committed as revision 17046 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-08 06:35:21 +00:00
Aurelien Jacobs	b250f9c66d	Change semantic of CONFIG_, HAVE_ and ARCH_*. They are now always defined to either 0 or 1. Originally committed as revision 16590 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-13 23:44:16 +00:00
Mathieu Velten	21ff7689da	Use H264 MMX chroma functions to accelerate RV40 decoding. Patch by Mathieu Velten (matmaul A gmail) Originally committed as revision 16419 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-04 01:36:11 +00:00
Jason Garrett-Glaser	37fed10087	Add x264 SSE2 iDCT functions to H.264 decoder. Originally committed as revision 16409 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-03 00:46:17 +00:00

1 2

51 Commits