FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-24 13:56:33 +02:00

Author	SHA1	Message	Date
Ronald S. Bultje	1d16a1cf99	Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-14 13:36:26 +00:00
Jason Garrett-Glaser	8acb554aff	LGPL SSE2 H.264 iDCT This leaves no more GPL-only H.264 decoding asm code. Approved by Loren. Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-10 02:25:12 +00:00
Stefano Sabatini	c6c98d0897	Move mm_support() from libavcodec to libavutil, make it a public function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-08 15:07:14 +00:00
Reimar Döffinger	b1c32fb5e5	Use "d" suffix for general-purpose registers used with movd. This increases compatibilty with nasm and is also more consistent, e.g. with h264_intrapred.asm and h264_chromamc.asm that already do it that way. Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-05 10:10:16 +00:00
Stefano Sabatini	7160bb716b	Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_ symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h. Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-04 09:59:08 +00:00
Ronald S. Bultje	2c166c3af1	Port latest x264 deblock asm (before they moved to using NV12 as internal format), LGPL'ed with permission from Jason and Loren. This includes mmx2 code, so remove inline asm from h264dsp_mmx.c accordingly. Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-03 16:52:46 +00:00
Eli Friedman	a10a9f5cd0	Fix typo in r25019. Patch by Eli Friedman <eli.friedman at gmail dot com>. Originally committed as revision 25022 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 23:19:36 +00:00
Ronald S. Bultje	615da9b1d9	Unscrew breakage after my last commit because of symbol prefixes. Originally committed as revision 25020 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 21:10:19 +00:00
Ronald S. Bultje	a33a2562c1	Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square biweight code to sse2/ssse3; add sse2 weight code; and use that same code to create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be removed. OK'ed by Jason on IRC. Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:56:16 +00:00
Ronald S. Bultje	14bc1f2485	Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c, still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c, which represents H264DSPContext and is now compiled on its own. Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:48:59 +00:00
Ronald S. Bultje	5929b3a651	Fix vertical align. Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-31 12:32:24 +00:00
Ronald S. Bultje	79ce0f002e	Fix compilation failure if yasm is disabled (missing vp3 symbols). Originally committed as revision 24992 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 20:30:40 +00:00
Ronald S. Bultje	de1c253bab	Split intra prediction initialization (i.e. assigning of function pointers) into its own file, it doesn't belong in h264dsp_mmx.c (much less so in dsputil_mmx.c). Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:34:13 +00:00
Ronald S. Bultje	d0eb5a1174	Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1 fate failures on Win64. Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:31:04 +00:00
Ronald S. Bultje	e9f5f020c6	Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6 issues on Win64. Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:25:46 +00:00
Ronald S. Bultje	7e7c4b6008	Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx() functions. Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:22:27 +00:00
Loren Merritt	19d929f9a3	cosmetics in imdct_sse Originally committed as revision 24958 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-28 21:03:13 +00:00
Ronald S. Bultje	4eca52ed19	Fix typos when converting inline asm to yasm, fixes MMX-only fate-ea-vp61. Originally committed as revision 24948 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-26 14:33:39 +00:00
Ronald S. Bultje	6697bc33e2	Revert r24931, it broke Win32 and some BSD compiles (yay fate). Originally committed as revision 24934 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 20:36:35 +00:00
Ronald S. Bultje	72f642400b	Mark xmm6 and xmm7 as clobbered in ff_vp3_idct_sse2(), which is contributing to the VP6 fate failures on Win64. Originally committed as revision 24931 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 19:57:05 +00:00
Måns Rullgård	69dad87c48	VP6: fix vp6_filter_diag4_mmx/sse on 64-bit The stride can be negative and must be sign extended before being used in pointer arithmetic. Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 15:41:11 +00:00
Ronald S. Bultje	89fa3504ed	Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should help in fixing the Win64 fate failures. Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 13:44:16 +00:00
Ronald S. Bultje	3a0885146c	Move vp6_filter_diag4() from DSPContext to VP56DSPContext. Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 13:42:28 +00:00
Måns Rullgård	c0ec9918b0	Remove global mm_flags variable Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-24 17:47:05 +00:00
Ronald S. Bultje	3611c45ab7	Mark xmm registers as clobbered in simple loopfilter. Should fix the last two VP8-related fate failures on Win64. Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-24 16:52:27 +00:00
Alex Converse	cb4f12466b	imdct/x86: Use "s->mdct_size" instead of "1 << s->mdct_bits". It generates smaller cleaner code. Originally committed as revision 24887 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-23 15:51:09 +00:00
Ronald S. Bultje	684d608bde	Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures). Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-23 02:41:22 +00:00
Alex Converse	78b5c97d3e	Convert ff_imdct_half_sse() to yasm. This is to avoid split asm sections that attempt to preserve some registers between sections. Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-22 14:39:58 +00:00
Jason Garrett-Glaser	05c04cdf54	VP5/6/8: ~7% faster arithmetic decoding Grab from the bitstream in 16-bit chunks instead of 8-bit chunks. TODO: grab in 32-bit chunks on 64-bit systems. Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-12 01:11:32 +00:00
Jason Garrett-Glaser	4a384de5b8	Split h264dsp and h264pred in configure. Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions but not the weight/loopfilter functions. This should reduce the size of builds with one of these derivatives but without H.264 decoding itself. Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-07 23:10:25 +00:00
Jason Garrett-Glaser	98fe09df7b	Add file missing in r24702 Originally committed as revision 24703 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:49:48 +00:00
Eli Friedman	c12d6955e2	H.264: SSE2/SSSE3 weighted prediction asm Patch by Eli Friedman <eli.friedman at gmail dot com> Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:13:38 +00:00
Måns Rullgård	f079a64aea	Move cavs dsp functions to their own struct Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-03 20:59:00 +00:00
Jason Garrett-Glaser	8b9b5e085f	VP5/6/8: add one inline missed in r24677 Originally committed as revision 24682 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-03 11:21:22 +00:00
Jason Garrett-Glaser	827d43bb9d	VP8: move zeroing of luma DC block into the WHT Lets us do the zeroing in asm instead of C. Also makes it consistent with the way the regular iDCT code does it. Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-02 20:18:09 +00:00
Ronald S. Bultje	6341838f3c	Use word-writing instead of dword-writing (with two cached but otherwise unchanged bytes) in the horizontal simple loopfilter. This makes the filter quite a bit faster in itself (~30 cycles less on Core1), probably mostly because we don't need a complex 4x4 transpose, but only a simple byte interleave. Also allows using pextrw on SSE4, which speeds up even more (e.g. 25% faster on Core i7). Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 23:13:15 +00:00
Vitor Sessak	fa738b3ad1	Remove x86/mmx.h. It is not used anymore and has been deprecated for years. Originally committed as revision 24618 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 16:20:45 +00:00
Vitor Sessak	de4bc44abb	Convert deinterlacing MMX code to YASM Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 14:50:51 +00:00
Vitor Sessak	740dfe7012	Fix compilation in x86_64. I broke it with r24580. Originally committed as revision 24582 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-29 22:45:21 +00:00
Vitor Sessak	2c3dda6838	Translate libmpeg2 MMX IDCT to plain asm Originally committed as revision 24580 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-29 22:19:54 +00:00
Ronald S. Bultje	ab4d031889	Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster. Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 21:18:19 +00:00
Jason Garrett-Glaser	e25dee602f	VP8: Much faster SSE2 MC 5-10% faster or more on Phenom, Athlon 64, and some others. Helps some on pre-SSSE3 Intel chips as well, but not as much. Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 19:34:00 +00:00
Ronald S. Bultje	48adb7e7a4	Enable no-loop memory/register saving for ssse3/sse4 also. Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 14:07:57 +00:00
Ronald S. Bultje	2a180c69ea	Save a register (or regsize of stackspace for x86-32) for the no-loop mbedge loopfilter functions, by re-using space that holds a variable that we no longer need. Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 14:00:15 +00:00
Ronald S. Bultje	bcd4aa6498	Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this construct was always enabled, even for <ssse3 versions). Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 13:56:51 +00:00
Ronald S. Bultje	2208053bd3	Split pextrw macro-spaghetti into several opt-specific macros, this will make future new optimizations (imagine a sse5) much easier. Also fix a bug where we used the direction (%2) rather than optimization (%1) to enable this, which means it wasn't ever actually used... Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 13:50:59 +00:00
Ronald S. Bultje	6de5b7c6b8	Fix obvious bug in assignment. Somehow, the test vectors don't test this... Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-25 02:42:40 +00:00
Ronald S. Bultje	e3f7bf774c	Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this splits it into small optimization-specific macros which are selected for each DSP function. The advantage of this approach is that the sse4 functions now use the ssse3 codepath also without needing an explicit sse4 codepath. Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-24 19:33:05 +00:00
Eli Friedman	3611e7a309	Inline asm for VP56 arith coder This is a lot more reliable to get cmov rather than trying to trick gcc into generating it, useful since it's 2% faster overall. Patch by Eli Friedman <eli.friedman at gmail> Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 21:46:30 +00:00
Jason Garrett-Glaser	3ae079a3c8	VP8: optimize DC-only chroma case in the same way as luma. Add MMX idct_dc_add4uv function for this case. ~40% faster chroma idct. Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 06:02:52 +00:00

1 2 3 4 5

216 Commits