FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-02 03:06:28 +02:00

Author	SHA1	Message	Date
Ronald S. Bultje	18175baa54	vp9/x86: 16px MC functions (64bit only). Cycle counts for large MCs (old -> new on ped1080p.webm, mx!=0&&my!=0): 16x8: 876 -> 870 (0.7%) 16x16: 1444 -> 1435 (0.7%) 16x32: 2784 -> 2748 (1.3%) 32x16: 2455 -> 2349 (4.5%) 32x32: 4641 -> 4084 (13.6%) 32x64: 9200 -> 7834 (17.4%) 64x32: 8980 -> 7197 (24.8%) 64x64: 17330 -> 13796 (25.6%) Total decoding time goes from 9.326sec to 9.182sec.	2013-12-26 21:05:10 -05:00
Ronald S. Bultje	0d9375fc90	vp9/x86: 16x16 sub-IDCT for top-left 8x8 subblock (eob <= 38). Sub8x8 speed (w/o dc-only case) goes from ~750 cycles (inter) or ~735 cycles (intra) to ~415 cycles (inter) or ~430 cycles (intra). Average overall 16x16 idct speed goes from ~635 cycles (inter) or ~720 cycles (intra) to ~415 cycles (inter) or ~545 (intra) - all measurements done using ped1080p.webm.	2013-12-26 07:40:25 -05:00
Ivan Kalvachev	1c63aed232	Convert XvMC to hwaccel v3 Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-22 22:03:47 +01:00
Michael Niedermayer	ce612fc186	Merge commit 'dfc50ac85e9d68a771b556297b7c411650206f3b' * commit 'dfc50ac85e9d68a771b556297b7c411650206f3b': x86: mpegvideo: move denoise_dct asm to mpegvideoenc Conflicts: libavcodec/x86/mpegvideo.c libavcodec/x86/mpegvideoenc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-20 23:44:31 +01:00
Anton Khirnov	dfc50ac85e	x86: mpegvideo: move denoise_dct asm to mpegvideoenc This function is encoding-only. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-12-20 17:16:11 +01:00
Ronald S. Bultje	8d4c616fc0	vp9/x86: idct_add_16x16_ssse3. Currently only dc-only and full 16x16. Other subforms will follow in the near future. Total decoding time of ped1080p.webm goes from 9.7 to 9.3 seconds. DC-only goes from 957 -> 131 cycles, and the full IDCT goes from ~4050 to ~745 cycles.	2013-12-14 12:13:26 -05:00
Michael Niedermayer	8e70fdab36	Merge commit '4958f35a2ebc307049ff2104ffb944f5f457feb3' * commit '4958f35a2ebc307049ff2104ffb944f5f457feb3': dsputil: Move apply_window_int16 to ac3dsp Conflicts: libavcodec/arm/ac3dsp_init_arm.c libavcodec/arm/ac3dsp_neon.S libavcodec/x86/ac3dsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-09 04:12:40 +01:00
Diego Biurrun	4958f35a2e	dsputil: Move apply_window_int16 to ac3dsp The (optimized) functions are used nowhere else.	2013-12-08 17:57:15 +01:00
Ronald S. Bultje	92436e8ad9	vp9: implement top/left half (4x4) sub-8x8-IDCT. For that specific case (eob>3&&eob<=12), runtime of idct8x8 goes from 668 to 477 cycles. For all idct8x8, runtime goes from 521 to 490 cycles.	2013-12-07 12:39:36 -05:00
Ronald S. Bultje	b2045c44a9	vp9: split pre-load of 11585x2 out of 1d idct macro. This allows us to load it only once, instead of twice, in this function.	2013-12-07 12:39:36 -05:00
Ronald S. Bultje	f9a0d4c6e0	vp9: minor refactorings in idct ssse3 assembly. Make register usage in macros explicit; change mulsub_2w_4x to use 2 instead of 3 temp registers.	2013-12-07 12:39:35 -05:00
Ronald S. Bultje	8729964b99	vp9: split x86 assembly in two files. (And in future, loopfilter or intra pred could be put in their own respective files also.)	2013-12-07 12:39:35 -05:00
Michael Niedermayer	5b4d57455d	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: Initialize mmxext after amd3dnow optimizations Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-05 11:55:41 +01:00
Diego Biurrun	3d7c84747d	x86: Initialize mmxext after amd3dnow optimizations The mmxext optimizations should be at least equally fast if available and amd3dnow optimizations are being deprecated. Thus the former should override the latter, not the other way around.	2013-12-04 18:52:48 +01:00
Michael Niedermayer	be2312aa8f	Merge remote-tracking branch 'qatar/master' * qatar/master: dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo If someone optimizes dct_quantize for non x86 SIMD, then this probably needs to be reverted. Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-02 10:59:48 +01:00
Diego Biurrun	7ffaa19570	dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo The table is MMX-specific and used nowhere else.	2013-12-02 04:05:18 +01:00
Michael Niedermayer	3adb825650	Merge commit 'cf7860db608df7c76471d8b61f07abbd5aad8dd5' * commit 'cf7860db608df7c76471d8b61f07abbd5aad8dd5': x86: dsputil: Suppress deprecation warnings for XvMC bits Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-28 22:47:37 +01:00
Diego Biurrun	cf7860db60	x86: dsputil: Suppress deprecation warnings for XvMC bits These parts are scheduled for removal on the next version bump. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2013-11-28 16:04:30 +01:00
Clément Bœsch	616da59542	avcodec/x86/vp9dsp: merge a few SWAP together.	2013-11-21 23:06:21 +01:00
Clément Bœsch	e0434cfcfc	avcodec/x86: remove 3 sub in pred4x4_tm_vp8_8. before: 411 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388289 runs, 319 skips after: 389 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388308 runs, 300 skips Tested on i7 920.	2013-11-17 23:12:35 +01:00
Clément Bœsch	d28c79b003	avcodec/x86/vp9dsp: use EXTERNAL_* macros. Original fix by one of these developers: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> See `97962b2` / `72ca830` Personnal guess is Diego Biurrun.	2013-11-16 17:03:17 +01:00
Michael Niedermayer	91e00c4a78	Merge commit '458446acfa1441d283dacf9e6e545beb083b8bb0' * commit '458446acfa1441d283dacf9e6e545beb083b8bb0': lavc: Edge emulation with dst/src linesize Conflicts: libavcodec/cavs.c libavcodec/h264.c libavcodec/hevc.c libavcodec/mpegvideo_enc.c libavcodec/mpegvideo_motion.c libavcodec/rv34.c libavcodec/svq3.c libavcodec/vc1dec.c libavcodec/videodsp.h libavcodec/videodsp_template.c libavcodec/vp3.c libavcodec/vp8.c libavcodec/wmv2.c libavcodec/x86/videodsp.asm libavcodec/x86/videodsp_init.c Changes to the asm are not merged, they are left for volunteers or in their absence for later. The changes this merge introduces are reordering of the function arguments See: `face578d56` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-15 15:07:10 +01:00
Ronald S. Bultje	72ca830f51	lavc: VP9 decoder Originally written by Ronald S. Bultje <rsbultje@gmail.com> and Clément Bœsch <u@pkh.me> Further contributions by: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> Signed-off-by: Luca Barbato <lu_zero@gentoo.org> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2013-11-15 10:16:28 +01:00
Ronald S. Bultje	458446acfa	lavc: Edge emulation with dst/src linesize Allow supporting files for which the image stride is smaller than the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9 file or a 16x16 VP8 file with -fflags +emu_edge.	2013-11-15 10:16:27 +01:00
Michael Niedermayer	5231eecdaf	Merge remote-tracking branch 'qatar/master' * qatar/master: Deprecate obsolete XvMC hardware decoding support Conflicts: libavcodec/mpeg12.c libavcodec/mpeg12dec.c libavcodec/mpegvideo.c libavcodec/options_table.h libavutil/pixdesc.c libavutil/version.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-14 03:26:35 +01:00
Diego Biurrun	19e30a58fc	Deprecate obsolete XvMC hardware decoding support XvMC has long ago been superseded by newer acceleration APIs, such as VDPAU, and few downstreams still support it. Furthermore XvMC is not implemented within the hwaccel framework, but requires its own specific code in the MPEG-1/2 decoder, which is a maintenance burden.	2013-11-13 21:07:45 +01:00
Michael Niedermayer	a30f7918b5	Merge commit '0338c396987c82b41d322630ea9712fe5f9561d6' * commit '0338c396987c82b41d322630ea9712fe5f9561d6': dsputil: Split off H.263 bits into their own H263DSPContext Conflicts: configure libavcodec/mpegvideo.h libavcodec/mpegvideo_enc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-08 17:42:56 +01:00
Diego Biurrun	0338c39698	dsputil: Split off H.263 bits into their own H263DSPContext	2013-11-08 12:40:47 +01:00
Clément Bœsch	87434cf373	avcodec/vp9: add ff_vp9_idct_idct_{4x4,8x8}_ssse3(). 1789 decicycles in idct_idct_4x4_add_c, 262136 runs, 8 skips 1839 decicycles in idct_idct_4x4_add_c, 524270 runs, 18 skips 1864 decicycles in idct_idct_4x4_add_c, 1048548 runs, 28 skips 529 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 262138 runs, 6 skips 516 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 524282 runs, 6 skips 474 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 1048565 runs, 11 skips (~3.9x faster) 7726 decicycles in idct_idct_8x8_add_c, 1048433 runs, 143 skips 7732 decicycles in idct_idct_8x8_add_c, 2096882 runs, 270 skips 7731 decicycles in idct_idct_8x8_add_c, 4193772 runs, 532 skips 1145 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 1048549 runs, 27 skips 1137 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 2097097 runs, 55 skips 1086 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 4194188 runs, 116 skips (~7.1x faster) Overall decode time before commit: 16.48s user 0.03s system 99% cpu 16.526 total 16.54s user 0.01s system 99% cpu 16.566 total 16.46s user 0.03s system 99% cpu 16.511 total Overall decode time after commit: 16.34s user 0.02s system 99% cpu 16.378 total 16.28s user 0.02s system 99% cpu 16.315 total 16.32s user 0.03s system 99% cpu 16.366 total Tested on i7 920 with 40s 1080p footage.	2013-11-05 19:25:40 +01:00
Michael Niedermayer	934e489ee8	Merge commit 'e2b5b097898c9155f4bdff4d83cdc54d5eef6930' * commit 'e2b5b097898c9155f4bdff4d83cdc54d5eef6930': x86: rv40dsp: Use PAVGB instruction macro where appropriate Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-05 10:26:07 +01:00
Diego Biurrun	e2b5b09789	x86: rv40dsp: Use PAVGB instruction macro where appropriate	2013-11-04 21:14:39 +01:00
Mikulas Patocka	694d997afe	x86: hpeldsp: Use PAVGB instruction macro where necessary Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-11-04 01:29:23 +01:00
Mikulas Patocka	074155360d	avcodec/x86/hpeldsp: fix crash on AMD K6-3+ There are instructions pavgb and pavgusb. Both instructions do the same operation but they have different enconding. Pavgb exists in SSE (or MMXEXT) instruction set and pavgusb exists in 3D-NOW instruction set. livavcodec uses the macro PAVGB to select the proper instruction. However, the function avg_pixels8_xy2 doesn't use this macro, it uses pavgb directly. As a consequence, the function avg_pixels8_xy2 crashes on AMD K6-2 and K6-3 processors, because they have pavgusb, but not pavgb. This bug seems to be introduced by commit `71155d7b41`, "dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm" Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-03 19:49:11 +01:00
Michael Niedermayer	7146eacfc5	Merge commit '1700b4e678ed329611a16b20d11e64b7abda4839' * commit '1700b4e678ed329611a16b20d11e64b7abda4839': x86: vp8dsp: Split loopfilter code into a separate file Conflicts: libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-02 10:13:14 +01:00
Diego Biurrun	1700b4e678	x86: vp8dsp: Split loopfilter code into a separate file	2013-11-01 22:05:20 +01:00
Michael Niedermayer	fa6fa2162b	avcodec/cabac: support UNCHECKED_BITSTREAM_READER = 0 Fixes overreads in HEVC Fixes Ticket3070 Also fixed remaining issues from Ticket3075 and Ticket3076 Some lines of code taken from 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/x86/cabac.h and 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/cabac_functions.h Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-31 11:13:27 +01:00
Ronald S. Bultje	960490c0b2	avcodec/x86/videodsp: Small speedups in ff_emulated_edge_mc x86 SIMD. Don't use word-size multiplications if size == 2, and if we're using SIMD instructions (size >= 8), complete leftover 4byte sets using movd, not mov. Both of these changes lead to minor speedups. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-27 15:02:48 +01:00
Ronald S. Bultje	cd86eb265f	avcodec/x86/videodsp: fix a bug in a %if statement where we used '%%' instead of '&&'. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-27 15:02:48 +01:00
Michael Niedermayer	41efb8d9a7	avcodec/x86/cabac: include get_cabac_bypass_sign_x86() under #if !BROKEN_COMPILER this might fix Ticket2999 as well as some fate clients untested as the original patch submitter no longer has the environment to test this should be reverted if it does not fix the issues Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-26 15:06:55 +02:00
Ronald S. Bultje	1b3a7e1f42	avcodec/x86/videodsp: Properly mark sse2 instructions in emulated_edge_mc x86 simd as such. Should fix crashes or corrupt output on pre-SSE2 CPUs when they were using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in hfix or hvar single-edge (left/right) extension functions. Tested-by: Ingo Brückl <ib@wupperonline.de> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-24 13:36:55 +02:00
Michael Niedermayer	c35d29a9c8	avcodec/x86/dsputil_init: move ff_idct_xvid_mmxext init This decreases the diff to libav Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-15 02:06:12 +02:00
Michael Niedermayer	ab8cbfe0dd	avcodec/x86/dsputil_init: remove duplicated sse2 idct init Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-15 01:59:36 +02:00
Michael Niedermayer	1bf8fa75ee	avcodec/x86/dsputil_init: fix cpu flag checks Fixes linking failure with --disable-sse2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-15 01:46:21 +02:00
Ronald S. Bultje	20d78a8606	libavcodec/x86: Fix emulated_edge_mc SSE code to not contain SSE2 instructions on x86-32. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-10 13:36:06 +02:00
Ronald S. Bultje	ad75d2b590	x86: Fix compilation with nasm on PPC & OS/2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 12:36:19 +02:00
Michael Niedermayer	deb5addcff	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: h264_idct: Update comments to match 8/10-bit depth optimization split Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 12:10:02 +02:00
Michael Niedermayer	1f17619fe4	Merge commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450' * commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450': x86inc: Utilize the shadow space on 64-bit Windows Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 11:23:00 +02:00
Ronald S. Bultje	ba9c557b92	avcodec/x86/vp9dsp: Fix compilation with nasm. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 02:27:12 +02:00
Diego Biurrun	6405ca7d4a	x86: h264_idct: Update comments to match 8/10-bit depth optimization split	2013-10-07 21:46:46 +02:00
Henrik Gramner	bbe4a6db44	x86inc: Utilize the shadow space on 64-bit Windows Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:25:35 -04:00

1 2 3 4 5 ...

1374 Commits