FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-12 19:18:44 +02:00

Author	SHA1	Message	Date
Michael Niedermayer	3c1378ce0a	Merge commit '2d91abade29e43bb45c881d45909b8ee77e904e2' * commit '2d91abade29e43bb45c881d45909b8ee77e904e2': x86: h264_intrapred: Don't treat 32-bit integers as 64-bit Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-10-08 11:48:58 +02:00
Henrik Gramner	2d91abade2	x86: h264_intrapred: Don't treat 32-bit integers as 64-bit The upper halves are not guaranteed to be zero in x86-64. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-10-08 08:15:52 +00:00
Mickaël Raulet	4ba6371a83	x86/hevc: get rid off packusdw for ssse3 compatibility cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2 Fixes out of array access Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-10-04 21:14:15 +02:00
James Almer	0de1d6287e	x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2} 2x to 2.5x faster than the C version. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-10-02 22:11:55 -03:00
James Almer	acebff8e5d	x86/mpegvideoencdsp: improve ff_pix_sum16_sse2 ~15% faster. Also add an mmxext version that takes advantage of the new code, and build it alongside with the mmx version only on x86_32. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-10-01 13:07:22 -03:00
Michael Niedermayer	d22e88d120	avcodec/x86/fmtconvert: Fix operand size in ff_int32_to_float_fmul_array8_sse* Fixes acodec-dca2 fate failure Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-28 19:04:06 +02:00
James Almer	26cd7b1e1a	x86/fmtconvert: add ff_int32_to_float_fmul_array8_{sse,sse2} About two times faster than the c wrapper. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-26 20:48:40 -03:00
Carl Eugen Hoyos	c0f9df30dd	lavc/x86/idctdsp.h: Fix make checkheaders.	2014-09-25 22:18:25 +02:00
James Almer	a829870b2f	avcodec/svq1enc: align buffer used by simd functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-25 16:00:20 -03:00
James Almer	4b892e469b	x86/cavsdsp: fix buffer alignment in cavs_idct8_add_mmx() It may be used by ff_add_pixels_clamped_sse2(). Should fix fate-cavs failures on some systems. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-25 16:00:16 -03:00
James Almer	4f4f08e6f0	x86/idctdsp: port {put,add}_pixels_clamped to yasm Also add sse2 versions for both. put_pixels_clamped port and sse2 version originally written by Timothy Gu. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 21:52:13 -03:00
James Almer	c99a882814	avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 21:43:19 -03:00
James Almer	ad26e83f9c	avcodec/x86: use function pointers for {put,add}_pixels_clamped Same behavior as in simple_idct. This way the best optimized versions available will be used instead. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 18:52:32 -03:00
James Almer	70277d1d23	x86/videodsp: add ff_emu_edge_{hfix,hvar}_avx2 ~15% faster than sse2. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 16:12:55 -03:00
James Almer	164d6c7f5b	x86/videodsp: fix warning about discarded 'const' qualifier Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-23 19:59:20 -03:00
James Almer	6b2caa321f	x86/vp9: add AVX and AVX2 MC Roughly 25% faster MC than ssse3 for blocksizes 32 and 64. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-22 22:35:03 -03:00
James Almer	33c752be51	x86/me_cmp: port mmxext vsad functions to yasm Also add mmxext versions of vsad8 and vsad_intra8, and sse2 versions of vsad16 and vsad_intra16. Since vsad8 and vsad16 are not bitexact, they are accordingly marked as approximate. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-19 20:50:20 -03:00
James Almer	77f9a81cca	x86/me_cmp: combine sad functions into a single macro No point in having the sad8 functions separate now that the loop is no longer unrolled. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-17 23:52:36 -03:00
Michael Niedermayer	41d82b85ab	avcodec/x86/vp9lpf: Always include x86util.asm Fixes executable stack Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-17 23:37:46 +02:00
Michael Niedermayer	85f2c0124d	avcodec/x86/me_cmp: fix sad8xh This adds back support for 8x4 and 8x16 it does not support 8x2, i think nothing uses that Found-by: ubitux Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-17 14:08:24 +02:00
James Almer	0456d169c4	x86/me_cmp: port mmxext and sse2 sad functions to yasm Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext). Since the _xy2 versions are not bitexact, they are accordingly marked as approximate. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-17 11:12:50 +02:00
James Almer	52ec81c67d	x86/hevc_res_add: add missing guards to hevc_transform_add32_8_avx2 Should fix compilation with old Yasm/Nasm versions. Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-04 23:34:01 -03:00
James Almer	c3d2426cca	x86/hevc_res_add: add ff_hevc_transform_add32_8_avx2 ~20% faster than AVX. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-04 20:21:29 -03:00
James Darnley	46ef45ab59	lavc/x86/v210: give cpuflag to INIT macro This lets the cglobal macro automatically append a suffix to the function name. This means that INIT_XMM avx must be used rather than INIT_AVX. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-05 00:35:07 +02:00
Michael Niedermayer	5b58d79a99	Merge commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453' * commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453': xvid: Add C IDCT Conflicts: libavcodec/dct-test.c libavcodec/xvididct.c See: `298b3b6c1f` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-03 04:09:38 +02:00
Michael Niedermayer	5db23c07a3	Merge commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b' * commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b': idctdsp: Add global function pointers for {add\|put}_pixels_clamped functions Conflicts: libavcodec/arm/idctdsp_init_arm.c libavcodec/dct.h libavcodec/idctdsp.c libavcodec/jrevdct.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-03 03:19:40 +02:00
Pascal Massimino	7a1d6ddd2c	xvid: Add C IDCT Thanks to Pascal Massimino and Michael Militzer for relicensing as LGPL. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2014-09-02 14:41:13 -07:00
Diego Biurrun	95c0cec03a	idctdsp: Add global function pointers for {add\|put}_pixels_clamped functions These function pointers already existed in the ARM code. Adding them globally allows calls to the function pointers to access arch-optimized versions of the functions transparently.	2014-09-02 14:41:13 -07:00
Reimar Döffinger	d9e2aceb7f	Add missing "const" all over the place. Only "./configure --enable-gpl" on x86 was tested. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2014-08-29 18:57:25 +02:00
Michael Niedermayer	5403a288a7	Merge commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc' * commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc': x86: xvid: K&R formatting cosmetics Conflicts: libavcodec/x86/xvididct_sse2.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-27 21:20:39 +02:00
Michael Niedermayer	b3b05a11d3	Merge commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5' * commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5': cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs Conflicts: libavcodec/mpeg4videodec.c libavcodec/x86/Makefile libavcodec/x86/dct-test.c libavcodec/x86/xvididct_sse2.c libavcodec/xvididct.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-27 21:09:30 +02:00
Michael Niedermayer	3ff5ca89fc	Merge commit '1f156af4274dc72d588620f6bedb4e9e66023c92' * commit '1f156af4274dc72d588620f6bedb4e9e66023c92': x86: xvid_idct: Drop unused definitions Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-27 21:01:54 +02:00
Diego Biurrun	8d27bf1cff	x86: xvid: K&R formatting cosmetics	2014-08-27 05:58:04 -07:00
Diego Biurrun	dcb7c868ec	cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs	2014-08-27 04:54:05 -07:00
Diego Biurrun	1f156af427	x86: xvid_idct: Drop unused definitions	2014-08-27 04:36:41 -07:00
Christophe Gisquet	3e892b2bcd	x86: hevc_mc: split differently calls In some cases, 2 or 3 calls are performed to functions for unusual widths. Instead, perform 2 calls for different widths to split the workload. The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't be processed that way without modifications: some calls use unaligned buffers, and having branches to handle this was resulting in no micro-benchmark benefit. For block_w == 12 (around 1% of the pixels of the sequence): Before: 12758 decicycles in epel_uni, 4093 runs, 3 skips 19389 decicycles in qpel_uni, 8187 runs, 5 skips 22699 decicycles in epel_bi, 32743 runs, 25 skips 34736 decicycles in qpel_bi, 32733 runs, 35 skips After: 11929 decicycles in epel_uni, 4096 runs, 0 skips 18131 decicycles in qpel_uni, 8184 runs, 8 skips 20065 decicycles in epel_bi, 32750 runs, 18 skips 31458 decicycles in qpel_bi, 32753 runs, 15 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-24 12:05:33 +02:00
Christophe Gisquet	38e2aa3759	x86: hevc_mc: correct unneeded use of SSE4 code Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-24 11:43:33 +02:00
Christophe Gisquet	2346f2b5db	x86: hevcdsp: use compilation-time-fixed constant The stride for some buffers is known. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 16:26:30 +02:00
Christophe Gisquet	dad7f15567	hevcdsp: remove more instances of compile-time-fixed parameters Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 15:22:42 +02:00
Christophe Gisquet	d4f44b66d3	hevcdsp: remove compilation-time-fixed parameter The dststride parameter is always MAX_PB_SIZE. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 14:57:37 +02:00
Christophe Gisquet	fb1a98ec5b	x86: hevc_mc: assume 2nd source stride is 64 Reviewed-by: Mickaël Raulet <mraulet@gmail.com Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 13:21:37 +02:00
James Almer	54ca4dd43b	x86/hevc_res_add: refactor ff_hevc_transform_add{16,32}_8 * Reduced xmm register count to 7 (As such they are now enabled for x86_32). * Removed four movdqa (affects the sse2 version only). * pxor is now used to clear m0 only once. ~5% faster. Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-08-21 15:01:33 -03:00
James Almer	76a99d467f	x86/hecv_res_add: add ff_hevc_transform_add{8,16,32}_8_avx ~15% faster than sse2 Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-08-20 16:54:52 -03:00
James Almer	9f498f4e6f	x86/hevc_res_add: fix register count in hevc_transform_add{16,32}_10_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2014-08-19 21:34:52 -03:00
Pierre Edouard Lepere	a6af4bf64d	x86: hevc: adding transform_add Reviewed-by: James Almer <jamrial@gmail.com> Approved-by: Ronald S. Bultje Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-20 01:28:56 +02:00
Michael Niedermayer	3bb2297351	Merge commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6' * commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6': build: Add explanatory comments to (optimization) blocks in the Makefiles Conflicts: libavcodec/ppc/Makefile libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-15 20:25:12 +02:00
Michael Niedermayer	c1df467d73	Merge commit '835f798c7d20bca89eb4f3593846251ad0d84e4b' * commit '835f798c7d20bca89eb4f3593846251ad0d84e4b': mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes Conflicts: libavcodec/h261dec.c libavcodec/intrax8.c libavcodec/mjpegenc.c libavcodec/mpeg12dec.c libavcodec/mpeg12enc.c libavcodec/mpeg4videoenc.c libavcodec/mpegvideo.c libavcodec/mpegvideo.h libavcodec/mpegvideo_enc.c libavcodec/rv10.c libavcodec/x86/mpegvideoenc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-15 20:11:56 +02:00
Diego Biurrun	efd26bedec	build: Add explanatory comments to (optimization) blocks in the Makefiles	2014-08-15 02:55:21 -07:00
Diego Biurrun	835f798c7d	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	2014-08-15 01:26:33 -07:00
James Darnley	54a51d3840	lavc/flacenc: partially unroll loop in flac_enc_lpc_16 It now does 12 samples per iteration, up from 4. From 1.8 to 3.2 times faster again. 3.6 to 5.7 times faster overall. Runtime is reduced by a further 2 to 18%. Overall runtime reduced by 4 to 50%. Same conditions as before apply. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-13 03:09:26 +02:00

1 2 3 4 5 ...

1846 Commits