FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-24 13:56:33 +02:00

Author	SHA1	Message	Date
Martin Storsjö	b280c6202b	arm: fft_vfp: Unify the behaviour in ff_fft_calc_vfp between arm/thumb Don't include the function pointer table in the code segment in arm mode. This shouldn't have any significant performance effect. It does end up as a few more instructions than before, for ARM, but only at the entry to this function, not within the fft functions themselves. Signed-off-by: Martin Storsjö <martin@martin.st>	2014-12-08 12:29:53 +02:00
Martin Storsjö	ae81576414	arm: fft_vfp: Add a missing "endconst" when building in thumb mode Signed-off-by: Martin Storsjö <martin@martin.st>	2014-12-08 12:29:49 +02:00
Vittorio Giovara	9c12c6ff95	motion_est: convert stride to ptrdiff_t CC: libav-stable@libav.org Bug-Id: CID 700556 / CID 700557 / CID 700558	2014-11-24 01:30:10 +00:00
Diego Biurrun	95c0cec03a	idctdsp: Add global function pointers for {add\|put}_pixels_clamped functions These function pointers already existed in the ARM code. Adding them globally allows calls to the function pointers to access arch-optimized versions of the functions transparently.	2014-09-02 14:41:13 -07:00
Diego Biurrun	efd26bedec	build: Add explanatory comments to (optimization) blocks in the Makefiles	2014-08-15 02:55:21 -07:00
Diego Biurrun	835f798c7d	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	2014-08-15 01:26:33 -07:00
Ben Avison	adf8227cf4	vc-1: Add platform-specific start code search routine to VC1DSPContext. Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2014-08-04 22:22:54 +02:00
Ben Avison	db7f1c7c5a	h264: Move start code search functions into separate source files. This permits re-use with parsers for codecs which use similar start codes. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2014-08-04 22:22:54 +02:00
Diego Biurrun	7fb993d338	qpeldsp: Mark source pointer in qpel_mc_func function pointer const	2014-07-25 02:52:54 -07:00
Ben Avison	6869612f5c	arm: Macroize the test for 'setend' CPU instruction support Signed-off-by: Diego Biurrun <diego@biurrun.de>	2014-07-21 15:08:01 -07:00
Diego Biurrun	81b9bf3192	dct-test: Move arch-specific bits into arch-specific subdirectories	2014-07-21 01:10:11 -07:00
Diego Biurrun	4de8b60684	idct: Move arm-specific declarations to a header in the arm directory	2014-07-20 13:02:17 -07:00
Diego Biurrun	8b0dd4942a	idctdsp: prettyprinting cosmetics	2014-07-18 07:51:03 -07:00
Diego Biurrun	b4987f7219	idct: Convert IDCT permutation #defines to an enum Also rename the enum values to be consistent with other DCT permutations.	2014-07-18 07:51:03 -07:00
Martin Storsjö	7e18a727d2	arm: cosmetics: Consistently use lowercase for shift operators Signed-off-by: Martin Storsjö <martin@martin.st>	2014-07-18 11:17:40 +03:00
Martin Storsjö	fe67f3fbb5	arm: cosmetics: Fix a misaligned asm operand Signed-off-by: Martin Storsjö <martin@martin.st>	2014-07-18 11:17:35 +03:00
Ben Avison	87552d54d3	armv6: Accelerate ff_fft_calc for general case (nbits != 4) The previous implementation targeted DTS Coherent Acoustics, which only requires nbits == 4 (fft16()). This case was (and still is) linked directly rather than being indirected through ff_fft_calc_vfp(), but now the full range from radix-4 up to radix-65536 is available. This benefits other codecs such as AAC and AC3. The implementaion is based upon the C version, with each routine larger than radix-16 calling a hierarchy of smaller FFT functions, then performing a post-processing pass. This pass benefits a lot from loop unrolling to counter the long pipelines in the VFP. A relaxed calling standard also reduces the overhead of the call hierarchy, and avoiding the excessive inlining performed by GCC probably helps with I-cache utilisation too. I benchmarked the result by measuring the number of gperftools samples that hit anywhere in the AAC decoder (starting from aac_decode_frame()) or specifically in the FFT routines (fft4() to fft512() and pass()) for the same sample AAC stream: Before After Mean StdDev Mean StdDev Confidence Change Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4% FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2% Signed-off-by: Martin Storsjö <martin@martin.st>	2014-07-18 01:34:23 +03:00
Ben Avison	5c22e8e4ad	armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) The previous implementation targeted DTS Coherent Acoustics, which only requires mdct_bits == 6. This relatively small size lent itself to unrolling the loops a small number of times, and encoding offsets calculated at assembly time within the load/store instructions of each iteration. In the more general case (codecs such as AAC and AC3) much larger arrays are used - mdct_bits == [8, 9, 11]. The old method does not scale for these cases, so more integer registers are used with non-unrolled versions of the loops (and with some stack spillage). The postrotation filter loop is still unrolled by a factor of 2 to permit the double-buffering of some VFP registers to facilitate overlap of neighbouring iterations. I benchmarked the result by measuring the number of gperftools samples that hit anywhere in the AAC decoder (starting from aac_decode_frame()) or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same example AAC stream: Before After Mean StdDev Mean StdDev Confidence Change aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8% ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1% Signed-off-by: Martin Storsjö <martin@martin.st>	2014-07-18 01:34:08 +03:00
Diego Biurrun	2d60444331	dsputil: Split motion estimation compare bits off into their own context	2014-07-17 09:07:10 -07:00
Diego Biurrun	adff0a8166	arm: dsputil: Coalesce all init files	2014-07-16 06:18:23 -07:00
Diego Biurrun	1173320249	dsputil: Drop unused bit_depth parameter from all init functions	2014-07-11 06:38:26 -07:00
Diego Biurrun	f46bb608d9	dsputil: Split off pixel block routines into their own context	2014-07-09 08:05:26 -07:00
Martin Storsjö	79fce1ec8a	arm: Avoid using the 'setend' instruction on ARMv7 and newer This instruction is deprecated on ARMv8, and it is serializing on some ARMv7 cores as well [1]. [1] http://article.gmane.org/gmane.linux.ports.arm.kernel/339293 CC: libav-stable@libav.org Signed-off-by: Martin Storsjö <martin@martin.st>	2014-07-08 12:09:09 +03:00
Diego Biurrun	c166148409	dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc	2014-07-06 14:26:53 -07:00
Diego Biurrun	e3fcb14347	dsputil: Split off IDCT bits into their own context	2014-06-30 07:58:46 -07:00
Janne Grunau	f23d26a686	h264: avoid using uninitialized memory in NEON chroma mc Adapt commit 982b596ea6640bfe218a31f6c3fc542d9fe61c31 for the arm and aarch64 NEON asm. 5-10% faster on Cortex-A9.	2014-06-23 16:32:15 +02:00
Diego Biurrun	9a9e2f1c8a	dsputil: Split audio operations off into a separate context	2014-06-22 06:20:15 -07:00
Diego Biurrun	e74433a8e6	dsputil: Split clear_block/fill_block off into a separate context	2014-06-18 14:07:23 -07:00
Janne Grunau	896a5bff64	arm: check if AS supports .dn Move the GNU as check before the arch specific asm checks since the .dn check requires gas compatible assembler. Disable the VC-1 motion compensation NEON asm which is the only part using that directive. The integrated assembler in the upcoming clang 3.5 does not support .dn/.qn without plans to change that. Too much effort to implement it while it is rarely used. http://llvm.org/bugs/show_bug.cgi?id=18199.	2014-06-03 14:23:03 +02:00
Diego Biurrun	054013a0fc	dsputil: Move APE-specific bits into apedsp	2014-05-29 06:41:15 -07:00
Anton Khirnov	6a13505c06	mpegvideo: move the MpegEncContext fields used from arm asm to the beginning This should reduce the frequency with which the offsets need to be updated.	2014-04-29 14:49:42 +02:00
Janne Grunau	a88e1d1c59	lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets	2014-04-24 18:28:26 +02:00
Diego Biurrun	3dc6272bed	Remove a number of unnecessary dsputil.h #includes	2014-04-04 19:08:05 +02:00
Janne Grunau	f37815b1d5	arm: asm decode_block_coeffs_internal is vp8 specific Unbreaks compilation on arm due to conflicting types for 'ff_decode_block_coeffs_armv6'.	2014-04-04 10:39:29 +02:00
Peter Ross	ac4b32df71	On2 VP7 decoder Further performance improvements and security fixes by Vittorio Giovara, Luca Barbato and Diego Biurrun. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2014-04-04 04:00:11 +02:00
Diego Biurrun	c3a0b3eb64	arm: build: Maintain decoder objects separate from infrastructure objects	2014-03-27 03:00:05 -07:00
Ben Avison	3b5946bcce	truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output. Profiling results for overall decode and the output_data function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 339.6 15.1 329.3 16.0 95.8% +3.1% (insignificant) 6:2 function 24.6 6.0 9.9 3.1 100.0% +148.5% 8:2 total 324.5 15.5 323.6 14.3 15.2% +0.3% (insignificant) 8:2 function 20.4 3.9 9.9 3.4 100.0% +104.7% 6:6 total 572.8 20.6 539.9 24.2 100.0% +6.1% 6:6 function 54.5 5.6 16.0 3.8 100.0% +240.9% 8:8 total 741.5 21.2 702.5 18.5 100.0% +5.6% 8:8 function 63.9 7.6 18.4 4.8 100.0% +247.3% The assembly version has also been tested with a fuzz tester to ensure that any combinations of inputs not exercised by my available test streams still generate mathematically identical results to the C version. Signed-off-by: Martin Storsjö <martin@martin.st>	2014-03-26 19:54:32 +02:00
Ben Avison	483321fe78	truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel. Profiling results for overall audio decode and the rematrix_channels function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 370.8 17.0 348.8 20.1 99.9% +6.3% 6:2 function 46.4 8.4 45.8 6.6 18.0% +1.2% (insignificant) 8:2 total 343.2 19.0 339.1 15.4 54.7% +1.2% (insignificant) 8:2 function 38.9 3.9 40.2 6.9 52.4% -3.2% (insignificant) 6:6 total 658.4 15.7 604.6 20.8 100.0% +8.9% 6:6 function 109.0 8.7 59.5 5.4 100.0% +83.3% 8:8 total 896.2 24.5 766.4 17.6 100.0% +16.9% 8:8 function 223.4 12.8 93.8 5.0 100.0% +138.3% The assembly version has also been tested with a fuzz tester to ensure that any combinations of inputs not exercised by my available test streams still generate mathematically identical results to the C version. Signed-off-by: Martin Storsjö <martin@martin.st>	2014-03-26 19:54:10 +02:00
Ben Avison	15a29c39d9	truehd: add hand-scheduled ARM asm version of mlp_filter_channel. Profiling results for overall audio decode and the mlp_filter_channel(_arm) function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 380.4 22.0 370.8 17.0 87.4% +2.6% (insignificant) 6:2 function 60.7 7.2 36.6 8.1 100.0% +65.8% 8:2 total 357.0 17.5 343.2 19.0 97.8% +4.0% (insignificant) 8:2 function 60.3 8.8 37.3 3.8 100.0% +61.8% 6:6 total 717.2 23.2 658.4 15.7 100.0% +8.9% 6:6 function 140.4 12.9 81.5 9.2 100.0% +72.4% 8:8 total 981.9 16.2 896.2 24.5 100.0% +9.6% 8:8 function 193.4 15.0 103.3 11.5 100.0% +87.2% Experiments with adding preload instructions to this function yielded no useful benefit, so these have not been included. The assembly version has also been tested with a fuzz tester to ensure that any combinations of inputs not exercised by my available test streams still generate mathematically identical results to the C version. Signed-off-by: Martin Storsjö <martin@martin.st>	2014-03-26 19:53:52 +02:00
Diego Biurrun	322a1dda97	dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros	2014-03-22 06:17:29 -07:00
Diego Biurrun	82bb304801	dsputil: Use correct type in me_cmp_func function pointer	2014-03-20 05:03:23 -07:00
Diego Biurrun	0e083d7e43	build: Group general components separate from de/encoders in arch Makefiles This is in line with how the top-level libavcodec Makefile is structured.	2014-03-20 05:03:23 -07:00
Diego Biurrun	5169e68895	dsputil: Propagate bit depth information to all (sub)init functions This avoids recalculating the value over and over again.	2014-03-20 05:03:23 -07:00
Diego Biurrun	cf7a216757	arm: dsputil: K&R formatting cosmetics	2014-03-20 05:03:23 -07:00
Diego Biurrun	36b822b8be	arm: dsputil: Drop restrict keyword from add_pixels_clamped_armv6 prototype The function is assigned to a function pointer that does not have the restrict keyword for that parameter. This fixes compilation for MSVC builds that don't recognize "restrict", broken since ed9625eb62.	2014-03-14 13:45:40 +01:00
Diego Biurrun	831a118078	Update dsputil- and SIMD-related comments to match reality more closely	2014-03-13 05:50:29 -07:00
Diego Biurrun	d1184b8110	arm: dsputil: Add a bunch of missing #includes	2014-03-13 05:50:28 -07:00
Diego Biurrun	49676eb730	dsputil: Remove prototypes for nonexisting optimization functions	2014-03-13 05:50:28 -07:00
Janne Grunau	5a7f382a5d	armv6: vp8: use explicit labels in motion compensation asm The integrated arm assembler in clang-503.0.38 (Xcode-5.1) fails to assemble a branch to 'label + offset' in thumb mode.	2014-03-12 15:06:05 +01:00
Janne Grunau	634d9d8b39	arm: get_cabac inline asm Based on the aarch64 asm. CPU cycle counts on cortex-a9 compared to gcc 4.8.2: before: 475 decicycles in get_cabac_noinline, 67106035 runs, 2829 skips after: 393 decicycles in get_cabac_noinline, 67106474 runs, 2390 skips Overall speedup is above 2%. Code generated by clang 3.4 is slower on the same hardware and the relative change is a little larger.	2014-03-09 00:45:34 +01:00

1 2 3 4 5 ...

428 Commits