FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-03-08 22:39:44 +02:00

Author	SHA1	Message	Date
James Almer	7323c896b2	checkasm: add an exrdsp test Signed-off-by: James Almer <jamrial@gmail.com>	2017-09-17 19:01:40 -03:00
Clément Bœsch	e0d56f097f	checkasm: use perf API on Linux ARM* On ARM platforms, accessing the PMU registers requires special user access permissions. Since there is no other way to get accurate timers, the current implementation of timers in FFmpeg rely on these registers. Unfortunately, enabling user access to these registers on Linux is not trivial, and generally involve compiling a random and unreliable github kernel module, or patching somehow your kernel. Such module is very unlikely to reach the upstream anytime soon. Quoting Robin Murphin from ARM: > Say you do give userspace direct access to the PMU; now run two or more > programs at once that believe they can use the counters for their own > "minimal-overhead" profiling. Have fun interpreting those results... > > And that's not even getting into the implications of scheduling across > different CPUs, CPUidle, etc. where the PMU state is completely beyond > userspace's control. In general, the plan to provide userspace with > something which might happen to just about work in a few corner cases, > but is meaningless, misleading or downright broken in all others, is to > never do so. As a result, the alternative is to use the Performance Monitoring Linux API which makes use of these registers internally (assuming the PMU of your ARM board is supported in the kernel, which is definitely not a given...). While the Linux API is obviously cross platform, it does have a significant overhead which needs to be taken into account. As a result, that mode is only weakly enabled on ARM platforms exclusively. Note on the non flexibility of the implementation: the timers (native FFmpeg vs Linux API) are selected at compilation time to prevent the need of function calls, which would result in a negative impact on the cycle counters.	2017-09-08 18:51:05 +02:00
James Almer	e51073fe00	checkasm/vf_blend: rename addition128 and difference128 to grainmerge and grainextract This was missing from f8d0689d3f. Fixes checkasm.	2017-08-24 23:39:09 -03:00
James Almer	6f205a42d7	checkasm: add hybrid_analysis_ileave and hybrid_synthesis_deint tests to aacpsdsp Signed-off-by: James Almer <jamrial@gmail.com>	2017-07-13 17:03:28 -03:00
James Almer	823cc7e25f	checkasm: add a g722dsp test Signed-off-by: James Almer <jamrial@gmail.com>	2017-07-13 17:00:19 -03:00
James Almer	3d3243577c	checkasm: use declare_func_float() in sbrdsp sum_square test The function returns a float. This fixes the test in x86_32 targets. Signed-off-by: James Almer <jamrial@gmail.com>	2017-07-04 23:02:57 -03:00
Matthieu Bouron	7864e07f4a	checkasm: add sbrdsp tests	2017-07-03 14:28:17 +02:00
James Almer	0eb783eb06	checkasm: randomize the full input buffer in test_hybrid_analysis Missed in the last commit.	2017-06-30 22:49:54 -03:00
James Almer	fb7b477a91	checkasm: fix size of input buffer in test_hybrid_analysis	2017-06-30 20:37:06 -03:00
Clément Bœsch	b12a36170b	lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis	2017-06-28 12:22:39 +02:00
Clément Bœsch	edd041e64c	checkasm: add AAC PS tests This includes various fixes and improvements from James Almer. Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-28 12:22:39 +02:00
James Almer	fa50d9360b	x86/vf_blend: add sse and ssse3 extremity functions Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-27 13:17:23 -03:00
James Almer	a579dbb4f7	checkasm: add missing checks to float_dsp's butterflies_float test	2017-06-23 23:38:07 -03:00
Matthieu Bouron	067e42b851	checkasm/aarch64: fix tests returning a float Avoids overriding the v0 register (which containins the result of the tested function) in checkasm_call_checked.	2017-06-22 09:18:10 +02:00
Diego Biurrun	fd502f4f5f	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler. (Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1) Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-21 17:00:29 -03:00
James Almer	5b10f484e2	checkasm: add float_dsp tests Ported from libavutil/tests/float_dsp.c Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-14 19:20:10 -03:00
James Almer	37388b119c	checkasm: add a checkasm_checked_call function that doesn't issue emms Meant for DSP functions returning a float or double, as they'd fail if emms is called after every run on x86_32. Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-14 19:18:56 -03:00
James Almer	93dc1c1221	checkasm: add _fixed suffix to fixed_dsp tests Should prevents future conflicts with the similarly named floatdsp tests	2017-06-01 13:12:20 -03:00
James Almer	7b3cb953f7	checkasm: add fixed_dsp tests Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2017-04-11 18:05:13 -03:00
Clément Bœsch	210678d3c5	Merge commit '3794062ab1a13442b06f6d76c54dce51ffa54697' * commit '3794062ab1a13442b06f6d76c54dce51ffa54697': Remove Plan 9 support Merged-by: Clément Bœsch <u@pkh.me>	2017-04-09 14:52:00 +02:00
James Almer	6747fc436e	Merge commit 'effc1430b2fe5997d9d55bf28dc507c27125eb27' * commit 'effc1430b2fe5997d9d55bf28dc507c27125eb27': Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately" Merged-by: James Almer <jamrial@gmail.com>	2017-04-04 15:26:18 -03:00
Clément Bœsch	edfa7ac8ec	Merge commit '81d7f0bbca837afda1f7e60d3ae52ab1360ab44b' * commit '81d7f0bbca837afda1f7e60d3ae52ab1360ab44b': checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately Merged-by: Clément Bœsch <u@pkh.me>	2017-04-01 11:54:29 +02:00
Clément Bœsch	b589e83f43	Merge commit '9498237049d15812cecb79df47b196c73013908b' * commit '9498237049d15812cecb79df47b196c73013908b': checkasm: Add --test parameter to check only specific components Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-03-31 10:06:13 +02:00
Clément Bœsch	1c9f4b5078	lavc/vp9: split into vp9{block,data,mvs} This is following Libav layout to ease merges.	2017-03-27 21:38:21 +02:00
James Almer	09ce5519f3	fate/checkasm: fix use of uninitialized memory on hevc_add_res tests	2017-03-24 22:11:34 -03:00
James Almer	36eae45510	fate/checkasm: use LOCAL_ALINGED_32 on hevc_add_res tests	2017-03-24 22:11:22 -03:00
Clément Bœsch	3d4039f964	Merge commit 'ed48a9d8143d2575a4458589cebde69ec326afd8' * commit 'ed48a9d8143d2575a4458589cebde69ec326afd8': checkasm: Add a test for HEVC add_residual Merged-by: Clément Bœsch <u@pkh.me>	2017-03-24 12:37:09 +01:00
James Almer	0d34473d8e	Merge commit 'dd5d4a0e1e3a30a254d1a57ecbdcedf230c6014b' * commit 'dd5d4a0e1e3a30a254d1a57ecbdcedf230c6014b': checkasm: aarch64: Don't clobber x29 in checkasm_stack_clobber Merged-by: James Almer <jamrial@gmail.com>	2017-03-23 18:31:36 -03:00
James Almer	f23078904f	Merge commit '2816f8a8bb33bd67fec5e94f5d357918caf4e055' * commit '2816f8a8bb33bd67fec5e94f5d357918caf4e055': build: Drop arch-specific checkasm Makefiles Merged-by: James Almer <jamrial@gmail.com>	2017-03-23 18:01:47 -03:00
James Almer	3ddae9eee9	Merge commit '93d5b022a9fd3a1a1f9c521a1eac7f0410e05b81' * commit '93d5b022a9fd3a1a1f9c521a1eac7f0410e05b81': build: Drop duplicate asm recipe Merged-by: James Almer <jamrial@gmail.com>	2017-03-23 17:57:35 -03:00
James Almer	67b639b496	Merge commit 'c91d6a33f872574c95c8784277cf60ffcf6bff4f' * commit 'c91d6a33f872574c95c8784277cf60ffcf6bff4f': checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack Merged-by: James Almer <jamrial@gmail.com>	2017-03-23 17:38:20 -03:00
James Almer	a2d34cc51b	Merge commit 'f1b3e131385176c3c9d9783b25047856a0dcebf6' * commit 'f1b3e131385176c3c9d9783b25047856a0dcebf6': checkasm: aarch64: Clobber the stack before calling functions Merged-by: James Almer <jamrial@gmail.com>	2017-03-23 17:36:53 -03:00
James Almer	cab4c7fa19	Merge commit 'a05cc56124b4f1237f6355784de821e3290ddb44' * commit 'a05cc56124b4f1237f6355784de821e3290ddb44': checkasm: arm/aarch64: Fix the amount of space reserved for stack parameters Merged-by: James Almer <jamrial@gmail.com>	2017-03-23 17:35:38 -03:00
Clément Bœsch	50bbb67472	Merge commit 'e3f941cb03b139b866a0ad6dc95fbe1b247d54af' * commit 'e3f941cb03b139b866a0ad6dc95fbe1b247d54af': checkasm: add a test for HEVC IDCT Merged-by: Clément Bœsch <u@pkh.me>	2017-03-23 12:17:39 +01:00
James Almer	30cadfe071	avcodec/lossless_videodsp: use ptrdiff_t for length parameters Signed-off-by: James Almer <jamrial@gmail.com>	2017-03-22 18:38:35 -03:00
Clément Bœsch	7c2a7f9c11	Merge commit '22c3ab18646924ce24dc6017a9e882ff69689e40' * commit '22c3ab18646924ce24dc6017a9e882ff69689e40': checkasm: Add test for huffyuvdsp add_bytes huffyuvdsp is renamed to llviddsp to be consistent with our codebase. Note: af607b7e07 wasn't actually required for this test since this commit is not actually testing huffyuvdsp. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-22 16:31:38 +01:00
Clément Bœsch	83cd80d10a	Merge commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5' * commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5': audiodsp/x86: yasmify vector_clipf_sse audiodsp: reorder arguments for vector_clipf Merged the version from Libav after a discussion with James Almer on IRC: 19:22 <ubitux> jamrial: opinion on 12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5? 19:23 <ubitux> it was apparently yasmified differently 19:23 <ubitux> (it depends on the previous commit arg shuffle) 19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw 19:24 <ubitux> it's a port from 1d36defe94c7d7ebf995d4dbb4f878d06272f9c6 19:25 <jamrial> seems better thanks to said arg shuffle 19:25 <jamrial> the loop is the same, but init is simpler 19:25 <jamrial> probably worth merging 19:25 <ubitux> OK 19:25 <ubitux> thanks 19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh 19:26 <ubitux> yeah indeed Both commits are merged at the same time to prevent a conflict with our existing yasmified ff_vector_clipf_sse. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 22:35:07 +01:00
Clément Bœsch	8414755486	Merge commit 'e9ef6171396dc4106526aaa86b620c61ca3d1017' * commit 'e9ef6171396dc4106526aaa86b620c61ca3d1017': checkasm: add tests for audiodsp Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 19:10:56 +01:00
Clément Bœsch	c50b2164a6	Merge commit '2eb97af66af90ca3978229da151f0b8b3a5d9370' * commit '2eb97af66af90ca3978229da151f0b8b3a5d9370': checkasm: add a test for blockdsp Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 19:05:05 +01:00
Clément Bœsch	e07fa3008b	Merge commit 'de452e503734ebb0fdbce86e9d16693b3530fad3' * commit 'de452e503734ebb0fdbce86e9d16693b3530fad3': pixblockdsp: Change type of stride parameters to ptrdiff_t Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 15:58:32 +01:00
Clément Bœsch	3c8f7a8f6b	Merge commit 'e89cef40506d990a982aefedfde7d3ca4f88c524' * commit 'e89cef40506d990a982aefedfde7d3ca4f88c524': checkasm: Read the unsigned value as it should Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 11:55:20 +01:00
James Almer	e5623aafd8	Merge commit '87c6c78604e4dd16f1f45862b27ca006da010527' * commit '87c6c78604e4dd16f1f45862b27ca006da010527': vp8: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>	2017-03-19 15:11:44 -03:00
Clément Bœsch	8b13492c9e	Merge commit '40ad05bab206c932a32171d45581080c914b06ec' * commit '40ad05bab206c932a32171d45581080c914b06ec': checkasm: Cast unsigned to signed Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-03-15 12:32:15 +01:00
Clément Bœsch	92cb9a3869	Merge commit '9064777dbb335ab4809ae09e3fdcc0245f925cdc' * commit '9064777dbb335ab4809ae09e3fdcc0245f925cdc': checkasm: add HEVC test for testing IDCT DC Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-02-02 11:40:58 +01:00
Clément Bœsch	a0860b0a38	Merge commit '6f9e34baea4f6f484392e4e67f606a0835d07b73' * commit '6f9e34baea4f6f484392e4e67f606a0835d07b73': arm: Check for support for the .fpu directive Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-02-02 11:22:04 +01:00
Clément Bœsch	9f1c81e5ec	Merge commit '71a0472114574993df7035f4de9aa007e03817b8' * commit '71a0472114574993df7035f4de9aa007e03817b8': checkasm: arm: report the first clobbered register in checkasm_checked_call Also includes 446353ea18, 59aeed93e4, and 37961044c6 to avoid breaking too much stuff. Merged-by: Clément Bœsch <u@pkh.me>	2017-01-24 19:21:29 +01:00
Martin Storsjö	388f6e6715	arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. This is cherrypicked from libav commit 9c8bc74c2b40537b0997f646c87c008042d788c2. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-14 21:13:30 +01:00
Ronald S. Bultje	1c8fbd7b90	checkasm/vp9: benchmark all sub-IDCTs (but not WHT or ADST).	2016-12-27 10:02:33 -05:00
Diego Biurrun	3794062ab1	Remove Plan 9 support Supporting the system was a nice joke for the 9 release, but it has run its course. Nowadays Plan 9 receives no testing and has no practical usefulness.	2016-12-03 09:15:01 +01:00
Martin Storsjö	9c8bc74c2b	arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-30 23:54:07 +02:00

1 2 3 4 5

213 Commits