FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Martin Storsjö	9c8bc74c2b	arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-30 23:54:07 +02:00
Ronald S. Bultje	06fec74cac	checkasm: vp9dsp: benchmark all sub-IDCTs (but not WHT or ADST). Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-23 23:55:38 +02:00
Martin Storsjö	effc1430b2	Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately" This reverts commit `81d7f0bbca`. Instead of just benchmarking dc separately, test all relevant subparts (in the next commit). Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-23 23:55:26 +02:00
Martin Storsjö	286ab878bd	fate.sh: Allow setting other make flags for running tests If makeopts_fate is set, these makeopts are used for running the tests instead of the normal makeopts. If it isn't set, the normal makeopts variable is used as before. This is useful if remote testing on a lesser machine where a large number of parallel jobs might be undesireable, while wanting to speed up the build with many parallel processes. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-23 15:05:51 +02:00
Vittorio Giovara	481ff3cf01	fate: Add h264 and hevc extradata reload tests Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-16 11:17:27 -05:00
Martin Storsjö	81d7f0bbca	checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately The dc-only mode is already checked to work correctly above, but this allows benchmarking this mode for performance tuning, and allows making sure that it actually is correctly hooked up. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-16 10:06:32 +02:00
Ronald S. Bultje	0b37cd09a6	checkasm: add vp9dsp.itxfm_add tests. This includes fixes by Henrik Gramner. The forward transforms are derived from the reference encoder. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 11:09:05 +02:00
Diego Biurrun	4537647c04	fate: checkasm: Split monolithic test into individual components	2016-11-08 17:32:25 +01:00
Diego Biurrun	9498237049	checkasm: Add --test parameter to check only specific components Inspired by a patch from Martin Storsjö <martin@martin.st>.	2016-11-08 17:32:25 +01:00
Luca Barbato	ab839054e6	swscale: Add GRAY12	2016-11-07 22:42:00 +01:00
Martin Storsjö	2e55e26b40	vp9: Flip the order of arguments in MC functions This makes it match the pattern already used for VP8 MC functions. This also makes the signature match ffmpeg's version of these functions, easing porting of code in both directions. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-03 09:12:02 +02:00
Vittorio Giovara	ecd2ec69ce	mov: Evaluate the movie display matrix This matrix needs to be applied after all others have (currently only display matrix from trak), but cannot be handled in movie box, since streams are not allocated yet. So store it in main context, and apply it when appropriate, that is after parsing the tkhd one. Fate tests are updated accordingly. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-01 12:19:00 -04:00
Vittorio Giovara	b90c8a3d08	fate: Add tests for mov display matrix Rotation, sample/display aspect ratio and pure matrix export. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-01 11:55:54 -04:00
Alexandra Hájková	ed48a9d814	checkasm: Add a test for HEVC add_residual	2016-10-22 17:33:35 +02:00
Diego Biurrun	043b0b9fb1	Replace leftover uses of -aframes\|-dframes\|-vframes with -frames:a\|d\|v	2016-10-22 16:50:41 +02:00
Luca Barbato	da4f8c8e35	fate: Update filter-pixfmts-scale gbrap12le hash missing from `be9dba5c8a` Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-10-18 20:34:55 +02:00
Martin Storsjö	dd5d4a0e1e	checkasm: aarch64: Don't clobber x29 in checkasm_stack_clobber x29 (FP) is a callee saved register and should be restored on return. Instead of backing up x29 and restoring it here, back up sp in a register that we are allowed to overwrite. This fixes crashes in checkasm on aarch64 since `f1b3e13138`. For some reason, gcc builds didn't crash, but clang builds do. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-10-18 16:17:12 +03:00
Michael Niedermayer	be9dba5c8a	swscale: Properly load alpha for planar rgb Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-10-18 12:58:14 +02:00
Diego Biurrun	2816f8a8bb	build: Drop arch-specific checkasm Makefiles They only contain one line and will never contain more.	2016-10-17 16:25:38 +02:00
Diego Biurrun	93d5b022a9	build: Drop duplicate asm recipe And move the asm recipe to the top-level Makefile next to the other local pattern rules for .o files.	2016-10-17 16:25:35 +02:00
Martin Storsjö	c91d6a33f8	checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack This, combined with clobbering the stack space prior to the call, increases the chances of finding cases where 32 bit parameters are erroneously treated as 64 bit. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-10-16 23:26:33 +03:00
Martin Storsjö	f1b3e13138	checkasm: aarch64: Clobber the stack before calling functions Signed-off-by: Martin Storsjö <martin@martin.st>	2016-10-16 23:26:22 +03:00
Martin Storsjö	a05cc56124	checkasm: arm/aarch64: Fix the amount of space reserved for stack parameters Even if MAX_ARGS - 2 (for arm) or MAX_ARGS - 7 (for aarch64) parameters are passed on the stack to checkasm_checked_call, we actually only need to store MAX_ARGS - 4 (for arm) or MAX_ARGS - 8 (for aarch64) parameters on the stack when calling the tested function. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-10-16 23:26:15 +03:00
Anton Khirnov	8e2ea69135	lavf: use the new bitstream filter for extracting extradata This also fixes a minor bug introduced in the codecpar conversion, where the termination condition for extracting the extradata does not match the actual extradata setting code. As a result, the packet durations made up by lavf go back to their values before the codecpar conversion. That is of little consequence since that code should eventually be dropped completely.	2016-10-16 20:27:30 +02:00
Luca Barbato	881477c77b	swscale: Add the GBRAP12 output	2016-10-12 21:33:34 +02:00
Luca Barbato	ef3740c3a0	swscale: Enable GBRP12 output	2016-10-12 18:00:24 +02:00
Vittorio Giovara	eb54210602	swscale: Add missing yuv444p12 swapping Missing from `9bd6ea5695`. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-10-12 10:42:38 +02:00
Alexandra Hájková	e3f941cb03	checkasm: add a test for HEVC IDCT Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-11 18:15:40 +02:00
Ronald S. Bultje	c935b54bd6	checkasm: add VP9 loopfilter tests. The randomize_buffer() implementation assures that "most of the time", we'll do a good mix of wide16/wide8/hev/regular/no filters for complete code coverage. However, this is not mathematically assured because that would make the code either much more complex, or much less random. Some fixes and improvements by Rodger Combs <rodger.combs@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:07 +02:00
Vittorio Giovara	dc3fe45fca	fate: Add test for rscc palette	2016-10-02 15:42:03 -04:00
Anton Khirnov	5cc0057f49	lavu: remove the custom atomic API It has been replaced by C11 stdatomic.h and is now unused.	2016-10-02 19:35:55 +02:00
Alexandra Hájková	22c3ab1864	checkasm: Add test for huffyuvdsp add_bytes Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-10-02 17:13:26 +02:00
Diego Biurrun	ba479f3daa	hevc: Change type of array stride parameters to ptrdiff_t ptrdiff_t is the correct type for array strides and similar.	2016-09-29 17:54:23 +02:00
Luca Barbato	9bd6ea5695	pixfmt: Add yuv444p12 pixel format	2016-09-27 18:48:30 +02:00
Luca Barbato	0aebbbd024	pixfmt: Add yuv422p12 pixel format	2016-09-27 18:48:30 +02:00
Luca Barbato	85406e7a8d	pixfmt: Add yuv420p12 pixel format	2016-09-27 18:48:30 +02:00
Anton Khirnov	683da86aab	audiodsp: reorder arguments for vector_clipf This will make the x86 asm simpler. ARM conversion by Martin Storsjö <martin@martin.st> and Janne Grunau <janne-libav@jannau.net>	2016-09-22 09:47:52 +02:00
Anton Khirnov	e9ef617139	checkasm: add tests for audiodsp	2016-09-22 09:47:52 +02:00
Anton Khirnov	2eb97af66a	checkasm: add a test for blockdsp	2016-09-22 09:47:52 +02:00
Luca Barbato	e89cef4050	checkasm: Read the unsigned value as it should Reading a value larger than int using atoi() may give the wrong result.	2016-09-11 14:12:18 +02:00
Diego Biurrun	3aa9d37d03	build: Fix directory dependencies of tests/pixfmts.mak target	2016-09-05 13:21:13 +02:00
Diego Biurrun	87c6c78604	vp8: Change type of stride parameters to ptrdiff_t ptrdiff_t is the correct type for array strides and similar.	2016-08-26 11:36:53 +02:00
Ronald S. Bultje	e99ecda550	checkasm: add vp9 MC tests. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-08-03 11:07:01 +02:00
Luca Barbato	40ad05bab2	checkasm: Cast unsigned to signed Avoid a warning for passing an unsigned value to abs(), some compilers might optimize away abs().	2016-07-23 08:27:32 +02:00
Alexandra Hájková	9064777dbb	checkasm: add HEVC test for testing IDCT DC Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-07-22 19:08:12 +02:00
Martin Storsjö	6f9e34baea	arm: Check for support for the .fpu directive When targeting COFF (windows), clang doesn't support this directive (while binutils supports it for all targets). Signed-off-by: Martin Storsjö <martin@martin.st>	2016-07-21 12:52:10 +03:00
Martin Storsjö	37961044c6	checkasm: arm: Ignore changes to bits 0-4 and 7 of FPSCR These bits are set by exceptions in NEON instructions. Also print the differing bits when FPSCR is clobbered, and use bic instead of lsl, for clearing the topmost bits. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-07-17 21:48:17 +03:00
Janne Grunau	59aeed93e4	cheackasm/arm: remove NEON instructions from checkasm_checked_call_vfp Fixes AS error on non NEON builds introduced in `71a0472114`. Also set the fpu directly to vfp in checkasm.S to cause build errors on NEON builds.	2016-07-17 11:28:21 +02:00
Martin Storsjö	446353ea18	checkasm: arm: Don't start new const blocks for each string Each const block needs to be terminated by one endconst invocation so either call endconst after each, or just declare plain labels to the later strings. This fixes errors such as this, on some binutils versions: checkasm.S:38: Error: Macro `endconst' was already defined Signed-off-by: Martin Storsjö <martin@martin.st>	2016-07-17 12:21:19 +03:00
Janne Grunau	71a0472114	checkasm: arm: report the first clobbered register in checkasm_checked_call	2016-07-16 12:57:18 +02:00

1 2 3 4 5 ...

1997 Commits