FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Martin Storsjö	0c0b87f12d	aarch64: vp9itxfm: Fix incorrect vertical alignment Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:57:06 +02:00
Martin Storsjö	8476eb0d3a	aarch64: vp9itxfm: Update a comment to refer to a register with a different name Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:57:02 +02:00
Martin Storsjö	3dd7827258	aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:59 +02:00
Martin Storsjö	ed8d293306	aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible The ld1r is a leftover from the arm version, where this trick is beneficial on some cores. Use a single-lane load where we don't need the semantics of ld1r. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:54 +02:00
Martin Storsjö	4da4b2b87f	aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:50 +02:00
Martin Storsjö	3933b86bb9	arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:44 +02:00
Martin Storsjö	a63da4511d	aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:32:03 +02:00
Martin Storsjö	5eb5aec475	arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:32:00 +02:00
Martin Storsjö	79d332ebbd	aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:56 +02:00
Martin Storsjö	47b3c2c18d	arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:53 +02:00
Martin Storsjö	115476018d	aarch64: vp9itxfm: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from 19496 to 14740 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 8095.7 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 1390.1 vp9_inv_dct_dct_32x32_sub4_add_neon: 5199.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8125.8 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:45 +02:00
Martin Storsjö	0331c3f5e8	arm: vp9itxfm: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from 15324 to 12388 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to add more optimized versions of these transforms. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub4_add_neon: 2063.4 1516.0 1719.5 1245.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3279.3 2454.5 2525.2 1982.3 vp9_inv_dct_dct_32x32_sub4_add_neon: 10750.0 7955.4 8525.6 6754.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 18574.0 17108.4 14216.7 12010.2 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 2060.8 1608.5 1735.7 1262.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.2 2443.5 2546.1 1999.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10682.0 8043.8 8581.3 6810.1 vp9_inv_dct_dct_32x32_sub32_add_neon: 18522.4 17277.4 14286.7 12087.9 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:40 +02:00
Martin Storsjö	57ec83e424	omx: Use the EOS flag to handle flushing at the end This avoids having to count the number of frames sent to the codec and the number of output packets received; instead just wait until the encoder returns a buffer with the EOS flag set. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-08 11:50:57 +02:00
Diego Biurrun	a25dac976a	Use bitstream_init8() where appropriate	2017-02-07 18:27:21 +01:00
Alexandra Hájková	f7ec7f546f	wma: Convert to the new bitstream reader	2017-02-06 15:13:34 +01:00
Martin Storsjö	58d87e0f49	aarch64: vp9itxfm: Restructure the idct32 store macros This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. This is also arguably more readable. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-05 13:05:32 +02:00
Martin Storsjö	3bc5b28d5a	arm: vp9itxfm: Avoid .irp when it doesn't save any lines This makes it more readable. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-05 12:59:19 +02:00
Diego Biurrun	7abdd026df	asm: Consistently uppercase SECTION markers	2017-02-03 11:37:53 +01:00
Alexandra Hájková	c29da01ac9	svq3: Convert to the new bitstream reader	2017-02-02 17:06:17 +01:00
wm4	577326d430	lavc: deprecate refcounted_frames field No deprecation guards, because the old decode API (for which this field is needed) doesn't have any either. This field should be removed together with the old decode calls. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-02-01 10:47:46 +01:00
Anton Khirnov	fd9212f2ed	Mark some arrays that never change as const.	2017-02-01 10:42:59 +01:00
Alexandra Hájková	ab2539bd37	ffv1: Convert to the new bitstream reader	2017-01-31 17:54:11 +01:00
Alexandra Hájková	2d72219554	h261dec: Convert to the new bitstream reader	2017-01-31 17:54:11 +01:00
Alexandra Hájková	2b94ed12de	shorten: Convert to the new bitstream reader	2017-01-31 17:54:11 +01:00
Alexandra Hájková	5a6da49dd0	ralf: Convert to the new bitstream reader	2017-01-31 17:54:11 +01:00
Alexandra Hájková	d85b37a955	loco: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Alexandra Hájková	0f94de8a09	fic: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Alexandra Hájková	6b1f559f9a	dirac: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Alexandra Hájková	ffc00df0a6	cavs: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Alexandra Hájková	0c89ff82e9	aic: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Diego Biurrun	d4c2103bd3	golomb: Convert to the new bitstream reader	2017-01-31 17:46:19 +01:00
Andreas Cadhalpun	612cc07128	pgssubdec: reset rle_data_len/rle_remaining_len on allocation error The code relies on their validity and otherwise can try to access a NULL object->rle pointer, causing segmentation faults. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2017-01-31 09:35:54 +01:00
Mark Thompson	ca62236a89	vaapi_encode: Add VP8 support	2017-01-30 23:03:46 +00:00
Mark Thompson	ff35aa8ca4	vaapi_encode: Pass framerate parameters to driver Only do this when building for a recent VAAPI version - initial driver implementations were confused about the interpretation of the framerate field, but hopefully this will be consistent everywhere once 0.40.0 is released.	2017-01-30 22:52:54 +00:00
Mark Thompson	eddfb57210	vaapi_h264: Enable VBR mode Default to using VBR when a target bitrate is set, unless the max rate is also set and matches the target. Changes to the Intel driver mean that min_qp is also respected in this case, so set a codec default to unset the value rather than using the current default inherited from the MPEG-4 part 2 encoder.	2017-01-30 22:52:54 +00:00
Mark Thompson	f033ba470f	vaapi_encode: Support VBR mode This includes a backward-compatibility hack to choose CBR anyway on old drivers which have no CBR support, so that existing programs will continue to work their options now map to VBR.	2017-01-30 22:52:54 +00:00
Mark Thompson	ca6ae3b77a	vaapi_encode: Add MPEG-2 support	2017-01-29 13:28:31 +00:00
Alexandra Hájková	381a4e31a6	tak: Convert to the new bitstream reader	2017-01-25 11:06:58 +01:00
Diego Biurrun	2e0e150144	magicyuv: Convert to the new bitstream reader	2017-01-25 10:38:43 +01:00
Diego Biurrun	b061f298f7	truemotion2rt: Convert to the new bitstream reader	2017-01-25 09:55:36 +01:00
Alexandra Hájková	e7f24c9ffc	wavpack: Convert to the new bitstream reader	2017-01-25 09:55:35 +01:00
Alexandra Hájková	6668bc80b5	mpc: Convert to the new bitstream reader	2017-01-25 09:55:33 +01:00
Alexandra Hájková	fd8de7f2d8	dxtory: Convert to the new bitstream reader	2017-01-20 10:18:32 +01:00
Alexandra Hájková	4d49a4c550	apedec: Convert to the new bitstream reader	2017-01-20 10:18:32 +01:00
Anton Khirnov	b4a911c189	mpegvideoenc: make a table const	2017-01-19 09:52:21 +01:00
Anton Khirnov	296eff4d9d	zmbvenc: get rid of a global table	2017-01-19 09:52:10 +01:00
Derek Buitenhuis	00b775dda2	hevc: Mark as having threadsafe init Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-01-19 09:51:15 +01:00
Alexandra Hájková	54dcd22885	als: Convert to the new bitstream reader	2017-01-17 09:52:11 +01:00
Luca Barbato	fb59f87ce7	nvenc: Explicitly push the cuda context on encoding Make sure that NVENC does not misbehave if other cuda usages happen in the application.	2017-01-17 07:37:12 +01:00
Alexandra Hájková	4795e4f61f	alac: Convert to the new bitstream reader	2017-01-13 10:27:03 +01:00

1 2 3 4 5 ...

21412 Commits