FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-21 10:55:51 +02:00

Author	SHA1	Message	Date
Martin Storsjö	a76bf8cf12	arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling This work is sponsored by, and copyright, Google. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0 459.2 862.2 553.9 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 226.5 145.0 225.1 171.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 721.2 415.7 727.6 475.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:31:52 +02:00
Martin Storsjö	388e0d2515	aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter No measured speedup on a Cortex A53, but other cores might benefit. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:08:50 +02:00
Martin Storsjö	fea92a4b57	arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter Before: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5 After: vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:08:37 +02:00
Martin Storsjö	5e0c2158fb	aarch64: vp9mc: Simplify the extmla macro parameters Fold the field lengths into the macro. This makes the macro invocations much more readable, when the lines are shorter. This also makes it easier to use only half the registers within the macro. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:08:29 +02:00
Vittorio Giovara	53ea595eec	mov: Rework stsc index validation In order to avoid potential integer overflow change the comparison and make sure to use the same unsigned type for both elements.	2017-02-10 16:26:16 -05:00
Vittorio Giovara	ce6d72d107	imgutils: Document av_image_get_buffer_size()	2017-02-10 16:25:58 -05:00
Luca Barbato	b6093e8c72	hlsenc: Correctly write down all 16 bytes in hex Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-10 14:12:16 +01:00
Martin Storsjö	bc25897630	utvideodec: Add a missing include This was missing from `77c23704c7`, fixing building. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-10 09:31:49 +02:00
Timo Rothenpieler	a52976c0fe	nvenc: make gpu indices independent of supported capabilities Do not allocate a CUDA context for every available gpu. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-09 23:29:32 +01:00
Derek Buitenhuis	77c23704c7	avcodec: Mark some codecs with threadsafe init as such Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-09 23:28:18 +01:00
Martin Storsjö	0c0b87f12d	aarch64: vp9itxfm: Fix incorrect vertical alignment Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:57:06 +02:00
Martin Storsjö	8476eb0d3a	aarch64: vp9itxfm: Update a comment to refer to a register with a different name Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:57:02 +02:00
Martin Storsjö	3dd7827258	aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:59 +02:00
Martin Storsjö	ed8d293306	aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible The ld1r is a leftover from the arm version, where this trick is beneficial on some cores. Use a single-lane load where we don't need the semantics of ld1r. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:54 +02:00
Martin Storsjö	4da4b2b87f	aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:50 +02:00
Martin Storsjö	3933b86bb9	arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:44 +02:00
Martin Storsjö	a63da4511d	aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:32:03 +02:00
Martin Storsjö	5eb5aec475	arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:32:00 +02:00
Martin Storsjö	79d332ebbd	aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:56 +02:00
Martin Storsjö	47b3c2c18d	arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:53 +02:00
Martin Storsjö	115476018d	aarch64: vp9itxfm: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from 19496 to 14740 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 8095.7 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 1390.1 vp9_inv_dct_dct_32x32_sub4_add_neon: 5199.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8125.8 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:45 +02:00
Martin Storsjö	0331c3f5e8	arm: vp9itxfm: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from 15324 to 12388 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to add more optimized versions of these transforms. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub4_add_neon: 2063.4 1516.0 1719.5 1245.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3279.3 2454.5 2525.2 1982.3 vp9_inv_dct_dct_32x32_sub4_add_neon: 10750.0 7955.4 8525.6 6754.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 18574.0 17108.4 14216.7 12010.2 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 2060.8 1608.5 1735.7 1262.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.2 2443.5 2546.1 1999.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10682.0 8043.8 8581.3 6810.1 vp9_inv_dct_dct_32x32_sub32_add_neon: 18522.4 17277.4 14286.7 12087.9 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:40 +02:00
Diego Biurrun	c546147db0	configure: Correctly recurse in do_check_deps() Fixes all sorts of configuration problems introducec by `dad7a9c7c0` on non-Linux or non-vanilla configs. Also removes a line made redundant in that commit.	2017-02-08 21:23:41 +01:00
Martin Storsjö	57ec83e424	omx: Use the EOS flag to handle flushing at the end This avoids having to count the number of frames sent to the codec and the number of output packets received; instead just wait until the encoder returns a buffer with the EOS flag set. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-08 11:50:57 +02:00
Diego Biurrun	dad7a9c7c0	configure: Rework dependency handling for conflicting components This makes the feature more visible and obvious.	2017-02-07 19:06:04 +01:00
Diego Biurrun	9127ac5ebc	configure: Add name parameter to require_pkg_config() helper function This allows distinguishing between the internal variable name for external libraries and the pkg-config package name. Having both names available avoids special-casing outside the helper function when the two identifiers do not match.	2017-02-07 19:06:02 +01:00
Diego Biurrun	a25dac976a	Use bitstream_init8() where appropriate	2017-02-07 18:27:21 +01:00
Diego Biurrun	71a49fe25f	configure: Use cppflags check helper functions where appropriate	2017-02-06 15:43:56 +01:00
Diego Biurrun	0ce3761c78	configure: Add stdlib.h #include to CPPFLAGS check helper functions This ensures that added CPPFLAGS are validated against libc headers.	2017-02-06 15:43:56 +01:00
Alexandra Hájková	f7ec7f546f	wma: Convert to the new bitstream reader	2017-02-06 15:13:34 +01:00
Martin Storsjö	58d87e0f49	aarch64: vp9itxfm: Restructure the idct32 store macros This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. This is also arguably more readable. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-05 13:05:32 +02:00
Martin Storsjö	3bc5b28d5a	arm: vp9itxfm: Avoid .irp when it doesn't save any lines This makes it more readable. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-05 12:59:19 +02:00
John Stebbins	8e67039c63	asfdec: Use the ASF stream count when iterating The AVFormat stream count can be larger due external factors, such as an id3 tag appended. Avoid an out of bound read. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-04 15:21:12 +01:00
Diego Biurrun	7abdd026df	asm: Consistently uppercase SECTION markers	2017-02-03 11:37:53 +01:00
Diego Biurrun	740b0bf03b	build: Ignore generated .version files	2017-02-03 11:37:53 +01:00
Martin Storsjö	15a92e0c40	rtmp: Correctly handle the Window Acknowledgement Size packets This swaps which field is set when the Window Acknowledgement Size and Set Peer BW packets are received, renames the fields in order to clarify their role further and adds verbose comments explaining their respective roles and how well the code currently does what it is supposed to. The Set Peer BW packet tells the receiver of the packet (which can be either client or server) that it should not send more data if it already has sent more data than the specified number of bytes, without receiving acknowledgement for them. Actually checking this limit is currently not implemented. In order to be able to check that properly, one can send the Window Acknowledgement Size packet, which tells the receiver of the packet that it needs to send Acknowledgement packets (RTMP_PT_BYTES_READ) at least after receiving a given number of bytes since the last Acknowledgement. Therefore, when we receive a Window Acknowledgement Size packet, this sets the maximum number of bytes we can receive without sending an Acknowledgement; therefore when handling this packet we should set the receive_report_size field (previously client_report_size). Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-03 09:27:41 +02:00
Martin Storsjö	a1a143adb0	rtmp: Rename packet types to closer match the spec Also rename comments and log messages accordingly, and add clarifying comments for some hardcoded values. The previous names were taken from older, reverse engineered references. These names match the official public rtmp specification, and matches the names used by wirecast in annotating captured streams. These names also avoid hardcoding the roles of server and client, since the handling of them is irrelevant of whether we act as server or client. The RTMP_PT_PING type maps to RTMP_PT_USER_CONTROL. The SERVER_BW and CLIENT_BW types are a bit more intertwined; RTMP_PT_SERVER_BW maps to RTMP_PT_WINDOW_ACK_SIZE and RTMP_PT_CLIENT_BW maps to RTMP_PT_SET_PEER_BW. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-03 09:26:46 +02:00
Diego Biurrun	bcaedef118	configure: Add require_cpp_condition() convenience function Simplifies checking for conditions in external library headers and aborting if said conditions are not met.	2017-02-02 17:49:51 +01:00
Diego Biurrun	aba7fdcc8b	configure: Add require_header() convenience function Simplifies checking for external library headers and aborting if the external library support was requested, but is not available.	2017-02-02 17:49:51 +01:00
Diego Biurrun	a97563c889	configure: Simplify libxcb check	2017-02-02 17:38:50 +01:00
Alexandra Hájková	c29da01ac9	svq3: Convert to the new bitstream reader	2017-02-02 17:06:17 +01:00
Diego Biurrun	acfa7a2178	configure: Drop weak dependencies on external libraries for webm muxer Weak dependencies on external libraries do not obviate having to explicitly enable these libraries, so the weak dependency does not simplify the configure command line nor have any real effect.	2017-02-02 14:35:44 +01:00
Diego Biurrun	6698832079	configure: Add proper weak dependency of drawtext filter on libfontconfig	2017-02-02 14:35:44 +01:00
Diego Biurrun	24d5680bbc	configure: Simplify inline asm check with appropriate helper function	2017-02-02 14:34:05 +01:00
Diego Biurrun	b3825723dc	configure: Merge compiler/libc/os hacks sections	2017-02-02 14:34:05 +01:00
wm4	577326d430	lavc: deprecate refcounted_frames field No deprecation guards, because the old decode API (for which this field is needed) doesn't have any either. This field should be removed together with the old decode calls. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-02-01 10:47:46 +01:00
wm4	3ad825793a	hwcontext_cuda: implement frames_get_constraints Copied and modified from hwcontext_qsv.c. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-02-01 10:43:00 +01:00
Anton Khirnov	fd9212f2ed	Mark some arrays that never change as const.	2017-02-01 10:42:59 +01:00
Anton Khirnov	b420a27e74	avconv: allow -b to be used with streamcopy In this mode it tells the muxer about the bitrate of the input stream.	2017-02-01 10:42:59 +01:00
Alexandra Hájková	ab2539bd37	ffv1: Convert to the new bitstream reader	2017-01-31 17:54:11 +01:00

... 2 3 4 5 6 ...

44563 Commits