FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-08 13:22:53 +02:00

Author	SHA1	Message	Date
Alexandra Hájková	178b4ea5f9	xsubdec: Convert to the new bitstream reader	2016-11-24 11:22:13 +01:00
Alexandra Hájková	be35ef92a4	xan: Convert to the new bitstream reader	2016-11-24 11:22:13 +01:00
Alexandra Hájková	f9c59f26c8	wnv1: Convert to the new bitstream reader	2016-11-24 11:22:13 +01:00
Alexandra Hájková	0536e7d782	vima: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	e5bdfc6790	vble: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	104a4289f9	utvideodec: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	85f760fedd	twinvq: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	0bea79afa6	tscc2: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	8e4cadea5d	truespeech: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	0ac07d0b8d	tiertex: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	9ab1a3e283	truemotion2: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	9f78e3a46d	svq1dec: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	6efbc88a5c	smacker: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Alexandra Hájková	087bc8d704	sipr: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Alexandra Hájková	f26cbb555b	rtjpeg: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Alexandra Hájková	c60cda7cb4	ra288: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Alexandra Hájková	7d8075cf47	ra144: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Martin Storsjö	79566ec8c7	arm: vp9itxfm: Rename a macro parameter to fit better Since the same parameter is used for both input and output, the name inout is more fitting. This matches the naming used below in the dmbutterfly macro. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-23 23:56:56 +02:00
Martin Storsjö	721bc37522	arm/aarch64: vp9itxfm: Fix indentation of macro arguments Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-23 23:56:16 +02:00
James Almer	aa498c3183	avpacket: fix leak on realloc in av_packet_add_side_data() If realloc fails, the pointer is overwritten and the previously allocated buffer is leaked, which goes against the expected functionality of keeping the packet unchanged in case of error. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-23 13:17:52 +01:00
Andreas Cadhalpun	f92d7bdfdd	libopusdec: default to stereo for invalid number of channels This fixes an out-of-bounds read if avc->channels is 0. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-23 13:03:15 +01:00
Diego Biurrun	b34c6cd57a	dvbsub: cosmetics: Group all debug code together	2016-11-23 07:40:46 +01:00
Diego Biurrun	b8cd7a3c8d	dvbsub: Check for errors from system() libavcodec/dvbsubdec.c:145:5: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result] libavcodec/dvbsubdec.c:148:5: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result]	2016-11-23 07:36:32 +01:00
Diego Biurrun	6427379f23	als: Restructure DEBUG ifdefs to avoid unused function parameter warnings	2016-11-22 17:28:17 +01:00
Diego Biurrun	367f95af55	ac3enc: Restructure DEBUG ifdefs to avoid unused function parameter warnings	2016-11-22 17:28:17 +01:00
Diego Biurrun	81a3c42abe	Drop some bogus Doxygen documentation.	2016-11-21 14:29:11 +01:00
Diego Biurrun	a1d9de304f	Fix some mismatches between function parameter and doxygen parameter names.	2016-11-21 14:29:10 +01:00
Martin Storsjö	4d960a1185	aarch64: vp9itxfm: Use w3 instead of x3 for the int eob parameter The clobbering tests in checkasm are only invoked when testing correctness, so this bug didn't show up when benchmarking the dc-only version. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-18 23:17:33 +02:00
Janne Grunau	e5b0fc170f	arm: vp9itxfm: Simplify the stack alignment code This is one instruction less for thumb, and only have got 1/2 arm/thumb specific instructions. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-18 23:17:26 +02:00
Alexandra Hájková	0b5a26e8bc	qdm2: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:22 +01:00
Alexandra Hájková	0dabd329e8	qcelp: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:18 +01:00
Alexandra Hájková	770406d1e8	pcx: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:14 +01:00
Alexandra Hájková	b3441350fa	opus: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:11 +01:00
Alexandra Hájková	6f94a64bd6	nellymoser: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:08 +01:00
Alexandra Hájková	15d4dbfd4a	jvdec: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:04 +01:00
Alexandra Hájková	1df549bfa2	hqx: Convert to the new bitstream header Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:43 +01:00
Alexandra Hájková	c5e01d9170	hq_hqa: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:39 +01:00
Alexandra Hájková	b2c56301f9	gsm: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:36 +01:00
Alexandra Hájková	2188d53906	g72x: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:33 +01:00
Alexandra Hájková	799703c3ea	g2meet: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:26 +01:00
Alexandra Hájková	b37b681f77	fraps: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:14 +01:00
Alexandra Hájková	692ba4fe64	flashsv: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:10 +01:00
Alexandra Hájková	418ccdd703	faxcompr: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:07 +01:00
Alexandra Hájková	8df1ac6b78	exr: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:04 +01:00
Alexandra Hájková	2906d8dcb3	escape130: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:01 +01:00
Alexandra Hájková	c43eb73172	escape124: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:57 +01:00
Alexandra Hájková	d8618570be	dvdsubdec: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:53 +01:00
Alexandra Hájková	928f8c7ce3	dss_sp: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:38 +01:00
Alexandra Hájková	942e84d2a3	cook: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:32 +01:00
Alexandra Hájková	e561146611	cljrdec: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:29 +01:00
Alexandra Hájková	b4c0daa83c	cdxl: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:24 +01:00
Alexandra Hájková	0977a7c2f6	binkaudio: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:15 +01:00
Alexandra Hájková	9a23b59943	bink: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:10 +01:00
Alexandra Hájková	dae9b0b9c6	avs: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:04 +01:00
Alexandra Hájková	edd4c19a78	atrac3plus: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:33:59 +01:00
Alexandra Hájková	0272119202	atrac: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:33:50 +01:00
Alexandra Hájková	41679be1a2	asvdec: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:33:45 +01:00
Alexandra Hájková	012c451153	adpcm: Convert to the new bitstream header Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:33:01 +01:00
Alexandra Hájková	ed006ae4e2	4xm: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:32:57 +01:00
Alexandra Hájková	b25180801b	on2avc: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:32:54 +01:00
Alexandra Hájková	7d957b3f47	ea: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:32:45 +01:00
Alexandra Hájková	adb1ebb36c	eamad: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:32:40 +01:00
Alexandra Hájková	d182d8a6d3	cllc: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:31:59 +01:00
Alexandra Hájková	dd3d7ddf2a	lavc: add a new bitstream reader to replace get_bits The new bit reader features a simpler API and an implementation without stacks of nested macros. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:31:56 +01:00
Luca Barbato	adb0e941c3	avpacket: Mark src pointer as constant	2016-11-17 19:41:12 +01:00
Diego Biurrun	76167140a9	qsvdec: Drop stray extra braces around initializer libavcodec/qsvdec.c:93:5: warning: braces around scalar initializer	2016-11-17 16:53:48 +01:00
Diego Biurrun	715b824346	qsv: Drop some unused variables	2016-11-17 16:53:48 +01:00
Janne Grunau	e7ae8f7a71	aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne}; The latter is 1 cycle faster on a cortex-53 and since the operands are bytewise (or larger) bitmask (impossible to overflow to zero) both are equivalent.	2016-11-16 09:05:18 +01:00
Janne Grunau	d7595de0b2	aarch64: vp9: use alternative returns in the core loop filter function Since aarch64 has enough free general purpose registers use them to branch to the appropiate storage code. 1-2 cycles faster for the functions using loop_filter 8/16, ... on a cortex-a53. Mixed results (up to 2 cycles faster/slower) on a cortex-a57.	2016-11-16 09:05:18 +01:00
Gianluigi Tiesi	e17567a831	libilbc: support for latest git of libilbc In the latest git commits of libilbc developers removed WebRtc_xxx typedefs. This commit uses int types instead. It's safe to apply also for previous versions since WebRtc_Word16 was always a typedef of int16_t and WebRtc_UWord16 a typedef of uint16_t. Reviewed-by: Timothy Gu <timothygu99@gmail.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-11-16 08:21:05 +01:00
Diego Biurrun	f7407f56cb	golomb: Replace __PRETTY_FUNCTION__ with __func__ for tracing The former is a GNU extension while the latter is C99.	2016-11-15 09:41:08 +01:00
Mark Thompson	e0b164576f	qsv: Add VP8 decoder	2016-11-14 19:38:20 +00:00
Mark Thompson	182cf170a5	vp8: Return stream format information from parser	2016-11-14 19:38:19 +00:00
Mark Thompson	b6582b2927	qsv: Add VC-1 decoder It uses the same code as the MPEG-2 decoder, so the file is renamed to contain all "other" (that is, not H.26[45]) codecs.	2016-11-14 19:38:19 +00:00
Mark Thompson	fea4dc05b4	vc1: Return stream format information from parser	2016-11-14 19:38:19 +00:00
Mark Thompson	0940b748bd	qsvdec: Only warn about unconsumed data if it happens more than once	2016-11-14 19:38:19 +00:00
Mark Thompson	030d84fa2e	qsvdec: Pass field order information to libmfx The VC-1 decoder fails to initialise if this is not set.	2016-11-14 19:38:19 +00:00
Mark Thompson	cd1047f391	qsvdec: Pass the correct profile to libmfx This was correct for H.26[45], because libmfx uses the same values derived from profile_idc and the constraint_set flags, but it is wrong for other codecs. Also avoid passing FF_LEVEL_UNKNOWN (-99) as the level, as this is certainly invalid.	2016-11-14 19:38:19 +00:00
Mark Thompson	3297577f3e	mpegvideo: Return correct coded frame sizes from parser	2016-11-14 19:38:19 +00:00
Janne Grunau	31756abe29	aarch64: vp9: loop_filter: fix typo in skip flatout8 check The 16_16 loop filter functions could miss an early exit before flatout8. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 08:51:58 +02:00
Martin Storsjö	9d2afd1eb8	aarch64: vp9: Implement NEON loop filters This work is sponsored by, and copyright, Google. These are ported from the ARM version; thanks to the larger amount of registers available, we can do the loop filters with 16 pixels at a time. The implementation is fully templated, with a single macro which can generate versions for both 8 and 16 pixels wide, for both 4, 8 and 16 pixels loop filters (and the 4/8 mixed versions as well). For the 8 pixel wide versions, it is pretty close in speed (the v_4_8 and v_8_8 filters are the best examples of this; the h_4_8 and h_8_8 filters seem to get some gain in the load/transpose/store part). For the 16 pixels wide ones, we get a speedup of around 1.2-1.4x compared to the 32 bit version. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_neon: 144.0 127.2 vp9_loop_filter_h_8_8_neon: 207.0 182.5 vp9_loop_filter_h_16_8_neon: 415.0 328.7 vp9_loop_filter_h_16_16_neon: 672.0 558.6 vp9_loop_filter_mix2_h_44_16_neon: 302.0 203.5 vp9_loop_filter_mix2_h_48_16_neon: 365.0 305.2 vp9_loop_filter_mix2_h_84_16_neon: 365.0 305.2 vp9_loop_filter_mix2_h_88_16_neon: 376.0 305.2 vp9_loop_filter_mix2_v_44_16_neon: 193.2 128.2 vp9_loop_filter_mix2_v_48_16_neon: 246.7 218.4 vp9_loop_filter_mix2_v_84_16_neon: 248.0 218.5 vp9_loop_filter_mix2_v_88_16_neon: 302.0 218.2 vp9_loop_filter_v_4_8_neon: 89.0 88.7 vp9_loop_filter_v_8_8_neon: 141.0 137.7 vp9_loop_filter_v_16_8_neon: 295.0 272.7 vp9_loop_filter_v_16_16_neon: 546.0 453.7 The speedup vs C code in checkasm tests is around 2-7x, which is pretty much the same as for the 32 bit version. Even if these functions are faster than their 32 bit equivalent, the C version that we compare to also became around 1.3-1.7x faster than the C version in 32 bit. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 4-5x. Examples of runtimes vs C on a Cortex A57 (for a slightly older version of the patch): A57 gcc-5.3 neon loop_filter_h_4_8_neon: 256.6 93.4 loop_filter_h_8_8_neon: 307.3 139.1 loop_filter_h_16_8_neon: 340.1 254.1 loop_filter_h_16_16_neon: 827.0 407.9 loop_filter_mix2_h_44_16_neon: 524.5 155.4 loop_filter_mix2_h_48_16_neon: 644.5 173.3 loop_filter_mix2_h_84_16_neon: 630.5 222.0 loop_filter_mix2_h_88_16_neon: 697.3 222.0 loop_filter_mix2_v_44_16_neon: 598.5 100.6 loop_filter_mix2_v_48_16_neon: 651.5 127.0 loop_filter_mix2_v_84_16_neon: 591.5 167.1 loop_filter_mix2_v_88_16_neon: 855.1 166.7 loop_filter_v_4_8_neon: 271.7 65.3 loop_filter_v_8_8_neon: 312.5 106.9 loop_filter_v_16_8_neon: 473.3 206.5 loop_filter_v_16_16_neon: 976.1 327.8 The speed-up compared to the C functions is 2.5 to 6 and the cortex-a57 is again 30-50% faster than the cortex-a53. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 00:10:13 +02:00
Martin Storsjö	52d196fb30	arm: vp9itxfm: Simplify txfm string comparisons Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 00:10:13 +02:00
Martin Storsjö	3c9546dfaf	aarch64: vp9: Add NEON itxfm routines This work is sponsored by, and copyright, Google. These are ported from the ARM version; thanks to the larger amount of registers available, we can do the 16x16 and 32x32 transforms in slices 8 pixels wide instead of 4. This gives a speedup of around 1.4x compared to the 32 bit version. The fact that aarch64 doesn't have the same d/q register aliasing makes some of the macros quite a bit simpler as well. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_inv_adst_adst_4x4_add_neon: 90.0 87.7 vp9_inv_adst_adst_8x8_add_neon: 400.0 354.7 vp9_inv_adst_adst_16x16_add_neon: 2526.5 1827.2 vp9_inv_dct_dct_4x4_add_neon: 74.0 72.7 vp9_inv_dct_dct_8x8_add_neon: 271.0 256.7 vp9_inv_dct_dct_16x16_add_neon: 1960.7 1372.7 vp9_inv_dct_dct_32x32_add_neon: 11988.9 8088.3 vp9_inv_wht_wht_4x4_add_neon: 63.0 57.7 The speedup vs C code (2-4x) is smaller than in the 32 bit case, mostly because the C code ends up significantly faster (around 1.6x faster, with GCC 5.4) when built for aarch64. Examples of runtimes vs C on a Cortex A57 (for a slightly older version of the patch): A57 gcc-5.3 neon vp9_inv_adst_adst_4x4_add_neon: 152.2 60.0 vp9_inv_adst_adst_8x8_add_neon: 948.2 288.0 vp9_inv_adst_adst_16x16_add_neon: 4830.4 1380.5 vp9_inv_dct_dct_4x4_add_neon: 153.0 58.6 vp9_inv_dct_dct_8x8_add_neon: 789.2 180.2 vp9_inv_dct_dct_16x16_add_neon: 3639.6 917.1 vp9_inv_dct_dct_32x32_add_neon: 20462.1 4985.0 vp9_inv_wht_wht_4x4_add_neon: 91.0 49.8 The asm is around factor 3-4 faster than C on the cortex-a57 and the asm is around 30-50% faster on the a57 compared to the a53. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 00:10:13 +02:00
Diego Biurrun	800d91d348	Drop pointless void* casts	2016-11-13 18:44:01 +01:00
Diego Biurrun	d316f9cefc	aac: Drop pointless cast	2016-11-13 18:44:00 +01:00
Diego Biurrun	3b50dbc51f	ratecontrol: Use correct function pointer casts instead of void* libavcodec/ratecontrol.c:120:9: warning: ISO C forbids initialization between function pointer and ‘void ’ [-Wpedantic] libavcodec/ratecontrol.c:121:9: warning: ISO C forbids initialization between function pointer and ‘void ’ [-Wpedantic]	2016-11-12 16:47:06 +01:00
Martin Storsjö	dd299a2d6d	arm: vp9: Add NEON loop filters This work is sponsored by, and copyright, Google. The implementation tries to have smart handling of cases where no pixels need the full filtering for the 8/16 width filters, skipping both calculation and writeback of the unmodified pixels in those cases. The actual effect of this is hard to test with checkasm though, since it tests the full filtering, and the benefit depends on how many filtered blocks use the shortcut. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_loop_filter_h_4_8_neon: 2.72 2.68 1.78 3.15 vp9_loop_filter_h_8_8_neon: 2.36 2.38 1.70 2.91 vp9_loop_filter_h_16_8_neon: 1.80 1.89 1.45 2.01 vp9_loop_filter_h_16_16_neon: 2.81 2.78 2.18 3.16 vp9_loop_filter_mix2_h_44_16_neon: 2.65 2.67 1.93 3.05 vp9_loop_filter_mix2_h_48_16_neon: 2.46 2.38 1.81 2.85 vp9_loop_filter_mix2_h_84_16_neon: 2.50 2.41 1.73 2.85 vp9_loop_filter_mix2_h_88_16_neon: 2.77 2.66 1.96 3.23 vp9_loop_filter_mix2_v_44_16_neon: 4.28 4.46 3.22 5.70 vp9_loop_filter_mix2_v_48_16_neon: 3.92 4.00 3.03 5.19 vp9_loop_filter_mix2_v_84_16_neon: 3.97 4.31 2.98 5.33 vp9_loop_filter_mix2_v_88_16_neon: 3.91 4.19 3.06 5.18 vp9_loop_filter_v_4_8_neon: 4.53 4.47 3.31 6.05 vp9_loop_filter_v_8_8_neon: 3.58 3.99 2.92 5.17 vp9_loop_filter_v_16_8_neon: 3.40 3.50 2.81 4.68 vp9_loop_filter_v_16_16_neon: 4.66 4.41 3.74 6.02 The speedup vs C code is around 2-6x. The numbers are quite inconclusive though, since the checkasm test runs multiple filterings on top of each other, so later rounds might end up with different codepaths (different decisions on which filter to apply, based on input pixel differences). Disabling the early-exit in the asm doesn't give a fair comparison either though, since the C code only does the necessary calcuations for each row. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 4-9x. This is pretty similar in runtime to the corresponding routines in libvpx. (This is comparing vpx_lpf_vertical_16_neon, vpx_lpf_horizontal_edge_8_neon and vpx_lpf_horizontal_edge_16_neon to vp9_loop_filter_h_16_8_neon, vp9_loop_filter_v_16_8_neon and vp9_loop_filter_v_16_16_neon - note that the naming of horizonal and vertical is flipped between the libraries.) In order to have stable, comparable numbers, the early exits in both asm versions were disabled, forcing the full filtering codepath. Cortex A7 A8 A9 A53 vp9_loop_filter_h_16_8_neon: 597.2 472.0 482.4 415.0 libvpx vpx_lpf_vertical_16_neon: 626.0 464.5 470.7 445.0 vp9_loop_filter_v_16_8_neon: 500.2 422.5 429.7 295.0 libvpx vpx_lpf_horizontal_edge_8_neon: 586.5 414.5 415.6 383.2 vp9_loop_filter_v_16_16_neon: 905.0 784.7 791.5 546.0 libvpx vpx_lpf_horizontal_edge_16_neon: 1060.2 751.7 743.5 685.2 Our version is consistently faster on on A7 and A53, marginally slower on A8, and sometimes faster, sometimes slower on A9 (marginally slower in all three tests in this particular test run). Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 14:16:42 +02:00
Diego Biurrun	f7d183f084	libxvid: Check return value of write() call libavcodec/libxvid_rc.c:106:9: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]	2016-11-11 10:17:07 +01:00
Diego Biurrun	e5e8a26dcf	libxvid: Use proper context in av_log() calls	2016-11-11 10:17:07 +01:00
Diego Biurrun	12db2832e4	libxvid: Require availability of mkstemp() The replacement code uses tempnam(), which is dangerous. Such a fringe feature is not worth the trouble.	2016-11-11 10:17:07 +01:00
Martin Storsjö	a67ae67083	arm: vp9: Add NEON itxfm routines This work is sponsored by, and copyright, Google. For the transforms up to 8x8, we can fit all the data (including temporaries) in registers and just do a straightforward transform of all the data. For 16x16, we do a transform of 4x16 pixels in 4 slices, using a temporary buffer. For 32x32, we transform 4x32 pixels at a time, in two steps of 4x16 pixels each. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_inv_adst_adst_4x4_add_neon: 3.39 5.83 4.17 4.01 vp9_inv_adst_adst_8x8_add_neon: 3.79 4.86 4.23 3.98 vp9_inv_adst_adst_16x16_add_neon: 3.33 4.36 4.11 4.16 vp9_inv_dct_dct_4x4_add_neon: 4.06 6.16 4.59 4.46 vp9_inv_dct_dct_8x8_add_neon: 4.61 6.01 4.98 4.86 vp9_inv_dct_dct_16x16_add_neon: 3.35 3.44 3.36 3.79 vp9_inv_dct_dct_32x32_add_neon: 3.89 3.50 3.79 4.42 vp9_inv_wht_wht_4x4_add_neon: 3.22 5.13 3.53 3.77 Thus, the speedup vs C code is around 3-6x. This is mostly marginally faster than the corresponding routines in libvpx on most cores, tested with their 32x32 idct (compared to vpx_idct32x32_1024_add_neon). These numbers are slightly in libvpx's favour since their version doesn't clear the input buffer like ours do (although the effect of that on the total runtime probably is negligible.) Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_add_neon: 18436.8 16874.1 14235.1 11988.9 libvpx vpx_idct32x32_1024_add_neon 20789.0 13344.3 15049.9 13030.5 Only on the Cortex A8, the libvpx function is faster. On the other cores, ours is slightly faster even though ours has got source block clearing integrated. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 11:09:05 +02:00
Mark Thompson	fd0fae6037	pthread_frame: Unreference hw_frames_ctx on per-thread codec contexts When decoding with threads enabled, the get_format callback will be called with one of the per-thread codec contexts rather than with the outer context. If a hwaccel is in use too, this will add a reference to the hardware frames context on that codec context, which will then propagate to all of the other per-thread contexts for decoding. Once the decoder finishes, however, the per-thread contexts are not freed normally, so these references leak.	2016-11-10 20:36:11 +00:00
Martin Storsjö	11623217e3	arm: vp9mc: Use a different helper register for PIC loads This fixes crashes since `557c1675cf` in linux PIC builds. Previously, movrelx silently used r12 as helper register, which doesn't work when r12 is the destination register. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 14:01:04 +02:00
Martin Storsjö	6a62795d40	aarch64: h264idct: Use the offset parameter to movrel Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:18:22 +02:00
Martin Storsjö	557c1675cf	arm: vp9mc: Minor adjustments from review of the aarch64 version This work is sponsored by, and copyright, Google. The speedup for the large horizontal filters is surprisingly big on A7 and A53, while there's a minor slowdown (almost within measurement noise) on A8 and A9. Cortex A7 A8 A9 A53 orig: vp9_put_8tap_smooth_64h_neon: 20270.0 14447.3 19723.9 10910.9 new: vp9_put_8tap_smooth_64h_neon: 20165.8 14466.5 19730.2 10668.8 Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:18:22 +02:00
Martin Storsjö	383d96aa22	aarch64: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:15:56 +02:00
Martin Storsjö	a4cfcddcb0	vp9: Make the subpel filters non-static Make them aligned, to allow efficient access to them from simd. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:05:57 +02:00
Anton Khirnov	84f225684c	pthread_frame: properly propagate the hw frame context across frame threads	2016-11-10 09:00:11 +01:00
Diego Biurrun	72a19f4013	mpegaudiodsp: aarch64: Adjust function prototype after `2caa93b813`	2016-11-10 00:13:48 +01:00
Diego Biurrun	67deba8a41	Use avpriv_report_missing_feature() where appropriate	2016-11-08 17:54:34 +01:00
Vittorio Giovara	47a795727f	hevc: Support extradata changes from multiple stsd Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Vittorio Giovara	2fe30b4743	hevc: Allow parsing external extradata buffers	2016-11-08 11:22:29 -05:00
Vittorio Giovara	5be2153111	hevc: Move hevc_decode_extradata before frame decoding Avoids a forward-declaration in the following commit. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Vittorio Giovara	bed2c4b265	lavc: Add hevc main10 profile to avconv cli	2016-11-08 11:22:29 -05:00
Vittorio Giovara	17dac56b8f	lavu: Rename ycgco color space appropriately Planes are ordered as the name suggests now. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Diego Biurrun	0361e4dcb4	h264_qpel: x86: Move function with only one instance out of template macro libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]	2016-11-08 17:21:02 +01:00
Andreas Cadhalpun	43de8b328b	lzf: update pointer p after realloc This fixes heap-use-after-free detected by AddressSanitizer. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-11-07 22:42:00 +01:00
Anton Khirnov	4ab61cd983	qsv{enc,dec}: extend the internal frame allocator Handle the internal frame requests, which is required by the HEVC encoding plugin. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:48:00 +01:00
Anton Khirnov	00aeedd841	qsv{dec,enc}: use a struct as a memory id with internal memory allocator This will allow implementing the allocator more fully, which is needed by the HEVC encoder plugin with video memory input. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:47:54 +01:00
Anton Khirnov	404e51478e	qsv{dec,enc}: always use an internal mfxFrameSurface1 For encoding, this avoids modifying the input surface, which we are not allowed to do. This will also be useful in the following commits. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:47:46 +01:00
Hendrik Leppkes	fabfbfe571	dxva2: fix surface selection when compiled with both d3d11va and dxva2 Fixes a regression introduced in `be630b1e08` Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-07 10:05:12 +01:00
Derek Buitenhuis	db0b3dccb3	libx265: Add option to force IDR frames This is in the same the same vein as `380146924e`. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-07 10:16:10 +02:00
Diego Biurrun	3cba09e522	x86: Drop stray semicolons after function definitions libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic] libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]	2016-11-05 12:41:45 +01:00
Martin Storsjö	392caa65df	arm: vp9mc: Insert a literal pool at the middle of the file This fixes errors like this when building non-pic binaries with armv6 as baseline: Error: invalid literal constant: pool needs to be closer Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-04 21:37:53 +02:00
Diego Biurrun	67351924fa	Drop unreachable break and return statements	2016-11-03 20:17:12 +01:00
Diego Biurrun	6354957a95	dnxhdenc: Have function pointer prototype match implementation libavcodec/dnxhdenc.c(326) : warning C4028: formal parameter 1 different from declaration libavcodec/dnxhdenc.c(329) : warning C4028: formal parameter 1 different from declaration	2016-11-03 17:43:55 +01:00
Diego Biurrun	c778eb15b8	pixblockdsp: Have function pointer prototype match implementation libavcodec/pixblockdsp.c(58) : warning C4028: formal parameter 1 different from declaration libavcodec/pixblockdsp.c(63) : warning C4028: formal parameter 1 different from declaration libavcodec/pixblockdsp.c(66) : warning C4028: formal parameter 1 different from declaration	2016-11-03 17:43:55 +01:00
Diego Biurrun	99ddeddc7f	ituh263dec: Have function signature match across declaration and definition libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 1 different from declaration libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 2 different from declaration	2016-11-03 17:43:55 +01:00
Diego Biurrun	13fcdfb976	svq3: Drop unused function dctcoef_get() libavcodec/svq3.c:627:29: warning: unused function 'dctcoef_get' [-Wunused-function]	2016-11-03 15:52:12 +01:00
Diego Biurrun	ee59f05408	intrax8: Have function signature match across declaration and definition libavcodec/intrax8.c(776) : warning C4028: formal parameter 1 different from declaration	2016-11-03 15:50:48 +01:00
Martin Storsjö	1a469a5e42	options_table: Remove a now unnecessary include of config.h The include of config.h was added in 2012 in `1d9c2dc8`, due to the use of CONFIG_SNOW_ENCODER ifdefs within options_table.h. When the snow codec was dropped later (in `a0c5917f8` in 2013), this include no longer served any purpose. options_table.h is included in builds for the host as well, when building documentation. config.h should not be included in code that is built for the host, since it can contain workarounds for the target compiler/environment, like adding a missing define of restrict, defining getenv(x) to NULL for environments that lack getenv. The seemingly innocent include reordering in `2025d37871` broke builds that have getenv(x) defined to NULL in config.h (Windows CE and Windows Phone/RT), since libavcodec/options_table.h include config.h, while libavformat/options_table.h end up bringing in more system headers, and those system headers can contain a proper definition of getenv, which clash with the getenv define in config.h. This was avoided earlier as long as libavformat/options_table.h (or avformat.h) was included before libavcodec/options_table.h. This fixes builds for Windows Phone/RT and CE. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-03 11:25:50 +02:00
Martin Storsjö	ffbd1d2b00	arm: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. The filter coefficients are signed values, where the product of the multiplication with one individual filter coefficient doesn't overflow a 16 bit signed value (the largest filter coefficient is 127). But when the products are accumulated, the resulting sum can overflow the 16 bit signed range. Instead of accumulating in 32 bit, we accumulate the largest product (either index 3 or 4) last with a saturated addition. (The VP8 MC asm does something similar, but slightly simpler, by accumulating each half of the filter separately. In the VP9 MC filters, each half of the filter can also overflow though, so the largest component has to be handled individually.) Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_neon: 1.71 1.15 1.42 1.49 vp9_avg8_neon: 2.51 3.63 3.14 2.58 vp9_avg16_neon: 2.95 6.76 3.01 2.84 vp9_avg32_neon: 3.29 6.64 2.85 3.00 vp9_avg64_neon: 3.47 6.67 3.14 2.80 vp9_avg_8tap_smooth_4h_neon: 3.22 4.73 2.76 4.67 vp9_avg_8tap_smooth_4hv_neon: 3.67 4.76 3.28 4.71 vp9_avg_8tap_smooth_4v_neon: 5.52 7.60 4.60 6.31 vp9_avg_8tap_smooth_8h_neon: 6.22 9.04 5.12 9.32 vp9_avg_8tap_smooth_8hv_neon: 6.38 8.21 5.72 8.17 vp9_avg_8tap_smooth_8v_neon: 9.22 12.66 8.15 11.10 vp9_avg_8tap_smooth_64h_neon: 7.02 10.23 5.54 11.58 vp9_avg_8tap_smooth_64hv_neon: 6.76 9.46 5.93 9.40 vp9_avg_8tap_smooth_64v_neon: 10.76 14.13 9.46 13.37 vp9_put4_neon: 1.11 1.47 1.00 1.21 vp9_put8_neon: 1.23 2.17 1.94 1.48 vp9_put16_neon: 1.63 4.02 1.73 1.97 vp9_put32_neon: 1.56 4.92 2.00 1.96 vp9_put64_neon: 2.10 5.28 2.03 2.35 vp9_put_8tap_smooth_4h_neon: 3.11 4.35 2.63 4.35 vp9_put_8tap_smooth_4hv_neon: 3.67 4.69 3.25 4.71 vp9_put_8tap_smooth_4v_neon: 5.45 7.27 4.49 6.52 vp9_put_8tap_smooth_8h_neon: 5.97 8.18 4.81 8.56 vp9_put_8tap_smooth_8hv_neon: 6.39 7.90 5.64 8.15 vp9_put_8tap_smooth_8v_neon: 9.03 11.84 8.07 11.51 vp9_put_8tap_smooth_64h_neon: 6.78 9.48 4.88 10.89 vp9_put_8tap_smooth_64hv_neon: 6.99 8.87 5.94 9.56 vp9_put_8tap_smooth_64v_neon: 10.69 13.30 9.43 14.34 For the larger 8tap filters, the speedup vs C code is around 5-14x. This is significantly faster than libvpx's implementation of the same functions, at least when comparing the put_8tap_smooth_64 functions (compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from libvpx). Absolute runtimes from checkasm: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_64h_neon: 20150.3 14489.4 19733.6 10863.7 libvpx vpx_convolve8_horiz_neon: 52623.3 19736.4 21907.7 25027.7 vp9_put_8tap_smooth_64v_neon: 14455.0 12303.9 13746.4 9628.9 libvpx vpx_convolve8_vert_neon: 42090.0 17706.2 17659.9 16941.2 Thus, on the A9, the horizontal filter is only marginally faster than libvpx, while our version is significantly faster on the other cores, and the vertical filter is significantly faster on all cores. The difference is especially large on the A7. The libvpx implementation does the accumulation in 32 bit, which probably explains most of the differences. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-03 09:35:38 +02:00
Martin Storsjö	2e55e26b40	vp9: Flip the order of arguments in MC functions This makes it match the pattern already used for VP8 MC functions. This also makes the signature match ffmpeg's version of these functions, easing porting of code in both directions. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-03 09:12:02 +02:00
Diego Biurrun	baab87c4f3	bink: Have function pointer prototype match implementation libavcodec/binkdsp.c(156) : warning C4028: formal parameter 1 different from declaration	2016-11-02 10:33:39 +01:00
Diego Biurrun	4cf2ffb7c4	idct: Have function pointer prototype match implementation libavcodec/idctdsp.c(175) : warning C4028: formal parameter 2 different from declaration	2016-11-02 10:33:39 +01:00
Diego Biurrun	39cea6570c	aactab: Move extern keyword to the front of array declarations libavcodec/aactab.h:49:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration]	2016-11-02 10:33:36 +01:00
Luca Barbato	801ac7156d	qsv: Be informative when reporting that no data has been consumed	2016-10-30 21:55:03 +01:00
Diego Biurrun	30015305f3	Use avpriv_request_sample() where appropriate	2016-10-29 18:32:21 +02:00
Diego Biurrun	3ec6f855d0	srt: Adjust signedness of sscanf format strings Fixes several warnings from -Wformat.	2016-10-28 13:28:36 +02:00
Diego Biurrun	7a2b2b6a92	dxtory: Drop nonsense ISO C printf conversion specifiers for standard types	2016-10-28 13:24:55 +02:00
Diego Biurrun	c454dfcff9	Use ISO C printf conversion specifiers where appropriate	2016-10-28 13:24:44 +02:00
Diego Biurrun	fbe425c8d2	hap: Adjust printf length modifiers to match variable types libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t {aka unsigned int}’ [-Wformat=] libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘size_t {aka unsigned int}’ [-Wformat=]	2016-10-28 11:22:22 +02:00
Diego Biurrun	1263b2039e	Adjust printf conversion specifiers to match variable signedness	2016-10-28 11:22:21 +02:00
Diego Biurrun	47756f51fe	dnxhdenc: Drop pointless, commented-out debug output	2016-10-27 12:21:46 +02:00
Diego Biurrun	0574780d7a	h264_loopfilter: Do not print value of uninitialized variable libavcodec/h264_loopfilter.c:531:111: warning: variable 'edge' is uninitialized when used here [-Wuninitialized]	2016-10-27 12:21:46 +02:00
Diego Biurrun	2555269985	mpegaudio: Do not print value of uninitialized variable libavcodec/mpegaudiodec_template.c:885:97: warning: variable 'x' is uninitialized when used here [-Wuninitialized]	2016-10-27 12:21:46 +02:00
Mark Thompson	0aec37e625	vaapi_decode: Remove vestigial unmap code The buffer map/unmap code was in an early version of this before it was committed, but the unmap was never removed. While wrong, this was harmless (and therefore unnoticed) because the buffers can't be mapped at this point - all drivers just did nothing with the call.	2016-10-24 20:17:47 +01:00
Mark Thompson	5e879b54a3	vaapi_decode: Clear parameter buffers to fix picture reuse When decoding interlaced pictures, the structure is reused to render to the same surface twice. The parameter buffers were not being cleared, which caused the i965 driver to error out.	2016-10-24 20:17:47 +01:00
Gwenole Beauchesne	754b20d7eb	vaapi_h264: fix RefPicList[] field flags. Use new H264Ref.reference field to track field picture flags. The H264Picture.reference flag in DPB is now irrelevant here. This is a regression from git commit `a12d3188`, and that affected multiple interlaced video streams. Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com> Signed-off-by: Mark Thompson <sw@jkqxz.net>	2016-10-24 20:17:47 +01:00
Pierre Edouard Lepere	6d5636ad9a	hevc: x86: Add add_residual() SIMD optimizations Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>, extended by James Almer <jamrial@gmail.com>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>	2016-10-22 17:33:35 +02:00
Vittorio Giovara	0d9b9bd37f	lavu: Add JEDEC P22 color primaries	2016-10-21 11:46:21 -04:00
Anton Khirnov	59c90097a0	hevc: factor out a repeated condition	2016-10-21 10:11:20 +02:00
Anton Khirnov	0bfdcce4d4	hevc: move the SliceType enum to hevc.h Those values are decoder-independent and are also use by the VA-API encoder.	2016-10-21 10:11:20 +02:00
Diego Biurrun	788544ff0e	audiodsp: x86: Remove pointless header file Its single forward declaration can be moved to the only place it is used, like is done for all other dsp init files.	2016-10-19 15:20:41 +02:00
Diego Biurrun	b89804da9b	x86: videodsp: Add parentheses to expression to work around warning libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds	2016-10-19 10:13:34 +02:00
Diego Biurrun	58224dc5f3	ppc: avcodec: Drop silly "_ppc" suffixes from files in ppc subdirectories	2016-10-18 00:10:36 +02:00
Mark Thompson	0cf86fabfa	vaapi_encode: Write sequence header as extradata Only works if packed headers are supported, where we can know the output before generating the first frame.	2016-10-17 21:07:25 +01:00
Mark Thompson	f9bb356e0e	vaapi_h265: Include header for slice types The include was changed correctly in `4abe3b049d` but then mistakenly changed back by `c359d624d3` (it's not just the NAL unit types which are used).	2016-10-17 20:53:28 +01:00
Diego Biurrun	6be7944ee2	x86: Add missing colons after assembly labels This fixes many warnings of the sort warning: label alone on a line without a colon might be in error	2016-10-17 16:31:26 +02:00
Anton Khirnov	89b35a139e	lavc: add a bitstream filter for extracting extradata from packets This is intended as a replacement for the 'split' function exported by some parsers.	2016-10-16 20:27:16 +02:00
Anton Khirnov	f6e2f8a9ff	hevcdec: move parameter set parsing into a separate header This code is independent from the decoder, so it makes more sense for it to to have its own header.	2016-10-16 20:26:47 +02:00
Anton Khirnov	150c896a9e	hevcdec: split ff_hevc_diag_scan* declarations into a separate header This will be useful in the following commits.	2016-10-16 20:26:40 +02:00
Anton Khirnov	645c6ff423	hevcdec: drop the prototype of a non-existing function	2016-10-16 20:26:35 +02:00
Anton Khirnov	c359d624d3	hevcdec: move decoder-independent declarations into a separate header This way they can be reused by other code without including the whole decoder-specific hevcdec.h Also, add the HEVC_ prefix to them, since similarly named values exist for H.264 as well and are sometimes used in the same code.	2016-10-16 20:26:28 +02:00
Anton Khirnov	4abe3b049d	hevc: rename hevc.[ch] to hevcdec.[ch] This is more consistent with the rest of libav and frees up the hevc.h name for decoder-independent shared declarations.	2016-10-16 20:26:17 +02:00
Kieran Kunhya	81f1f6c3f6	Add GBRAP12 pixel format support Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-10-12 21:33:34 +02:00
Vittorio Giovara	14e7e19a90	lavc: bsf: Document input/output codecparam alloc/init process	2016-10-12 11:06:58 -04:00
Alexandra Hájková	112cee0241	hevc: Add SSE2 and AVX IDCT Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-11 18:21:04 +02:00
Martin Storsjö	9b2ccafb48	aarch64: Add missing sign extension in ff_h264_idct8_add_neon Signed-off-by: Martin Storsjö <martin@martin.st>	2016-10-10 14:57:53 +03:00
Yogender Gupta	cbd84b8a51	nvenc: Fix error log Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-10-09 20:58:10 +02:00
Yogender Gupta	da2848375a	nvenc: Force high_444 profile for 444 input Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-10-07 10:41:38 +02:00
Anton Khirnov	e4128c08d7	Revert "hevc: x86: Refactor IDCT macro declarations" This reverts commit `d9dccc0389`. There were outstanding objections to this commit.	2016-10-06 15:24:04 +02:00
Diego Biurrun	5801f9ed24	h264_intrapred: x86: Update comments left behind in `95c89da36e`	2016-10-06 12:32:34 +02:00
Diego Biurrun	d9dccc0389	hevc: x86: Refactor IDCT macro declarations	2016-10-06 12:32:34 +02:00
Steve Lhomme	be630b1e08	d3d11va: Use the proper decoding slice index The decoding buffer index expected by D3D11VA is the one from the ID3D11Texture2D not the one from the ID3D11VideoDecoderOutputView array in AVD3D11VAContext. Otherwise, when providing decoder slices that do not start from 0, pictures appear in bogus order. For an invalid index crashes and image corruption can occur. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-10-05 18:37:27 +02:00
Ronald S. Bultje	715f139c9b	vp9lpf/x86: make filter_16_h work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:09 +02:00
Ronald S. Bultje	8915320db9	vp9lpf/x86: make filter_48/84/88_h work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:09 +02:00
Ronald S. Bultje	725a216481	vp9lpf/x86: make filter_44_h work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:09 +02:00
Ronald S. Bultje	5bfa96c4b3	vp9lpf/x86: make filter_16_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:09 +02:00
Ronald S. Bultje	b905e8d2fe	vp9lpf/x86: make filter_48/84_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	37637e6590	vp9lpf/x86: make filter_88_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	be10834bd9	vp9lpf/x86: make filter_44_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	7c62891efe	vp9lpf/x86: save one register in SIGN_ADD/SUB. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	c6375a83d1	vp9lpf/x86: store unpacked intermediates for filter6/14 on stack. filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88 goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	4ce8ba72f9	vp9lpf/x86: move variable assigned inside macro branch. The value is not used outside the branch. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	e4961035b2	vp9lpf/x86: simplify ABSSUM_CMP by inverting the comparison meaning. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	683da2788e	vp9lpf/x86: remove unused register from ABSSUB_CMP macro. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	6e74e9636b	vp9lpf/x86: slightly simplify 44/48/84/88 h stores. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	6411c328a2	vp9lpf/x86: make cglobal statement more conservative in register allocation. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	a6e288d624	vp9lpf/x86: save one register in loopfilter surface coverage. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Clément Bœsch	0ed21bdc9e	vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Clément Bœsch	f2e3d706a1	vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}(). Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
James Almer	92d47550ea	vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16 Similar gains as the ssse3 version once again Additional improvements by Clément Bœsch <u@pkh.me>. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Clément Bœsch	6bea478158	vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
James Almer	1f451eed60	vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2(). Similar gains in performance as the SSSE3 version Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Clément Bœsch	a692724c58	vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	a451324ddd	vp9: ignore reference segmentation map if error_resilience flag is set. Fixes ffvp9_fails_where_libvpx.succeeds.webm. Bug-Id: ffmpeg/3849. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:07 +02:00
Carl Eugen Hoyos	c19830aa2c	rscc: Support palette format Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-10-02 15:42:03 -04:00
Vittorio Giovara	b8d5070db6	avcodec: Document AV_PKT_DATA_PALETTE side data type	2016-10-02 15:42:03 -04:00
Mark Thompson	5a5df90d9c	vaapi_h265: Add main 10 encode support	2016-10-02 20:23:18 +01:00
Mark Thompson	b8cac1e830	vaapi_h265: Fix buffering parameters A decoder may need this to be set correctly to output frames in the right order.	2016-10-02 20:23:18 +01:00
Mark Thompson	fc30a90898	vaapi_h265: Fix slice header writing This was not observed earlier because the only syntax element which it normally misses with the current setup is slice_qp_delta, but that is always going to be zero (in IDR frames QP isn't varied on the slice) which will always exp-golomb code as a single 1 bit. The immediately following part is the byte alignment, which is always a 1 bit followed by 0s which are ignored, so as long as the bitstream is never aligned at that point we will never notice because the only difference is that an ignored bit is a 1 instead of a 0.	2016-10-02 20:23:18 +01:00
Mark Thompson	ec17ab381e	vaapi_h264: Write bitstream restriction fields	2016-10-02 20:23:18 +01:00
Mark Thompson	17a0f9481c	vaapi_h264: Fix CFR mode with frame_rate set in AVCodecContext	2016-10-02 20:23:18 +01:00
Mark Thompson	314b421dd8	vaapi_encode: Decide on GOP setup before initialising sequence parameters This was always too late; several fields related to it have been incorrectly zero since the encoder was added.	2016-10-02 20:23:18 +01:00
Anton Khirnov	59c7022740	pthread_frame: use atomics for frame progress	2016-10-02 19:35:46 +02:00
Anton Khirnov	64a31b2854	pthread_frame: use atomics for PerThreadContext.state	2016-10-02 19:35:34 +02:00
Anton Khirnov	db2733256d	pthread_frame: use a thread-safe way for signalling threads to die Current code uses a plain int in a racy way, which is UB.	2016-10-02 19:35:23 +02:00
Anton Khirnov	8385ba53f1	mmaldec: convert to stdatomic	2016-10-02 19:35:12 +02:00
Luca Barbato	b015872c0d	huffyuvdsp: Enable the altivec code for PPC little-endian as well Confirmed to work by checkasm.	2016-10-02 17:13:36 +02:00

... 2 3 4 5 6 ...

21404 Commits