FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-02-20 07:48:15 +02:00

Author	SHA1	Message	Date
Martin Storsjö	52d196fb30	arm: vp9itxfm: Simplify txfm string comparisons Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 00:10:13 +02:00
Martin Storsjö	3c9546dfaf	aarch64: vp9: Add NEON itxfm routines This work is sponsored by, and copyright, Google. These are ported from the ARM version; thanks to the larger amount of registers available, we can do the 16x16 and 32x32 transforms in slices 8 pixels wide instead of 4. This gives a speedup of around 1.4x compared to the 32 bit version. The fact that aarch64 doesn't have the same d/q register aliasing makes some of the macros quite a bit simpler as well. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_inv_adst_adst_4x4_add_neon: 90.0 87.7 vp9_inv_adst_adst_8x8_add_neon: 400.0 354.7 vp9_inv_adst_adst_16x16_add_neon: 2526.5 1827.2 vp9_inv_dct_dct_4x4_add_neon: 74.0 72.7 vp9_inv_dct_dct_8x8_add_neon: 271.0 256.7 vp9_inv_dct_dct_16x16_add_neon: 1960.7 1372.7 vp9_inv_dct_dct_32x32_add_neon: 11988.9 8088.3 vp9_inv_wht_wht_4x4_add_neon: 63.0 57.7 The speedup vs C code (2-4x) is smaller than in the 32 bit case, mostly because the C code ends up significantly faster (around 1.6x faster, with GCC 5.4) when built for aarch64. Examples of runtimes vs C on a Cortex A57 (for a slightly older version of the patch): A57 gcc-5.3 neon vp9_inv_adst_adst_4x4_add_neon: 152.2 60.0 vp9_inv_adst_adst_8x8_add_neon: 948.2 288.0 vp9_inv_adst_adst_16x16_add_neon: 4830.4 1380.5 vp9_inv_dct_dct_4x4_add_neon: 153.0 58.6 vp9_inv_dct_dct_8x8_add_neon: 789.2 180.2 vp9_inv_dct_dct_16x16_add_neon: 3639.6 917.1 vp9_inv_dct_dct_32x32_add_neon: 20462.1 4985.0 vp9_inv_wht_wht_4x4_add_neon: 91.0 49.8 The asm is around factor 3-4 faster than C on the cortex-a57 and the asm is around 30-50% faster on the a57 compared to the a53. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 00:10:13 +02:00
Diego Biurrun	01348e411f	avconv_opt: Consistently iterate through hwaccels array in all cases avconv_opt.c:188:19: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]	2016-11-13 19:06:38 +01:00
Diego Biurrun	800d91d348	Drop pointless void* casts	2016-11-13 18:44:01 +01:00
Diego Biurrun	d316f9cefc	aac: Drop pointless cast	2016-11-13 18:44:00 +01:00
Diego Biurrun	8ddfa5ae5e	vf_drawtext: Drop wrong void* cast libavfilter/vf_drawtext.c:844:49: warning: ISO C forbids conversion of function pointer to object pointer type [-Wpedantic]	2016-11-12 16:47:07 +01:00
Diego Biurrun	fcbdd605b5	nut: Use correct function pointer casts instead of void* Fixes several warnings of the type libavformat/nut.c:207:42: warning: ISO C forbids conversion of function pointer to object pointer type [-Wpedantic]	2016-11-12 16:47:06 +01:00
Diego Biurrun	3b50dbc51f	ratecontrol: Use correct function pointer casts instead of void* libavcodec/ratecontrol.c:120:9: warning: ISO C forbids initialization between function pointer and ‘void ’ [-Wpedantic] libavcodec/ratecontrol.c:121:9: warning: ISO C forbids initialization between function pointer and ‘void ’ [-Wpedantic]	2016-11-12 16:47:06 +01:00
Martin Storsjö	dd299a2d6d	arm: vp9: Add NEON loop filters This work is sponsored by, and copyright, Google. The implementation tries to have smart handling of cases where no pixels need the full filtering for the 8/16 width filters, skipping both calculation and writeback of the unmodified pixels in those cases. The actual effect of this is hard to test with checkasm though, since it tests the full filtering, and the benefit depends on how many filtered blocks use the shortcut. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_loop_filter_h_4_8_neon: 2.72 2.68 1.78 3.15 vp9_loop_filter_h_8_8_neon: 2.36 2.38 1.70 2.91 vp9_loop_filter_h_16_8_neon: 1.80 1.89 1.45 2.01 vp9_loop_filter_h_16_16_neon: 2.81 2.78 2.18 3.16 vp9_loop_filter_mix2_h_44_16_neon: 2.65 2.67 1.93 3.05 vp9_loop_filter_mix2_h_48_16_neon: 2.46 2.38 1.81 2.85 vp9_loop_filter_mix2_h_84_16_neon: 2.50 2.41 1.73 2.85 vp9_loop_filter_mix2_h_88_16_neon: 2.77 2.66 1.96 3.23 vp9_loop_filter_mix2_v_44_16_neon: 4.28 4.46 3.22 5.70 vp9_loop_filter_mix2_v_48_16_neon: 3.92 4.00 3.03 5.19 vp9_loop_filter_mix2_v_84_16_neon: 3.97 4.31 2.98 5.33 vp9_loop_filter_mix2_v_88_16_neon: 3.91 4.19 3.06 5.18 vp9_loop_filter_v_4_8_neon: 4.53 4.47 3.31 6.05 vp9_loop_filter_v_8_8_neon: 3.58 3.99 2.92 5.17 vp9_loop_filter_v_16_8_neon: 3.40 3.50 2.81 4.68 vp9_loop_filter_v_16_16_neon: 4.66 4.41 3.74 6.02 The speedup vs C code is around 2-6x. The numbers are quite inconclusive though, since the checkasm test runs multiple filterings on top of each other, so later rounds might end up with different codepaths (different decisions on which filter to apply, based on input pixel differences). Disabling the early-exit in the asm doesn't give a fair comparison either though, since the C code only does the necessary calcuations for each row. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 4-9x. This is pretty similar in runtime to the corresponding routines in libvpx. (This is comparing vpx_lpf_vertical_16_neon, vpx_lpf_horizontal_edge_8_neon and vpx_lpf_horizontal_edge_16_neon to vp9_loop_filter_h_16_8_neon, vp9_loop_filter_v_16_8_neon and vp9_loop_filter_v_16_16_neon - note that the naming of horizonal and vertical is flipped between the libraries.) In order to have stable, comparable numbers, the early exits in both asm versions were disabled, forcing the full filtering codepath. Cortex A7 A8 A9 A53 vp9_loop_filter_h_16_8_neon: 597.2 472.0 482.4 415.0 libvpx vpx_lpf_vertical_16_neon: 626.0 464.5 470.7 445.0 vp9_loop_filter_v_16_8_neon: 500.2 422.5 429.7 295.0 libvpx vpx_lpf_horizontal_edge_8_neon: 586.5 414.5 415.6 383.2 vp9_loop_filter_v_16_16_neon: 905.0 784.7 791.5 546.0 libvpx vpx_lpf_horizontal_edge_16_neon: 1060.2 751.7 743.5 685.2 Our version is consistently faster on on A7 and A53, marginally slower on A8, and sometimes faster, sometimes slower on A9 (marginally slower in all three tests in this particular test run). Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 14:16:42 +02:00
Diego Biurrun	f7d183f084	libxvid: Check return value of write() call libavcodec/libxvid_rc.c:106:9: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]	2016-11-11 10:17:07 +01:00
Diego Biurrun	e5e8a26dcf	libxvid: Use proper context in av_log() calls	2016-11-11 10:17:07 +01:00
Diego Biurrun	12db2832e4	libxvid: Require availability of mkstemp() The replacement code uses tempnam(), which is dangerous. Such a fringe feature is not worth the trouble.	2016-11-11 10:17:07 +01:00
Martin Storsjö	a67ae67083	arm: vp9: Add NEON itxfm routines This work is sponsored by, and copyright, Google. For the transforms up to 8x8, we can fit all the data (including temporaries) in registers and just do a straightforward transform of all the data. For 16x16, we do a transform of 4x16 pixels in 4 slices, using a temporary buffer. For 32x32, we transform 4x32 pixels at a time, in two steps of 4x16 pixels each. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_inv_adst_adst_4x4_add_neon: 3.39 5.83 4.17 4.01 vp9_inv_adst_adst_8x8_add_neon: 3.79 4.86 4.23 3.98 vp9_inv_adst_adst_16x16_add_neon: 3.33 4.36 4.11 4.16 vp9_inv_dct_dct_4x4_add_neon: 4.06 6.16 4.59 4.46 vp9_inv_dct_dct_8x8_add_neon: 4.61 6.01 4.98 4.86 vp9_inv_dct_dct_16x16_add_neon: 3.35 3.44 3.36 3.79 vp9_inv_dct_dct_32x32_add_neon: 3.89 3.50 3.79 4.42 vp9_inv_wht_wht_4x4_add_neon: 3.22 5.13 3.53 3.77 Thus, the speedup vs C code is around 3-6x. This is mostly marginally faster than the corresponding routines in libvpx on most cores, tested with their 32x32 idct (compared to vpx_idct32x32_1024_add_neon). These numbers are slightly in libvpx's favour since their version doesn't clear the input buffer like ours do (although the effect of that on the total runtime probably is negligible.) Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_add_neon: 18436.8 16874.1 14235.1 11988.9 libvpx vpx_idct32x32_1024_add_neon 20789.0 13344.3 15049.9 13030.5 Only on the Cortex A8, the libvpx function is faster. On the other cores, ours is slightly faster even though ours has got source block clearing integrated. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 11:09:05 +02:00
Ronald S. Bultje	0b37cd09a6	checkasm: add vp9dsp.itxfm_add tests. This includes fixes by Henrik Gramner. The forward transforms are derived from the reference encoder. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 11:09:05 +02:00
Mark Thompson	fd0fae6037	pthread_frame: Unreference hw_frames_ctx on per-thread codec contexts When decoding with threads enabled, the get_format callback will be called with one of the per-thread codec contexts rather than with the outer context. If a hwaccel is in use too, this will add a reference to the hardware frames context on that codec context, which will then propagate to all of the other per-thread contexts for decoding. Once the decoder finishes, however, the per-thread contexts are not freed normally, so these references leak.	2016-11-10 20:36:11 +00:00
Martin Storsjö	11623217e3	arm: vp9mc: Use a different helper register for PIC loads This fixes crashes since 557c1675cf in linux PIC builds. Previously, movrelx silently used r12 as helper register, which doesn't work when r12 is the destination register. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 14:01:04 +02:00
Martin Storsjö	824e8c2840	arm: Clear the gp register alias at the end of functions We reset .Lpic_gp to zero at the start of each function, which means that the logic within movrelx for clearing gp when necessary will be missed. This fixes using movrelx in different functions with a different helper register. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 14:01:04 +02:00
Diego Biurrun	905cdcaa9d	examples/decode_audio: Add missing header for av_free()	2016-11-10 10:33:19 +01:00
Martin Storsjö	6a62795d40	aarch64: h264idct: Use the offset parameter to movrel Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:18:22 +02:00
Martin Storsjö	557c1675cf	arm: vp9mc: Minor adjustments from review of the aarch64 version This work is sponsored by, and copyright, Google. The speedup for the large horizontal filters is surprisingly big on A7 and A53, while there's a minor slowdown (almost within measurement noise) on A8 and A9. Cortex A7 A8 A9 A53 orig: vp9_put_8tap_smooth_64h_neon: 20270.0 14447.3 19723.9 10910.9 new: vp9_put_8tap_smooth_64h_neon: 20165.8 14466.5 19730.2 10668.8 Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:18:22 +02:00
Martin Storsjö	383d96aa22	aarch64: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:15:56 +02:00
Martin Storsjö	c44a8a3eab	aarch64: Add an offset parameter to the movrel macro With apple tools, the linker fails with errors like these, if the offset is negative: ld: in section __TEXT,__text reloc 8: symbol index out of range for architecture arm64 Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:06:08 +02:00
Martin Storsjö	a4cfcddcb0	vp9: Make the subpel filters non-static Make them aligned, to allow efficient access to them from simd. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:05:57 +02:00
James Almer	98cae966c7	matroskaenc: write updated STREAMINFO metadata for FLAC streams if available FLAC streams originating from the FLAC encoder send updated and more complete STREAMINFO metadata as part of the last packet, so write that to CodecPrivate instead of the incomplete one available in extradata during init. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-10 09:15:24 +01:00
James Almer	f4bf236338	matroskaenc: fix muxing AAC streams when using aac_adtstoasc bsf aac_adtstoasc makes the aac extradata available only after the first packet is filtered, and as packet side data. Assume extradata will be available as part of the first packet if avpriv_mpeg4audio_get_config() fails the first time due to missing extradata and reserve space for the OutputSampleRate element in the Tracks master. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-10 09:01:18 +01:00
Anton Khirnov	84f225684c	pthread_frame: properly propagate the hw frame context across frame threads	2016-11-10 09:00:11 +01:00
Diego Biurrun	72a19f4013	mpegaudiodsp: aarch64: Adjust function prototype after 2caa93b813adc5dbb7771dfe615da826a2947d18	2016-11-10 00:13:48 +01:00
Diego Biurrun	2dd464868c	configure: Move license checks directly after command line parsing This will allow to error out immediately if incompatible options are passed on the command line instead of running time-consuming tests.	2016-11-09 20:51:56 +01:00
Diego Biurrun	c78495d1cd	configure: Log name and parameters of all helper functions where it makes sense	2016-11-09 20:51:56 +01:00
Diego Biurrun	8a6e7a67cb	configure: Use check_cpp in CPP flags tests	2016-11-09 20:51:56 +01:00
Diego Biurrun	831005b230	configure: Log correct test name and use correct filter when testing objective C flags	2016-11-09 20:51:56 +01:00
Diego Biurrun	fe7bc1f16a	configure: Do not unconditionally check for (and enable) xlib This avoids unnecessarily linking against xlib.	2016-11-09 20:51:55 +01:00
Diego Biurrun	d1a91ebe49	configure: Print list of enabled programs Also drop a related and now redundant informative output line.	2016-11-09 20:51:55 +01:00
Diego Biurrun	576c9003ae	configure: Improve output wording Also drop a redundant output line.	2016-11-09 20:51:55 +01:00
Diego Biurrun	a3483f7993	avconv: Drop stray leftover debug output	2016-11-09 20:51:55 +01:00
Diego Biurrun	67deba8a41	Use avpriv_report_missing_feature() where appropriate	2016-11-08 17:54:34 +01:00
Diego Biurrun	59d2b00d20	configure: Add --quiet command line parameter to suppress informative output	2016-11-08 17:32:57 +01:00
Diego Biurrun	4537647c04	fate: checkasm: Split monolithic test into individual components	2016-11-08 17:32:25 +01:00
Diego Biurrun	9498237049	checkasm: Add --test parameter to check only specific components Inspired by a patch from Martin Storsjö <martin@martin.st>.	2016-11-08 17:32:25 +01:00
Vittorio Giovara	de6e2ff3dd	mov: Read multiple stsd from DV Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Vittorio Giovara	47a795727f	hevc: Support extradata changes from multiple stsd Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Vittorio Giovara	2fe30b4743	hevc: Allow parsing external extradata buffers	2016-11-08 11:22:29 -05:00
Vittorio Giovara	5be2153111	hevc: Move hevc_decode_extradata before frame decoding Avoids a forward-declaration in the following commit. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Vittorio Giovara	bed2c4b265	lavc: Add hevc main10 profile to avconv cli	2016-11-08 11:22:29 -05:00
Vittorio Giovara	17dac56b8f	lavu: Rename ycgco color space appropriately Planes are ordered as the name suggests now. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Diego Biurrun	0361e4dcb4	h264_qpel: x86: Move function with only one instance out of template macro libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]	2016-11-08 17:21:02 +01:00
Diego Biurrun	88f0cf8cd3	avplay: Correct function pointer assignments in options array avplay.c:2928:5: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]	2016-11-08 17:20:30 +01:00
Diego Biurrun	943533d64c	avconv: Correct function pointer assignments in options array Fixes several warnings of the type avconv_opt.c:2356:5: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]	2016-11-08 16:48:41 +01:00
Andreas Cadhalpun	43de8b328b	lzf: update pointer p after realloc This fixes heap-use-after-free detected by AddressSanitizer. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-11-07 22:42:00 +01:00
Luca Barbato	ab839054e6	swscale: Add GRAY12	2016-11-07 22:42:00 +01:00

1 2 3 4 5 ...

43979 Commits