FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Mark Thompson	89725a8512	vaapi_h264: Scale log2_max_pic_order_cnt_lsb with max_b_frames Before this change, it was possible to overflow pic_order_cnt_lsb and generate a stream with invalid POC numbering. This makes sure that the field is large enough that a single IDR B* P sequence uses fewer than half the available POC lsb values.	2017-01-11 23:03:58 +00:00
Mark Thompson	a3c3a5eac2	vaapi_encode: Support forcing IDR frames via AVFrame.pict_type	2017-01-11 23:03:58 +00:00
Mark Thompson	37fab0661a	vaapi_encode: Fix GOP sizing This change makes the configured GOP size be respected exactly - previously the value could be exceeded slightly due to flaws in the frame type selection logic.	2017-01-11 23:03:58 +00:00
Alexandra Hájková	bd6496fa07	interplayvideo: Convert to the new bitstream reader	2017-01-09 15:21:47 +01:00
Alexandra Hájková	4e25051031	adx: Convert to the new bitstream reader	2017-01-09 15:21:47 +01:00
Alexandra Hájková	9aec009f65	dvbsubdec: Convert to the new bitstream reader	2017-01-09 15:21:47 +01:00
Alexandra Hájková	d7fe11634c	motionpixels: Convert to the new bitstream reader	2017-01-09 15:18:16 +01:00
Anton Khirnov	f1af37b510	h264dec: make ff_h264_decode_init() static It is not called from outside h264dec.c anymore.	2017-01-09 13:21:13 +01:00
Anton Khirnov	e7de05f98f	h264dec: drop a redundant check Cropping parameters are already checked for validity during SPS parsing, no need to check them again.	2017-01-09 13:21:13 +01:00
Steve Lhomme	2835e9a9fd	hevcdec: add P010 support for D3D11VA Given it's the same API than DVXA2 I don't know why the same output was not enabled for both. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-01-09 10:48:54 +01:00
Steve Lhomme	0ac2d86c47	dxva2: Factorize DXVA context validity test into a single macro Signed-off-by: Diego Biurrun <diego@biurrun.de>	2017-01-08 16:41:24 +01:00
Steve Lhomme	f8a42d4f26	dxva2: Make ff_dxva2_get_surface() static and drop its name prefix Signed-off-by: Diego Biurrun <diego@biurrun.de>	2017-01-08 16:41:07 +01:00
Jun Zhao	9b1db2d338	vaapi_h264: Fix POC on IDR frames In H.264 section 8.2.1, we have that "The bitstream shall not contain data that result in Min(TopFieldOrderCnt, BottomFieldOrderCnt) not equal to 0 for a coded IDR frame". This fixes the encoder to always conform to this - previously the POC values formed an unbroken sequence, not resetting to zero on IDR frames. Signed-off-by: Mark Thompson <sw@jkqxz.net>	2017-01-04 21:52:06 +00:00
Mark Thompson	d08e02d929	vaapi_h265: Fix build failure with old libva without 10-bit surfaces 10-bit surface support was added in libva 1.6.2, earlier versions support H.265 encoding in 8-bit only.	2017-01-04 21:49:41 +00:00
Martin Storsjö	85ad5ea72c	aarch64: vp9mc: Fix a comment to refer to a register with the right name Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-03 14:16:10 +02:00
Martin Storsjö	65074791e8	aarch64: vp9dsp: Fix vertical alignment in the init file Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-03 14:15:58 +02:00
Martin Storsjö	c536e5e869	arm: vp9mc: Fix vertical alignment of operands Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-03 14:15:45 +02:00
Diego Biurrun	53618054b6	parser: Add missing #include for printing ISO C99 conversion specifiers	2016-12-25 13:22:50 +01:00
Diego Biurrun	0b77a59336	Use correct printf conversion specifiers for POSIX integer types	2016-12-23 19:30:00 +01:00
Diego Biurrun	92db508307	build: Generate pkg-config files from Make and not from configure This moves work from the configure to the Make stage where it can be parallelized and ensures that pkgconfig files are updated when library versions change. Bug-Id: 449	2016-12-22 12:30:54 +01:00
Diego Biurrun	f9edc734e0	ratecontrol: Drop xvid-rc-related struct members unused after `a6901b9c6`	2016-12-21 11:13:20 +01:00
Ruta Gadkari	5b26d3b789	nvenc: Update check for lookahead By default it is -1. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-12-21 06:16:52 +01:00
Martin Storsjö	a0c443a398	aarch64: vp9itxfm: Use the offset parameter to movrel This fixes build failures for iOS, broken since `cad42fadcd`. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-12-19 22:49:51 +02:00
Alexandra Hájková	fc322d6a70	tta: Convert to the new bitstream reader	2016-12-19 13:52:36 +01:00
Alexandra Hájková	00c72a1e01	mlp: Convert to the new bitstream reader	2016-12-19 13:22:29 +01:00
Alexandra Hájková	fa64aea12e	unary: Convert to the new bitstream reader	2016-12-19 12:35:05 +01:00
Anton Khirnov	45286a625c	h264dec: make sure to only end a field if it has been started Calling ff_h264_field_end() when the per-field state is not properly initialized leads to all kinds of undefined behaviour. CC: libav-stable@libav.org Bug-Id: 977 978 992	2016-12-19 08:15:58 +01:00
Anton Khirnov	c2fa6bb0e8	mpeg12dec: move setting first_field to mpeg_field_start() For field picture, the first_field is set based on its previous value. Before this commit, first_field is set when reading the picture coding extension. However, in corrupted files there may be multiple picture coding extension headers, so the final value of first_field that is actually used during decoding can be wrong. That can lead to various undefined behaviour, like predicting from a non-existing field. Fix this problem, by setting first_field in mpeg_field_start(), which should be called exactly once per field. CC: libav-stable@libav.org Bug-ID: 999	2016-12-19 08:15:49 +01:00
Anton Khirnov	e807491fc6	mpeg12dec: avoid signed overflow in bitrate calculation CC: libav-stable@libav.org Bug-Id: 981 Found-By: Agostino Sarubbo	2016-12-19 08:15:42 +01:00
Anton Khirnov	58405de095	mpegvideo_parser: avoid signed overflow in bitrate calculation CC: libav-stable@libav.org Bug-Id: 981 Found-By: Agostino Sarubbo	2016-12-19 08:15:07 +01:00
Anton Khirnov	cfa4eb4fba	vaapi_decode: use the correct logging context	2016-12-19 08:13:28 +01:00
Anton Khirnov	ea8b730d8e	hevcdec: add a VAAPI hwaccel Partially based on a patch by Timo Rothenpieler <timo@rothenpieler.org>. Additional scaling list handling fix by Jun Zhao <mypopydev@gmail.com>.	2016-12-19 08:13:08 +01:00
Anton Khirnov	d4a91e6534	pthread_frame: do not run hwaccel decoding asynchronously unless it's safe Certain hardware decoding APIs are not guaranteed to be thread-safe, so having the user access decoded hardware surfaces while the decoder is running in another thread can cause failures (this is mainly known to happen with DXVA2). For such hwaccels, only allow the decoding thread to run while the user is inside a lavc decode call (avcodec_send_packet/receive_frame).	2016-12-19 08:10:22 +01:00
Anton Khirnov	8dfba25ce8	pthread_frame: ensure the threads don't run simultaneously with hwaccel	2016-12-19 08:09:19 +01:00
Anton Khirnov	373fd76b4d	hevcdec: do not set decoder-global SPS prematurely It should only be set after the decoder state has been fully initialized for using that SPS. Fixes possible invalid reads on get_format() failure. CC: libav-stable@libav.org	2016-12-19 08:07:15 +01:00
Janne Grunau	2425d7329f	arm64: replace 'bic' with immediate with 'and' with inverted immediate The former is not an official pseudo instruction although gas and llvm's internal assembler support it. Fixes a build error with xcode 6.2 reported by Memphiz on github.	2016-12-14 21:53:05 +01:00
Diego Biurrun	ea7ee4b4e3	ppc: Centralize compiler-specific altivec.h #include handling in one place Also move #includes into canonical order where appropriate.	2016-12-14 14:08:43 +01:00
Diego Biurrun	39929e55eb	ppc: hevcdsp: Use shorthands for vector types This is more consistent and fixes compilation with clang.	2016-12-14 14:08:43 +01:00
Diego Biurrun	554e55bbf0	decode.h: Add missing headers to fix standalone compilation	2016-12-14 14:08:43 +01:00
Wan-Teh Chang	343e283399	pthread_frame: use better memory orders for frame progress This improves commit `59c7022740`. In ff_thread_report_progress(), the fast code path can load progress[field] with the relaxed memory order, and the slow code path can store progress[field] with the release memory order. These changes are mainly intended to avoid confusion when one inspects the source code. They are unlikely to have measurable performance improvement. ff_thread_report_progress() and ff_thread_await_progress() form a pair. ff_thread_await_progress() reads progress[field] with the acquire memory order (in the fast code path). Therefore, one expects to see ff_thread_report_progress() write progress[field] with the matching release memory order. In the fast code path in ff_thread_report_progress(), the atomic load of progress[field] doesn't need the acquire memory order because the calling thread is trying to make the data it just decoded visible to the other threads, rather than trying to read the data decoded by other threads. In ff_thread_get_buffer(), initialize progress[0] and progress[1] using atomic_init(). Signed-off-by: Wan-Teh Chang <wtc@google.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-12-14 11:16:51 +01:00
Derek Buitenhuis	5c7f2cf81d	h264_slice: Wait for refs to be available before we use them in error concealment This could happen when there was a frame number gap and frame threading was used. Debugging-by: Ronald S. Bultje <rsbultje@gmail.com> Debugging-by: Justin Ruggles <justin.ruggles@gmail.com> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> CC:libav-stable@libav.org Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-12-14 10:38:15 +01:00
Anton Khirnov	86157e6db2	hevc: decouple calling get_format() from exporting the SPS parameters This makes sure ff_get_format() does not get called unnecessarily from update_thread_context().	2016-12-14 09:06:45 +01:00
Anton Khirnov	730c023260	binkaudio: switch to the new send/receive API It is more natural for this codec and allows to avoid awkward constructs like "consuming 0 bytes from input". Also, keep a reference to the input packet to avoid unnecessary copying.	2016-12-14 09:06:45 +01:00
Anton Khirnov	fa1749dd34	vp9: split superframes in the filtering stage before actual decoding Significantly increases the efficiency of frame threading, since individual frames in a superframe can now be decoded in parallel.	2016-12-14 09:06:45 +01:00
Anton Khirnov	03a80925ef	lavc: add a bitstream filter for splitting VP9 superframes Partially based on code by Ronald S. Bultje <rsbultje@gmail.com>.	2016-12-14 09:06:45 +01:00
Anton Khirnov	8fb4210ad8	qsvdec_h2645: switch to the new generic filtering mechanism Drop the internal manual conversion from the MP4 format to Annex B.	2016-12-14 09:06:45 +01:00
Anton Khirnov	972c71e9cb	lavc: add support for filtering packets before decoding	2016-12-14 09:06:45 +01:00
Anton Khirnov	061a0c14bb	decode: restructure the core decoding code Currently, the new decoding API is pretty much just a wrapper around the old deprecated one. This is problematic, since it interferes with making full use of the flexibility added by the new API. The old API should also be removed at some future point. Reorganize the code so that the new send_packet/receive_frame functions call the actual decoding directly and change the old deprecated avcodec_decode_* functions into wrappers around the new API. The new internal API for decoders is now changing as well. Before this commit, it mirrors the public API, so the decoders need to implement send_packet() and receive_frame() callbacks. This turns out to require awkward constructs in both the decoders and the generic code. After this commit, the decoders only implement the receive_frame() callback and call a new internal function, ff_decode_get_packet() to obtain input data, in the same manner to how the bitstream filters now work. avcodec will now always make a reference to the input packet, which means that non-refcounted input packets will be copied. Keeping the previous behaviour, where this copy could sometimes be avoided, would make the code significantly more complex and fragile for only dubious gains, since packets are typically small and everyone who cares about performance should use refcounted packets anyway.	2016-12-14 09:06:44 +01:00
Anton Khirnov	549d0bdca5	decode: be more explicit about storing the last packet properties The current code stores a pointer to the packet passed to the decoder, which is then used during get_buffer() for timestamps and side data passthrough. However, since this is a pointer to user data which we do not own, storing it is potentially dangerous. It is also ill defined for the new decoding API with split input/output. Fix this problem by making an explicit internally owned copy of the packet properties.	2016-12-14 09:06:44 +01:00
Anton Khirnov	47e547b321	lavc: add a null bitstream filter It is useful for testing/debugging and will also be used as the default filter in the following commit adding pre-decode filtering to avoid having a separate non-filtered codepath.	2016-12-14 09:06:44 +01:00
Anton Khirnov	0309ddcfb2	lavc: handle MP3 in get_audio_frame_duration()	2016-12-14 09:06:44 +01:00
Diego Biurrun	6aa4ba7131	dxva2: Keep code shared between dxva2 and d3d11va under the correct #if This partially reverts commit `ac648bb835`.	2016-12-12 13:44:25 +01:00
Alexandra Hajkova	b0e6b3f477	hevc: ppc: Add HEVC 4x4 IDCT for PowerPC Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-12-12 09:25:16 +01:00
Diego Biurrun	a6901b9c6b	Drop libxvid rate control support for mpegvideo encoding The feature has outlived is usefulness and complicates the code.	2016-12-11 09:27:40 +01:00
Diego Biurrun	ac648bb835	dxva2: Simplify some ifdefs	2016-12-11 09:27:40 +01:00
Mark Thompson	7d81698b89	vaapi_h265: Fix CFR mode with framerate set in AVCodecContext Same issue as `17a0f9481c`.	2016-12-10 16:55:44 +00:00
Diego Biurrun	932cc6496e	vdpau: Do not #include vdpau_x11.h from the main vdpau header That header should only be included in the special bits that use X11 code.	2016-12-09 08:41:53 +01:00
Diego Biurrun	92e6b31c3b	dxva2: Adjust multiple inclusion guard names to follow convention	2016-12-09 08:41:52 +01:00
Andreas Cadhalpun	fc85646ad4	libopusdec: fix out-of-bounds read Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>	2016-12-08 15:53:58 -05:00
Andreas Cadhalpun	dc2ad09493	libschroedingerdec: fix leaking of framewithpts Also preserve the return value from ff_get_buffer(). Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-12-08 15:53:58 -05:00
Andreas Cadhalpun	8c3a643808	libschroedingerdec: don't produce empty frames They are not valid and can cause problems/crashes for API users. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>	2016-12-08 15:53:58 -05:00
Timothy Gu	d3da8a0035	omx: Fix allocation check Also use av_mallocz_array(). Bug-Id: CID 1396839 Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-12-08 15:53:58 -05:00
Timothy Gu	d32bdadda8	qsvdec: Fix memory leak on error Bug-Id: CID 1396851 Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-12-08 15:53:58 -05:00
Diego Biurrun	d5759701a8	libkvazaar: Add missing header #includes This fixes compilation after the next version bump.	2016-12-08 21:34:30 +01:00
Diego Biurrun	fbec58daa2	build: Add an internal component for hevc_ps code This allows expressing dependencies in a more correct way.	2016-12-08 20:12:23 +01:00
Vittorio Giovara	2fb6acd9c2	lavc: Add spherical packet side data API Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-12-07 14:34:34 -05:00
Diego Biurrun	624aa8ab22	build: Add missing Makefile entries and ifdefs for QSV hwaccels	2016-12-07 15:46:57 +01:00
Diego Biurrun	e1dc5358af	build: Create a component for MPEG audio header decoding Fixes standalone compilation of the libmp3lame encoder.	2016-12-05 16:13:05 +01:00
Diego Biurrun	0fdc9f81a0	build: Add missing hevc_ps dependency for QSV HEVC encoder	2016-12-05 16:13:04 +01:00
Alexandra Hájková	6c916192f3	mimic: Convert to the new bitstream reader	2016-12-03 14:36:03 +01:00
Alexandra Hájková	cdc6727c3e	metasound: Convert to the new bitstream reader	2016-12-03 14:36:03 +01:00
Alexandra Hájková	6fad5abcad	lagarith: Convert to the new bitstream reader	2016-12-03 14:36:03 +01:00
Alexandra Hájková	c3defda0d8	indeo: Convert to the new bitstream reader	2016-12-03 14:36:03 +01:00
Alexandra Hájková	f5b7bd2a7c	imc: Convert to the new bitstream reader	2016-12-03 14:36:03 +01:00
Alexandra Hájková	39ecf0588f	webp: Convert to the new bitstream reader	2016-12-03 14:36:03 +01:00
James Almer	33a2b73b98	mpeg4audio: correctly propagate meaningful error values Signed-off-by: James Almer <jamrial@gmail.com>	2016-12-02 12:16:30 -05:00
Wan-Teh Chang	d82d5379ca	mmaldec: initialize refcount using atomic_init() This is how we initialize refcount in libavutil/buffer.c. Signed-off-by: Wan-Teh Chang <wtc@google.com> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-12-02 12:16:26 -05:00
Vittorio Giovara	5168026a05	options_table: Do not rely on enum size as option bound Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-12-02 11:36:46 -05:00
Vittorio Giovara	ff9db5cfd1	lavc: Use a stricter check for the color properties values Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-12-02 11:36:42 -05:00
Diego Biurrun	0a35f128f3	cabac: x86: Give optimizations header a more meaningful name	2016-12-01 08:23:54 +01:00
Martin Storsjö	cad42fadcd	aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-30 23:57:05 +02:00
Martin Storsjö	9c8bc74c2b	arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-30 23:54:07 +02:00
Martin Storsjö	3c87039a40	arm: vp9itxfm: Only reload the idct coeffs for the iadst_idct combination This avoids reloading them if they haven't been clobbered, if the first pass also was idct. This is similar to what was done in the aarch64 version. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-30 23:53:52 +02:00
Clément Bœsch	c4c5f5386c	vp9dsp: add DC only versions for idct/idct. before: time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null - real 0m11.125s user 0m11.059s sys 0m0.050s time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null - real 0m10.944s user 0m10.819s sys 0m0.064s after: time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null - real 0m8.153s user 0m8.034s sys 0m0.050s time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null - real 0m8.038s user 0m7.980s sys 0m0.039s Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-30 23:48:28 +02:00
Diego Biurrun	e4382a4ab4	hevc: Eliminate pointless variable indirection	2016-11-30 14:11:44 +01:00
Diego Biurrun	5c89022542	hevc: Drop pointless av_unused attribute	2016-11-30 14:11:43 +01:00
Diego Biurrun	0983f9117f	metasound: Drop unused tables	2016-11-30 13:44:05 +01:00
Diego Biurrun	212c6a1d70	mjpegdec: Check return values of functions that may fail	2016-11-29 13:13:35 +01:00
Diego Biurrun	3ee5f25d37	dxva2: Adjust printf length modifiers where appropriate	2016-11-29 13:13:35 +01:00
Anton Khirnov	3fe2a01df7	lavc: move decoding-related code from utils.c to a new file	2016-11-29 10:39:20 +01:00
Anton Khirnov	328cd2b599	lavc: move encoding-related code from utils.c to a new file	2016-11-29 10:39:20 +01:00
James Almer	45d199d5b0	aac_adtstoasc_bsf: validate and forward extradata if the stream is already ASC Fixes AAC AudioSpecificConfig passthrough. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-29 10:39:20 +01:00
Andreas Cadhalpun	1762a39e09	mss2: only use error correction for matching block counts This fixes a heap-buffer-overflow in ff_er_frame_end when decoding mss2 with coded_width/coded_height larger than width/height. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-11-29 10:38:01 +01:00
Diego Biurrun	eb135516e6	ac3enc: Avoid unnecessary macro indirections	2016-11-28 17:19:30 +01:00
Diego Biurrun	f0d3e43bd7	ac3enc: Reshuffle functions to avoid forward declarations	2016-11-28 17:19:30 +01:00
Diego Biurrun	e22c63ac74	ac3enc: Reshuffle some float/fixed-mode ifdefs to avoid a dummy function	2016-11-28 17:19:30 +01:00
Anton Khirnov	4adbb44ad1	tta: avoid undefined shifts Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-11-25 21:42:33 +01:00
Anton Khirnov	dc4b625028	tta: use get_unary() instead of a custom implementation Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-11-25 21:42:33 +01:00
Martin Storsjö	2f99117f6f	aarch64: vp9itxfm: Don't repeatedly set x9 when nothing overwrites it Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-24 13:39:21 +02:00
Alexandra Hájková	178b4ea5f9	xsubdec: Convert to the new bitstream reader	2016-11-24 11:22:13 +01:00
Alexandra Hájková	be35ef92a4	xan: Convert to the new bitstream reader	2016-11-24 11:22:13 +01:00
Alexandra Hájková	f9c59f26c8	wnv1: Convert to the new bitstream reader	2016-11-24 11:22:13 +01:00
Alexandra Hájková	0536e7d782	vima: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	e5bdfc6790	vble: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	104a4289f9	utvideodec: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	85f760fedd	twinvq: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	0bea79afa6	tscc2: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	8e4cadea5d	truespeech: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	0ac07d0b8d	tiertex: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	9ab1a3e283	truemotion2: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	9f78e3a46d	svq1dec: Convert to the new bitstream reader	2016-11-24 11:22:12 +01:00
Alexandra Hájková	6efbc88a5c	smacker: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Alexandra Hájková	087bc8d704	sipr: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Alexandra Hájková	f26cbb555b	rtjpeg: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Alexandra Hájková	c60cda7cb4	ra288: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Alexandra Hájková	7d8075cf47	ra144: Convert to the new bitstream reader	2016-11-24 11:22:11 +01:00
Martin Storsjö	79566ec8c7	arm: vp9itxfm: Rename a macro parameter to fit better Since the same parameter is used for both input and output, the name inout is more fitting. This matches the naming used below in the dmbutterfly macro. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-23 23:56:56 +02:00
Martin Storsjö	721bc37522	arm/aarch64: vp9itxfm: Fix indentation of macro arguments Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-23 23:56:16 +02:00
James Almer	aa498c3183	avpacket: fix leak on realloc in av_packet_add_side_data() If realloc fails, the pointer is overwritten and the previously allocated buffer is leaked, which goes against the expected functionality of keeping the packet unchanged in case of error. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-23 13:17:52 +01:00
Andreas Cadhalpun	f92d7bdfdd	libopusdec: default to stereo for invalid number of channels This fixes an out-of-bounds read if avc->channels is 0. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-23 13:03:15 +01:00
Diego Biurrun	b34c6cd57a	dvbsub: cosmetics: Group all debug code together	2016-11-23 07:40:46 +01:00
Diego Biurrun	b8cd7a3c8d	dvbsub: Check for errors from system() libavcodec/dvbsubdec.c:145:5: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result] libavcodec/dvbsubdec.c:148:5: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result]	2016-11-23 07:36:32 +01:00
Diego Biurrun	6427379f23	als: Restructure DEBUG ifdefs to avoid unused function parameter warnings	2016-11-22 17:28:17 +01:00
Diego Biurrun	367f95af55	ac3enc: Restructure DEBUG ifdefs to avoid unused function parameter warnings	2016-11-22 17:28:17 +01:00
Diego Biurrun	81a3c42abe	Drop some bogus Doxygen documentation.	2016-11-21 14:29:11 +01:00
Diego Biurrun	a1d9de304f	Fix some mismatches between function parameter and doxygen parameter names.	2016-11-21 14:29:10 +01:00
Martin Storsjö	4d960a1185	aarch64: vp9itxfm: Use w3 instead of x3 for the int eob parameter The clobbering tests in checkasm are only invoked when testing correctness, so this bug didn't show up when benchmarking the dc-only version. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-18 23:17:33 +02:00
Janne Grunau	e5b0fc170f	arm: vp9itxfm: Simplify the stack alignment code This is one instruction less for thumb, and only have got 1/2 arm/thumb specific instructions. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-18 23:17:26 +02:00
Alexandra Hájková	0b5a26e8bc	qdm2: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:22 +01:00
Alexandra Hájková	0dabd329e8	qcelp: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:18 +01:00
Alexandra Hájková	770406d1e8	pcx: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:14 +01:00
Alexandra Hájková	b3441350fa	opus: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:11 +01:00
Alexandra Hájková	6f94a64bd6	nellymoser: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:08 +01:00
Alexandra Hájková	15d4dbfd4a	jvdec: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:36:04 +01:00
Alexandra Hájková	1df549bfa2	hqx: Convert to the new bitstream header Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:43 +01:00
Alexandra Hájková	c5e01d9170	hq_hqa: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:39 +01:00
Alexandra Hájková	b2c56301f9	gsm: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:36 +01:00
Alexandra Hájková	2188d53906	g72x: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:33 +01:00
Alexandra Hájková	799703c3ea	g2meet: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:26 +01:00
Alexandra Hájková	b37b681f77	fraps: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:14 +01:00
Alexandra Hájková	692ba4fe64	flashsv: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:10 +01:00
Alexandra Hájková	418ccdd703	faxcompr: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:07 +01:00
Alexandra Hájková	8df1ac6b78	exr: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:04 +01:00
Alexandra Hájková	2906d8dcb3	escape130: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:35:01 +01:00
Alexandra Hájková	c43eb73172	escape124: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:57 +01:00
Alexandra Hájková	d8618570be	dvdsubdec: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:53 +01:00
Alexandra Hájková	928f8c7ce3	dss_sp: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:38 +01:00
Alexandra Hájková	942e84d2a3	cook: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:32 +01:00
Alexandra Hájková	e561146611	cljrdec: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:29 +01:00
Alexandra Hájková	b4c0daa83c	cdxl: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:24 +01:00
Alexandra Hájková	0977a7c2f6	binkaudio: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:15 +01:00
Alexandra Hájková	9a23b59943	bink: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:10 +01:00
Alexandra Hájková	dae9b0b9c6	avs: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:34:04 +01:00
Alexandra Hájková	edd4c19a78	atrac3plus: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:33:59 +01:00
Alexandra Hájková	0272119202	atrac: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:33:50 +01:00
Alexandra Hájková	41679be1a2	asvdec: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:33:45 +01:00
Alexandra Hájková	012c451153	adpcm: Convert to the new bitstream header Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:33:01 +01:00
Alexandra Hájková	ed006ae4e2	4xm: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:32:57 +01:00
Alexandra Hájková	b25180801b	on2avc: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:32:54 +01:00
Alexandra Hájková	7d957b3f47	ea: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:32:45 +01:00
Alexandra Hájková	adb1ebb36c	eamad: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:32:40 +01:00
Alexandra Hájková	d182d8a6d3	cllc: Convert to the new bitstream reader Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:31:59 +01:00
Alexandra Hájková	dd3d7ddf2a	lavc: add a new bitstream reader to replace get_bits The new bit reader features a simpler API and an implementation without stacks of nested macros. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-18 10:31:56 +01:00
Luca Barbato	adb0e941c3	avpacket: Mark src pointer as constant	2016-11-17 19:41:12 +01:00
Diego Biurrun	76167140a9	qsvdec: Drop stray extra braces around initializer libavcodec/qsvdec.c:93:5: warning: braces around scalar initializer	2016-11-17 16:53:48 +01:00
Diego Biurrun	715b824346	qsv: Drop some unused variables	2016-11-17 16:53:48 +01:00
Janne Grunau	e7ae8f7a71	aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne}; The latter is 1 cycle faster on a cortex-53 and since the operands are bytewise (or larger) bitmask (impossible to overflow to zero) both are equivalent.	2016-11-16 09:05:18 +01:00
Janne Grunau	d7595de0b2	aarch64: vp9: use alternative returns in the core loop filter function Since aarch64 has enough free general purpose registers use them to branch to the appropiate storage code. 1-2 cycles faster for the functions using loop_filter 8/16, ... on a cortex-a53. Mixed results (up to 2 cycles faster/slower) on a cortex-a57.	2016-11-16 09:05:18 +01:00
Gianluigi Tiesi	e17567a831	libilbc: support for latest git of libilbc In the latest git commits of libilbc developers removed WebRtc_xxx typedefs. This commit uses int types instead. It's safe to apply also for previous versions since WebRtc_Word16 was always a typedef of int16_t and WebRtc_UWord16 a typedef of uint16_t. Reviewed-by: Timothy Gu <timothygu99@gmail.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-11-16 08:21:05 +01:00
Diego Biurrun	f7407f56cb	golomb: Replace __PRETTY_FUNCTION__ with __func__ for tracing The former is a GNU extension while the latter is C99.	2016-11-15 09:41:08 +01:00
Mark Thompson	e0b164576f	qsv: Add VP8 decoder	2016-11-14 19:38:20 +00:00
Mark Thompson	182cf170a5	vp8: Return stream format information from parser	2016-11-14 19:38:19 +00:00
Mark Thompson	b6582b2927	qsv: Add VC-1 decoder It uses the same code as the MPEG-2 decoder, so the file is renamed to contain all "other" (that is, not H.26[45]) codecs.	2016-11-14 19:38:19 +00:00
Mark Thompson	fea4dc05b4	vc1: Return stream format information from parser	2016-11-14 19:38:19 +00:00
Mark Thompson	0940b748bd	qsvdec: Only warn about unconsumed data if it happens more than once	2016-11-14 19:38:19 +00:00
Mark Thompson	030d84fa2e	qsvdec: Pass field order information to libmfx The VC-1 decoder fails to initialise if this is not set.	2016-11-14 19:38:19 +00:00
Mark Thompson	cd1047f391	qsvdec: Pass the correct profile to libmfx This was correct for H.26[45], because libmfx uses the same values derived from profile_idc and the constraint_set flags, but it is wrong for other codecs. Also avoid passing FF_LEVEL_UNKNOWN (-99) as the level, as this is certainly invalid.	2016-11-14 19:38:19 +00:00
Mark Thompson	3297577f3e	mpegvideo: Return correct coded frame sizes from parser	2016-11-14 19:38:19 +00:00
Janne Grunau	31756abe29	aarch64: vp9: loop_filter: fix typo in skip flatout8 check The 16_16 loop filter functions could miss an early exit before flatout8. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 08:51:58 +02:00
Martin Storsjö	9d2afd1eb8	aarch64: vp9: Implement NEON loop filters This work is sponsored by, and copyright, Google. These are ported from the ARM version; thanks to the larger amount of registers available, we can do the loop filters with 16 pixels at a time. The implementation is fully templated, with a single macro which can generate versions for both 8 and 16 pixels wide, for both 4, 8 and 16 pixels loop filters (and the 4/8 mixed versions as well). For the 8 pixel wide versions, it is pretty close in speed (the v_4_8 and v_8_8 filters are the best examples of this; the h_4_8 and h_8_8 filters seem to get some gain in the load/transpose/store part). For the 16 pixels wide ones, we get a speedup of around 1.2-1.4x compared to the 32 bit version. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_neon: 144.0 127.2 vp9_loop_filter_h_8_8_neon: 207.0 182.5 vp9_loop_filter_h_16_8_neon: 415.0 328.7 vp9_loop_filter_h_16_16_neon: 672.0 558.6 vp9_loop_filter_mix2_h_44_16_neon: 302.0 203.5 vp9_loop_filter_mix2_h_48_16_neon: 365.0 305.2 vp9_loop_filter_mix2_h_84_16_neon: 365.0 305.2 vp9_loop_filter_mix2_h_88_16_neon: 376.0 305.2 vp9_loop_filter_mix2_v_44_16_neon: 193.2 128.2 vp9_loop_filter_mix2_v_48_16_neon: 246.7 218.4 vp9_loop_filter_mix2_v_84_16_neon: 248.0 218.5 vp9_loop_filter_mix2_v_88_16_neon: 302.0 218.2 vp9_loop_filter_v_4_8_neon: 89.0 88.7 vp9_loop_filter_v_8_8_neon: 141.0 137.7 vp9_loop_filter_v_16_8_neon: 295.0 272.7 vp9_loop_filter_v_16_16_neon: 546.0 453.7 The speedup vs C code in checkasm tests is around 2-7x, which is pretty much the same as for the 32 bit version. Even if these functions are faster than their 32 bit equivalent, the C version that we compare to also became around 1.3-1.7x faster than the C version in 32 bit. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 4-5x. Examples of runtimes vs C on a Cortex A57 (for a slightly older version of the patch): A57 gcc-5.3 neon loop_filter_h_4_8_neon: 256.6 93.4 loop_filter_h_8_8_neon: 307.3 139.1 loop_filter_h_16_8_neon: 340.1 254.1 loop_filter_h_16_16_neon: 827.0 407.9 loop_filter_mix2_h_44_16_neon: 524.5 155.4 loop_filter_mix2_h_48_16_neon: 644.5 173.3 loop_filter_mix2_h_84_16_neon: 630.5 222.0 loop_filter_mix2_h_88_16_neon: 697.3 222.0 loop_filter_mix2_v_44_16_neon: 598.5 100.6 loop_filter_mix2_v_48_16_neon: 651.5 127.0 loop_filter_mix2_v_84_16_neon: 591.5 167.1 loop_filter_mix2_v_88_16_neon: 855.1 166.7 loop_filter_v_4_8_neon: 271.7 65.3 loop_filter_v_8_8_neon: 312.5 106.9 loop_filter_v_16_8_neon: 473.3 206.5 loop_filter_v_16_16_neon: 976.1 327.8 The speed-up compared to the C functions is 2.5 to 6 and the cortex-a57 is again 30-50% faster than the cortex-a53. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 00:10:13 +02:00
Martin Storsjö	52d196fb30	arm: vp9itxfm: Simplify txfm string comparisons Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 00:10:13 +02:00
Martin Storsjö	3c9546dfaf	aarch64: vp9: Add NEON itxfm routines This work is sponsored by, and copyright, Google. These are ported from the ARM version; thanks to the larger amount of registers available, we can do the 16x16 and 32x32 transforms in slices 8 pixels wide instead of 4. This gives a speedup of around 1.4x compared to the 32 bit version. The fact that aarch64 doesn't have the same d/q register aliasing makes some of the macros quite a bit simpler as well. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_inv_adst_adst_4x4_add_neon: 90.0 87.7 vp9_inv_adst_adst_8x8_add_neon: 400.0 354.7 vp9_inv_adst_adst_16x16_add_neon: 2526.5 1827.2 vp9_inv_dct_dct_4x4_add_neon: 74.0 72.7 vp9_inv_dct_dct_8x8_add_neon: 271.0 256.7 vp9_inv_dct_dct_16x16_add_neon: 1960.7 1372.7 vp9_inv_dct_dct_32x32_add_neon: 11988.9 8088.3 vp9_inv_wht_wht_4x4_add_neon: 63.0 57.7 The speedup vs C code (2-4x) is smaller than in the 32 bit case, mostly because the C code ends up significantly faster (around 1.6x faster, with GCC 5.4) when built for aarch64. Examples of runtimes vs C on a Cortex A57 (for a slightly older version of the patch): A57 gcc-5.3 neon vp9_inv_adst_adst_4x4_add_neon: 152.2 60.0 vp9_inv_adst_adst_8x8_add_neon: 948.2 288.0 vp9_inv_adst_adst_16x16_add_neon: 4830.4 1380.5 vp9_inv_dct_dct_4x4_add_neon: 153.0 58.6 vp9_inv_dct_dct_8x8_add_neon: 789.2 180.2 vp9_inv_dct_dct_16x16_add_neon: 3639.6 917.1 vp9_inv_dct_dct_32x32_add_neon: 20462.1 4985.0 vp9_inv_wht_wht_4x4_add_neon: 91.0 49.8 The asm is around factor 3-4 faster than C on the cortex-a57 and the asm is around 30-50% faster on the a57 compared to the a53. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-14 00:10:13 +02:00
Diego Biurrun	800d91d348	Drop pointless void* casts	2016-11-13 18:44:01 +01:00
Diego Biurrun	d316f9cefc	aac: Drop pointless cast	2016-11-13 18:44:00 +01:00
Diego Biurrun	3b50dbc51f	ratecontrol: Use correct function pointer casts instead of void* libavcodec/ratecontrol.c:120:9: warning: ISO C forbids initialization between function pointer and ‘void ’ [-Wpedantic] libavcodec/ratecontrol.c:121:9: warning: ISO C forbids initialization between function pointer and ‘void ’ [-Wpedantic]	2016-11-12 16:47:06 +01:00
Martin Storsjö	dd299a2d6d	arm: vp9: Add NEON loop filters This work is sponsored by, and copyright, Google. The implementation tries to have smart handling of cases where no pixels need the full filtering for the 8/16 width filters, skipping both calculation and writeback of the unmodified pixels in those cases. The actual effect of this is hard to test with checkasm though, since it tests the full filtering, and the benefit depends on how many filtered blocks use the shortcut. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_loop_filter_h_4_8_neon: 2.72 2.68 1.78 3.15 vp9_loop_filter_h_8_8_neon: 2.36 2.38 1.70 2.91 vp9_loop_filter_h_16_8_neon: 1.80 1.89 1.45 2.01 vp9_loop_filter_h_16_16_neon: 2.81 2.78 2.18 3.16 vp9_loop_filter_mix2_h_44_16_neon: 2.65 2.67 1.93 3.05 vp9_loop_filter_mix2_h_48_16_neon: 2.46 2.38 1.81 2.85 vp9_loop_filter_mix2_h_84_16_neon: 2.50 2.41 1.73 2.85 vp9_loop_filter_mix2_h_88_16_neon: 2.77 2.66 1.96 3.23 vp9_loop_filter_mix2_v_44_16_neon: 4.28 4.46 3.22 5.70 vp9_loop_filter_mix2_v_48_16_neon: 3.92 4.00 3.03 5.19 vp9_loop_filter_mix2_v_84_16_neon: 3.97 4.31 2.98 5.33 vp9_loop_filter_mix2_v_88_16_neon: 3.91 4.19 3.06 5.18 vp9_loop_filter_v_4_8_neon: 4.53 4.47 3.31 6.05 vp9_loop_filter_v_8_8_neon: 3.58 3.99 2.92 5.17 vp9_loop_filter_v_16_8_neon: 3.40 3.50 2.81 4.68 vp9_loop_filter_v_16_16_neon: 4.66 4.41 3.74 6.02 The speedup vs C code is around 2-6x. The numbers are quite inconclusive though, since the checkasm test runs multiple filterings on top of each other, so later rounds might end up with different codepaths (different decisions on which filter to apply, based on input pixel differences). Disabling the early-exit in the asm doesn't give a fair comparison either though, since the C code only does the necessary calcuations for each row. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 4-9x. This is pretty similar in runtime to the corresponding routines in libvpx. (This is comparing vpx_lpf_vertical_16_neon, vpx_lpf_horizontal_edge_8_neon and vpx_lpf_horizontal_edge_16_neon to vp9_loop_filter_h_16_8_neon, vp9_loop_filter_v_16_8_neon and vp9_loop_filter_v_16_16_neon - note that the naming of horizonal and vertical is flipped between the libraries.) In order to have stable, comparable numbers, the early exits in both asm versions were disabled, forcing the full filtering codepath. Cortex A7 A8 A9 A53 vp9_loop_filter_h_16_8_neon: 597.2 472.0 482.4 415.0 libvpx vpx_lpf_vertical_16_neon: 626.0 464.5 470.7 445.0 vp9_loop_filter_v_16_8_neon: 500.2 422.5 429.7 295.0 libvpx vpx_lpf_horizontal_edge_8_neon: 586.5 414.5 415.6 383.2 vp9_loop_filter_v_16_16_neon: 905.0 784.7 791.5 546.0 libvpx vpx_lpf_horizontal_edge_16_neon: 1060.2 751.7 743.5 685.2 Our version is consistently faster on on A7 and A53, marginally slower on A8, and sometimes faster, sometimes slower on A9 (marginally slower in all three tests in this particular test run). Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 14:16:42 +02:00
Diego Biurrun	f7d183f084	libxvid: Check return value of write() call libavcodec/libxvid_rc.c:106:9: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]	2016-11-11 10:17:07 +01:00
Diego Biurrun	e5e8a26dcf	libxvid: Use proper context in av_log() calls	2016-11-11 10:17:07 +01:00
Diego Biurrun	12db2832e4	libxvid: Require availability of mkstemp() The replacement code uses tempnam(), which is dangerous. Such a fringe feature is not worth the trouble.	2016-11-11 10:17:07 +01:00
Martin Storsjö	a67ae67083	arm: vp9: Add NEON itxfm routines This work is sponsored by, and copyright, Google. For the transforms up to 8x8, we can fit all the data (including temporaries) in registers and just do a straightforward transform of all the data. For 16x16, we do a transform of 4x16 pixels in 4 slices, using a temporary buffer. For 32x32, we transform 4x32 pixels at a time, in two steps of 4x16 pixels each. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_inv_adst_adst_4x4_add_neon: 3.39 5.83 4.17 4.01 vp9_inv_adst_adst_8x8_add_neon: 3.79 4.86 4.23 3.98 vp9_inv_adst_adst_16x16_add_neon: 3.33 4.36 4.11 4.16 vp9_inv_dct_dct_4x4_add_neon: 4.06 6.16 4.59 4.46 vp9_inv_dct_dct_8x8_add_neon: 4.61 6.01 4.98 4.86 vp9_inv_dct_dct_16x16_add_neon: 3.35 3.44 3.36 3.79 vp9_inv_dct_dct_32x32_add_neon: 3.89 3.50 3.79 4.42 vp9_inv_wht_wht_4x4_add_neon: 3.22 5.13 3.53 3.77 Thus, the speedup vs C code is around 3-6x. This is mostly marginally faster than the corresponding routines in libvpx on most cores, tested with their 32x32 idct (compared to vpx_idct32x32_1024_add_neon). These numbers are slightly in libvpx's favour since their version doesn't clear the input buffer like ours do (although the effect of that on the total runtime probably is negligible.) Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_add_neon: 18436.8 16874.1 14235.1 11988.9 libvpx vpx_idct32x32_1024_add_neon 20789.0 13344.3 15049.9 13030.5 Only on the Cortex A8, the libvpx function is faster. On the other cores, ours is slightly faster even though ours has got source block clearing integrated. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 11:09:05 +02:00
Mark Thompson	fd0fae6037	pthread_frame: Unreference hw_frames_ctx on per-thread codec contexts When decoding with threads enabled, the get_format callback will be called with one of the per-thread codec contexts rather than with the outer context. If a hwaccel is in use too, this will add a reference to the hardware frames context on that codec context, which will then propagate to all of the other per-thread contexts for decoding. Once the decoder finishes, however, the per-thread contexts are not freed normally, so these references leak.	2016-11-10 20:36:11 +00:00
Martin Storsjö	11623217e3	arm: vp9mc: Use a different helper register for PIC loads This fixes crashes since `557c1675cf` in linux PIC builds. Previously, movrelx silently used r12 as helper register, which doesn't work when r12 is the destination register. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 14:01:04 +02:00
Martin Storsjö	6a62795d40	aarch64: h264idct: Use the offset parameter to movrel Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:18:22 +02:00
Martin Storsjö	557c1675cf	arm: vp9mc: Minor adjustments from review of the aarch64 version This work is sponsored by, and copyright, Google. The speedup for the large horizontal filters is surprisingly big on A7 and A53, while there's a minor slowdown (almost within measurement noise) on A8 and A9. Cortex A7 A8 A9 A53 orig: vp9_put_8tap_smooth_64h_neon: 20270.0 14447.3 19723.9 10910.9 new: vp9_put_8tap_smooth_64h_neon: 20165.8 14466.5 19730.2 10668.8 Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:18:22 +02:00
Martin Storsjö	383d96aa22	aarch64: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:15:56 +02:00
Martin Storsjö	a4cfcddcb0	vp9: Make the subpel filters non-static Make them aligned, to allow efficient access to them from simd. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:05:57 +02:00
Anton Khirnov	84f225684c	pthread_frame: properly propagate the hw frame context across frame threads	2016-11-10 09:00:11 +01:00
Diego Biurrun	72a19f4013	mpegaudiodsp: aarch64: Adjust function prototype after `2caa93b813`	2016-11-10 00:13:48 +01:00
Diego Biurrun	67deba8a41	Use avpriv_report_missing_feature() where appropriate	2016-11-08 17:54:34 +01:00
Vittorio Giovara	47a795727f	hevc: Support extradata changes from multiple stsd Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00

... 2 3 4 5 6 ...

21503 Commits