FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-21 10:55:51 +02:00

Author	SHA1	Message	Date
Diego Biurrun	6eef263aca	x86: Merge align directives into SECTION_RODATA declarations where possible	2017-03-05 14:26:06 +01:00
Ganapathy Kasi	3303f86467	nvenc: Remove qmin and qmax constraints for nvenc vbr qmin and qmax are not necessary for nvenc vbr. Also fix for using 2 pass vbr mode for slow preset through ctx->flag NVENC_TWO_PASSES. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-03-04 08:23:28 +01:00
Paul B Mahol	aba5b94859	Add Apple Pixlet decoder Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2017-03-01 11:52:29 -05:00
Diego Biurrun	39e208f4d4	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler.	2017-03-01 10:18:15 +01:00
Diego Biurrun	fde7ee8710	x86: hevc: Add missing colons after assembly labels This fixes several warnings of the sort warning: label alone on a line without a colon might be in error	2017-03-01 09:23:42 +01:00
Michael Niedermayer	d7b2bb5391	h264_sei: Check actual presence of picture timing SEI message Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2017-02-28 10:32:50 -05:00
Ben Chang	d8f36a6aa3	nvenc: Fix the preset mapping list The map is a sparse array and does not need a empty element to terminate it. The empty element is stored after the last one inserted in the list, overwriting whichever element was next with zeros. Bug-Id: 1029 Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-28 11:54:02 +01:00
Anton Khirnov	984736dd9e	lavc: make sure not to return EAGAIN from codecs This error is treated specially by the API. CC: libav-stable@libav.org	2017-02-25 09:57:44 +01:00
Anton Khirnov	b2788fe934	svq3: fix the slice size check Currently it incorrectly compares bits with bytes. Also, move the check right before where it's relevant, so that the correct number of remaining bits is used. CC: libav-stable@libav.org	2017-02-25 09:57:43 +01:00
John Stebbins	248dc5c164	h264dec: fix dropped initial SEI recovery point	2017-02-24 08:24:13 -07:00
Martin Storsjö	b8f66c0838	aarch64: vp9itxfm: Reorder iadst16 coeffs This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:04:34 +02:00
Martin Storsjö	08074c092d	arm: vp9itxfm: Reorder iadst16 coeffs This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:04:33 +02:00
Martin Storsjö	09eb88a12e	aarch64: vp9itxfm: Reorder the idct coefficients for better pairing All elements are used pairwise, except for the first one. Previously, the 16th element was unused. Move the unused element to the second slot, to make the later element pairs not split across registers. This simplifies loading only parts of the coefficients, reducing the difference to the 16 bpp version. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:04:32 +02:00
Martin Storsjö	de06bdfe6c	arm: vp9itxfm: Reorder the idct coefficients for better pairing All elements are used pairwise, except for the first one. Previously, the 16th element was unused. Move the unused element to the second slot, to make the later element pairs not split across registers. This simplifies loading only parts of the coefficients, reducing the difference to the 16 bpp version. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:04:31 +02:00
Martin Storsjö	65aa002d54	aarch64: vp9itxfm: Avoid reloading the idct32 coefficients The idct32x32 function actually pushed d8-d15 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. After this, we still can skip pushing d12-d15. Before: vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:03:44 +02:00
Martin Storsjö	402546a172	arm: vp9itxfm: Avoid reloading the idct32 coefficients The idct32x32 function actually pushed q4-q7 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. Since the idct16 core transform avoids clobbering q4-q7 (but clobbers q2-q3 instead, to avoid needing to back up and restore q4-q7 at all in the idct16 function), and the lanewise vmul needs a register in the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5 while doing idct16. While keeping these coefficients in registers, we still can skip pushing q7. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_sub32_add_neon: 18553.8 17182.7 14303.3 12089.7 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 18470.3 16717.7 14173.6 11860.8 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:03:43 +02:00
Martin Storsjö	575e31e931	arm: vp9lpf: Implement the mix2_44 function with one single filter pass For this case, with 8 inputs but only changing 4 of them, we can fit all 16 input pixels into a q register, and still have enough temporary registers for doing the loop filter. The wd=8 filters would require too many temporary registers for processing all 16 pixels at once though. Before: Cortex A7 A8 A9 A53 vp9_loop_filter_mix2_v_44_16_neon: 289.7 256.2 237.5 181.2 After: vp9_loop_filter_mix2_v_44_16_neon: 221.2 150.5 177.7 138.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:03:09 +02:00
Martin Storsjö	3bf9c48320	aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1 This is one cycle faster in total, and three instructions fewer. Before: vp9_loop_filter_mix2_v_44_16_neon: 123.2 After: vp9_loop_filter_mix2_v_44_16_neon: 122.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:03:00 +02:00
Martin Storsjö	c582cb8537	arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit The theoretical maximum value of E is 193, so we can just saturate the addition to 255. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 After: vp9_loop_filter_v_4_8_neon: 136.0 125.7 112.6 84.0 83.0 vp9_loop_filter_v_8_8_neon: 234.0 195.5 171.5 136.0 133.7 vp9_loop_filter_v_16_8_neon: 490.0 417.5 377.7 289.0 271.0 vp9_loop_filter_v_16_16_neon: 951.2 814.7 732.3 571.0 446.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:02:36 +02:00
Diego Biurrun	ed6a891c36	Place attribute_deprecated in the right position for struct declarations libavcodec/vaapi.h:58:1: warning: attribute 'deprecated' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]	2017-02-23 12:23:20 +01:00
Diego Biurrun	00b160af11	nvenc: Fix nvec vs. nvenc typo	2017-02-20 09:50:03 +01:00
Mark Thompson	7cb9296db8	webp: Fix alpha decoding This was broken by `4e528206bc` - the webp decoder was assuming that it could set the output pixfmt of the vp8 decoder directly, but after that change it no longer could because ff_get_format() was used instead. This adds an internal get_format() callback to webp use of the vp8 decoder to override the pixfmt appropriately.	2017-02-18 19:53:20 +00:00
Mark Thompson	17aeee5832	vaapi_encode: Discard output buffer if picture submission fails Previously this was leaking, though it actually hit an assert making sure that the buffer had already been cleared when freeing the picture.	2017-02-16 20:58:42 +00:00
Martin Storsjö	030de53e9c	libopenh264dec: Let the framework use the h264_mp4toannexb bitstream filter This avoids a lot of boilerplate code within the decoder wrapper itself. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-15 23:05:58 +02:00
Mark Thompson	5dd9a4b88b	vaapi: Implement device-only setup In this case, the user only supplies a device and the frame context is allocated internally by lavc.	2017-02-13 21:44:43 +00:00
Mark Thompson	44f2eda39f	lavc: Add device context field to AVCodecContext For use by codec implementations which can allocate frames internally.	2017-02-13 20:14:27 +00:00
Martin Storsjö	07b5136c48	aarch64: vp9lpf: Fix broken indentation/vertical alignment Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-12 21:57:23 +02:00
Martin Storsjö	b0806088d3	aarch64: vp9lpf: Interleave the start of flat8in into the calculation above This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 22:54:18 +02:00
Martin Storsjö	e18c39005a	arm: vp9lpf: Interleave the start of flat8in into the calculation above This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 22:54:01 +02:00
Luca Barbato	9c2d36fcaf	dv: Convert to the new bitstream reader	2017-02-11 20:29:44 +01:00
Luca Barbato	ba30b74686	aac: Validate the sbr sample rate before using the value Avoid a floating point exception. Bug-Id: 1027 CC: libav-stable@libav.org	2017-02-11 20:23:11 +01:00
Anton Khirnov	f44ec22e09	lavc: use av_cpu_max_align() instead of hardcoding alignment requirements	2017-02-11 11:37:45 +01:00
Martin Storsjö	435cd7bc99	arm: vp9lpf: Use orrs instead of orr+cmp Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:44:04 +02:00
Martin Storsjö	e1f9de86f4	arm/aarch64: vp9lpf: Calculate !hev directly Previously we first calculated hev, and then negated it. Since we were able to schedule the negation in the middle of another calculation, we don't see any gain in all cases. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 147.0 129.0 115.8 89.0 88.7 vp9_loop_filter_v_8_8_neon: 242.0 198.5 174.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 500.0 419.5 382.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 971.2 825.5 731.5 579.0 453.0 After: vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:43:59 +02:00
Martin Storsjö	3fcf788fbb	aarch64: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling This work is sponsored by, and copyright, Google. Before: Cortex A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 180.2 vp9_inv_dct_dct_32x32_sub1_add_neon: 475.3 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:31:58 +02:00
Martin Storsjö	a76bf8cf12	arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling This work is sponsored by, and copyright, Google. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0 459.2 862.2 553.9 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 226.5 145.0 225.1 171.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 721.2 415.7 727.6 475.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:31:52 +02:00
Martin Storsjö	388e0d2515	aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter No measured speedup on a Cortex A53, but other cores might benefit. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:08:50 +02:00
Martin Storsjö	fea92a4b57	arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter Before: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5 After: vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:08:37 +02:00
Martin Storsjö	5e0c2158fb	aarch64: vp9mc: Simplify the extmla macro parameters Fold the field lengths into the macro. This makes the macro invocations much more readable, when the lines are shorter. This also makes it easier to use only half the registers within the macro. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:08:29 +02:00
Martin Storsjö	bc25897630	utvideodec: Add a missing include This was missing from `77c23704c7`, fixing building. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-10 09:31:49 +02:00
Timo Rothenpieler	a52976c0fe	nvenc: make gpu indices independent of supported capabilities Do not allocate a CUDA context for every available gpu. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-09 23:29:32 +01:00
Derek Buitenhuis	77c23704c7	avcodec: Mark some codecs with threadsafe init as such Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-09 23:28:18 +01:00
Martin Storsjö	0c0b87f12d	aarch64: vp9itxfm: Fix incorrect vertical alignment Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:57:06 +02:00
Martin Storsjö	8476eb0d3a	aarch64: vp9itxfm: Update a comment to refer to a register with a different name Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:57:02 +02:00
Martin Storsjö	3dd7827258	aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:59 +02:00
Martin Storsjö	ed8d293306	aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible The ld1r is a leftover from the arm version, where this trick is beneficial on some cores. Use a single-lane load where we don't need the semantics of ld1r. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:54 +02:00
Martin Storsjö	4da4b2b87f	aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:50 +02:00
Martin Storsjö	3933b86bb9	arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:44 +02:00
Martin Storsjö	a63da4511d	aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:32:03 +02:00
Martin Storsjö	5eb5aec475	arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:32:00 +02:00

1 2 3 4 5 ...

21454 Commits