FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-19 05:49:09 +02:00

Author	SHA1	Message	Date
Mark Thompson	210dd7bbb2	Merge commit '21962261c74aed4df00ae8348a5e2d1ecb67c52d' * commit '21962261c74aed4df00ae8348a5e2d1ecb67c52d': qsv: handle the semi-packed formats in map_fourcc as well Merged-by: Mark Thompson <sw@jkqxz.net>	2017-03-12 14:21:37 +00:00
Clément Bœsch	5e193daaa2	Merge commit 'f65285aba0df7d46298abe0c945dfee05cbc6028' * commit 'f65285aba0df7d46298abe0c945dfee05cbc6028': lavc: set sw_pix_fmt for hwaccel encoding Merged-by: Clément Bœsch <u@pkh.me>	2017-03-12 13:21:01 +01:00
Clément Bœsch	8d2d817098	Merge commit 'd59641abfd25a1007bdf4723d952887b1e3619c6' * commit 'd59641abfd25a1007bdf4723d952887b1e3619c6': lavc: initialize AVCodecContext.sw_pix_fmt properly Merged-by: Clément Bœsch <u@pkh.me>	2017-03-12 13:20:57 +01:00
Clément Bœsch	15f6e5f2a9	Merge commit '8b7a9729aa162e2bbd571933f1aa40767f1ff47b' * commit '8b7a9729aa162e2bbd571933f1aa40767f1ff47b': avconv_qsv: use the actual pixel format provided by lavc This commit is a noop, see 03cef34aa66 Merged-by: Clément Bœsch <u@pkh.me>	2017-03-12 13:13:55 +01:00
Clément Bœsch	e514309a91	Merge commit '6f40181cad8ac04adff7bd10e1e1ab65f22bc1f0' * commit '6f40181cad8ac04adff7bd10e1e1ab65f22bc1f0': avconv_qsv: align the surface size to 32 This commit is a noop, see 03cef34aa66 Merged-by: Clément Bœsch <u@pkh.me>	2017-03-12 13:13:05 +01:00
Clément Bœsch	993a9a3d72	Merge commit 'b0f36a0043d76436cc7ab8ff92ab99c94595d3c0' * commit 'b0f36a0043d76436cc7ab8ff92ab99c94595d3c0': avconv: stop using setpts for input framerate forced with -r Merged-by: Clément Bœsch <u@pkh.me>	2017-03-12 13:08:04 +01:00
Paul B Mahol	807d5dcde9	avcodec/scpr: use correct linesize for prev frame Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-03-12 12:34:55 +01:00
Michael Niedermayer	ce010655a6	avcodec/dca_xll: Fix runtime error: signed integer overflow: 2147286116 + 6298923 cannot be represented in type 'int' Fixes: 732/clusterfuzz-testcase-4872990070145024 See: [FFmpeg-devel] [PATCH 2/6] avcodec/dca_xll: Fix runtime error: signed integer overflow: 2147286116 + 6298923 cannot be represented in type 'int' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-12 04:38:14 +01:00
Michael Niedermayer	44e2105189	avcodec/amrwbdec: Fix runtime error: left shift of negative value -1 Fixes: 763/clusterfuzz-testcase-6007567320875008 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-12 04:38:14 +01:00
Michael Niedermayer	f4c2302ee2	avcodec/dca_xll: Fix runtime error: signed integer overflow: 1762028192 + 698372290 cannot be represented in type 'int' Fixes: 762/clusterfuzz-testcase-5927683747741696 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-12 04:38:14 +01:00
Michael Niedermayer	47cc9c1d77	avcodec/wavpack: Fix runtime error: signed integer overflow: -2147483648 + -83886075 cannot be represented in type 'int' Fixes: 761/clusterfuzz-testcase-5442222252097536 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-12 04:38:14 +01:00
Muhammad Faiz	0bab78f7e7	avfilter/af_firequalizer: add av_restrict on convolution func slightly improved speed Reviewed-by: wm4 <nfxjfg@googlemail.com> Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>	2017-03-12 03:21:55 +07:00
Przemysław Sobala	89c0fda5f4	lavf/dashenc: update bitrates on dash_write_trailer Provides a way to change bandwidth parameter inside DASH manifest after a non-CBR H.264 encoding. Caller now is able to compute the bitrate by itself, after all packets have been written, and then set that value in AVFormatContext->streams->codecpar->bit_rate before calling av_write_trailer. As a result that value will be set in DASH manifest. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-11 16:43:43 +01:00
Steven Liu	70a9407b50	doc/muxers: move hls_flags temp_file to after SECOND LEVEL hls example the temp_file hls_flags describe text offset is wrong, now move it after example Signed-off-by: Steven Liu <lq@chinaffmpeg.org>	2017-03-11 21:11:38 +08:00
Martin Storsjö	26ee83acc4	aarch64: vp9itxfm: Reorder iadst16 coeffs This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. This is cherrypicked from libav commit b8f66c0838b4c645227f23a35b4d54373da4c60a. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:52 +02:00
Martin Storsjö	b2e20d8984	arm: vp9itxfm: Reorder iadst16 coeffs This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. This is cherrypicked from libav commit 08074c092d8c97d71c5986e5325e97ffc956119d. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:52 +02:00
Martin Storsjö	f952273019	aarch64: vp9itxfm: Reorder the idct coefficients for better pairing All elements are used pairwise, except for the first one. Previously, the 16th element was unused. Move the unused element to the second slot, to make the later element pairs not split across registers. This simplifies loading only parts of the coefficients, reducing the difference to the 16 bpp version. This is cherrypicked from libav commit 09eb88a12e008d10a3f7a6be75d18ad98b368e68. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:52 +02:00
Martin Storsjö	4f693b56bd	arm: vp9itxfm: Reorder the idct coefficients for better pairing All elements are used pairwise, except for the first one. Previously, the 16th element was unused. Move the unused element to the second slot, to make the later element pairs not split across registers. This simplifies loading only parts of the coefficients, reducing the difference to the 16 bpp version. This is cherrypicked from libav commit de06bdfe6c8abd8266d5c6f5c68e4df0060b61fc. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:51 +02:00
Martin Storsjö	2905657b90	aarch64: vp9itxfm: Avoid reloading the idct32 coefficients The idct32x32 function actually pushed d8-d15 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. After this, we still can skip pushing d12-d15. Before: vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3 This is cherrypicked from libav commit 65aa002d54433154a6924dc13e498bec98451ad0. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:51 +02:00
Martin Storsjö	600f4c9b03	arm: vp9itxfm: Avoid reloading the idct32 coefficients The idct32x32 function actually pushed q4-q7 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. Since the idct16 core transform avoids clobbering q4-q7 (but clobbers q2-q3 instead, to avoid needing to back up and restore q4-q7 at all in the idct16 function), and the lanewise vmul needs a register in the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5 while doing idct16. While keeping these coefficients in registers, we still can skip pushing q7. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_sub32_add_neon: 18553.8 17182.7 14303.3 12089.7 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 18470.3 16717.7 14173.6 11860.8 This is cherrypicked from libav commit 402546a17233a8815307df9e14ff88cd70424537. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:51 +02:00
Martin Storsjö	a88db8b9a0	arm: vp9lpf: Implement the mix2_44 function with one single filter pass For this case, with 8 inputs but only changing 4 of them, we can fit all 16 input pixels into a q register, and still have enough temporary registers for doing the loop filter. The wd=8 filters would require too many temporary registers for processing all 16 pixels at once though. Before: Cortex A7 A8 A9 A53 vp9_loop_filter_mix2_v_44_16_neon: 289.7 256.2 237.5 181.2 After: vp9_loop_filter_mix2_v_44_16_neon: 221.2 150.5 177.7 138.0 This is cherrypicked from libav commit 575e31e931e4178e9f1e24407503c9b4ec0ef9ba. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:51 +02:00
Martin Storsjö	f32690a298	aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1 This is one cycle faster in total, and three instructions fewer. Before: vp9_loop_filter_mix2_v_44_16_neon: 123.2 After: vp9_loop_filter_mix2_v_44_16_neon: 122.2 This is cherrypicked from libav commit 3bf9c48320f25f3d5557485b0202f22ae60748b0. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:50 +02:00
Martin Storsjö	3fbbad2984	arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit The theoretical maximum value of E is 193, so we can just saturate the addition to 255. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 After: vp9_loop_filter_v_4_8_neon: 136.0 125.7 112.6 84.0 83.0 vp9_loop_filter_v_8_8_neon: 234.0 195.5 171.5 136.0 133.7 vp9_loop_filter_v_16_8_neon: 490.0 417.5 377.7 289.0 271.0 vp9_loop_filter_v_16_16_neon: 951.2 814.7 732.3 571.0 446.7 This is cherrypicked from libav commit c582cb8537367721bb399a5d01b652c20142b756. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:50 +02:00
Martin Storsjö	dda45c087b	aarch64: Add parentheses around the offset parameter in movrel This fixes building with clang for linux with PIC enabled. This is cherrypicked from libav commit 8847eeaa141898850381400000fb2b8a7adc7100. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:50 +02:00
Martin Storsjö	c8d6eec85d	aarch64: vp9lpf: Fix broken indentation/vertical alignment This is cherrypicked from libav commit 07b5136c481d394992c7e951967df0cfbb346c0b. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:49 +02:00
Martin Storsjö	9f3a886364	aarch64: vp9lpf: Interleave the start of flat8in into the calculation above This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. This is cherrypicked from libav commit b0806088d3b27044145b20421da8d39089ae0c6a. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:49 +02:00
Martin Storsjö	83399cf569	arm: vp9lpf: Interleave the start of flat8in into the calculation above This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. This is cherrypicked from libav commit e18c39005ad1dbb178b336f691da1de91afd434e. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:49 +02:00
Martin Storsjö	92ab8374b1	arm: vp9lpf: Use orrs instead of orr+cmp This is cherrypicked from libav commit 435cd7bc99671bf561193421a50ac6e9d63c4266. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:49 +02:00
Martin Storsjö	f0ecbb13cf	arm/aarch64: vp9lpf: Calculate !hev directly Previously we first calculated hev, and then negated it. Since we were able to schedule the negation in the middle of another calculation, we don't see any gain in all cases. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 147.0 129.0 115.8 89.0 88.7 vp9_loop_filter_v_8_8_neon: 242.0 198.5 174.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 500.0 419.5 382.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 971.2 825.5 731.5 579.0 453.0 After: vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 This is cherrypicked from libav commit e1f9de86f454861b69b199ad801adc2ec6c3b220. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:48 +02:00
Martin Storsjö	148cc0bb89	aarch64: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling This work is sponsored by, and copyright, Google. Before: Cortex A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 180.2 vp9_inv_dct_dct_32x32_sub1_add_neon: 475.3 This is cherrypicked from libav commit 3fcf788fbbccc4130868e7abe58a88990290f7c1. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:48 +02:00
Martin Storsjö	758302e4bc	arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling This work is sponsored by, and copyright, Google. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0 459.2 862.2 553.9 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 226.5 145.0 225.1 171.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 721.2 415.7 727.6 475.0 This is cherrypicked from libav commit a76bf8cf1277ef6feb1580b578f5e6ca327e713c. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:48 +02:00
Martin Storsjö	045e33ae3f	aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter No measured speedup on a Cortex A53, but other cores might benefit. This is cherrypicked from libav commit 388e0d2515bc6bbc9d0c9af1d230bd16cf945fe7. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:48 +02:00
Martin Storsjö	bff0771590	arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter Before: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5 After: vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5 This is cherrypicked from libav commit fea92a4b57d1c328b1de226a5f213a629ee63754. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:47 +02:00
Martin Storsjö	ac6cb8ae5b	aarch64: vp9mc: Simplify the extmla macro parameters Fold the field lengths into the macro. This makes the macro invocations much more readable, when the lines are shorter. This also makes it easier to use only half the registers within the macro. This is cherrypicked from libav commit 5e0c2158fbc774f87d3ce4b7b950ba4d42c4a7b8. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:47 +02:00
Martin Storsjö	16ef000799	aarch64: vp9itxfm: Fix incorrect vertical alignment This is cherrypicked from libav commit 0c0b87f12d48d4e7f0d3d13f9345e828a3a5ea32. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:47 +02:00
Martin Storsjö	d0fbf7f34e	aarch64: vp9itxfm: Update a comment to refer to a register with a different name This is cherrypicked from libav commit 8476eb0d3ab1f7a52317b23346646389c08fb57a. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:46 +02:00
Martin Storsjö	6752318c73	aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability This is cherrypicked from libav commit 3dd7827258ddaa2e51085d0c677d6f3b1be3572f. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:46 +02:00
Martin Storsjö	19a0f9529c	aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible The ld1r is a leftover from the arm version, where this trick is beneficial on some cores. Use a single-lane load where we don't need the semantics of ld1r. This is cherrypicked from libav commit ed8d293306e12c9b79022d37d39f48825ce7f2fa. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:46 +02:00
Martin Storsjö	3006e5253a	aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function This is cherrypicked from libav commit 4da4b2b87f08a1331650c7e36eb7d4029a160776. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:27 +02:00
Martin Storsjö	1d8ab576a7	arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function This is cherrypicked from libav commit 3933b86bb93aca47f29fbd493075b0f110c1e3f5. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:26 +02:00
Martin Storsjö	9532a7d4d0	aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 This is cherrypicked from libav commit a63da4511d0fee66695ff4afd264ba1dbf1e812d. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:25 +02:00
Martin Storsjö	824589556c	arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 This is cherrypicked from libav commit 5eb5aec475aabc884d083566f902876ecbc072cb. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:25 +02:00
Martin Storsjö	a681c793a3	aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. This is cherrypicked from libav commit 79d332ebbde8c0a3e9da094dcfd10abd33ba7378. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:24 +02:00
Martin Storsjö	3bd9b39108	arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. This is cherrypicked from libav commit 47b3c2c18d1897f3c753ba0cec4b2d7aa24526af. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:23 +02:00
Martin Storsjö	dc47bf3872	aarch64: vp9itxfm: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from 19496 to 14740 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 8095.7 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 1390.1 vp9_inv_dct_dct_32x32_sub4_add_neon: 5199.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8125.8 This is cherrypicked from libav commit 115476018d2c97df7e9b4445fe8f6cc7420ab91f. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:22 +02:00
Martin Storsjö	f8fcee0daf	arm: vp9itxfm: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from 15324 to 12388 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to add more optimized versions of these transforms. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub4_add_neon: 2063.4 1516.0 1719.5 1245.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3279.3 2454.5 2525.2 1982.3 vp9_inv_dct_dct_32x32_sub4_add_neon: 10750.0 7955.4 8525.6 6754.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 18574.0 17108.4 14216.7 12010.2 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 2060.8 1608.5 1735.7 1262.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.2 2443.5 2546.1 1999.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10682.0 8043.8 8581.3 6810.1 vp9_inv_dct_dct_32x32_sub32_add_neon: 18522.4 17277.4 14286.7 12087.9 This is cherrypicked from libav commit 0331c3f5e8cb6e6b53fab7893e91d1be1bfa979c. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:20 +02:00
Martin Storsjö	52c7366c83	aarch64: vp9itxfm: Restructure the idct32 store macros This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. This is also arguably more readable. This is cherrypicked from libav commit 58d87e0f49bcbbc6f426328f53b657bae7430cd2. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:09 +02:00
Martin Storsjö	31e41350d2	arm: vp9itxfm: Avoid .irp when it doesn't save any lines This makes it more readable. This is cherrypicked from libav commit 3bc5b28d5a191864c54bba60646933a63da31656. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 13:14:00 +02:00
Moritz Barsnick	114bbb0b74	libavfilter/avf_showwaves: make sqrt and cbrt scale option values available to showwavespic by name The 'sqrt' and 'cbrt' scalers were added in commit 80262d8c86e94ff9a4bb3a9e3c2d734e04ccb399, but their symbolic option values only made available to the showwaves filter, not showwavespic, despite the scalers working properly by their numerical option values. Signed-off-by: Moritz Barsnick <barsnick@gmx.net>	2017-03-11 11:55:57 +01:00
Steven Liu	51e3501993	ffprobe: add AVCodecContext help message into ffprobe because the ffprobe can use AVCodecContext parameters Signed-off-by: Steven Liu <lq@chinaffmpeg.org>	2017-03-11 11:12:23 +08:00

1 2 3 4 5 ...

83854 Commits