FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-08-10 06:10:52 +02:00

Author	SHA1	Message	Date
Martin Storsjö	9f10cff610	aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter This work is sponsored by, and copyright, Google. This is similar to the arm version, but due to the larger registers on aarch64, we can do 8 pixels at a time for all filter sizes. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_10bpp_neon: 213.2 172.6 vp9_loop_filter_h_8_8_10bpp_neon: 281.2 244.2 vp9_loop_filter_h_16_8_10bpp_neon: 657.0 444.5 vp9_loop_filter_h_16_16_10bpp_neon: 1280.4 877.7 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 397.7 358.0 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 465.7 429.0 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 465.7 428.0 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 533.7 499.0 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 271.5 244.0 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 330.0 305.0 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 329.0 306.0 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 386.0 365.0 vp9_loop_filter_v_4_8_10bpp_neon: 150.0 115.2 vp9_loop_filter_v_8_8_10bpp_neon: 209.0 175.5 vp9_loop_filter_v_16_8_10bpp_neon: 492.7 345.2 vp9_loop_filter_v_16_16_10bpp_neon: 951.0 682.7 This is significantly faster than the ARM version in almost all cases except for the mix2 functions. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-3x. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-24 22:36:11 +02:00
Martin Storsjö	ceb36b8178	aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm This work is sponsored by, and copyright, Google. Compared to the arm version, on aarch64 we can keep the full 8x8 transform in registers, and for 16x16 and 32x32, we can process it in slices of 4 pixels instead of 2. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_inv_adst_adst_4x4_sub4_add_10_neon: 111.0 109.7 vp9_inv_adst_adst_8x8_sub8_add_10_neon: 914.0 733.5 vp9_inv_adst_adst_16x16_sub16_add_10_neon: 5184.0 3745.7 vp9_inv_dct_dct_4x4_sub1_add_10_neon: 65.0 65.7 vp9_inv_dct_dct_4x4_sub4_add_10_neon: 100.0 96.7 vp9_inv_dct_dct_8x8_sub1_add_10_neon: 111.0 119.7 vp9_inv_dct_dct_8x8_sub8_add_10_neon: 618.0 494.7 vp9_inv_dct_dct_16x16_sub1_add_10_neon: 295.1 284.6 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 2303.2 1883.9 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 2984.8 2189.3 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 3890.0 2799.4 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1044.4 1012.7 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 13333.7 9695.1 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 18531.3 12459.8 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 24470.7 16160.2 vp9_inv_wht_wht_4x4_sub4_add_10_neon: 83.0 79.7 The larger transforms are significantly faster than the corresponding ARM versions. The speedup vs C code is smaller than in 32 bit mode, probably because the 64 bit intermediates in the C code can be expressed more efficiently in aarch64. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-24 22:36:08 +02:00
Martin Storsjö	638eceed47	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC This work is sponsored by, and copyright, Google. This has mostly got the same differences to the 8 bit version as in the arm version. For the horizontal filters, we do 16 pixels in parallel as well. For the 8 pixel wide vertical filters, we can accumulate 4 rows before storing, just as in the 8 bit version. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_10bpp_neon: 35.7 30.7 vp9_avg8_10bpp_neon: 93.5 84.7 vp9_avg16_10bpp_neon: 324.4 296.6 vp9_avg32_10bpp_neon: 1236.5 1148.2 vp9_avg64_10bpp_neon: 4639.6 4571.1 vp9_avg_8tap_smooth_4h_10bpp_neon: 130.0 128.0 vp9_avg_8tap_smooth_4hv_10bpp_neon: 440.0 440.5 vp9_avg_8tap_smooth_4v_10bpp_neon: 114.0 105.5 vp9_avg_8tap_smooth_8h_10bpp_neon: 327.0 314.0 vp9_avg_8tap_smooth_8hv_10bpp_neon: 918.7 865.4 vp9_avg_8tap_smooth_8v_10bpp_neon: 330.0 300.2 vp9_avg_8tap_smooth_16h_10bpp_neon: 1187.5 1155.5 vp9_avg_8tap_smooth_16hv_10bpp_neon: 2663.1 2591.0 vp9_avg_8tap_smooth_16v_10bpp_neon: 1107.4 1078.3 vp9_avg_8tap_smooth_64h_10bpp_neon: 17754.6 17454.7 vp9_avg_8tap_smooth_64hv_10bpp_neon: 33285.2 33001.5 vp9_avg_8tap_smooth_64v_10bpp_neon: 16066.9 16048.6 vp9_put4_10bpp_neon: 25.5 21.7 vp9_put8_10bpp_neon: 56.0 52.0 vp9_put16_10bpp_neon/armv8: 183.0 163.1 vp9_put32_10bpp_neon/armv8: 678.6 563.1 vp9_put64_10bpp_neon/armv8: 2679.9 2195.8 vp9_put_8tap_smooth_4h_10bpp_neon: 120.0 118.0 vp9_put_8tap_smooth_4hv_10bpp_neon: 435.2 435.0 vp9_put_8tap_smooth_4v_10bpp_neon: 107.0 98.2 vp9_put_8tap_smooth_8h_10bpp_neon: 303.0 290.0 vp9_put_8tap_smooth_8hv_10bpp_neon: 893.7 828.7 vp9_put_8tap_smooth_8v_10bpp_neon: 305.5 263.5 vp9_put_8tap_smooth_16h_10bpp_neon: 1089.1 1059.2 vp9_put_8tap_smooth_16hv_10bpp_neon: 2578.8 2452.4 vp9_put_8tap_smooth_16v_10bpp_neon: 1009.5 933.5 vp9_put_8tap_smooth_64h_10bpp_neon: 16223.4 15918.6 vp9_put_8tap_smooth_64hv_10bpp_neon: 32153.0 31016.2 vp9_put_8tap_smooth_64v_10bpp_neon: 14516.5 13748.1 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is around 4-9x. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-24 22:36:05 +02:00
Martin Storsjö	48ad3fe1be	aarch64: vp9dsp: Restructure the bpp checks This work is sponsored by, and copyright, Google. This is more in line with how it will be extended for more bitdepths. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-24 22:36:02 +02:00
Martin Storsjö	1e5d87eec3	arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter This work is sponsored by, and copyright, Google. This is pretty much similar to the 8 bpp version, but in some senses simpler. All input pixels are 16 bits, and all intermediates also fit in 16 bits, so there's no lengthening/narrowing in the filter at all. For the full 16 pixel wide filter, we can only process 4 pixels at a time (using an implementation very much similar to the one for 8 bpp), but we can do 8 pixels at a time for the 4 and 8 pixel wide filters with a different implementation of the core filter. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_loop_filter_h_4_8_10bpp_neon: 1.83 2.16 1.40 2.09 vp9_loop_filter_h_8_8_10bpp_neon: 1.39 1.67 1.24 1.70 vp9_loop_filter_h_16_8_10bpp_neon: 1.56 1.47 1.10 1.81 vp9_loop_filter_h_16_16_10bpp_neon: 1.94 1.69 1.33 2.24 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 2.01 2.27 1.67 2.39 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 1.84 2.06 1.45 2.19 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 1.89 2.20 1.47 2.29 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 1.69 2.12 1.47 2.08 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 3.16 3.98 2.50 4.05 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 2.84 3.64 2.25 3.77 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 2.65 3.45 2.16 3.54 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 2.55 3.30 2.16 3.55 vp9_loop_filter_v_4_8_10bpp_neon: 2.85 3.97 2.24 3.68 vp9_loop_filter_v_8_8_10bpp_neon: 2.27 3.19 1.96 3.08 vp9_loop_filter_v_16_8_10bpp_neon: 3.42 2.74 2.26 4.40 vp9_loop_filter_v_16_16_10bpp_neon: 2.86 2.44 1.93 3.88 The speedup vs C code measured in checkasm is around 1.1-4x. These numbers are quite inconclusive though, since the checkasm test runs multiple filterings on top of each other, so later rounds might end up with different codepaths (different decisions on which filter to apply, based on input pixel differences). Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-4x. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-24 22:35:59 +02:00
Martin Storsjö	2ed67eba96	arm: Add NEON optimizations for 10 and 12 bit vp9 itxfm This work is sponsored by, and copyright, Google. This is structured similarly to the 8 bit version. In the 8 bit version, the coefficients are 16 bits, and intermediates are 32 bits. Here, the coefficients are 32 bit. For the 4x4 transforms for 10 bit content, the intermediates also fit in 32 bits, but for all other transforms (4x4 for 12 bit content, and 8x8 and larger for both 10 and 12 bit) the intermediates are 64 bit. For the existing 8 bit case, the 8x8 transform fit all coefficients in registers; for 10/12 bit, when the coefficients are 32 bit, the 8x8 transform also has to be done in slices of 4 pixels (just as 16x16 and 32x32 for 8 bit). The slice width also shrinks from 4 elements to 2 elements in parallel for the 16x16 and 32x32 cases. The 16 bit coefficients from idct_coeffs and similar tables also need to be lenghtened to 32 bit in order to be used in multiplication with vectors with 32 bit elements. This leads to the fixed coefficient vectors needing more space, leading to more cases where they have to be reloaded within the transform (in iadst16). This technically would need testing in checkasm for subpartitions in increments of 2, but that slows down normal checkasm runs excessively. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_inv_adst_adst_4x4_sub4_add_10_neon: 4.83 11.36 5.22 6.77 vp9_inv_adst_adst_8x8_sub8_add_10_neon: 4.12 7.60 4.06 4.84 vp9_inv_adst_adst_16x16_sub16_add_10_neon: 3.93 8.16 4.52 5.35 vp9_inv_dct_dct_4x4_sub1_add_10_neon: 1.36 2.57 1.41 1.61 vp9_inv_dct_dct_4x4_sub4_add_10_neon: 4.24 8.66 5.06 5.81 vp9_inv_dct_dct_8x8_sub1_add_10_neon: 2.63 4.18 1.68 2.87 vp9_inv_dct_dct_8x8_sub4_add_10_neon: 4.52 9.47 4.24 5.39 vp9_inv_dct_dct_8x8_sub8_add_10_neon: 3.45 7.34 3.45 4.30 vp9_inv_dct_dct_16x16_sub1_add_10_neon: 3.56 6.21 2.47 4.32 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 5.68 12.73 5.28 7.07 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 4.42 9.28 4.24 5.45 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 3.41 7.29 3.35 4.19 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 4.52 8.35 3.83 6.40 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 5.86 13.19 6.14 7.04 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 4.29 8.11 4.59 5.06 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 3.31 5.70 3.56 3.84 vp9_inv_wht_wht_4x4_sub4_add_10_neon: 1.89 2.80 1.82 1.97 The speedup compared to the C functions is around 1.3 to 7x for the full transforms, even higher for the smaller subpartitions. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-24 22:35:56 +02:00
Martin Storsjö	a4d4bad75c	arm: Add NEON optimizations for 10 and 12 bit vp9 MC This work is sponsored by, and copyright, Google. The plain pixel put/copy functions are used from the 8 bit version, for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new copy128 is added. Compared with the 8 bit version, the filters can no longer use the trick to accumulate in 16 bit with only saturation at the end, but now the accumulators need to be 32 bit. This avoids the need to keep track of which filter index is the largest though, reducing the size of the executable code for these filters. For the horizontal filters, we only do 4 or 8 pixels wide in parallel (while doing two rows at a time), since we don't have enough register space to filter 16 pixels wide. For the vertical filters, we still do 4 and 8 pixels in parallel just as in the 8 bit case, but we need to store the output after every 2 rows instead of after every 4 rows. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_10bpp_neon: 2.25 2.44 3.05 2.16 vp9_avg8_10bpp_neon: 3.66 8.48 3.86 3.50 vp9_avg16_10bpp_neon: 3.39 8.26 3.37 2.72 vp9_avg32_10bpp_neon: 4.03 10.20 4.07 3.42 vp9_avg64_10bpp_neon: 4.15 10.01 4.13 3.70 vp9_avg_8tap_smooth_4h_10bpp_neon: 3.38 6.22 3.41 4.75 vp9_avg_8tap_smooth_4hv_10bpp_neon: 3.89 6.39 4.30 5.32 vp9_avg_8tap_smooth_4v_10bpp_neon: 5.32 9.73 6.34 7.31 vp9_avg_8tap_smooth_8h_10bpp_neon: 4.45 9.40 4.68 6.87 vp9_avg_8tap_smooth_8hv_10bpp_neon: 4.64 8.91 5.44 6.47 vp9_avg_8tap_smooth_8v_10bpp_neon: 6.44 13.42 8.68 8.79 vp9_avg_8tap_smooth_64h_10bpp_neon: 4.66 9.02 4.84 7.71 vp9_avg_8tap_smooth_64hv_10bpp_neon: 4.61 9.14 4.92 7.10 vp9_avg_8tap_smooth_64v_10bpp_neon: 6.90 14.13 9.57 10.41 vp9_put4_10bpp_neon: 1.33 1.46 2.09 1.33 vp9_put8_10bpp_neon: 1.57 3.42 1.83 1.84 vp9_put16_10bpp_neon: 1.55 4.78 2.17 1.89 vp9_put32_10bpp_neon: 2.06 5.35 2.14 2.30 vp9_put64_10bpp_neon: 3.00 2.41 1.95 1.66 vp9_put_8tap_smooth_4h_10bpp_neon: 3.19 5.81 3.31 4.63 vp9_put_8tap_smooth_4hv_10bpp_neon: 3.86 6.22 4.32 5.21 vp9_put_8tap_smooth_4v_10bpp_neon: 5.40 9.77 6.08 7.21 vp9_put_8tap_smooth_8h_10bpp_neon: 4.22 8.41 4.46 6.63 vp9_put_8tap_smooth_8hv_10bpp_neon: 4.56 8.51 5.39 6.25 vp9_put_8tap_smooth_8v_10bpp_neon: 6.60 12.43 8.17 8.89 vp9_put_8tap_smooth_64h_10bpp_neon: 4.41 8.59 4.54 7.49 vp9_put_8tap_smooth_64hv_10bpp_neon: 4.43 8.58 5.34 6.63 vp9_put_8tap_smooth_64v_10bpp_neon: 7.26 13.92 9.27 10.92 For the larger 8tap filters, the speedup vs C code is around 4-14x. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-24 22:35:50 +02:00
Martin Storsjö	cda9a3e80b	arm: vp9dsp: Restructure the bpp checks This work is sponsored by, and copyright, Google. This is more in line with how it will be extended for more bitdepths. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-24 22:35:44 +02:00
Clément Bœsch	1400598c0e	Merge commit 'fd5e6a095f69495c558069315d6b36ea410c31fa' * commit 'fd5e6a095f69495c558069315d6b36ea410c31fa': x86util: Extend SPLATW for avx2 This commit is a noop, see `1ace9573dc` (only libavutil/x86/x86util.asm chunk). Merged-by: Clément Bœsch <u@pkh.me>	2017-01-24 19:35:33 +01:00
Clément Bœsch	f84ece0a98	Merge commit '37961044c6' * commit '37961044c6': checkasm: arm: Ignore changes to bits 0-4 and 7 of FPSCR cheackasm/arm: remove NEON instructions from checkasm_checked_call_vfp checkasm: arm: Don't start new const blocks for each string This merge is a noop: the changes were included in `9f1c81e5ec`. Merged-by: Clément Bœsch <u@pkh.me>	2017-01-24 19:32:12 +01:00
Clément Bœsch	727c463ff7	Merge commit '5ece6911010b3464d2fdacfa8031c15b5bd83418' * commit '5ece6911010b3464d2fdacfa8031c15b5bd83418': apichanges: Fill in missing hashes and dates This commit is a noop as we need to fill with our own hashes. Merged-by: Clément Bœsch <u@pkh.me>	2017-01-24 19:29:35 +01:00
Clément Bœsch	4181d7741d	Merge commit 'facdfe40805559963b5875931af9406ed5ddcd5c' * commit 'facdfe40805559963b5875931af9406ed5ddcd5c': swscale: Add proper ff_ prefix to init functions This commit is a noop, see `e8c3716064` I'm keeping our ff_sws_ vs ff_ since we use ff_sws_ in other places in swscale. Merged-by: Clément Bœsch <u@pkh.me>	2017-01-24 19:26:51 +01:00
Clément Bœsch	4ad5b9363f	Merge commit 'c0fd2fb27bebd1d5ab028e6df6bca9119d269122' * commit 'c0fd2fb27bebd1d5ab028e6df6bca9119d269122': swscale: Rename sws_context_class to ff_sws_context_class This commit is a noop, see `8bfbc8c5e5` Merged-by: Clément Bœsch <u@pkh.me>	2017-01-24 19:23:48 +01:00
Clément Bœsch	9f1c81e5ec	Merge commit '71a0472114574993df7035f4de9aa007e03817b8' * commit '71a0472114574993df7035f4de9aa007e03817b8': checkasm: arm: report the first clobbered register in checkasm_checked_call Also includes `446353ea18`, `59aeed93e4`, and `37961044c6` to avoid breaking too much stuff. Merged-by: Clément Bœsch <u@pkh.me>	2017-01-24 19:21:29 +01:00
Michael Niedermayer	755933cb5c	avcodec/mjpegdec: Check remaining bitstream in ljpeg_decode_yuv_scan() Fixes timeout Fixes: 445/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_MJPEG_fuzzer Fixes: 456/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_JPEGLS_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-24 17:50:03 +01:00
Clément Bœsch	8504d64bcb	Merge commit 'a8fce24b9c5a87187f5bd864b18f5b3e575f8c3d' * commit 'a8fce24b9c5a87187f5bd864b18f5b3e575f8c3d': avconv_dxva2: support HEVC Main10 decoding This commit is a noop, see `1ec14612a5` Merged-by: Clément Bœsch <u@pkh.me>	2017-01-24 16:34:00 +01:00
Clément Bœsch	5f74ce0e4d	Merge commit '33f6690eb4e21acc4b581688eecfc4cc5ea9515e' * commit '33f6690eb4e21acc4b581688eecfc4cc5ea9515e': hevc: offer DXVA2 for 10bit 420 This commit is a noop, see `ccb94789e2` Merged-by: Clément Bœsch <u@pkh.me>	2017-01-24 16:31:10 +01:00
Clément Bœsch	7448019890	Merge commit '38efff92f1ef81f3de20ff0460ec7b70c253d714' * commit '38efff92f1ef81f3de20ff0460ec7b70c253d714': FATE: add a test for H.264 with two fields per packet h264: fix decoding multiple fields per packet with slice threads This merge includes two commits because the FATE test was useful in order to make proper testing. The merge gets rid of the now unused: - SLICE_SINGLETHREAD and SLICE_SKIPED macros - max_contexts - "again" label in decode_nal_units() This commit also includes the fix from `d3e4d406b`. Thanks to wm4 and Michael Niedermayer for their testing. Merged-by: Clément Bœsch <u@pkh.me> Merged-by: Matthieu Bouron <matthieu.bouron@gmail.com>	2017-01-24 16:13:03 +01:00
Steven Liu	1033f56b07	avformat/hlsenc: improve to write m3u8 head block Signed-off-by: Steven Liu <lq@chinaffmpeg.org>	2017-01-24 22:25:29 +08:00
Michael Niedermayer	25f4f08ba5	avcodec/h264dec: Fix regression with "make fate-h264-attachment-631 THREADS=8" This treats the case of no slices like no frames which it basically is. The field is added to the context as other nal related fields are also there and passing the has_slices field per *arguments is ugly and not consistent Found-by: ubitux Approved-by: ubitux Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-24 12:13:59 +01:00
Paul B Mahol	08e5732318	avfilter: add EIA-608 line extractor Signed-off-by: Dave Rice <dave@dericed.com> Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-01-24 10:20:10 +01:00
Steven Liu	1bb192ef6c	avformat/flvenc: refine the flvenc shift_data code refine the flvenc shift_data move data option Signed-off-by: Steven Liu <lq@chinaffmpeg.org>	2017-01-24 12:31:36 +08:00
Steven Liu	2f7cc21b61	avformat/hlsenc: refine the code readable for time unit Reviewed-by: Bodecs Bela <bodecsb@vivanet.hu> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Steven Liu <lq@chinaffmpeg.org>	2017-01-24 12:29:01 +08:00
Felipe Astroza	b7665642f1	libavformat/tee: tee was passing a wrong option name for fifo's format_options If fifo is enabled on tee muxer, ffmpeg exits because of an unknown option passed to fifo muxer. Option name "format_options" was replaced by "format_opts" on tee muxer. Signed-off-by: Felipe Astroza <felipe@astroza.cl> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-24 02:36:51 +01:00
Pavel Koshevoy	9ea2998586	avcodec/cuvid: fail early if GPU can't handle video resolution CUVID on GeForce GT 730 and GeForce GTX 1060 does not report any error when decoding 8K h264 packets. However, it does return an error during cuvidCreateDecoder call if the indicated video resolution is not supported. Given that stream resolution is typically known as a result of probing it is better to use this information during avcodec_open2 call to fail immediately, rather than proceeding to decode and never receiving any frames from the decoder nor receiving any indication of decode failure. Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>	2017-01-23 17:49:35 +01:00
wm4	c16fe1432d	hwcontext_cuda: implement frames_get_constraints Copied and modified from hwcontext_qsv.c.	2017-01-23 16:21:18 +01:00
Rodger Combs	2b20290061	lavf/segment: fix crash when failing to open segment list This happens because segment_end() returns an error, so seg_write_packet never proceeds to segment_start(), and seg->avf->pb is never re-set, so we crash with a null pb when av_write_trailer flushes the packet queue. This doesn't seem to be clearly recoverable, so I'm just failing more gracefully. Repro: ffmpeg -i input.ts -f segment -c copy -segment_list /noaxx.m3u8 test-%05d.ts (assuming you don't have write access to /)	2017-01-23 05:44:49 -06:00
Michael Niedermayer	e371f031b9	avcodec/pngdec: Fix off by 1 size in decode_zbuf() Fixes out of array access Fixes: 444/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_PNG_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-23 01:43:35 +01:00
Michael Niedermayer	a0341b4d74	avcodec/error_resilience: update indention after last commit Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-22 21:43:06 +01:00
Michael Niedermayer	d9d9fd9446	avcodec/error_resilience: Optimize motion recovery code by using blcok lists This makes the code 7 times faster with the testcase from libfuzzer and should reduce the amount of timeouts we hit in automated fuzzing. (for example 438/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_RV40_fuzzer) The code is also faster with more realistic input though the difference is small here as that is far from the worst cases the fuzzers pick out Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-22 21:39:43 +01:00
Marton Balint	f1214ad5d9	ffplay: fix indentation after last commit Signed-off-by: Marton Balint <cus@passwd.hu>	2017-01-22 16:17:50 +01:00
Marton Balint	076fc75bdb	ffplay: do not preallocate video texture Since the uploads happen in the main display function, it does not matter much. Signed-off-by: Marton Balint <cus@passwd.hu>	2017-01-22 16:17:50 +01:00
Paul B Mahol	7f9978b0bd	avformat: add MIDI Sample Dump Standard demuxer Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-01-22 13:00:25 +01:00
Jonathan Campbell	d5d474aea5	avcodec/ac3dec: add consistent noise generation option. use av_lfg_init_from_data() to seed AC-3 dithering from the AC-3 frame data to make it consistent given the same AC-3 frame, if option is set. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-22 02:29:16 +01:00
Jonathan Campbell	76c5a69e26	libavutil: add av_lfg_init_from_data() function seeds an AVLFG from binary data. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-22 02:28:53 +01:00
Michael Niedermayer	0a5add45c7	avfilter/af_hdcd: Fix leak of memory allocated by ff_make_format_list() Fixes CID1396265 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-22 02:28:53 +01:00
Mark Thompson	d40a1ae7ec	vaapi_mpeg4: Restore changes overwritten by merge From `2aa8e33d7d`.	2017-01-22 00:07:47 +00:00
Michael Niedermayer	61164112a5	avfilter/avf_showspectrum: Fix memleak of text allocated by av_asprintf() Fixes CID1396261 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-21 23:07:02 +01:00
Michael Niedermayer	e740e9c798	avfilter/vf_palettegen: Fix leak and simplify code Fixes CID1270818 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-21 22:40:14 +01:00
Paul B Mahol	d60f090dd1	avcodec/fraps: add support for PAL8 Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-01-21 18:08:08 +01:00
Michael Niedermayer	cde007dcd3	avcodec: Add FF_CODEC_CAP_SKIP_FRAME_FILL_PARAM to most h263 based codecs Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-21 02:30:38 +01:00
Michael Niedermayer	5f2b360fc0	avfilter/avfiltergraph: Add assert to write down in machine readable form what is assumed about sample rates in swap_samplerates_on_filter() Fixes CID1397292 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-21 01:35:52 +01:00
Matthieu Bouron	cf3affabb4	lavc/h264dec: re-indent after previous commit	2017-01-20 17:29:09 +01:00
Matthieu Bouron	639e262971	lavc/h264dec: make sure a slice is decoded before finishing setup Fixes regression in fate-h264-attachment-631 with THREADS=8 introduced by `bdbbb8f11e`.	2017-01-20 17:28:40 +01:00
Paul B Mahol	8869f5efec	avformat/wavdec: enable seeking with XMA2 Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-01-20 13:58:41 +01:00
Paul B Mahol	18cfcc6458	avcodec/wmaprodec: add xma_flush for seeking in XMA2 Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-01-20 13:58:41 +01:00
Paul B Mahol	5d2609929d	avcodec: add XMA2 parser Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-01-20 13:58:41 +01:00
Paul B Mahol	96fe4432f5	avcodec/wmaprodec: unbreak XMA mono decoding Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-01-20 13:58:36 +01:00
bnnm	cab0f3abc5	avcodec/atrac3: allow 6 channels (non-joint stereo) Raises max channels to 6 (for non joint-stereo only), there is no difference decoding 1 or N discrete channels. Fixes trac issue #5840 Signed-off-by: bnnm <bananaman255@gmail.com>	2017-01-20 12:53:57 +01:00
Daniil Cherednik	9a619bef54	dcaenc: Use Huffman codes for Bit Allocation Index Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2017-01-20 10:03:46 +00:00

1 2 3 4 5 ...

83235 Commits