FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-12 19:18:44 +02:00

Author	SHA1	Message	Date
Ben Avison	23c92e14f5	avcodec/vc1: Arm 32-bit NEON unescape fast path checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_unescape_buffer_c: 918624.7 vc1dsp.vc1_unescape_buffer_neon: 142958.0 Signed-off-by: Ben Avison <bavison@riscosopen.org> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-04-01 10:03:34 +03:00
Ben Avison	c07de58a72	avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C version can still outperform the NEON version in specific cases. The balance between different code paths is stream-dependent, but in practice the best case happens about 5% of the time, the worst case happens about 40% of the time, and the complexity of the remaining cases fall somewhere in between. Therefore, taking the average of the best and worst case timings is probably a conservative estimate of the degree by which the NEON code improves performance. vc1dsp.vc1_h_loop_filter4_bestcase_c: 19.0 vc1dsp.vc1_h_loop_filter4_bestcase_neon: 48.5 vc1dsp.vc1_h_loop_filter4_worstcase_c: 144.7 vc1dsp.vc1_h_loop_filter4_worstcase_neon: 76.2 vc1dsp.vc1_h_loop_filter8_bestcase_c: 41.0 vc1dsp.vc1_h_loop_filter8_bestcase_neon: 75.0 vc1dsp.vc1_h_loop_filter8_worstcase_c: 294.0 vc1dsp.vc1_h_loop_filter8_worstcase_neon: 102.7 vc1dsp.vc1_h_loop_filter16_bestcase_c: 54.7 vc1dsp.vc1_h_loop_filter16_bestcase_neon: 130.0 vc1dsp.vc1_h_loop_filter16_worstcase_c: 569.7 vc1dsp.vc1_h_loop_filter16_worstcase_neon: 186.7 vc1dsp.vc1_v_loop_filter4_bestcase_c: 20.2 vc1dsp.vc1_v_loop_filter4_bestcase_neon: 47.2 vc1dsp.vc1_v_loop_filter4_worstcase_c: 164.2 vc1dsp.vc1_v_loop_filter4_worstcase_neon: 68.5 vc1dsp.vc1_v_loop_filter8_bestcase_c: 43.5 vc1dsp.vc1_v_loop_filter8_bestcase_neon: 55.2 vc1dsp.vc1_v_loop_filter8_worstcase_c: 316.2 vc1dsp.vc1_v_loop_filter8_worstcase_neon: 72.7 vc1dsp.vc1_v_loop_filter16_bestcase_c: 62.2 vc1dsp.vc1_v_loop_filter16_bestcase_neon: 103.7 vc1dsp.vc1_v_loop_filter16_worstcase_c: 646.5 vc1dsp.vc1_v_loop_filter16_worstcase_neon: 110.7 Signed-off-by: Ben Avison <bavison@riscosopen.org> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-04-01 10:03:33 +03:00
Martin Storsjö	a78f136f3f	configure: Use a separate config_components.h header for $ALL_COMPONENTS This avoids unnecessary rebuilds of most source files if only the list of enabled components has changed, but not the other properties of the build, set in config.h. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-03-16 14:12:49 +02:00
J. Dekker	7fc6015de9	Revert "arm: hevc_qpel: Fix the assembly to work with non-multiple of 8 widths" This reverts commit `2589060b92` which was originally to fix the FATE test. The real cause of the test breakage was fixed in `22b7c37275`. Signed-off-by: J. Dekker <jdek@itanimul.li>	2022-01-04 14:31:48 +01:00
J. Dekker	22b7c37275	lavc/arm: dont assign hevc_qpel functions for non-multiple of 8 widths The assembly is written assuming that the width is a multiple of 8. However the real issue is the functions were errorneously assigned to the 2, 4, 6 & 12 widths. This behaviour never broke the decoder as samples which trigger the functions for these widths have not been found in the wild. This relies on the mappings in ff_hevc_pel_weight[]. Signed-off-by: J. Dekker <jdek@itanimul.li>	2022-01-04 14:31:32 +01:00
Martin Storsjö	2d5a7f6d00	arm/aarch64: Improve scheduling in the avg form of h264_qpel Don't use the loaded registers directly, avoiding stalls on in order cores. Use vrhadd.u8 with q registers where easily possible. Signed-off-by: Martin Storsjö <martin@martin.st>	2021-10-18 14:27:36 +03:00
Martin Storsjö	2589060b92	arm: hevc_qpel: Fix the assembly to work with non-multiple of 8 widths This unbreaks the fate-checkasm-hevc_pel test on arm targets. The assembly assumed that the width passed to the DSP functions is a multiple of 8, while the checkasm test used other widths too. This wasn't noticed before, because the hevc_pel checkasm tests (that were added in `9c513edb79` in January) weren't run as part of fate until in `b492cacffd` in August. As this hasn't been an issue in practice with actual full decoding tests, it seems like the actual decoder doesn't call these functions with such widths. Therefore, we could alternatively fix the test to only test things that the real decoder does, and this modification could be reverted. Signed-off-by: Martin Storsjö <martin@martin.st>	2021-08-25 23:24:49 +03:00
Andreas Rheinhardt	afc95a10ac	avcodec/h264dsp, h264idct: Fix lengths of array parameters Fixes many -Warray-parameter warnings from GCC 11. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-08-08 17:44:57 +02:00
Andreas Rheinhardt	7c1f347b18	avcodec: Remove deprecated old encode/decode APIs Deprecated in commits `7fc329e2dd` and `31f6a4b4b8`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> Signed-off-by: James Almer <jamrial@gmail.com>	2021-04-27 10:43:12 -03:00
Andreas Rheinhardt	f3c197b129	Include attributes.h directly Some files currently rely on libavutil/cpu.h to include it for them; yet said file won't use include it any more after the currently deprecated functions are removed, so include attributes.h directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2021-04-19 14:34:10 +02:00
James Almer	f1a894f9d3	avcodec: add missing FF_API_OLD_ENCDEC wrappers to xmm clobber functions Signed-off-by: James Almer <jamrial@gmail.com>	2021-02-26 19:26:31 -03:00
Lynne	151b41c8cc	fft: remove 16-bit FFT and MDCT code No longer used by anything. Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's simply too much work for code meant to be all removed anyway.	2021-01-14 01:44:21 +01:00
Lynne	9e05421dbe	ac3enc_fixed: drop unnecessary fixed-point DSP code	2021-01-14 01:44:20 +01:00
Anton Khirnov	e15371061d	lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump They are not properly namespaced and not intended for public use.	2021-01-01 14:14:57 +01:00
Anton Khirnov	c8c2dfbc37	lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h That is a more appropriate place for it.	2021-01-01 14:11:01 +01:00
Martin Storsjö	b252178321	libavcodec: arm: Add a NEON implementation of pixblockdsp Cortex A7 A8 A9 A53 A72 get_pixels_c: 144.7 146.0 143.0 137.7 69.0 get_pixels_armv6: 112.0 106.7 90.2 95.0 72.5 get_pixels_neon: 69.0 29.7 68.7 40.2 19.0 get_pixels_unaligned_c: 144.7 146.2 143.0 137.7 69.0 get_pixels_unaligned_neon: 77.0 36.5 72.5 48.5 19.0 diff_pixels_c: 376.7 319.7 265.5 307.7 148.0 diff_pixels_armv6: 179.0 159.5 205.5 139.0 142.0 diff_pixels_neon: 69.0 40.2 77.5 53.2 26.0 diff_pixels_unaligned_c: 376.7 319.7 265.5 307.7 148.0 diff_pixels_unaligned_neon: 85.0 54.5 93.5 66.7 26.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2020-05-15 23:37:43 +03:00
qoroliang	cacdac819f	lavc/hevcdec: fix the HEVC decoder crash when memory over-read Fix an occasional crash for hevc decoder in ARM 32 platform, the root cause is the memory over read(read cross the memory boundary) in SAO NENO functions ff_hevc_sao_band_filter_neon_8 and ff_hevc_sao_edge_filter_neon_8. After this fix, the crash disapper in the massive Android phone test. Signed-off-by: qoroliang <qoroliang@tencent.com>	2020-04-20 10:28:04 +08:00
Aman Gupta	0e49560806	avcodec/arm/mlpdsp: add missing dependency for truehd Signed-off-by: Aman Gupta <aman@tmm1.net>	2019-11-11 11:29:55 -08:00
James Almer	47e12966b7	Merge commit '0676de935b1e81bc5b5698fef3e7d48ff2ea77ff' * commit '0676de935b1e81bc5b5698fef3e7d48ff2ea77ff': arm: Implement a NEON version of 422 h264_h_loop_filter_chroma Merged-by: James Almer <jamrial@gmail.com>	2019-03-22 16:06:04 -03:00
Martin Storsjö	0676de935b	arm: Implement a NEON version of 422 h264_h_loop_filter_chroma Previously, the 420 version was used even for 422. This fixes occasional checkasm failures. Signed-off-by: Martin Storsjö <martin@martin.st>	2019-03-21 22:03:46 +02:00
James Almer	d6b62ce1ac	Merge commit 'cef914e08310166112ac09567e66452a7679bfc8' * commit 'cef914e08310166112ac09567e66452a7679bfc8': arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 Merged-by: James Almer <jamrial@gmail.com>	2019-03-14 16:19:41 -03:00
James Almer	7b9ca44cbc	arm/h264dsp: change loop filter stride argument to ptrdiff_t This was missed in `d5d699ab6e` Signed-off-by: James Almer <jamrial@gmail.com>	2019-02-20 19:38:55 -03:00
Martin Storsjö	cef914e083	arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 This makes it similar to put_epel16_v6, and gives a 10-25% speedup of this function. Before: Cortex A7 A8 A9 A53 A72 vp8_put_epel16_h6v6_neon: 3058.0 2218.5 2459.8 2183.0 1572.2 After: vp8_put_epel16_h6v6_neon: 2670.8 1934.2 2244.4 1729.4 1503.9 Signed-off-by: Martin Storsjö <martin@martin.st>	2019-02-19 11:46:18 +02:00
Meng Wang	3b2fd96048	avcodec/arm/hevcdsp_sao : add NEON optimization for sao Signed-off-by: Meng Wang <wangmeng.kids@bytedance.com> Reviewed-by: Shengbin Meng <shengbinmeng@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-04-09 03:45:15 +02:00
Martin Storsjö	5f83935de4	arm: hevcdsp: Add commas between macro arguments When targeting darwin, clang requires commas between arguments, while the no-comma form is allowed for other targets. Since Xcode 9.3, the bundled clang supports altmacro and doesn't require using gas-preprocessor any longer. Signed-off-by: Martin Storsjö <martin@martin.st>	2018-03-31 21:59:01 +03:00
Martin Storsjö	6660bc034d	arm: hevcdsp: Avoid using macro expansion counters Clang supports the macro expansion counter (used for making unique labels within macro expansions), but not when targeting darwin. Convert uses of the counter into normal local labels, as used elsewhere. Since Xcode 9.3, the bundled clang supports altmacro and doesn't require using gas-preprocessor any longer. Signed-off-by: Martin Storsjö <martin@martin.st>	2018-03-31 21:55:32 +03:00
James Almer	a7109b82c4	Merge commit 'ab05d3934de8e932dbd77979a687e6598e67535c' * commit 'ab05d3934de8e932dbd77979a687e6598e67535c': arm: vc1dsp: Add commas between macro arguments Merged-by: James Almer <jamrial@gmail.com>	2018-03-30 15:47:31 -03:00
Martin Storsjö	ab05d3934d	arm: vc1dsp: Add commas between macro arguments When targeting darwin, clang requires commas between arguments, while the no-comma form is allowed for other targets. Since Xcode 9.3, the bundled clang supports altmacro and doesn't require using gas-preprocessor any longer. Signed-off-by: Martin Storsjö <martin@martin.st>	2018-03-30 15:47:24 +03:00
Aurelien Jacobs	f677718bc8	sbcenc: add armv6 and neon asm optimizations This was originally based on libsbc, and was fully integrated into ffmpeg.	2018-03-07 22:26:53 +01:00
Michael Niedermayer	7dbbb75ee3	avcodec/arm/sbrdsp_neon: Use a free register instead of putting 2 things in one Fixes high pitched shriek Fixes: 25420848_1478428308873746_4255813235963330560_n.mp4 Reported-by: Dale Curtis <dalecurtis@google.com> Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2018-01-12 22:45:02 +01:00
James Almer	36de24d5b7	arm/hevc_idct: fix compilation on Android Compilation error "out of range" fixed for armeabi-v7a. Compilation failed trying to build libvlc.aar for ARM7 android on ubuntu 16.04 host. Error messages is "Offset out of range". The reason of the error is assembler LDR directives in function "ff_hevc_transform_luma_4x4_neon_8" need local storage in range <1k, but no such storage provided. Based on a patch by Ihor Bobalo <bob@eleks.com> Suggested-by: wbs Signed-off-by: James Almer <jamrial@gmail.com>	2017-12-09 21:46:34 +02:00
Alexandra Hájková	7993ec19af	hevc: Add hevc_get_pixel_4/8/12/16/24/32/48/64 Checkasm timings: block size bitdepth C NEON 4 8 bit: 146.7 48.7 10 bit: 146.7 52.7 8 8 bit: 430.3 84.4 10 bit: 430.4 119.5 12 8 bit: 812.8 141.0 10 bit: 812.8 195.0 16 8 bit: 1499.1 268.0 10 bit: 1498.9 368.4 24 8 bit: 4394.2 574.8 10 bit: 3696.3 804.8 32 8 bit: 5108.6 568.9 10 bit: 4249.6 918.8 48 8 bit: 16819.6 2304.9 10 bit: 13882.0 3178.5 64 8 bit: 13490.8 1799.5 10 bit: 11018.5 2519.4 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-12-08 23:41:01 +02:00
James Almer	68e479e3ad	Merge commit 'b487add7ecf78efda36d49815f8f8757bd24d4cb' * commit 'b487add7ecf78efda36d49815f8f8757bd24d4cb': arm: Remove a redundant check in fmtconvert_init_arm.c Merged-by: James Almer <jamrial@gmail.com>	2017-11-11 23:30:31 -03:00
James Almer	640073eceb	Merge commit '9dde6ab06c48f9447cd16f39bee33569cddb7be4' * commit '9dde6ab06c48f9447cd16f39bee33569cddb7be4': arm: Fix SIGBUS on ARM when compiled with binutils 2.29 Merged-by: James Almer <jamrial@gmail.com>	2017-11-11 13:44:07 -03:00
James Almer	921993503b	Merge commit 'd7320ca3ed10f0d35b3740fa03341161e74275ea' * commit 'd7320ca3ed10f0d35b3740fa03341161e74275ea': arm: Avoid using .dn register aliases Merged-by: James Almer <jamrial@gmail.com>	2017-10-30 21:00:51 -03:00
James Almer	62d86c41b7	Merge commit 'ce080f47b8b55ab3d41eb00487b138d9906d114d' * commit 'ce080f47b8b55ab3d41eb00487b138d9906d114d': hevc: Add NEON 32x32 IDCT Merged-by: James Almer <jamrial@gmail.com>	2017-10-30 19:59:01 -03:00
James Almer	e9e7e1cc6b	Merge commit '118dd4a321a2d67f67c21b076abd0b4d939ab642' * commit '118dd4a321a2d67f67c21b076abd0b4d939ab642': hevc: 16x16 NEON idct: Use the right element size for loads/stores Merged-by: James Almer <jamrial@gmail.com>	2017-10-30 19:56:29 -03:00
James Almer	31a4112936	Merge commit 'edbf0fffb15dde7a1de70b05855529d5fc769f14' * commit 'edbf0fffb15dde7a1de70b05855529d5fc769f14': hevc: Add NEON add_residual for bitdepth 10 Merged-by: James Almer <jamrial@gmail.com>	2017-10-30 18:07:31 -03:00
James Almer	05beee44c6	Merge commit 'e1c2453a4fac1f7116244d0d05310935c20887e6' * commit 'e1c2453a4fac1f7116244d0d05310935c20887e6': arm: hevc_idct: Tune the add_res_8x8 and add_res_32x32 functions Merged-by: James Almer <jamrial@gmail.com>	2017-10-30 17:41:08 -03:00
James Almer	999c2271a5	Merge commit '0d4d43513786f1df4d561e1fac924fb0722c6700' * commit '0d4d43513786f1df4d561e1fac924fb0722c6700': hevc: Add NEON add_residual for bitdepth 8 See `03cecf45c1` Merged-by: James Almer <jamrial@gmail.com>	2017-10-30 17:39:37 -03:00
James Almer	f9c3fbc00c	Merge commit '3d69dd65c6771c28d3bf4e8e53a905aa8cd01fd9' * commit '3d69dd65c6771c28d3bf4e8e53a905aa8cd01fd9': hevc: Add support for bitdepth 10 for IDCT DC Merged-by: James Almer <jamrial@gmail.com>	2017-10-30 16:03:27 -03:00
James Almer	cc8c2d3609	Merge commit '358adef0305618219522858e471edf7e0cb4043e' * commit '358adef0305618219522858e471edf7e0cb4043e': hevc: Add NEON IDCT DC functions for bitdepth 8 See `03cecf45c1` Merged-by: James Almer <jamrial@gmail.com>	2017-10-30 15:58:40 -03:00
James Almer	9840ca70e7	Merge commit '89d9869d2491d4209d707a8e7f29c58227ae5a4e' * commit '89d9869d2491d4209d707a8e7f29c58227ae5a4e': hevc: Add NEON 16x16 IDCT Merged-by: James Almer <jamrial@gmail.com>	2017-10-27 18:22:39 -03:00
James Almer	c0683dce89	Merge commit '0b9a237b2386ff84a6f99716bd58fa27a1b767e7' * commit '0b9a237b2386ff84a6f99716bd58fa27a1b767e7': hevc: Add NEON 4x4 and 8x8 IDCT [15:12:59] <@ubitux> hevc_idct_4x4_8_c: 389.1 [15:13:00] <@ubitux> hevc_idct_4x4_8_neon: 126.6 [15:13:02] <@ubitux> our ^ [15:13:06] <@ubitux> hevc_idct_4x4_8_c: 389.3 [15:13:08] <@ubitux> hevc_idct_4x4_8_neon: 107.8 [15:13:10] <@ubitux> hevc_idct_4x4_10_c: 418.6 [15:13:12] <@ubitux> hevc_idct_4x4_10_neon: 108.1 [15:13:14] <@ubitux> libav ^ [15:13:30] <@ubitux> so yeah, we can probably trash our versions here Merged-by: James Almer <jamrial@gmail.com>	2017-10-24 19:10:22 -03:00
Martin Storsjö	b487add7ec	arm: Remove a redundant check in fmtconvert_init_arm.c This was missed in `e2710e790c`, where have_vfp && !have_vfpv3 were converted into have_vfp_vm. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-10-24 09:07:01 +03:00
Martin Storsjö	9dde6ab06c	arm: Fix SIGBUS on ARM when compiled with binutils 2.29 In binutils 2.29, the behavior of the ADR instruction changed so that 1 is added to the address of a Thumb function (previously nothing was added). This allows the loaded address to be passed to a BLX instruction and the correct mode change will occur. See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458 By using adr with a label that isn't annotated as a thumb function, we avoid the new behaviour in binutils 2.29 and get the same behaviour as in prior releases, and as in other assemblers (ms armasm.exe, clang's built in assembler) - an idea that Janne Grunau came up with. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-09-02 22:18:20 +03:00
Muhammad Faiz	0780ad9c68	avcodec/rdft: remove sintable It is redundant with costable. The first half of sintable is identical with the second half of costable. The second half of sintable is negative value of the first half of sintable. The computation is changed to handle sign of sin values, in C code and ARM assembly code. Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>	2017-07-11 13:22:02 +07:00
Clément Bœsch	b12a36170b	lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis	2017-06-28 12:22:39 +02:00
Clément Bœsch	e4a27e2f2d	lavc/arm: fix lack of precision in ff_ps_stereo_interpolate_neon The code originally pre-multiply by 2 the steps, causing the running sum of the h factors to drift away due to the lack of precision. It quickly causes an inaccuracy > 0.01. I tried diverse approaches such as multiply by 2.0 (instead of adding the value itself) without success. I'm unable to bench the impact of this change, feel free to compare. This commit fixes the incoming aacpsdsp tests. Following is an alternative simplified function (matching the incoming AArch64 code) that may be used: function ff_ps_stereo_interpolate_neon, export=1 vld1.32 {q0}, [r2] vld1.32 {q1}, [r3] ldr r12, [sp] vmov.f32 q8, q0 vmov.f32 q9, q1 vzip.32 q8, q0 vzip.32 q9, q1 1: vld1.32 {d4}, [r0,:64] vld1.32 {d6}, [r1,:64] vadd.f32 q8, q8, q9 vadd.f32 q0, q0, q1 vmov.f32 d5, d4 vmov.f32 d7, d6 vmul.f32 q2, q2, q8 vmla.f32 q2, q3, q0 vst1.32 {d4}, [r0,:64]! vst1.32 {d5}, [r1,:64]! subs r12, r12, #1 bgt 1b bx lr endfunc	2017-06-28 11:59:34 +02:00
Martin Storsjö	d7320ca3ed	arm: Avoid using .dn register aliases clang now (in the upcoming 5.0 version) is capable of building our arm assembly without relying on gas-preprocessor, although clang/LLVM doesn't support .dn register aliases. The VC1 MC assembly was only built and used if the chosen assembler supported the .dn directives though. This was supported as long as gas-preprocessor was used. This means that VC1 decoding got a speed regression on clang 5.0, unless the user manually chose using gas-preprocessor again. By avoiding using the .dn register aliases, we can build the VC1 MC assembly with the latest clang version. Support for the .dn/.qn directives in clang/LLVM isn't actively planned, see https://bugs.llvm.org/show_bug.cgi?id=18199. This partially reverts `896a5bff64`. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-05-15 09:52:18 +03:00

1 2 3 4 5 ...

938 Commits