FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Clément Bœsch	100026bed6	Merge commit 'ec903058447ad5be34d89533962e9ae1aa1c78f7' * commit 'ec903058447ad5be34d89533962e9ae1aa1c78f7': configure: Simplify clock_gettime() test nanosleep check also updated. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 11:04:50 +01:00
Clément Bœsch	38343651a8	Merge commit '3aa9d37d03da3c9b482d19b3988659287815280e' * commit '3aa9d37d03da3c9b482d19b3988659287815280e': build: Fix directory dependencies of tests/pixfmts.mak target This might not be necessary given our mkdirs in the configure, but it probably doesn't hurt. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 11:01:02 +01:00
Clément Bœsch	4ae80c3753	Merge commit '0e5dde739943168d6f61d3fb40b3f622e7abfeff' * commit '0e5dde739943168d6f61d3fb40b3f622e7abfeff': configure: Fix --disable-pod2man / --disable-texi2html This commit is a noop, we have dedicated documentation option for this purpose. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 10:47:01 +01:00
Clément Bœsch	d0db00c808	configure: remove pod2man from the config list The configure has the --disable-manpages option for this purpose, and --disable-pod2man is currently ignored due to that. This is also consistent with the other documentation options.	2017-03-20 10:45:48 +01:00
Clément Bœsch	715f781834	Merge commit 'b8c2d407efa41c3db6813ad67fadd51b814765bd' * commit 'b8c2d407efa41c3db6813ad67fadd51b814765bd': configure: Simplify libopenjpeg check This commit is a noop, our libopenjpeg check is already "simpler". Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 09:48:22 +01:00
Clément Bœsch	6d6f79c737	Merge commit '2610c9528f86286e4c6e174411a26ff5b4815cde' * commit '2610c9528f86286e4c6e174411a26ff5b4815cde': configure: Move initial VAAPI check to a more sensible place This commit is a noop, see `17989dcf54` Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 09:46:33 +01:00
Clément Bœsch	7317b69630	Merge commit '5b5ed92d92252a685e891a5d636870e223b63228' * commit '5b5ed92d92252a685e891a5d636870e223b63228': sanm: Change type of array pitch parameters to ptrdiff_t Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 09:43:52 +01:00
Clément Bœsch	64926292a6	lavc/copy_block: style fix	2017-03-20 09:23:15 +01:00
Clément Bœsch	21c18b0878	Merge commit '73f5e17a203713c4ac4e5a821809823b383b195f' * commit '73f5e17a203713c4ac4e5a821809823b383b195f': copy_block: Change type of array stride parameters to ptrdiff_t Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 09:22:36 +01:00
Clément Bœsch	e59d8d030f	Merge commit '21e500ba647aec233d5930d3d1081489d0d53ceb' * commit '21e500ba647aec233d5930d3d1081489d0d53ceb': svq1dec: Change type of array pitch parameters to ptrdiff_t Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 09:17:34 +01:00
Clément Bœsch	bb3ad401fc	Merge commit '746c56b7730ce09397d3a8354acc131285e9d829' * commit '746c56b7730ce09397d3a8354acc131285e9d829': indeo: Change type of array pitch parameters to ptrdiff_t Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 09:07:57 +01:00
Clément Bœsch	3835283293	Merge commit '4fb311c804098d78e5ce5f527f9a9c37536d3a08' * commit '4fb311c804098d78e5ce5f527f9a9c37536d3a08': Drop memalign hack Merged, as this may indeed be uneeded since `46e3936fb0`. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:54:44 +01:00
Clément Bœsch	a5cf6628d6	Merge commit 'f01f7a7846529b7c3ef343f117eaa2c0a1457af0' * commit 'f01f7a7846529b7c3ef343f117eaa2c0a1457af0': hwcontext_dxva2: use the special UC copy for downloading frames Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:37:40 +01:00
Clément Bœsch	8200b16a9c	Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5' * commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5': imgutils: add a function for copying image data from GPU mapped memory Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:34:10 +01:00
Clément Bœsch	5d23543277	Merge commit '24da430324735f95880c4a4a54298dc8023125bb' * commit '24da430324735f95880c4a4a54298dc8023125bb': Changelog: mark the release 12 branch This commit is a noop. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:26:09 +01:00
Clément Bœsch	518961bc99	Merge commit '851960f6f8cf1f946fe42fa36cf6598fac68072c' * commit '851960f6f8cf1f946fe42fa36cf6598fac68072c': lavc: Remove old vaapi decode infrastructure avconv_vaapi: Convert to use hw_frames_ctx only vaapi_mpeg4: Convert to use the new VAAPI hwaccel code vaapi_vc1: Convert to use the new VAAPI hwaccel code vaapi_mpeg2: Convert to use the new VAAPI hwaccel code vaapi_h264: Convert to use the new VAAPI hwaccel code lavc: Rewrite VAAPI decode infrastructure This merge is a noop, these commits have already been cherry-picked. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:25:01 +01:00
Clément Bœsch	464fcc979c	Merge commit '72eba6558ee4f10239ba3f472c0b033ec70082a7' * commit '72eba6558ee4f10239ba3f472c0b033ec70082a7': wmavoice: Simplify GetBitContext initialization This commit is a noop. We don't have that code anymore since `3deb4b54a2`. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:21:09 +01:00
Clément Bœsch	e514a1d404	Merge commit '80fc75d51e3312e1890591048eb6a3d499b6e49d' * commit '80fc75d51e3312e1890591048eb6a3d499b6e49d': Changelog: Mention mov with multiple stsd Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:19:03 +01:00
Clément Bœsch	45982bdcd0	Merge commit '728e80cd2e1d4b7c3e26489efcd77bd7a9e84a99' * commit '728e80cd2e1d4b7c3e26489efcd77bd7a9e84a99': High Definition Compatible Digital (HDCD) decoder filter, using libhdcd This commit is a noop, we have that code natively. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:17:09 +01:00
Clément Bœsch	b1a80bdb62	Merge commit '95f80293456d9d4b1b096621260c38bc90325ec0' * commit '95f80293456d9d4b1b096621260c38bc90325ec0': avprobe: Fix memory leak This commit is a noop, ffprobe is not affected. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:12:57 +01:00
Clément Bœsch	5e5e793552	doc/APIchanges: fill date & hash for AV_PIX_FMT_FLAG_BAYER	2017-03-20 08:10:54 +01:00
Clément Bœsch	6557d784d2	Merge commit '8db804e8f549d5b86a1edf62736e0ef80f160da9' * commit '8db804e8f549d5b86a1edf62736e0ef80f160da9': mov: Remove old b-frame/video delay heuristic This commit is a noop, see `425be3c810` Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:09:15 +01:00
Clément Bœsch	64722057b4	Merge commit 'eb96505b761eb02b6a3efc76d854afa6a41941ff' * commit 'eb96505b761eb02b6a3efc76d854afa6a41941ff': mov: Remove ancient heuristic hack This commit is a noop, see `04f8d31287` Merged-by: Clément Bœsch <u@pkh.me>	2017-03-20 08:08:31 +01:00
Clément Bœsch	e811f84a2e	swscale: cosmetics in is{RGB,BGR}inInt Reduce diff with Libav.	2017-03-20 08:02:30 +01:00
Clément Bœsch	d6635daded	swscale: remove unused is{RGB,BGR}inBytes	2017-03-20 08:02:30 +01:00
Clément Bœsch	ff6bc16c5a	swscale: use a (more correct) function for isPacked	2017-03-20 08:02:30 +01:00
Clément Bœsch	2b9a52bcca	swscale: use a function for isAnyRGB	2017-03-20 08:02:30 +01:00
Clément Bœsch	c30875e8b2	swscale: use a function for isBayer	2017-03-20 08:02:30 +01:00
Clément Bœsch	9c2436e1e7	lavu: add AV_PIX_FMT_FLAG_BAYER	2017-03-20 08:02:30 +01:00
Clément Bœsch	f052b1b40f	swscale: use a function for isGray	2017-03-20 08:02:30 +01:00
Clément Bœsch	08e1376d81	fate: add fate-sws-pixdesc-query Test the pixel format querying within libswscale.	2017-03-20 08:02:30 +01:00
Michael Niedermayer	23f3f92361	avcodec/mjpegdec: quant_matrixes can be up to 65535, use uint16_t Fixes invalid shift Fixes: 870/clusterfuzz-testcase-5649105424482304 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-20 01:38:04 +01:00
Michael Niedermayer	656a17e126	avcodec/mjpegdec: Check quant_matrixes values for being non zero Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-20 01:38:02 +01:00
Michael Niedermayer	98da63b3f5	avcodec/vp56: Check avctx->error_concealment before enabling EC Fixes timeout with 847/clusterfuzz-testcase-5291877358108672 Fixes timeout with 850/clusterfuzz-testcase-5721296509861888 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-20 01:33:08 +01:00
Michael Niedermayer	a84d610b37	avcodec/h264_direct: Fix runtime error: signed integer overflow: -9 - 2147483647 cannot be represented in type 'int' Fixes: 864/clusterfuzz-testcase-4774385942528000 See: [FFmpeg-devel] [PATCH 1/2] avcodec/h264_direct: Fix runtime error: signed integer overflow: 2147483647 - -14133 cannot be represented in type 'int' See: [FFmpeg-devel] [PATCH 2/2] avcodec/h264_direct: Fix runtime error: signed integer overflow: -9 - 2147483647 cannot be represented in type 'int' Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-20 01:33:08 +01:00
Michael Niedermayer	5d996b5649	avcodec/tiff: Check stripsize strippos for overflow Fixes: 861/clusterfuzz-testcase-5688284384591872 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-20 01:33:08 +01:00
Martin Storsjö	61b8a9ea29	aarch64: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 21512 bytes to 31400 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_10_neon: 284.6 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 1902.7 vp9_inv_dct_dct_16x16_sub4_add_10_neon: 1903.0 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 2201.1 vp9_inv_dct_dct_16x16_sub12_add_10_neon: 2510.0 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 2821.3 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1011.6 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 9716.5 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 9704.9 vp9_inv_dct_dct_32x32_sub8_add_10_neon: 10641.7 vp9_inv_dct_dct_32x32_sub12_add_10_neon: 11555.7 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 12499.8 vp9_inv_dct_dct_32x32_sub20_add_10_neon: 13403.7 vp9_inv_dct_dct_32x32_sub24_add_10_neon: 14335.8 vp9_inv_dct_dct_32x32_sub28_add_10_neon: 15253.6 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 16179.5 After: vp9_inv_dct_dct_16x16_sub1_add_10_neon: 282.8 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 1142.4 vp9_inv_dct_dct_16x16_sub4_add_10_neon: 1139.0 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 1772.9 vp9_inv_dct_dct_16x16_sub12_add_10_neon: 2515.2 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 2823.5 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1012.7 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 6944.4 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 6944.2 vp9_inv_dct_dct_32x32_sub8_add_10_neon: 7609.8 vp9_inv_dct_dct_32x32_sub12_add_10_neon: 9953.4 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 10770.1 vp9_inv_dct_dct_32x32_sub20_add_10_neon: 13418.8 vp9_inv_dct_dct_32x32_sub24_add_10_neon: 14330.7 vp9_inv_dct_dct_32x32_sub28_add_10_neon: 15257.1 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 16190.6 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:54:37 +02:00
Martin Storsjö	eabc5abf94	arm: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14516 bytes to 22484 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_10_neon: 454.0 270.7 418.5 295.4 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 3840.2 3244.8 3700.1 2337.9 vp9_inv_dct_dct_16x16_sub4_add_10_neon: 4212.5 3575.4 3996.9 2571.6 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 5174.4 4270.5 4615.5 3031.9 vp9_inv_dct_dct_16x16_sub12_add_10_neon: 5676.0 4908.5 5226.5 3491.3 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 6403.9 5589.0 5839.8 3948.5 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1710.7 944.7 1582.1 1045.4 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 21040.7 16706.1 18687.7 13193.1 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22197.7 18282.7 19577.5 13918.6 vp9_inv_dct_dct_32x32_sub8_add_10_neon: 24511.5 20911.5 21472.5 15367.5 vp9_inv_dct_dct_32x32_sub12_add_10_neon: 26939.5 24264.3 23239.1 16830.3 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 29419.5 26845.1 25020.6 18259.9 vp9_inv_dct_dct_32x32_sub20_add_10_neon: 31146.4 29633.5 26803.3 19721.7 vp9_inv_dct_dct_32x32_sub24_add_10_neon: 33376.3 32507.8 28642.4 21174.2 vp9_inv_dct_dct_32x32_sub28_add_10_neon: 35629.4 35439.6 30416.5 22625.7 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37269.9 37914.9 32271.9 24078.9 After: vp9_inv_dct_dct_16x16_sub1_add_10_neon: 454.0 276.0 418.5 295.1 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 2336.2 1886.0 2251.0 1458.6 vp9_inv_dct_dct_16x16_sub4_add_10_neon: 2531.0 2054.7 2402.8 1591.1 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 3848.6 3491.1 3845.7 2554.8 vp9_inv_dct_dct_16x16_sub12_add_10_neon: 5703.8 4831.6 5230.8 3493.4 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 6399.5 5567.0 5832.4 3951.5 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1722.1 938.5 1577.3 1044.5 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 15003.5 11576.8 13105.8 9602.2 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 15768.5 12677.2 13726.0 10138.1 vp9_inv_dct_dct_32x32_sub8_add_10_neon: 17278.8 14825.4 14907.5 11185.7 vp9_inv_dct_dct_32x32_sub12_add_10_neon: 22335.7 21544.5 20379.5 15019.8 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 24165.6 23881.7 21938.6 16308.2 vp9_inv_dct_dct_32x32_sub20_add_10_neon: 31082.2 30860.9 26835.3 19711.3 vp9_inv_dct_dct_32x32_sub24_add_10_neon: 33102.6 31922.8 28638.3 21161.0 vp9_inv_dct_dct_32x32_sub28_add_10_neon: 35104.9 34867.5 30411.7 22621.2 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37438.1 39103.4 32217.8 24067.6 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:54:33 +02:00
Martin Storsjö	d564c9018f	aarch64: vp9itxfm16: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:54:30 +02:00
Martin Storsjö	0f2705e66b	aarch64: vp9itxfm16: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_16bpp_neon.o from 26288 to 21512 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp9_inv_dct_dct_16x16_sub4_add_10_neon: 1887.4 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 2801.5 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 9691.4 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 16154.9 After: vp9_inv_dct_dct_16x16_sub4_add_10_neon: 1899.5 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 2827.2 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 9714.7 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 16175.9 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:54:26 +02:00
Martin Storsjö	0ea603203d	arm: vp9itxfm16: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_16bpp_neon.o from 17500 to 14516 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to add more optimized versions of these transforms. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub4_add_10_neon: 4237.4 3561.5 3971.8 2525.3 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 6371.9 5452.0 5779.3 3910.5 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22068.8 17867.5 19555.2 13871.6 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37268.9 38684.2 32314.2 23969.0 After: vp9_inv_dct_dct_16x16_sub4_add_10_neon: 4375.1 3571.9 4283.8 2567.2 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 6415.6 5578.9 5844.6 3948.3 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22653.7 18079.7 19603.7 13905.3 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37593.2 38862.2 32235.8 24070.9 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:54:19 +02:00
Martin Storsjö	b76533f105	aarch64: vp9itxfm16: Restructure the idct32 store macros This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:54:15 +02:00
Martin Storsjö	d613251622	aarch64: vp9itxfm16: Avoid .irp when it doesn't save any lines This makes the code a bit more readable. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:54:09 +02:00
Martin Storsjö	25ced1eb1c	aarch64: vp9itxfm16: Fix a typo in a comment Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:54:02 +02:00
Martin Storsjö	32e273c111	arm: vp9itxfm16: Avoid reloading the idct32 coefficients Keep the idct32 coefficients in narrow form in q6-q7, and idct16 coefficients in lengthened 32 bit form in q0-q3. Avoid clobbering q0-q3 in the pass1 function, and squeeze the idct16 coefficients into q0-q1 in the pass2 function to avoid reloading them. The idct16 coefficients are clobbered and reloaded within idct32_odd though, since that turns out to be faster than narrowing them and swapping them into q6-q7. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22653.8 18268.4 19598.0 14079.0 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37699.0 38665.2 32542.3 24472.2 After: vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22270.8 18159.3 19531.0 13865.0 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37523.3 37731.6 32181.7 24071.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:53:57 +02:00
Martin Storsjö	c1619318e5	arm: vp9itxfm16: Fix vertical alignment Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:53:48 +02:00
Martin Storsjö	b46d37e93a	arm: vp9itxfm16: Use the right lane size This makes the code slightly clearer, but doesn't make any functional difference. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:53:43 +02:00
Martin Storsjö	21c89f3a26	arm/aarch64: vp9: Fix vertical alignment Align the second/third operands as they usually are. Due to the wildly varying sizes of the written out operands in aarch64 assembly, the column alignment is usually not as clear as in arm assembly. This is cherrypicked from libav commit `7995ebfad1`. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:53:32 +02:00
Martin Storsjö	70317b25aa	arm/aarch64: vp9itxfm: Skip loading the min_eob pointer when it won't be used In the half/quarter cases where we don't use the min_eob array, defer loading the pointer until we know it will be needed. This is cherrypicked from libav commit `3a0d5e206d`. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:53:28 +02:00
Martin Storsjö	b7a565fe71	arm: vp9itxfm: Template the quarter/half idct32 function This reduces the number of lines and reduces the duplication. Also simplify the eob check for the half case. If we are in the half case, we know we at least will need to do the first three slices, we only need to check eob for the fourth one, so we can hardcode the value to check against instead of loading from the min_eob array. Since at most one slice can be skipped in the first pass, we can unroll the loop for filling zeros completely, as it was done for the quarter case before. This allows skipping loading the min_eob pointer when using the quarter/half cases. This is cherrypicked from libav commit `98ee855ae0`. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-19 22:53:22 +02:00

... 4 5 6 7 8 ...

84672 Commits