./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
-s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
32-bit mul, power8 only.
~2x speedup:
rgb24
24431 UNITS in yuv2packed2, 16384 runs, 0 skips
13783 UNITS in yuv2packed2, 16383 runs, 1 skips
bgr24
24396 UNITS in yuv2packed2, 16384 runs, 0 skips
14059 UNITS in yuv2packed2, 16384 runs, 0 skips
rgba
26815 UNITS in yuv2packed2, 16383 runs, 1 skips
12797 UNITS in yuv2packed2, 16383 runs, 1 skips
bgra
27060 UNITS in yuv2packed2, 16384 runs, 0 skips
13138 UNITS in yuv2packed2, 16384 runs, 0 skips
argb
26998 UNITS in yuv2packed2, 16384 runs, 0 skips
12728 UNITS in yuv2packed2, 16381 runs, 3 skips
bgra
26651 UNITS in yuv2packed2, 16384 runs, 0 skips
13124 UNITS in yuv2packed2, 16384 runs, 0 skips
This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
153372 UNITS in postfilter_c, 65536 runs, 0 skips
73164 UNITS in postfilter_neon, 65536 runs, 0 skips -> 2.1x speedup
80591 UNITS in deemphasis_c, 131072 runs, 0 skips
43969 UNITS in deemphasis_neon, 131072 runs, 0 skips -> 1.83x speedup
Total decoder speedup: ~15% on a Raspberry Pi 3 (from 28.1x to 33.5x realtime)
Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;
for (int i = 0; i < len; i += 4) {
y[0] = x[0] + c1*state;
y[1] = x[1] + c2*state + c1*x[0];
y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];
state = y[3];
y += 4;
x += 4;
}
Unlike the x86 version, duplication is used instead of pslldq so
the structure and tables are different.
The `opencl_get_plane_format` function was incorrectly determining the
value used to set the image channel order. This resulted in all RGB
pixel formats being set to the `CL_RGBA` pixel format, regardless of
whether or not they actually *were* RGBA.
This patch fixes the issue by using the `offset` and depth of components
rather than the loop index to determine the value of `order`.
Signed-off-by: Jarek Samic <cldfire3@gmail.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Fix memory leak after write trailer for #7827, only store a audio
packet whose buffer has size greater than zero in cur_audio_pkt.
Audio packets with size zero, but with side-data currently lead to
memleaks, in the Matroska muxer, because they are not properly freed:
They are currently put into an AVPacket in the MatroskaMuxContext to
ensure that the necessary audio is always available for a new cluster,
but are only written and freed when their size is > 0.
As the only use we have for such packets consists in updating the
CodecPrivate it makes no sense to store these packets at all and this
is how this commit solves the memleak.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
Otherwise, AV1 encodes with FFmpeg trigger use-of-uninitialized-value
warnings under MemorySanitizer, and the output buffer potentially
changes from run to run.
Signed-off-by: James Almer <jamrial@gmail.com>
When asetnsamples uses output samples < input samples, remaining samples build up in the fifo over time.
Fix this by marking the filter as ready again if there are enough samples.
Regression since ef3babb2c7
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
-s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
32-bit mul, power8 only.
1.8-2.3x speedup:
rgb24
18192 UNITS in yuv2packed1, 32767 runs, 1 skips
9983 UNITS in yuv2packed1, 32760 runs, 8 skips
bgr24
18665 UNITS in yuv2packed1, 32766 runs, 2 skips
9925 UNITS in yuv2packed1, 32763 runs, 5 skips
rgba
20239 UNITS in yuv2packed1, 32767 runs, 1 skips
8794 UNITS in yuv2packed1, 32759 runs, 9 skips
bgra
20354 UNITS in yuv2packed1, 32768 runs, 0 skips
8770 UNITS in yuv2packed1, 32761 runs, 7 skips
argb
20185 UNITS in yuv2packed1, 32768 runs, 0 skips
8761 UNITS in yuv2packed1, 32761 runs, 7 skips
bgra
20360 UNITS in yuv2packed1, 32766 runs, 2 skips
8759 UNITS in yuv2packed1, 32764 runs, 4 skips
This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
Fixes ticket #4519.
The metadata starting at 0xe00004 is encrypted
with the password "meta" but zlib does not
support decryption, so no kux metadata is read.
There is a calculation error in xcbgrab_reposition() that breaks
vertical repositioning on follow_mouse. It made the bottom
reposition occur when moving the mouse lower than N pixels after
the capture bottom edge, instead of before.
This commit fixes the calculation to match the documentation.
follow_mouse: centered or number of pixels. The documentation says:
When it is specified with "centered", the grabbing region follows
the mouse pointer and keeps the pointer at the center of region;
otherwise, the region follows only when the mouse pointer reaches
within PIXELS (greater than zero) to the edge of region.
Fixes: negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself
Fixes: 13999/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AGM_fuzzer-5644405991538688
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The earlier code had three flaws:
1. The case of an unknown-sized element inside a finite-sized element
(which is against the specifications) was not caught.
2. The error message wasn't helpful: It compared the length of the child
with the offset of the end of the parent and claimed that the first
exceeds the latter, although that is not necessarily true.
3. Unknown-sized elements that are not parsed can't be skipped. Given
that according to the Matroska specifications only the segment and the
clusters can be of unknown-size, this is handled by not allowing any
other units to have infinite size whereas the earlier code would seek
back by 1 byte upon encountering an infinite-size element that ought
to be skipped.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Fixes: 13997/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AGM_fuzzer-5701427252428800
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
GPB is the default type, just contains forward references but the
slice_type is B slice with higher encoding efficiency than regular P
slice, but lower performance.
Add an option to allow user to set regular P slice.
Fix ticket#6870
Test data on Intel Kabylake (i7-7567U CPU @ 3.50GHz):
1. ffmpeg -hwaccel qsv -c:v h264_qsv -i bbb_sunflower_1080p_30fps_normal.mp4 -vsync passthrough
-vframes 1000 -c:v hevc_qsv -gpb 0 -bf 0 -q 25 test_gpb_off_bf0_kbl.mp4
transcoding fps: 85
encoded file size of test_gpb_off_bf0_kbl.mp4: 21960100 (bytes)
2. ffmpeg -hwaccel qsv -c:v h264_qsv -i bbb_sunflower_1080p_30fps_normal.mp4 -vsync passthrough
-vframes 1000 -c:v hevc_qsv -gpb 1 -bf 0 -q 25 test_gpb_on_bf0_kbl.mp4
transcoding fps: 79
encoded file size oftest_gpb_on_bf0_kbl.mp4: 21211449 (bytes)
In this case, enable gpb can bring about 7% performance drop but 3.4% encoding efficiency improvment.
Signed-off-by: Zhong Li <zhong.li@intel.com>
write_tmcd allows tmcd track to be created with any mode but in
mov_write_header, index for first tmcd track is only set for modes
MP4 or MOV, causing a crash if tmcd creation is attempted with other
modes.
This brings the channel order in line with that used in 32-bit mode (BGR0).
24-bit decoding is disabled by default (#ifdef ZMBV_ENABLE_24BPP), and no
prior encoders or sample videos are known to exist for this bit depth, so
I consider this change in implementation is unlikely to affect anyone.
The decision has been made in agreement with the DOSBox Development Team
(dosbox.crew@gmail.com), specifically with harekiet, who wrote the original
codec.
Always exposes low_power option for all qsv encoder, and reports a warning
if VDENC is not supported in current version of MSDK.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Zhong Li <zhong.li@intel.com>
Fixes: Out of array access
Fixes: 13984/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RSCC_fuzzer-5734128093233152
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>