1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-13 21:28:01 +02:00
Go to file
Ben Avison 701e8b42e1 vc-1: Optimise parser (with special attention to ARM)
The previous implementation of the parser made four passes over each input
buffer (reduced to two if the container format already guaranteed the input
buffer corresponded to frames, such as with MKV). But these buffers are
often 200K in size, certainly enough to flush the data out of L1 cache, and
for many CPUs, all the way out to main memory. The passes were:

1) locate frame boundaries (not needed for MKV etc)
2) copy the data into a contiguous block (not needed for MKV etc)
3) locate the start codes within each frame
4) unescape the data between start codes

After this, the unescaped data was parsed to extract certain header fields,
but because the unescape operation was so large, this was usually also
effectively operating on uncached memory. Most of the unescaped data was
simply thrown away and never processed further. Only step 2 - because it
used memcpy - was using prefetch, making things even worse.

This patch reorganises these steps so that, aside from the copying, the
operations are performed in parallel, maximising cache utilisation. No more
than the worst-case number of bytes needed for header parsing is unescaped.
Most of the data is, in practice, only read in order to search for a start
code, for which optimised implementations already existed in the H264 codec
(notably the ARM version uses prefetch, so we end up doing both remaining
passes at maximum speed). For MKV files, we know when we've found the last
start code of interest in a given frame, so we are able to avoid doing even
that one remaining pass for most of the buffer.

In some use-cases (such as the Raspberry Pi) video decode is handled by the
GPU, but the entire elementary stream is still fed through the parser to
pick out certain elements of the header which are necessary to manage the
decode process. As you might expect, in these cases, the performance of the
parser is significant.

To measure parser performance, I used the same VC-1 elementary stream in
either an MPEG-2 transport stream or a MKV file, and fed it through avconv
with -c:v copy -c:a copy -f null. These are the gperftools counts for
those streams, both filtered to only include vc1_parse() and its callees,
and unfiltered (to include the whole binary). Lower numbers are better:

                Before          After
File  Filtered  Mean   StdDev   Mean   StdDev  Confidence  Change
M2TS  No        861.7  8.2      650.5  8.1     100.0%      +32.5%
MKV   No        868.9  7.4      731.7  9.0     100.0%      +18.8%
M2TS  Yes       250.0  11.2     27.2   3.4     100.0%      +817.9%
MKV   Yes       149.0  12.8     1.7    0.8     100.0%      +8526.3%

Yes, that last case shows vc1_parse() running 86 times faster! The M2TS
case does show a larger absolute improvement though, since it was worse
to begin with.

This patch has been tested with the FATE suite (albeit on x86 for speed).

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2014-08-04 22:22:54 +02:00
compat Work around broken floating point limits on some systems. 2014-03-10 10:27:17 +01:00
doc avutil: add AV_PIX_FMT_YA16 pixel format 2014-08-04 12:55:08 +01:00
libavcodec vc-1: Optimise parser (with special attention to ARM) 2014-08-04 22:22:54 +02:00
libavdevice oss_audio: Split muxer and demuxer 2014-07-18 15:04:25 -07:00
libavfilter avcodec: Deprecate dtg_active_format field in favor of avframe side-data 2014-08-03 15:43:02 -07:00
libavformat Add Icecast protocol 2014-08-04 12:56:42 +03:00
libavresample lavr: Do not change the sample format for mono audio 2014-08-03 23:13:26 +02:00
libavutil avutil: add AV_PIX_FMT_YA16 pixel format 2014-08-04 12:55:08 +01:00
libswscale swscale: support AV_PIX_FMT_YA16 as input 2014-08-04 12:56:05 +01:00
presets
tests fate: Only generate tests/pixfmts.mak if some pixfmts fate test is run 2014-08-04 11:08:35 -07:00
tools ismindex: Add an option for outputting files elsewhere than in the current directory 2014-07-03 20:13:27 +03:00
.gitignore fate: Split fate-pixdesc tests and dispatch them through Make 2014-08-01 01:18:30 -07:00
arch.mak aarch64: add armv8 CPU flag 2014-04-06 21:18:49 +02:00
avconv_dxva2.c avconv_dxva2: define all used GUIDs directly instead of relying on the dxva2api.h header 2014-04-29 16:50:43 +02:00
avconv_filter.c avconv: do not use the stream codec context for encoding 2014-06-01 08:33:21 +02:00
avconv_opt.c video4linux2: Avoid a floating point exception 2014-07-28 13:11:41 -07:00
avconv_vda.c avconv: Support VDA hwaccel 2014-05-11 15:00:03 +02:00
avconv_vdpau.c avconv: add support for VDPAU decoding 2013-11-23 11:55:53 +01:00
avconv.c avconv: set the output stream timebase 2014-07-09 13:30:33 +00:00
avconv.h avconv: do not use the stream codec context for encoding 2014-06-01 08:33:21 +02:00
avplay.c avplay: Handle pixel aspect ratio properly 2014-07-08 21:14:43 +03:00
avprobe.c cmdutils: wrap exit explicitly 2013-07-07 21:43:23 +02:00
Changelog Add Icecast protocol 2014-08-04 12:56:42 +03:00
cmdutils_common_opts.h avplay: Accept cpuflags option 2013-10-22 10:49:31 +02:00
cmdutils.c avconv: Match stream id 2014-03-13 11:59:34 +01:00
cmdutils.h cmdutils: Mark exit_program as av_noreturn 2014-03-28 00:40:43 +01:00
common.mak build: export library dependencies in ${name}_FFLIBS 2014-05-20 00:43:51 +02:00
configure vc-1: Add platform-specific start code search routine to VC1DSPContext. 2014-08-04 22:22:54 +02:00
COPYING.GPLv2
COPYING.GPLv3
COPYING.LGPLv2.1
COPYING.LGPLv3
CREDITS
INSTALL
library.mak build: Support executable only ldflags 2014-07-21 22:18:35 +02:00
LICENSE Add libx265 encoder 2014-02-12 13:13:17 +00:00
Makefile configure: add support for neon intrinsics 2014-07-21 23:18:29 +02:00
README
RELEASE Prepare for 11_alpha1 Release 2014-03-13 08:24:11 -04:00
version.sh

Libav README
------------

1) Documentation
----------------

* Read the documentation in the doc/ directory.

2) Licensing
------------

* See the LICENSE file.