1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

44547 Commits

Author SHA1 Message Date
Martin Storsjö
a63da4511d aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14740 bytes to 24292 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub8_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1387.4
vp9_inv_dct_dct_16x16_sub16_add_neon:   1387.6
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5198.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    5198.6
vp9_inv_dct_dct_32x32_sub8_add_neon:    5196.3
vp9_inv_dct_dct_32x32_sub12_add_neon:   6183.4
vp9_inv_dct_dct_32x32_sub16_add_neon:   6174.3
vp9_inv_dct_dct_32x32_sub20_add_neon:   7151.4
vp9_inv_dct_dct_32x32_sub24_add_neon:   7145.3
vp9_inv_dct_dct_32x32_sub28_add_neon:   8119.3
vp9_inv_dct_dct_32x32_sub32_add_neon:   8118.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:     640.8
vp9_inv_dct_dct_16x16_sub4_add_neon:     639.0
vp9_inv_dct_dct_16x16_sub8_add_neon:     842.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1388.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   1389.3
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    3685.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    3685.1
vp9_inv_dct_dct_32x32_sub8_add_neon:    3684.4
vp9_inv_dct_dct_32x32_sub12_add_neon:   5312.2
vp9_inv_dct_dct_32x32_sub16_add_neon:   5315.4
vp9_inv_dct_dct_32x32_sub20_add_neon:   7154.9
vp9_inv_dct_dct_32x32_sub24_add_neon:   7154.5
vp9_inv_dct_dct_32x32_sub28_add_neon:   8126.6
vp9_inv_dct_dct_32x32_sub32_add_neon:   8127.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:32:03 +02:00
Martin Storsjö
5eb5aec475 arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 12388 bytes to 19784 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    212.0    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2102.1   1521.7   1736.2   1265.8
vp9_inv_dct_dct_16x16_sub4_add_neon:    2104.5   1533.0   1736.6   1265.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2484.8   1828.7   2014.4   1506.5
vp9_inv_dct_dct_16x16_sub12_add_neon:   2851.2   2117.8   2294.8   1753.2
vp9_inv_dct_dct_16x16_sub16_add_neon:   3239.4   2408.3   2543.5   1994.9
vp9_inv_dct_dct_32x32_sub1_add_neon:     758.3    456.7    864.5    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10776.7   7949.8   8567.7   6819.7
vp9_inv_dct_dct_32x32_sub4_add_neon:   10865.6   8131.5   8589.6   6816.3
vp9_inv_dct_dct_32x32_sub8_add_neon:   12053.9   9271.3   9387.7   7564.0
vp9_inv_dct_dct_32x32_sub12_add_neon:  13328.3  10463.2  10217.0   8321.3
vp9_inv_dct_dct_32x32_sub16_add_neon:  14176.4  11509.5  11018.7   9062.3
vp9_inv_dct_dct_32x32_sub20_add_neon:  15301.5  12999.9  11855.1   9828.2
vp9_inv_dct_dct_32x32_sub24_add_neon:  16482.7  14931.5  12650.1  10575.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17589.5  15811.9  13482.8  11333.4
vp9_inv_dct_dct_32x32_sub32_add_neon:  18696.2  17049.2  14355.6  12089.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    1203.5    998.2   1035.3    763.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1203.5    998.1   1035.5    760.8
vp9_inv_dct_dct_16x16_sub8_add_neon:    1926.1   1610.6   1722.1   1271.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2873.2   2129.7   2285.1   1757.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   3221.4   2520.3   2557.6   2002.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     753.0    457.5    866.6    554.6
vp9_inv_dct_dct_32x32_sub2_add_neon:    7554.6   5652.4   6048.4   4920.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    7549.9   5685.0   6046.9   4925.7
vp9_inv_dct_dct_32x32_sub8_add_neon:    8336.9   6704.5   6604.0   5478.0
vp9_inv_dct_dct_32x32_sub12_add_neon:  10914.0   9777.2   9240.4   7416.9
vp9_inv_dct_dct_32x32_sub16_add_neon:  11859.2  11223.3   9966.3   8095.1
vp9_inv_dct_dct_32x32_sub20_add_neon:  15237.1  13029.4  11838.3   9829.4
vp9_inv_dct_dct_32x32_sub24_add_neon:  16293.2  14379.8  12644.9  10572.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17424.3  15734.7  13473.0  11326.9
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.3  17457.0  14298.6  12080.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:32:00 +02:00
Martin Storsjö
79d332ebbd aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function
This allows reusing the macro for a separate implementation of the
pass2 function.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:31:56 +02:00
Martin Storsjö
47b3c2c18d arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function
This allows reusing the macro for a separate implementation of the
pass2 function.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:31:53 +02:00
Martin Storsjö
115476018d aarch64: vp9itxfm: Make the larger core transforms standalone functions
This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
19496 to 14740 bytes.

This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.

Before:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub32_add_neon:   8095.7

After:
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub16_add_neon:   1390.1
vp9_inv_dct_dct_32x32_sub4_add_neon:    5199.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8125.8

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:31:45 +02:00
Martin Storsjö
0331c3f5e8 arm: vp9itxfm: Make the larger core transforms standalone functions
This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
15324 to 12388 bytes.

This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub4_add_neon:    2063.4   1516.0   1719.5   1245.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3279.3   2454.5   2525.2   1982.3
vp9_inv_dct_dct_32x32_sub4_add_neon:   10750.0   7955.4   8525.6   6754.2
vp9_inv_dct_dct_32x32_sub32_add_neon:  18574.0  17108.4  14216.7  12010.2

After:
vp9_inv_dct_dct_16x16_sub4_add_neon:    2060.8   1608.5   1735.7   1262.0
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.2   2443.5   2546.1   1999.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10682.0   8043.8   8581.3   6810.1
vp9_inv_dct_dct_32x32_sub32_add_neon:  18522.4  17277.4  14286.7  12087.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:31:40 +02:00
Diego Biurrun
c546147db0 configure: Correctly recurse in do_check_deps()
Fixes all sorts of configuration problems introducec by dad7a9c7c0
on non-Linux or non-vanilla configs. Also removes a line made redundant
in that commit.
2017-02-08 21:23:41 +01:00
Martin Storsjö
57ec83e424 omx: Use the EOS flag to handle flushing at the end
This avoids having to count the number of frames sent to the codec
and the number of output packets received; instead just wait until
the encoder returns a buffer with the EOS flag set.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-08 11:50:57 +02:00
Diego Biurrun
dad7a9c7c0 configure: Rework dependency handling for conflicting components
This makes the feature more visible and obvious.
2017-02-07 19:06:04 +01:00
Diego Biurrun
9127ac5ebc configure: Add name parameter to require_pkg_config() helper function
This allows distinguishing between the internal variable name for
external libraries and the pkg-config package name. Having both
names available avoids special-casing outside the helper function
when the two identifiers do not match.
2017-02-07 19:06:02 +01:00
Diego Biurrun
a25dac976a Use bitstream_init8() where appropriate 2017-02-07 18:27:21 +01:00
Diego Biurrun
71a49fe25f configure: Use cppflags check helper functions where appropriate 2017-02-06 15:43:56 +01:00
Diego Biurrun
0ce3761c78 configure: Add stdlib.h #include to CPPFLAGS check helper functions
This ensures that added CPPFLAGS are validated against libc headers.
2017-02-06 15:43:56 +01:00
Alexandra Hájková
f7ec7f546f wma: Convert to the new bitstream reader 2017-02-06 15:13:34 +01:00
Martin Storsjö
58d87e0f49 aarch64: vp9itxfm: Restructure the idct32 store macros
This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.

This is also arguably more readable.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-05 13:05:32 +02:00
Martin Storsjö
3bc5b28d5a arm: vp9itxfm: Avoid .irp when it doesn't save any lines
This makes it more readable.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-05 12:59:19 +02:00
John Stebbins
8e67039c63 asfdec: Use the ASF stream count when iterating
The AVFormat stream count can be larger due external factors, such as
an id3 tag appended.

Avoid an out of bound read.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-02-04 15:21:12 +01:00
Diego Biurrun
7abdd026df asm: Consistently uppercase SECTION markers 2017-02-03 11:37:53 +01:00
Diego Biurrun
740b0bf03b build: Ignore generated .version files 2017-02-03 11:37:53 +01:00
Martin Storsjö
15a92e0c40 rtmp: Correctly handle the Window Acknowledgement Size packets
This swaps which field is set when the Window Acknowledgement Size
and Set Peer BW packets are received, renames the fields in
order to clarify their role further and adds verbose comments
explaining their respective roles and how well the code currently
does what it is supposed to.

The Set Peer BW packet tells the receiver of the packet (which
can be either client or server) that it should not send more data
if it already has sent more data than the specified number of bytes,
without receiving acknowledgement for them. Actually checking this
limit is currently not implemented.

In order to be able to check that properly, one can send the
Window Acknowledgement Size packet, which tells the receiver of the
packet that it needs to send Acknowledgement packets
(RTMP_PT_BYTES_READ) at least after receiving a given number of bytes
since the last Acknowledgement.

Therefore, when we receive a Window Acknowledgement Size packet,
this sets the maximum number of bytes we can receive without sending
an Acknowledgement; therefore when handling this packet we should set
the receive_report_size field (previously client_report_size).

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-03 09:27:41 +02:00
Martin Storsjö
a1a143adb0 rtmp: Rename packet types to closer match the spec
Also rename comments and log messages accordingly,
and add clarifying comments for some hardcoded values.

The previous names were taken from older, reverse engineered
references.

These names match the official public rtmp specification, and
matches the names used by wirecast in annotating captured
streams. These names also avoid hardcoding the roles of server
and client, since the handling of them is irrelevant of whether
we act as server or client.

The RTMP_PT_PING type maps to RTMP_PT_USER_CONTROL.

The SERVER_BW and CLIENT_BW types are a bit more intertwined;
RTMP_PT_SERVER_BW maps to RTMP_PT_WINDOW_ACK_SIZE and
RTMP_PT_CLIENT_BW maps to RTMP_PT_SET_PEER_BW.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-03 09:26:46 +02:00
Diego Biurrun
bcaedef118 configure: Add require_cpp_condition() convenience function
Simplifies checking for conditions in external library headers and
aborting if said conditions are not met.
2017-02-02 17:49:51 +01:00
Diego Biurrun
aba7fdcc8b configure: Add require_header() convenience function
Simplifies checking for external library headers and aborting if
the external library support was requested, but is not available.
2017-02-02 17:49:51 +01:00
Diego Biurrun
a97563c889 configure: Simplify libxcb check 2017-02-02 17:38:50 +01:00
Alexandra Hájková
c29da01ac9 svq3: Convert to the new bitstream reader 2017-02-02 17:06:17 +01:00
Diego Biurrun
acfa7a2178 configure: Drop weak dependencies on external libraries for webm muxer
Weak dependencies on external libraries do not obviate having to
explicitly enable these libraries, so the weak dependency does not
simplify the configure command line nor have any real effect.
2017-02-02 14:35:44 +01:00
Diego Biurrun
6698832079 configure: Add proper weak dependency of drawtext filter on libfontconfig 2017-02-02 14:35:44 +01:00
Diego Biurrun
24d5680bbc configure: Simplify inline asm check with appropriate helper function 2017-02-02 14:34:05 +01:00
Diego Biurrun
b3825723dc configure: Merge compiler/libc/os hacks sections 2017-02-02 14:34:05 +01:00
wm4
577326d430 lavc: deprecate refcounted_frames field
No deprecation guards, because the old decode API (for which this field
is needed) doesn't have any either.

This field should be removed together with the old decode calls.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-02-01 10:47:46 +01:00
wm4
3ad825793a hwcontext_cuda: implement frames_get_constraints
Copied and modified from hwcontext_qsv.c.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-02-01 10:43:00 +01:00
Anton Khirnov
fd9212f2ed Mark some arrays that never change as const. 2017-02-01 10:42:59 +01:00
Anton Khirnov
b420a27e74 avconv: allow -b to be used with streamcopy
In this mode it tells the muxer about the bitrate of the input stream.
2017-02-01 10:42:59 +01:00
Alexandra Hájková
ab2539bd37 ffv1: Convert to the new bitstream reader 2017-01-31 17:54:11 +01:00
Alexandra Hájková
2d72219554 h261dec: Convert to the new bitstream reader 2017-01-31 17:54:11 +01:00
Alexandra Hájková
2b94ed12de shorten: Convert to the new bitstream reader 2017-01-31 17:54:11 +01:00
Alexandra Hájková
5a6da49dd0 ralf: Convert to the new bitstream reader 2017-01-31 17:54:11 +01:00
Alexandra Hájková
d85b37a955 loco: Convert to the new bitstream reader 2017-01-31 17:54:10 +01:00
Alexandra Hájková
0f94de8a09 fic: Convert to the new bitstream reader 2017-01-31 17:54:10 +01:00
Alexandra Hájková
6b1f559f9a dirac: Convert to the new bitstream reader 2017-01-31 17:54:10 +01:00
Alexandra Hájková
ffc00df0a6 cavs: Convert to the new bitstream reader 2017-01-31 17:54:10 +01:00
Alexandra Hájková
0c89ff82e9 aic: Convert to the new bitstream reader 2017-01-31 17:54:10 +01:00
Diego Biurrun
d4c2103bd3 golomb: Convert to the new bitstream reader 2017-01-31 17:46:19 +01:00
Diego Biurrun
ab87af4163 configure: Add proper weak dependency of avformat on network 2017-01-31 15:50:20 +01:00
Andreas Cadhalpun
612cc07128 pgssubdec: reset rle_data_len/rle_remaining_len on allocation error
The code relies on their validity and otherwise can try to access a NULL
object->rle pointer, causing segmentation faults.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-01-31 09:35:54 +01:00
Mark Thompson
708e84cda1 mov: Avoid memcmp of uninitialised data
The string codec name need not be as long as the value we are
comparing it to, so memcmp may make decisions derived from
uninitialised data that valgrind then complains about (though the
overall result of the function will always be the same).  Use
strncmp instead, which will stop at the first zero byte and
therefore not encounter this issue.
2017-01-30 23:03:52 +00:00
Mark Thompson
ca62236a89 vaapi_encode: Add VP8 support 2017-01-30 23:03:46 +00:00
Mark Thompson
ff35aa8ca4 vaapi_encode: Pass framerate parameters to driver
Only do this when building for a recent VAAPI version - initial
driver implementations were confused about the interpretation of the
framerate field, but hopefully this will be consistent everywhere
once 0.40.0 is released.
2017-01-30 22:52:54 +00:00
Mark Thompson
eddfb57210 vaapi_h264: Enable VBR mode
Default to using VBR when a target bitrate is set, unless the max rate
is also set and matches the target.  Changes to the Intel driver mean
that min_qp is also respected in this case, so set a codec default to
unset the value rather than using the current default inherited from
the MPEG-4 part 2 encoder.
2017-01-30 22:52:54 +00:00
Mark Thompson
f033ba470f vaapi_encode: Support VBR mode
This includes a backward-compatibility hack to choose CBR anyway on
old drivers which have no CBR support, so that existing programs will
continue to work their options now map to VBR.
2017-01-30 22:52:54 +00:00