1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-08 13:22:53 +02:00
Commit Graph

21404 Commits

Author SHA1 Message Date
Alexandra Hájková
178b4ea5f9 xsubdec: Convert to the new bitstream reader 2016-11-24 11:22:13 +01:00
Alexandra Hájková
be35ef92a4 xan: Convert to the new bitstream reader 2016-11-24 11:22:13 +01:00
Alexandra Hájková
f9c59f26c8 wnv1: Convert to the new bitstream reader 2016-11-24 11:22:13 +01:00
Alexandra Hájková
0536e7d782 vima: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
e5bdfc6790 vble: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
104a4289f9 utvideodec: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
85f760fedd twinvq: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
0bea79afa6 tscc2: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
8e4cadea5d truespeech: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
0ac07d0b8d tiertex: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
9ab1a3e283 truemotion2: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
9f78e3a46d svq1dec: Convert to the new bitstream reader 2016-11-24 11:22:12 +01:00
Alexandra Hájková
6efbc88a5c smacker: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Alexandra Hájková
087bc8d704 sipr: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Alexandra Hájková
f26cbb555b rtjpeg: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Alexandra Hájková
c60cda7cb4 ra288: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Alexandra Hájková
7d8075cf47 ra144: Convert to the new bitstream reader 2016-11-24 11:22:11 +01:00
Martin Storsjö
79566ec8c7 arm: vp9itxfm: Rename a macro parameter to fit better
Since the same parameter is used for both input and output,
the name inout is more fitting.

This matches the naming used below in the dmbutterfly macro.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-23 23:56:56 +02:00
Martin Storsjö
721bc37522 arm/aarch64: vp9itxfm: Fix indentation of macro arguments
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-23 23:56:16 +02:00
James Almer
aa498c3183 avpacket: fix leak on realloc in av_packet_add_side_data()
If realloc fails, the pointer is overwritten and the previously allocated buffer
is leaked, which goes against the expected functionality of keeping the packet
unchanged in case of error.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-23 13:17:52 +01:00
Andreas Cadhalpun
f92d7bdfdd libopusdec: default to stereo for invalid number of channels
This fixes an out-of-bounds read if avc->channels is 0.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-23 13:03:15 +01:00
Diego Biurrun
b34c6cd57a dvbsub: cosmetics: Group all debug code together 2016-11-23 07:40:46 +01:00
Diego Biurrun
b8cd7a3c8d dvbsub: Check for errors from system()
libavcodec/dvbsubdec.c:145:5: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result]
libavcodec/dvbsubdec.c:148:5: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result]
2016-11-23 07:36:32 +01:00
Diego Biurrun
6427379f23 als: Restructure DEBUG ifdefs to avoid unused function parameter warnings 2016-11-22 17:28:17 +01:00
Diego Biurrun
367f95af55 ac3enc: Restructure DEBUG ifdefs to avoid unused function parameter warnings 2016-11-22 17:28:17 +01:00
Diego Biurrun
81a3c42abe Drop some bogus Doxygen documentation. 2016-11-21 14:29:11 +01:00
Diego Biurrun
a1d9de304f Fix some mismatches between function parameter and doxygen parameter names. 2016-11-21 14:29:10 +01:00
Martin Storsjö
4d960a1185 aarch64: vp9itxfm: Use w3 instead of x3 for the int eob parameter
The clobbering tests in checkasm are only invoked when testing
correctness, so this bug didn't show up when benchmarking the
dc-only version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-18 23:17:33 +02:00
Janne Grunau
e5b0fc170f arm: vp9itxfm: Simplify the stack alignment code
This is one instruction less for thumb, and only have got
1/2 arm/thumb specific instructions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-18 23:17:26 +02:00
Alexandra Hájková
0b5a26e8bc qdm2: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:22 +01:00
Alexandra Hájková
0dabd329e8 qcelp: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:18 +01:00
Alexandra Hájková
770406d1e8 pcx: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:14 +01:00
Alexandra Hájková
b3441350fa opus: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:11 +01:00
Alexandra Hájková
6f94a64bd6 nellymoser: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:08 +01:00
Alexandra Hájková
15d4dbfd4a jvdec: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:36:04 +01:00
Alexandra Hájková
1df549bfa2 hqx: Convert to the new bitstream header
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:43 +01:00
Alexandra Hájková
c5e01d9170 hq_hqa: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:39 +01:00
Alexandra Hájková
b2c56301f9 gsm: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:36 +01:00
Alexandra Hájková
2188d53906 g72x: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:33 +01:00
Alexandra Hájková
799703c3ea g2meet: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:26 +01:00
Alexandra Hájková
b37b681f77 fraps: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:14 +01:00
Alexandra Hájková
692ba4fe64 flashsv: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:10 +01:00
Alexandra Hájková
418ccdd703 faxcompr: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:07 +01:00
Alexandra Hájková
8df1ac6b78 exr: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:04 +01:00
Alexandra Hájková
2906d8dcb3 escape130: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:35:01 +01:00
Alexandra Hájková
c43eb73172 escape124: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:57 +01:00
Alexandra Hájková
d8618570be dvdsubdec: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:53 +01:00
Alexandra Hájková
928f8c7ce3 dss_sp: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:38 +01:00
Alexandra Hájková
942e84d2a3 cook: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:32 +01:00
Alexandra Hájková
e561146611 cljrdec: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:29 +01:00
Alexandra Hájková
b4c0daa83c cdxl: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:24 +01:00
Alexandra Hájková
0977a7c2f6 binkaudio: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:15 +01:00
Alexandra Hájková
9a23b59943 bink: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:10 +01:00
Alexandra Hájková
dae9b0b9c6 avs: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:34:04 +01:00
Alexandra Hájková
edd4c19a78 atrac3plus: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:33:59 +01:00
Alexandra Hájková
0272119202 atrac: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:33:50 +01:00
Alexandra Hájková
41679be1a2 asvdec: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:33:45 +01:00
Alexandra Hájková
012c451153 adpcm: Convert to the new bitstream header
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:33:01 +01:00
Alexandra Hájková
ed006ae4e2 4xm: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:32:57 +01:00
Alexandra Hájková
b25180801b on2avc: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:32:54 +01:00
Alexandra Hájková
7d957b3f47 ea: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:32:45 +01:00
Alexandra Hájková
adb1ebb36c eamad: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:32:40 +01:00
Alexandra Hájková
d182d8a6d3 cllc: Convert to the new bitstream reader
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:31:59 +01:00
Alexandra Hájková
dd3d7ddf2a lavc: add a new bitstream reader to replace get_bits
The new bit reader features a simpler API and an implementation without
stacks of nested macros.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-18 10:31:56 +01:00
Luca Barbato
adb0e941c3 avpacket: Mark src pointer as constant 2016-11-17 19:41:12 +01:00
Diego Biurrun
76167140a9 qsvdec: Drop stray extra braces around initializer
libavcodec/qsvdec.c:93:5: warning: braces around scalar initializer
2016-11-17 16:53:48 +01:00
Diego Biurrun
715b824346 qsv: Drop some unused variables 2016-11-17 16:53:48 +01:00
Janne Grunau
e7ae8f7a71 aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne};
The latter is 1 cycle faster on a cortex-53 and since the operands are
bytewise (or larger) bitmask (impossible to overflow to zero) both are
equivalent.
2016-11-16 09:05:18 +01:00
Janne Grunau
d7595de0b2 aarch64: vp9: use alternative returns in the core loop filter function
Since aarch64 has enough free general purpose registers use them to
branch to the appropiate storage code. 1-2 cycles faster for the
functions using loop_filter 8/16, ... on a cortex-a53. Mixed results
(up to 2 cycles faster/slower) on a cortex-a57.
2016-11-16 09:05:18 +01:00
Gianluigi Tiesi
e17567a831 libilbc: support for latest git of libilbc
In the latest git commits of libilbc developers removed WebRtc_xxx typedefs.
This commit uses int types instead. It's safe to apply also for previous
versions since WebRtc_Word16 was always a typedef of int16_t and
WebRtc_UWord16 a typedef of uint16_t.

Reviewed-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-11-16 08:21:05 +01:00
Diego Biurrun
f7407f56cb golomb: Replace __PRETTY_FUNCTION__ with __func__ for tracing
The former is a GNU extension while the latter is C99.
2016-11-15 09:41:08 +01:00
Mark Thompson
e0b164576f qsv: Add VP8 decoder 2016-11-14 19:38:20 +00:00
Mark Thompson
182cf170a5 vp8: Return stream format information from parser 2016-11-14 19:38:19 +00:00
Mark Thompson
b6582b2927 qsv: Add VC-1 decoder
It uses the same code as the MPEG-2 decoder, so the file is renamed
to contain all "other" (that is, not H.26[45]) codecs.
2016-11-14 19:38:19 +00:00
Mark Thompson
fea4dc05b4 vc1: Return stream format information from parser 2016-11-14 19:38:19 +00:00
Mark Thompson
0940b748bd qsvdec: Only warn about unconsumed data if it happens more than once 2016-11-14 19:38:19 +00:00
Mark Thompson
030d84fa2e qsvdec: Pass field order information to libmfx
The VC-1 decoder fails to initialise if this is not set.
2016-11-14 19:38:19 +00:00
Mark Thompson
cd1047f391 qsvdec: Pass the correct profile to libmfx
This was correct for H.26[45], because libmfx uses the same values
derived from profile_idc and the constraint_set flags, but it is
wrong for other codecs.

Also avoid passing FF_LEVEL_UNKNOWN (-99) as the level, as this is
certainly invalid.
2016-11-14 19:38:19 +00:00
Mark Thompson
3297577f3e mpegvideo: Return correct coded frame sizes from parser 2016-11-14 19:38:19 +00:00
Janne Grunau
31756abe29 aarch64: vp9: loop_filter: fix typo in skip flatout8 check
The 16_16 loop filter functions could miss an early exit before
flatout8.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-14 08:51:58 +02:00
Martin Storsjö
9d2afd1eb8 aarch64: vp9: Implement NEON loop filters
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the loop filters with
16 pixels at a time. The implementation is fully templated, with
a single macro which can generate versions for both 8 and
16 pixels wide, for both 4, 8 and 16 pixels loop filters
(and the 4/8 mixed versions as well).

For the 8 pixel wide versions, it is pretty close in speed (the
v_4_8 and v_8_8 filters are the best examples of this; the h_4_8
and h_8_8 filters seem to get some gain in the load/transpose/store
part). For the 16 pixels wide ones, we get a speedup of around
1.2-1.4x compared to the 32 bit version.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM AArch64
vp9_loop_filter_h_4_8_neon:          144.0   127.2
vp9_loop_filter_h_8_8_neon:          207.0   182.5
vp9_loop_filter_h_16_8_neon:         415.0   328.7
vp9_loop_filter_h_16_16_neon:        672.0   558.6
vp9_loop_filter_mix2_h_44_16_neon:   302.0   203.5
vp9_loop_filter_mix2_h_48_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_84_16_neon:   365.0   305.2
vp9_loop_filter_mix2_h_88_16_neon:   376.0   305.2
vp9_loop_filter_mix2_v_44_16_neon:   193.2   128.2
vp9_loop_filter_mix2_v_48_16_neon:   246.7   218.4
vp9_loop_filter_mix2_v_84_16_neon:   248.0   218.5
vp9_loop_filter_mix2_v_88_16_neon:   302.0   218.2
vp9_loop_filter_v_4_8_neon:           89.0    88.7
vp9_loop_filter_v_8_8_neon:          141.0   137.7
vp9_loop_filter_v_16_8_neon:         295.0   272.7
vp9_loop_filter_v_16_16_neon:        546.0   453.7

The speedup vs C code in checkasm tests is around 2-7x, which is
pretty much the same as for the 32 bit version. Even if these functions
are faster than their 32 bit equivalent, the C version that we compare
to also became around 1.3-1.7x faster than the C version in 32 bit.

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 4-5x.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                         A57 gcc-5.3  neon
loop_filter_h_4_8_neon:        256.6  93.4
loop_filter_h_8_8_neon:        307.3 139.1
loop_filter_h_16_8_neon:       340.1 254.1
loop_filter_h_16_16_neon:      827.0 407.9
loop_filter_mix2_h_44_16_neon: 524.5 155.4
loop_filter_mix2_h_48_16_neon: 644.5 173.3
loop_filter_mix2_h_84_16_neon: 630.5 222.0
loop_filter_mix2_h_88_16_neon: 697.3 222.0
loop_filter_mix2_v_44_16_neon: 598.5 100.6
loop_filter_mix2_v_48_16_neon: 651.5 127.0
loop_filter_mix2_v_84_16_neon: 591.5 167.1
loop_filter_mix2_v_88_16_neon: 855.1 166.7
loop_filter_v_4_8_neon:        271.7  65.3
loop_filter_v_8_8_neon:        312.5 106.9
loop_filter_v_16_8_neon:       473.3 206.5
loop_filter_v_16_16_neon:      976.1 327.8

The speed-up compared to the C functions is 2.5 to 6 and the cortex-a57
is again 30-50% faster than the cortex-a53.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-14 00:10:13 +02:00
Martin Storsjö
52d196fb30 arm: vp9itxfm: Simplify txfm string comparisons
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-14 00:10:13 +02:00
Martin Storsjö
3c9546dfaf aarch64: vp9: Add NEON itxfm routines
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the 16x16 and 32x32
transforms in slices 8 pixels wide instead of 4. This gives
a speedup of around 1.4x compared to the 32 bit version.

The fact that aarch64 doesn't have the same d/q register
aliasing makes some of the macros quite a bit simpler as well.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM  AArch64
vp9_inv_adst_adst_4x4_add_neon:       90.0     87.7
vp9_inv_adst_adst_8x8_add_neon:      400.0    354.7
vp9_inv_adst_adst_16x16_add_neon:   2526.5   1827.2
vp9_inv_dct_dct_4x4_add_neon:         74.0     72.7
vp9_inv_dct_dct_8x8_add_neon:        271.0    256.7
vp9_inv_dct_dct_16x16_add_neon:     1960.7   1372.7
vp9_inv_dct_dct_32x32_add_neon:    11988.9   8088.3
vp9_inv_wht_wht_4x4_add_neon:         63.0     57.7

The speedup vs C code (2-4x) is smaller than in the 32 bit case,
mostly because the C code ends up significantly faster (around
1.6x faster, with GCC 5.4) when built for aarch64.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                                A57 gcc-5.3   neon
vp9_inv_adst_adst_4x4_add_neon:       152.2   60.0
vp9_inv_adst_adst_8x8_add_neon:       948.2  288.0
vp9_inv_adst_adst_16x16_add_neon:    4830.4 1380.5
vp9_inv_dct_dct_4x4_add_neon:         153.0   58.6
vp9_inv_dct_dct_8x8_add_neon:         789.2  180.2
vp9_inv_dct_dct_16x16_add_neon:      3639.6  917.1
vp9_inv_dct_dct_32x32_add_neon:     20462.1 4985.0
vp9_inv_wht_wht_4x4_add_neon:          91.0   49.8

The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
is around 30-50% faster on the a57 compared to the a53.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-14 00:10:13 +02:00
Diego Biurrun
800d91d348 Drop pointless void* casts 2016-11-13 18:44:01 +01:00
Diego Biurrun
d316f9cefc aac: Drop pointless cast 2016-11-13 18:44:00 +01:00
Diego Biurrun
3b50dbc51f ratecontrol: Use correct function pointer casts instead of void*
libavcodec/ratecontrol.c:120:9: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]
libavcodec/ratecontrol.c:121:9: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]
2016-11-12 16:47:06 +01:00
Martin Storsjö
dd299a2d6d arm: vp9: Add NEON loop filters
This work is sponsored by, and copyright, Google.

The implementation tries to have smart handling of cases
where no pixels need the full filtering for the 8/16 width
filters, skipping both calculation and writeback of the
unmodified pixels in those cases. The actual effect of this
is hard to test with checkasm though, since it tests the
full filtering, and the benefit depends on how many filtered
blocks use the shortcut.

Examples of relative speedup compared to the C version, from checkasm:
                          Cortex       A7     A8     A9    A53
vp9_loop_filter_h_4_8_neon:          2.72   2.68   1.78   3.15
vp9_loop_filter_h_8_8_neon:          2.36   2.38   1.70   2.91
vp9_loop_filter_h_16_8_neon:         1.80   1.89   1.45   2.01
vp9_loop_filter_h_16_16_neon:        2.81   2.78   2.18   3.16
vp9_loop_filter_mix2_h_44_16_neon:   2.65   2.67   1.93   3.05
vp9_loop_filter_mix2_h_48_16_neon:   2.46   2.38   1.81   2.85
vp9_loop_filter_mix2_h_84_16_neon:   2.50   2.41   1.73   2.85
vp9_loop_filter_mix2_h_88_16_neon:   2.77   2.66   1.96   3.23
vp9_loop_filter_mix2_v_44_16_neon:   4.28   4.46   3.22   5.70
vp9_loop_filter_mix2_v_48_16_neon:   3.92   4.00   3.03   5.19
vp9_loop_filter_mix2_v_84_16_neon:   3.97   4.31   2.98   5.33
vp9_loop_filter_mix2_v_88_16_neon:   3.91   4.19   3.06   5.18
vp9_loop_filter_v_4_8_neon:          4.53   4.47   3.31   6.05
vp9_loop_filter_v_8_8_neon:          3.58   3.99   2.92   5.17
vp9_loop_filter_v_16_8_neon:         3.40   3.50   2.81   4.68
vp9_loop_filter_v_16_16_neon:        4.66   4.41   3.74   6.02

The speedup vs C code is around 2-6x. The numbers are quite
inconclusive though, since the checkasm test runs multiple filterings
on top of each other, so later rounds might end up with different
codepaths (different decisions on which filter to apply, based
on input pixel differences). Disabling the early-exit in the asm
doesn't give a fair comparison either though, since the C code
only does the necessary calcuations for each row.

Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 4-9x.

This is pretty similar in runtime to the corresponding routines
in libvpx. (This is comparing vpx_lpf_vertical_16_neon,
vpx_lpf_horizontal_edge_8_neon and vpx_lpf_horizontal_edge_16_neon
to vp9_loop_filter_h_16_8_neon, vp9_loop_filter_v_16_8_neon
and vp9_loop_filter_v_16_16_neon - note that the naming of horizonal
and vertical is flipped between the libraries.)

In order to have stable, comparable numbers, the early exits in both
asm versions were disabled, forcing the full filtering codepath.

                           Cortex           A7      A8      A9     A53
vp9_loop_filter_h_16_8_neon:             597.2   472.0   482.4   415.0
libvpx vpx_lpf_vertical_16_neon:         626.0   464.5   470.7   445.0
vp9_loop_filter_v_16_8_neon:             500.2   422.5   429.7   295.0
libvpx vpx_lpf_horizontal_edge_8_neon:   586.5   414.5   415.6   383.2
vp9_loop_filter_v_16_16_neon:            905.0   784.7   791.5   546.0
libvpx vpx_lpf_horizontal_edge_16_neon: 1060.2   751.7   743.5   685.2

Our version is consistently faster on on A7 and A53, marginally slower on
A8, and sometimes faster, sometimes slower on A9 (marginally slower in all
three tests in this particular test run).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-11 14:16:42 +02:00
Diego Biurrun
f7d183f084 libxvid: Check return value of write() call
libavcodec/libxvid_rc.c:106:9: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
2016-11-11 10:17:07 +01:00
Diego Biurrun
e5e8a26dcf libxvid: Use proper context in av_log() calls 2016-11-11 10:17:07 +01:00
Diego Biurrun
12db2832e4 libxvid: Require availability of mkstemp()
The replacement code uses tempnam(), which is dangerous.
Such a fringe feature is not worth the trouble.
2016-11-11 10:17:07 +01:00
Martin Storsjö
a67ae67083 arm: vp9: Add NEON itxfm routines
This work is sponsored by, and copyright, Google.

For the transforms up to 8x8, we can fit all the data (including
temporaries) in registers and just do a straightforward transform
of all the data. For 16x16, we do a transform of 4x16 pixels in
4 slices, using a temporary buffer. For 32x32, we transform 4x32
pixels at a time, in two steps of 4x16 pixels each.

Examples of relative speedup compared to the C version, from checkasm:
                         Cortex       A7     A8     A9    A53
vp9_inv_adst_adst_4x4_add_neon:     3.39   5.83   4.17   4.01
vp9_inv_adst_adst_8x8_add_neon:     3.79   4.86   4.23   3.98
vp9_inv_adst_adst_16x16_add_neon:   3.33   4.36   4.11   4.16
vp9_inv_dct_dct_4x4_add_neon:       4.06   6.16   4.59   4.46
vp9_inv_dct_dct_8x8_add_neon:       4.61   6.01   4.98   4.86
vp9_inv_dct_dct_16x16_add_neon:     3.35   3.44   3.36   3.79
vp9_inv_dct_dct_32x32_add_neon:     3.89   3.50   3.79   4.42
vp9_inv_wht_wht_4x4_add_neon:       3.22   5.13   3.53   3.77

Thus, the speedup vs C code is around 3-6x.

This is mostly marginally faster than the corresponding routines
in libvpx on most cores, tested with their 32x32 idct (compared to
vpx_idct32x32_1024_add_neon). These numbers are slightly in libvpx's
favour since their version doesn't clear the input buffer like ours
do (although the effect of that on the total runtime probably is
negligible.)

                           Cortex       A7       A8       A9      A53
vp9_inv_dct_dct_32x32_add_neon:    18436.8  16874.1  14235.1  11988.9
libvpx vpx_idct32x32_1024_add_neon 20789.0  13344.3  15049.9  13030.5

Only on the Cortex A8, the libvpx function is faster. On the other cores,
ours is slightly faster even though ours has got source block clearing
integrated.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-11 11:09:05 +02:00
Mark Thompson
fd0fae6037 pthread_frame: Unreference hw_frames_ctx on per-thread codec contexts
When decoding with threads enabled, the get_format callback will be
called with one of the per-thread codec contexts rather than with the
outer context.  If a hwaccel is in use too, this will add a reference
to the hardware frames context on that codec context, which will then
propagate to all of the other per-thread contexts for decoding.  Once
the decoder finishes, however, the per-thread contexts are not freed
normally, so these references leak.
2016-11-10 20:36:11 +00:00
Martin Storsjö
11623217e3 arm: vp9mc: Use a different helper register for PIC loads
This fixes crashes since 557c1675cf in linux PIC builds.

Previously, movrelx silently used r12 as helper register, which
doesn't work when r12 is the destination register.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 14:01:04 +02:00
Martin Storsjö
6a62795d40 aarch64: h264idct: Use the offset parameter to movrel
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:18:22 +02:00
Martin Storsjö
557c1675cf arm: vp9mc: Minor adjustments from review of the aarch64 version
This work is sponsored by, and copyright, Google.

The speedup for the large horizontal filters is surprisingly
big on A7 and A53, while there's a minor slowdown (almost within
measurement noise) on A8 and A9.

                            Cortex    A7        A8        A9       A53
orig:
vp9_put_8tap_smooth_64h_neon:    20270.0   14447.3   19723.9   10910.9
new:
vp9_put_8tap_smooth_64h_neon:    20165.8   14466.5   19730.2   10668.8

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:18:22 +02:00
Martin Storsjö
383d96aa22 aarch64: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; it is essentially a 1:1
port with no extra added features, but with some hand tuning
(especially for the plain copy/avg functions). The ARM version
isn't very register starved to begin with, so there's not much
to be gained from having more spare registers here - we only
avoid having to clobber callee-saved registers.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                     ARM   AArch64
vp9_avg4_neon:                      27.2      23.7
vp9_avg8_neon:                      56.5      54.7
vp9_avg16_neon:                    169.9     167.4
vp9_avg32_neon:                    585.8     585.2
vp9_avg64_neon:                   2460.3    2294.7
vp9_avg_8tap_smooth_4h_neon:       132.7     125.2
vp9_avg_8tap_smooth_4hv_neon:      478.8     442.0
vp9_avg_8tap_smooth_4v_neon:       126.0      93.7
vp9_avg_8tap_smooth_8h_neon:       241.7     234.2
vp9_avg_8tap_smooth_8hv_neon:      690.9     646.5
vp9_avg_8tap_smooth_8v_neon:       245.0     205.5
vp9_avg_8tap_smooth_64h_neon:    11273.2   11280.1
vp9_avg_8tap_smooth_64hv_neon:   22980.6   22184.1
vp9_avg_8tap_smooth_64v_neon:    11549.7   10781.1
vp9_put4_neon:                      18.0      17.2
vp9_put8_neon:                      40.2      37.7
vp9_put16_neon:                     97.4      99.5
vp9_put32_neon/armv8:              346.0     307.4
vp9_put64_neon/armv8:             1319.0    1107.5
vp9_put_8tap_smooth_4h_neon:       126.7     118.2
vp9_put_8tap_smooth_4hv_neon:      465.7     434.0
vp9_put_8tap_smooth_4v_neon:       113.0      86.5
vp9_put_8tap_smooth_8h_neon:       229.7     221.6
vp9_put_8tap_smooth_8hv_neon:      658.9     621.3
vp9_put_8tap_smooth_8v_neon:       215.0     187.5
vp9_put_8tap_smooth_64h_neon:    10636.7   10627.8
vp9_put_8tap_smooth_64hv_neon:   21076.8   21026.9
vp9_put_8tap_smooth_64v_neon:     9635.0    9632.4

These are generally about as fast as the corresponding ARM
routines on the same CPU (at least on the A53), in most cases
marginally faster.

The speedup vs C code is pretty much the same as for the 32 bit
case; on the A53 it's around 6-13x for ther larger 8tap filters.
The exact speedup varies a little, since the C versions generally
don't end up exactly as slow/fast as on 32 bit.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:15:56 +02:00
Martin Storsjö
a4cfcddcb0 vp9: Make the subpel filters non-static
Make them aligned, to allow efficient access to them from simd.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-10 11:05:57 +02:00
Anton Khirnov
84f225684c pthread_frame: properly propagate the hw frame context across frame threads 2016-11-10 09:00:11 +01:00
Diego Biurrun
72a19f4013 mpegaudiodsp: aarch64: Adjust function prototype after 2caa93b813 2016-11-10 00:13:48 +01:00
Diego Biurrun
67deba8a41 Use avpriv_report_missing_feature() where appropriate 2016-11-08 17:54:34 +01:00
Vittorio Giovara
47a795727f hevc: Support extradata changes from multiple stsd
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-08 11:22:29 -05:00
Vittorio Giovara
2fe30b4743 hevc: Allow parsing external extradata buffers 2016-11-08 11:22:29 -05:00
Vittorio Giovara
5be2153111 hevc: Move hevc_decode_extradata before frame decoding
Avoids a forward-declaration in the following commit.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-08 11:22:29 -05:00
Vittorio Giovara
bed2c4b265 lavc: Add hevc main10 profile to avconv cli 2016-11-08 11:22:29 -05:00
Vittorio Giovara
17dac56b8f lavu: Rename ycgco color space appropriately
Planes are ordered as the name suggests now.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-11-08 11:22:29 -05:00
Diego Biurrun
0361e4dcb4 h264_qpel: x86: Move function with only one instance out of template macro
libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]
2016-11-08 17:21:02 +01:00
Andreas Cadhalpun
43de8b328b lzf: update pointer p after realloc
This fixes heap-use-after-free detected by AddressSanitizer.

Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-11-07 22:42:00 +01:00
Anton Khirnov
4ab61cd983 qsv{enc,dec}: extend the internal frame allocator
Handle the internal frame requests, which is required by the HEVC
encoding plugin.

Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
2016-11-07 12:48:00 +01:00
Anton Khirnov
00aeedd841 qsv{dec,enc}: use a struct as a memory id with internal memory allocator
This will allow implementing the allocator more fully, which is needed
by the HEVC encoder plugin with video memory input.

Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
2016-11-07 12:47:54 +01:00
Anton Khirnov
404e51478e qsv{dec,enc}: always use an internal mfxFrameSurface1
For encoding, this avoids modifying the input surface, which we are not
allowed to do.
This will also be useful in the following commits.

Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
2016-11-07 12:47:46 +01:00
Hendrik Leppkes
fabfbfe571 dxva2: fix surface selection when compiled with both d3d11va and dxva2
Fixes a regression introduced in
be630b1e08

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-11-07 10:05:12 +01:00
Derek Buitenhuis
db0b3dccb3 libx265: Add option to force IDR frames
This is in the same the same vein as 380146924e.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-07 10:16:10 +02:00
Diego Biurrun
3cba09e522 x86: Drop stray semicolons after function definitions
libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
2016-11-05 12:41:45 +01:00
Martin Storsjö
392caa65df arm: vp9mc: Insert a literal pool at the middle of the file
This fixes errors like this when building non-pic binaries with armv6
as baseline:

Error: invalid literal constant: pool needs to be closer

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-04 21:37:53 +02:00
Diego Biurrun
67351924fa Drop unreachable break and return statements 2016-11-03 20:17:12 +01:00
Diego Biurrun
6354957a95 dnxhdenc: Have function pointer prototype match implementation
libavcodec/dnxhdenc.c(326) : warning C4028: formal parameter 1 different from declaration
libavcodec/dnxhdenc.c(329) : warning C4028: formal parameter 1 different from declaration
2016-11-03 17:43:55 +01:00
Diego Biurrun
c778eb15b8 pixblockdsp: Have function pointer prototype match implementation
libavcodec/pixblockdsp.c(58) : warning C4028: formal parameter 1 different from declaration
libavcodec/pixblockdsp.c(63) : warning C4028: formal parameter 1 different from declaration
libavcodec/pixblockdsp.c(66) : warning C4028: formal parameter 1 different from declaration
2016-11-03 17:43:55 +01:00
Diego Biurrun
99ddeddc7f ituh263dec: Have function signature match across declaration and definition
libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 1 different from declaration
libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 2 different from declaration
2016-11-03 17:43:55 +01:00
Diego Biurrun
13fcdfb976 svq3: Drop unused function dctcoef_get()
libavcodec/svq3.c:627:29: warning: unused function 'dctcoef_get' [-Wunused-function]
2016-11-03 15:52:12 +01:00
Diego Biurrun
ee59f05408 intrax8: Have function signature match across declaration and definition
libavcodec/intrax8.c(776) : warning C4028: formal parameter 1 different from declaration
2016-11-03 15:50:48 +01:00
Martin Storsjö
1a469a5e42 options_table: Remove a now unnecessary include of config.h
The include of config.h was added in 2012 in 1d9c2dc8, due to
the use of CONFIG_SNOW_ENCODER ifdefs within options_table.h.
When the snow codec was dropped later (in a0c5917f8 in 2013),
this include no longer served any purpose.

options_table.h is included in builds for the host as well, when
building documentation. config.h should not be included in code
that is built for the host, since it can contain workarounds
for the target compiler/environment, like adding a missing define
of restrict, defining getenv(x) to NULL for environments that lack
getenv.

The seemingly innocent include reordering in 2025d37871 broke
builds that have getenv(x) defined to NULL in config.h (Windows CE
and Windows Phone/RT), since libavcodec/options_table.h include
config.h, while libavformat/options_table.h end up bringing in
more system headers, and those system headers can contain a proper
definition of getenv, which clash with the getenv define in config.h.
This was avoided earlier as long as libavformat/options_table.h (or
avformat.h) was included before libavcodec/options_table.h.

This fixes builds for Windows Phone/RT and CE.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 11:25:50 +02:00
Martin Storsjö
ffbd1d2b00 arm: vp9: Add NEON optimizations of VP9 MC functions
This work is sponsored by, and copyright, Google.

The filter coefficients are signed values, where the product of the
multiplication with one individual filter coefficient doesn't
overflow a 16 bit signed value (the largest filter coefficient is
127). But when the products are accumulated, the resulting sum can
overflow the 16 bit signed range. Instead of accumulating in 32 bit,
we accumulate the largest product (either index 3 or 4) last with a
saturated addition.

(The VP8 MC asm does something similar, but slightly simpler, by
accumulating each half of the filter separately. In the VP9 MC
filters, each half of the filter can also overflow though, so the
largest component has to be handled individually.)

Examples of relative speedup compared to the C version, from checkasm:
                       Cortex      A7     A8     A9    A53
vp9_avg4_neon:                   1.71   1.15   1.42   1.49
vp9_avg8_neon:                   2.51   3.63   3.14   2.58
vp9_avg16_neon:                  2.95   6.76   3.01   2.84
vp9_avg32_neon:                  3.29   6.64   2.85   3.00
vp9_avg64_neon:                  3.47   6.67   3.14   2.80
vp9_avg_8tap_smooth_4h_neon:     3.22   4.73   2.76   4.67
vp9_avg_8tap_smooth_4hv_neon:    3.67   4.76   3.28   4.71
vp9_avg_8tap_smooth_4v_neon:     5.52   7.60   4.60   6.31
vp9_avg_8tap_smooth_8h_neon:     6.22   9.04   5.12   9.32
vp9_avg_8tap_smooth_8hv_neon:    6.38   8.21   5.72   8.17
vp9_avg_8tap_smooth_8v_neon:     9.22  12.66   8.15  11.10
vp9_avg_8tap_smooth_64h_neon:    7.02  10.23   5.54  11.58
vp9_avg_8tap_smooth_64hv_neon:   6.76   9.46   5.93   9.40
vp9_avg_8tap_smooth_64v_neon:   10.76  14.13   9.46  13.37
vp9_put4_neon:                   1.11   1.47   1.00   1.21
vp9_put8_neon:                   1.23   2.17   1.94   1.48
vp9_put16_neon:                  1.63   4.02   1.73   1.97
vp9_put32_neon:                  1.56   4.92   2.00   1.96
vp9_put64_neon:                  2.10   5.28   2.03   2.35
vp9_put_8tap_smooth_4h_neon:     3.11   4.35   2.63   4.35
vp9_put_8tap_smooth_4hv_neon:    3.67   4.69   3.25   4.71
vp9_put_8tap_smooth_4v_neon:     5.45   7.27   4.49   6.52
vp9_put_8tap_smooth_8h_neon:     5.97   8.18   4.81   8.56
vp9_put_8tap_smooth_8hv_neon:    6.39   7.90   5.64   8.15
vp9_put_8tap_smooth_8v_neon:     9.03  11.84   8.07  11.51
vp9_put_8tap_smooth_64h_neon:    6.78   9.48   4.88  10.89
vp9_put_8tap_smooth_64hv_neon:   6.99   8.87   5.94   9.56
vp9_put_8tap_smooth_64v_neon:   10.69  13.30   9.43  14.34

For the larger 8tap filters, the speedup vs C code is around 5-14x.

This is significantly faster than libvpx's implementation of the same
functions, at least when comparing the put_8tap_smooth_64 functions
(compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from
libvpx).

Absolute runtimes from checkasm:
                          Cortex      A7        A8        A9       A53
vp9_put_8tap_smooth_64h_neon:    20150.3   14489.4   19733.6   10863.7
libvpx vpx_convolve8_horiz_neon: 52623.3   19736.4   21907.7   25027.7

vp9_put_8tap_smooth_64v_neon:    14455.0   12303.9   13746.4    9628.9
libvpx vpx_convolve8_vert_neon:  42090.0   17706.2   17659.9   16941.2

Thus, on the A9, the horizontal filter is only marginally faster than
libvpx, while our version is significantly faster on the other cores,
and the vertical filter is significantly faster on all cores. The
difference is especially large on the A7.

The libvpx implementation does the accumulation in 32 bit, which
probably explains most of the differences.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:35:38 +02:00
Martin Storsjö
2e55e26b40 vp9: Flip the order of arguments in MC functions
This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Diego Biurrun
baab87c4f3 bink: Have function pointer prototype match implementation
libavcodec/binkdsp.c(156) : warning C4028: formal parameter 1 different from declaration
2016-11-02 10:33:39 +01:00
Diego Biurrun
4cf2ffb7c4 idct: Have function pointer prototype match implementation
libavcodec/idctdsp.c(175) : warning C4028: formal parameter 2 different from declaration
2016-11-02 10:33:39 +01:00
Diego Biurrun
39cea6570c aactab: Move extern keyword to the front of array declarations
libavcodec/aactab.h:49:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration]
2016-11-02 10:33:36 +01:00
Luca Barbato
801ac7156d qsv: Be informative when reporting that no data has been consumed 2016-10-30 21:55:03 +01:00
Diego Biurrun
30015305f3 Use avpriv_request_sample() where appropriate 2016-10-29 18:32:21 +02:00
Diego Biurrun
3ec6f855d0 srt: Adjust signedness of sscanf format strings
Fixes several warnings from -Wformat.
2016-10-28 13:28:36 +02:00
Diego Biurrun
7a2b2b6a92 dxtory: Drop nonsense ISO C printf conversion specifiers for standard types 2016-10-28 13:24:55 +02:00
Diego Biurrun
c454dfcff9 Use ISO C printf conversion specifiers where appropriate 2016-10-28 13:24:44 +02:00
Diego Biurrun
fbe425c8d2 hap: Adjust printf length modifiers to match variable types
libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t {aka unsigned int}’ [-Wformat=]
libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘size_t {aka unsigned int}’ [-Wformat=]
2016-10-28 11:22:22 +02:00
Diego Biurrun
1263b2039e Adjust printf conversion specifiers to match variable signedness 2016-10-28 11:22:21 +02:00
Diego Biurrun
47756f51fe dnxhdenc: Drop pointless, commented-out debug output 2016-10-27 12:21:46 +02:00
Diego Biurrun
0574780d7a h264_loopfilter: Do not print value of uninitialized variable
libavcodec/h264_loopfilter.c:531:111: warning: variable 'edge' is uninitialized when used here [-Wuninitialized]
2016-10-27 12:21:46 +02:00
Diego Biurrun
2555269985 mpegaudio: Do not print value of uninitialized variable
libavcodec/mpegaudiodec_template.c:885:97: warning: variable 'x' is uninitialized when used here [-Wuninitialized]
2016-10-27 12:21:46 +02:00
Mark Thompson
0aec37e625 vaapi_decode: Remove vestigial unmap code
The buffer map/unmap code was in an early version of this before it
was committed, but the unmap was never removed.  While wrong, this
was harmless (and therefore unnoticed) because the buffers can't be
mapped at this point - all drivers just did nothing with the call.
2016-10-24 20:17:47 +01:00
Mark Thompson
5e879b54a3 vaapi_decode: Clear parameter buffers to fix picture reuse
When decoding interlaced pictures, the structure is reused to render
to the same surface twice.  The parameter buffers were not being
cleared, which caused the i965 driver to error out.
2016-10-24 20:17:47 +01:00
Gwenole Beauchesne
754b20d7eb vaapi_h264: fix RefPicList[] field flags.
Use new H264Ref.reference field to track field picture flags. The
H264Picture.reference flag in DPB is now irrelevant here.

This is a regression from git commit a12d3188, and that affected
multiple interlaced video streams.

Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
2016-10-24 20:17:47 +01:00
Pierre Edouard Lepere
6d5636ad9a hevc: x86: Add add_residual() SIMD optimizations
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.

Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-10-22 17:33:35 +02:00
Vittorio Giovara
0d9b9bd37f lavu: Add JEDEC P22 color primaries 2016-10-21 11:46:21 -04:00
Anton Khirnov
59c90097a0 hevc: factor out a repeated condition 2016-10-21 10:11:20 +02:00
Anton Khirnov
0bfdcce4d4 hevc: move the SliceType enum to hevc.h
Those values are decoder-independent and are also use by the VA-API
encoder.
2016-10-21 10:11:20 +02:00
Diego Biurrun
788544ff0e audiodsp: x86: Remove pointless header file
Its single forward declaration can be moved to the only place
it is used, like is done for all other dsp init files.
2016-10-19 15:20:41 +02:00
Diego Biurrun
b89804da9b x86: videodsp: Add parentheses to expression to work around warning
libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds
2016-10-19 10:13:34 +02:00
Diego Biurrun
58224dc5f3 ppc: avcodec: Drop silly "_ppc" suffixes from files in ppc subdirectories 2016-10-18 00:10:36 +02:00
Mark Thompson
0cf86fabfa vaapi_encode: Write sequence header as extradata
Only works if packed headers are supported, where we can know the
output before generating the first frame.
2016-10-17 21:07:25 +01:00
Mark Thompson
f9bb356e0e vaapi_h265: Include header for slice types
The include was changed correctly in 4abe3b049d
but then mistakenly changed back by c359d624d3
(it's not just the NAL unit types which are used).
2016-10-17 20:53:28 +01:00
Diego Biurrun
6be7944ee2 x86: Add missing colons after assembly labels
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00
Anton Khirnov
89b35a139e lavc: add a bitstream filter for extracting extradata from packets
This is intended as a replacement for the 'split' function exported by
some parsers.
2016-10-16 20:27:16 +02:00
Anton Khirnov
f6e2f8a9ff hevcdec: move parameter set parsing into a separate header
This code is independent from the decoder, so it makes more sense for it
to to have its own header.
2016-10-16 20:26:47 +02:00
Anton Khirnov
150c896a9e hevcdec: split ff_hevc_diag_scan* declarations into a separate header
This will be useful in the following commits.
2016-10-16 20:26:40 +02:00
Anton Khirnov
645c6ff423 hevcdec: drop the prototype of a non-existing function 2016-10-16 20:26:35 +02:00
Anton Khirnov
c359d624d3 hevcdec: move decoder-independent declarations into a separate header
This way they can be reused by other code without including the whole
decoder-specific hevcdec.h
Also, add the HEVC_ prefix to them, since similarly named values exist
for H.264 as well and are sometimes used in the same code.
2016-10-16 20:26:28 +02:00
Anton Khirnov
4abe3b049d hevc: rename hevc.[ch] to hevcdec.[ch]
This is more consistent with the rest of libav and frees up the hevc.h
name for decoder-independent shared declarations.
2016-10-16 20:26:17 +02:00
Kieran Kunhya
81f1f6c3f6 Add GBRAP12 pixel format support
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-12 21:33:34 +02:00
Vittorio Giovara
14e7e19a90 lavc: bsf: Document input/output codecparam alloc/init process 2016-10-12 11:06:58 -04:00
Alexandra Hájková
112cee0241 hevc: Add SSE2 and AVX IDCT
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-11 18:21:04 +02:00
Martin Storsjö
9b2ccafb48 aarch64: Add missing sign extension in ff_h264_idct8_add_neon
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-10 14:57:53 +03:00
Yogender Gupta
cbd84b8a51 nvenc: Fix error log
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-10-09 20:58:10 +02:00
Yogender Gupta
da2848375a nvenc: Force high_444 profile for 444 input
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-10-07 10:41:38 +02:00
Anton Khirnov
e4128c08d7 Revert "hevc: x86: Refactor IDCT macro declarations"
This reverts commit d9dccc0389. There were
outstanding objections to this commit.
2016-10-06 15:24:04 +02:00
Diego Biurrun
5801f9ed24 h264_intrapred: x86: Update comments left behind in 95c89da36e 2016-10-06 12:32:34 +02:00
Diego Biurrun
d9dccc0389 hevc: x86: Refactor IDCT macro declarations 2016-10-06 12:32:34 +02:00
Steve Lhomme
be630b1e08 d3d11va: Use the proper decoding slice index
The decoding buffer index expected by D3D11VA is the one from the
ID3D11Texture2D not the one from the ID3D11VideoDecoderOutputView array
in AVD3D11VAContext.

Otherwise, when providing decoder slices that do not start from 0,
pictures appear in bogus order. For an invalid index crashes and
image corruption can occur.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-10-05 18:37:27 +02:00
Ronald S. Bultje
715f139c9b vp9lpf/x86: make filter_16_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
8915320db9 vp9lpf/x86: make filter_48/84/88_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
725a216481 vp9lpf/x86: make filter_44_h work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
5bfa96c4b3 vp9lpf/x86: make filter_16_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:09 +02:00
Ronald S. Bultje
b905e8d2fe vp9lpf/x86: make filter_48/84_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
37637e6590 vp9lpf/x86: make filter_88_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
be10834bd9 vp9lpf/x86: make filter_44_v work on 32-bit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
7c62891efe vp9lpf/x86: save one register in SIGN_ADD/SUB.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
c6375a83d1 vp9lpf/x86: store unpacked intermediates for filter6/14 on stack.
filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88
goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
4ce8ba72f9 vp9lpf/x86: move variable assigned inside macro branch.
The value is not used outside the branch.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
e4961035b2 vp9lpf/x86: simplify ABSSUM_CMP by inverting the comparison meaning.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
683da2788e vp9lpf/x86: remove unused register from ABSSUB_CMP macro.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6e74e9636b vp9lpf/x86: slightly simplify 44/48/84/88 h stores.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
6411c328a2 vp9lpf/x86: make cglobal statement more conservative in register allocation.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
a6e288d624 vp9lpf/x86: save one register in loopfilter surface coverage.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
0ed21bdc9e vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
f2e3d706a1 vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}().
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
92d47550ea vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16
Similar gains as the ssse3 version once again

Additional improvements by Clément Bœsch <u@pkh.me>.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
6bea478158 vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
James Almer
1f451eed60 vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2().
Similar gains in performance as the SSSE3 version

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Clément Bœsch
a692724c58 vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:08 +02:00
Ronald S. Bultje
a451324ddd vp9: ignore reference segmentation map if error_resilience flag is set.
Fixes ffvp9_fails_where_libvpx.succeeds.webm.

Bug-Id: ffmpeg/3849.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:07 +02:00
Carl Eugen Hoyos
c19830aa2c rscc: Support palette format
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2016-10-02 15:42:03 -04:00
Vittorio Giovara
b8d5070db6 avcodec: Document AV_PKT_DATA_PALETTE side data type 2016-10-02 15:42:03 -04:00
Mark Thompson
5a5df90d9c vaapi_h265: Add main 10 encode support 2016-10-02 20:23:18 +01:00
Mark Thompson
b8cac1e830 vaapi_h265: Fix buffering parameters
A decoder may need this to be set correctly to output frames in the
right order.
2016-10-02 20:23:18 +01:00
Mark Thompson
fc30a90898 vaapi_h265: Fix slice header writing
This was not observed earlier because the only syntax element which
it normally misses with the current setup is slice_qp_delta, but that
is always going to be zero (in IDR frames QP isn't varied on the
slice) which will always exp-golomb code as a single 1 bit.  The
immediately following part is the byte alignment, which is always a 1
bit followed by 0s which are ignored, so as long as the bitstream is
never aligned at that point we will never notice because the only
difference is that an ignored bit is a 1 instead of a 0.
2016-10-02 20:23:18 +01:00
Mark Thompson
ec17ab381e vaapi_h264: Write bitstream restriction fields 2016-10-02 20:23:18 +01:00
Mark Thompson
17a0f9481c vaapi_h264: Fix CFR mode with frame_rate set in AVCodecContext 2016-10-02 20:23:18 +01:00
Mark Thompson
314b421dd8 vaapi_encode: Decide on GOP setup before initialising sequence parameters
This was always too late; several fields related to it have been incorrectly
zero since the encoder was added.
2016-10-02 20:23:18 +01:00
Anton Khirnov
59c7022740 pthread_frame: use atomics for frame progress 2016-10-02 19:35:46 +02:00
Anton Khirnov
64a31b2854 pthread_frame: use atomics for PerThreadContext.state 2016-10-02 19:35:34 +02:00
Anton Khirnov
db2733256d pthread_frame: use a thread-safe way for signalling threads to die
Current code uses a plain int in a racy way, which is UB.
2016-10-02 19:35:23 +02:00
Anton Khirnov
8385ba53f1 mmaldec: convert to stdatomic 2016-10-02 19:35:12 +02:00
Luca Barbato
b015872c0d huffyuvdsp: Enable the altivec code for PPC little-endian as well
Confirmed to work by checkasm.
2016-10-02 17:13:36 +02:00