1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-24 13:56:33 +02:00

44614 Commits

Author SHA1 Message Date
Martin Storsjö
de06bdfe6c arm: vp9itxfm: Reorder the idct coefficients for better pairing
All elements are used pairwise, except for the first one.
Previously, the 16th element was unused. Move the unused element
to the second slot, to make the later element pairs not split
across registers.

This simplifies loading only parts of the coefficients,
reducing the difference to the 16 bpp version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:04:31 +02:00
Martin Storsjö
65aa002d54 aarch64: vp9itxfm: Avoid reloading the idct32 coefficients
The idct32x32 function actually pushed d8-d15 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

After this, we still can skip pushing d12-d15.

Before:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3
After:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:03:44 +02:00
Martin Storsjö
402546a172 arm: vp9itxfm: Avoid reloading the idct32 coefficients
The idct32x32 function actually pushed q4-q7 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

Since the idct16 core transform avoids clobbering q4-q7 (but clobbers
q2-q3 instead, to avoid needing to back up and restore q4-q7 at all
in the idct16 function), and the lanewise vmul needs a register in
the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5
while doing idct16.

While keeping these coefficients in registers, we still can skip pushing
q7.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_32x32_sub32_add_neon:  18553.8  17182.7  14303.3  12089.7
After:
vp9_inv_dct_dct_32x32_sub32_add_neon:  18470.3  16717.7  14173.6  11860.8

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:03:43 +02:00
Martin Storsjö
575e31e931 arm: vp9lpf: Implement the mix2_44 function with one single filter pass
For this case, with 8 inputs but only changing 4 of them, we can fit
all 16 input pixels into a q register, and still have enough temporary
registers for doing the loop filter.

The wd=8 filters would require too many temporary registers for
processing all 16 pixels at once though.

Before:                          Cortex A7      A8     A9     A53
vp9_loop_filter_mix2_v_44_16_neon:   289.7   256.2  237.5   181.2
After:
vp9_loop_filter_mix2_v_44_16_neon:   221.2   150.5  177.7   138.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:03:09 +02:00
Martin Storsjö
3bf9c48320 aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1
This is one cycle faster in total, and three instructions fewer.

Before:
vp9_loop_filter_mix2_v_44_16_neon: 123.2
After:
vp9_loop_filter_mix2_v_44_16_neon: 122.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:03:00 +02:00
Martin Storsjö
c582cb8537 arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit
The theoretical maximum value of E is 193, so we can just
saturate the addition to 255.

Before:                     Cortex A7      A8      A9     A53  A53/AArch64
vp9_loop_filter_v_4_8_neon:     143.0   127.7   114.8    88.0         87.7
vp9_loop_filter_v_8_8_neon:     241.0   197.2   173.7   140.0        136.7
vp9_loop_filter_v_16_8_neon:    497.0   419.5   379.7   293.0        275.7
vp9_loop_filter_v_16_16_neon:   965.2   818.7   731.4   579.0        452.0
After:
vp9_loop_filter_v_4_8_neon:     136.0   125.7   112.6    84.0         83.0
vp9_loop_filter_v_8_8_neon:     234.0   195.5   171.5   136.0        133.7
vp9_loop_filter_v_16_8_neon:    490.0   417.5   377.7   289.0        271.0
vp9_loop_filter_v_16_16_neon:   951.2   814.7   732.3   571.0        446.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-24 00:02:36 +02:00
Diego Biurrun
ed6a891c36 Place attribute_deprecated in the right position for struct declarations
libavcodec/vaapi.h:58:1: warning: attribute 'deprecated' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
2017-02-23 12:23:20 +01:00
Luca Barbato
04d2afa93b mkv: Update the seek test to match 5d3953a5dc 2017-02-22 10:15:00 +01:00
John Stebbins
fec3456ce1 fate: Update fate-lavf-mkv after commit 5d3953a5dc 2017-02-21 21:04:25 -07:00
Mark Thompson
156bc0193b fate: Add webp alpha test 2017-02-21 23:19:08 +00:00
John Stebbins
5d3953a5dc matroskaenc: factor ts_offset into block timecode computation
ts_offset was added to cluster timecode, but then effectively subtracted
back off the block timecode

When setting initial_padding for an audio stream, the timestamps are
written incorrectly to the mkv file.  cluster timecode gets written
as pts0 + ts_offset which is correct, but then block timecode gets
written as pts - cluster timecode which expanded is
pts - (pts0 + ts_offset).  Adding cluster and block tc back together:
cluster + block = (pts0 + ts_offset) + (pts - (pts0 + ts_offset)) = pts
But the result should be pts + ts_offset since demux will subtract the
CodecDelay element from pts and set initial_padding to CodecDelay.
This patch gives the correct result.
2017-02-21 14:20:31 -07:00
Diego Biurrun
c95169f0ec build: Move cli tool sources to a separate subdirectory
This unclutters the top-level directory and groups related files together.
2017-02-21 16:10:51 +01:00
Diego Biurrun
ab566cc96b build: Separate logic for building examples from that for building avtools 2017-02-21 16:10:51 +01:00
Diego Biurrun
acb0dea27e build: Split logic for building examples off into a separate Makefile 2017-02-21 16:10:51 +01:00
Diego Biurrun
db4903eb48 build: Avoid duplication in examples lists 2017-02-21 16:10:51 +01:00
Diego Biurrun
533339bdcc build: Drop leftover reference to old EXAMPLES logic 2017-02-21 16:10:51 +01:00
Diego Biurrun
7208e5b5d6 configure: Restructure the way check_pkg_config() operates
Have check_pkg_config() enable variables and set cflags and extralibs
instead of relegating that task to require_pkg_config. This simplifies
require_pkg_config(), is consistent with what other helper functions
like check_lib() do and allows getting rid of some manual variable
setting in places where check_pkg_config() is used.
2017-02-20 20:16:05 +01:00
Diego Biurrun
54e39b102e configure: Explicitly spell out first require_pkg_config() parameter
This is less confusing than encountering "" in the argument list.
2017-02-20 20:16:05 +01:00
Diego Biurrun
00b160af11 nvenc: Fix nvec vs. nvenc typo 2017-02-20 09:50:03 +01:00
John Stebbins
42cf7f91f1 dv: Don't return EIO upon EOF 2017-02-19 21:03:09 -07:00
Mark Thompson
7cb9296db8 webp: Fix alpha decoding
This was broken by 4e528206bc4d968706401206cf54471739250ec7 - the webp
decoder was assuming that it could set the output pixfmt of the vp8
decoder directly, but after that change it no longer could because
ff_get_format() was used instead.  This adds an internal get_format()
callback to webp use of the vp8 decoder to override the pixfmt
appropriately.
2017-02-18 19:53:20 +00:00
Mark Thompson
2d518aec4c vf_deinterlace_vaapi: Create filter buffer after context
The Intel proprietary VAAPI driver enforces the restriction that a
buffer must be created inside an existing context, so just ensure
this is always true.
2017-02-17 23:20:39 +00:00
Mark Thompson
17aeee5832 vaapi_encode: Discard output buffer if picture submission fails
Previously this was leaking, though it actually hit an assert making
sure that the buffer had already been cleared when freeing the picture.
2017-02-16 20:58:42 +00:00
Martin Storsjö
8f5de34c8f vf_fade: Make sure to not miss the last lines of a frame
When slice_h is rounded up due to chroma subsampling, there's
a risk that jobnr * slice_h exceeds frame->height.

Prior to a638e9184d63, this wasn't an issue for the last slice
of a frame, since slice_end was set to frame->height for the last
slice.

a638e9184d63 tried to fix the case where other slices than the
last one would exceed frame->height (which can happen where the
number of slices/threads is very large compared to the frame
height).

However, the fix in a638e9184d63 instead broke other cases,
where slice_h * nb_threads < frame->height. Therefore, make
sure the last slice always ends at frame->height.

CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-16 12:58:50 +02:00
Diego Biurrun
d00a0d8e84 configure: Handle SDL version check through pkg-config 2017-02-16 09:28:33 +01:00
Martin Storsjö
8847eeaa14 aarch64: Add parentheses around the offset parameter in movrel
This fixes building with clang for linux with PIC enabled.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-16 09:56:11 +02:00
Mark Thompson
82989bd98c avconv: Move rescale to stream timebase before monotonisation
If the stream timebase is coarser than the muxing timebase then the
monotonisation process may fail because adding one to the timestamp
need not actually produce a different timestamp after the rescale.
2017-02-15 21:31:15 +00:00
Martin Storsjö
030de53e9c libopenh264dec: Let the framework use the h264_mp4toannexb bitstream filter
This avoids a lot of boilerplate code within the decoder wrapper itself.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-15 23:05:58 +02:00
Alexandra Hájková
0539d84d98 asfdec: Account for different Format Data sizes
Some muxers may use the BMP_HEADER Format Data size instead
of the ASF-specific one.

Bug-Id: 1020
CC: libav-stable@libav.org

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2017-02-15 11:21:11 +01:00
Diego Biurrun
871b4f3654 configure: Check for xcb as well as xcb-shape before enabling libxcb
Newer versions of libxcb have xcb-foo pkg-config files that do not declare
their xcb dependency so that required linker flags will not be generated.
2017-02-15 10:33:34 +01:00
Luca Barbato
b446f0e98f mov: Do not try to parse multiple stsd for the same track
Bug-Id: 1017
CC: libav-stable@libav.org

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-02-15 00:40:00 +01:00
Mark Thompson
e791b915c7 hwcontext_vaapi: Try to support the VDPAU wrapper
The driver is somewhat bitrotten (not updated for years) but is still
usable for decoding with this change.  To support it, this adds a new
driver quirk to indicate no support at all for surface attributes.

Based on a patch by wm4 <nfxjfg@googlemail.com>.
2017-02-13 21:44:47 +00:00
Mark Thompson
5dd9a4b88b vaapi: Implement device-only setup
In this case, the user only supplies a device and the frame context
is allocated internally by lavc.
2017-02-13 21:44:43 +00:00
Mark Thompson
44f2eda39f lavc: Add device context field to AVCodecContext
For use by codec implementations which can allocate frames internally.
2017-02-13 20:14:27 +00:00
Martin Storsjö
07b5136c48 aarch64: vp9lpf: Fix broken indentation/vertical alignment
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-12 21:57:23 +02:00
Martin Storsjö
b0806088d3 aarch64: vp9lpf: Interleave the start of flat8in into the calculation above
This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 22:54:18 +02:00
Martin Storsjö
e18c39005a arm: vp9lpf: Interleave the start of flat8in into the calculation above
This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 22:54:01 +02:00
Luca Barbato
9c2d36fcaf dv: Convert to the new bitstream reader 2017-02-11 20:29:44 +01:00
Luca Barbato
ba30b74686 aac: Validate the sbr sample rate before using the value
Avoid a floating point exception.

Bug-Id: 1027
CC: libav-stable@libav.org
2017-02-11 20:23:11 +01:00
Luca Barbato
0ee78020cd configure: Move up the avbuild directory creation
The early check for inconsistent in-source vs out-of-source build
cannot generate a config.log otherwise.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-02-11 15:47:43 +01:00
wm4
c2f97f0508 hwcontext_dxva2: support D3D9Ex
D3D9Ex uses different driver paths. This helps with "headless"
configurations when no user logs in. Plain D3D9 device creation will
fail if no user is logged in, while it works with D3D9Ex.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-02-11 11:37:45 +01:00
wm4
04f3bd3496 AVFrame: add an opaque_ref field
This is an extended version of the AVFrame.opaque field, which can be
used to attach arbitrary user information to an AVFrame.

The usefulness of the opaque field is rather limited, because it can
store only up to 32 bits of information (or 64 bit on 64 bit systems).
It's not possible to set this field to a memory allocation, because
there is no way to deallocate it correctly.

The opaque_ref field circumvents this by letting the user set an
AVBuffer, which makes the user data refcounted.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-02-11 11:37:45 +01:00
Anton Khirnov
4de220d2e3 frame: allow align=0 (meaning automatic) for av_frame_get_buffer()
This will avoid every caller from hardcoding some specific alignment,
which may break in the future with new instruction sets.
2017-02-11 11:37:45 +01:00
Anton Khirnov
f44ec22e09 lavc: use av_cpu_max_align() instead of hardcoding alignment requirements 2017-02-11 11:37:45 +01:00
Anton Khirnov
e6bff23f1e cpu: add a function for querying maximum required data alignment 2017-02-11 11:37:45 +01:00
Anton Khirnov
5c8a5765dc scale_npp: explicitly set the output frames context for passthrough mode
This is no longer done automatically for filters marked as
hwframe-aware.
2017-02-11 11:37:45 +01:00
Anton Khirnov
6f554521af Use the new AVIOContext destructor. 2017-02-11 11:37:45 +01:00
Anton Khirnov
99684f3ae7 avio: add a destructor for AVIOContext
Before this commit, AVIOContext is to be freed with a plain av_free(),
which prevents us from adding any deeper structure to it.
2017-02-11 11:37:44 +01:00
Martin Storsjö
435cd7bc99 arm: vp9lpf: Use orrs instead of orr+cmp
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 00:44:04 +02:00
Martin Storsjö
e1f9de86f4 arm/aarch64: vp9lpf: Calculate !hev directly
Previously we first calculated hev, and then negated it.

Since we were able to schedule the negation in the middle
of another calculation, we don't see any gain in all cases.

Before:                     Cortex A7      A8      A9     A53  A53/AArch64
vp9_loop_filter_v_4_8_neon:     147.0   129.0   115.8    89.0         88.7
vp9_loop_filter_v_8_8_neon:     242.0   198.5   174.7   140.0        136.7
vp9_loop_filter_v_16_8_neon:    500.0   419.5   382.7   293.0        275.7
vp9_loop_filter_v_16_16_neon:   971.2   825.5   731.5   579.0        453.0
After:
vp9_loop_filter_v_4_8_neon:     143.0   127.7   114.8    88.0         87.7
vp9_loop_filter_v_8_8_neon:     241.0   197.2   173.7   140.0        136.7
vp9_loop_filter_v_16_8_neon:    497.0   419.5   379.7   293.0        275.7
vp9_loop_filter_v_16_16_neon:   965.2   818.7   731.4   579.0        452.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-11 00:43:59 +02:00