The loops are guaranteed to be at least multiples of 8, so this
unrolling is safe but allows exploiting execution ports.
For int32 version: 68 -> 58c.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* commit '5e1a3ea3ba7bb0c71d931e93e60fb75f51b0cc1a':
lavc: move the vaapi encoders further down in the list of codecs
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
These are all trivial to merge.
* commit '92fdea37477b5a2d1329e5ef0773e24473fa8f12':
vaapi_h265: Add -qp option, use it to replace use of -global_quality
vaapi_h265: Add constant-bitrate encode support
vaapi_h264: Add encode quality option (for quality-speed tradeoff)
vaapi_h264: Add -qp option, use it to replace use of -global_quality
vaapi_encode: Add support for codec-local options
vaapi_h264: Add constant-bitrate encode support
vaapi_encode: Refactor slightly to allow easier setting of global options
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '9d4d9be538faa537440fff37d3b7ecf322911a55':
libavcodec: Document that encoders may use the framerate field in AVCodecContext
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '1bb56abb9b37bd208a66164339c92cad59b1087b':
omx: Add support for zerocopy input of frames
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit 'f1cd9b03f3fa875eb5e394281b4b688cec611658':
omx: Add support for broadcom OMX on raspberry pi
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit 'e8919ec486a5559fdcf366e347be0656d904a87f':
libavcodec: Add H264/MPEG4 encoders based on OpenMAX IL
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit 'd12b5b2f135aade4099f4b26b0fe678656158c13':
build: Split test programs off into separate files
Some conversions done by: James Almer <jamrial@gmail.com>
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '44f05f15d4a34ef2f0d62259e5d5b43371bc0954':
build: Do not check the vaapi_encode.h header if VAAPI is not enabled
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
The problem is that with particularly complex images and especially at
high bit depths and 5-level transforms the coefficients would overflow,
causing huge artifacts to appear. This was discovered thanks to the fate
tests, which will have to be redone as this fixes a multitude of
problems and increases PSNR.
There is a slight performance drop associated with this change, making
the encoder slower by 1.15 times, however this is necessary in order to
avoid undefined behavior and overflows.
It would be worth to template the transforms to keep the performance for
8 bit images as 32 bit coefficients are unnecessary for that case, but
the primary use of the encoder is to encode video at 10 bits.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Commit ca2f19b9cc modified the meaning of
H264SliceContext.gb: it is now initialised at the start of the NAL unit
header, rather than at the start of the slice header. The VAAPI slice
decoder uses the offset after parsing to determine the offset of the
slice data in the bitstream, so with the changed meaning we no longer
need to add the extra byte to account for the NAL unit header because
it is now included directly.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.
Timings: 68 -> 49 cycles
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* commit '9fa888c02801fff2e8817c24068f5296bbe60000':
intrax8: Keep a reference to the decoder blocks
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit 'c2084ffcbfc11d1b6ed3a4a0df9cafd56fbb896f':
intrax8: Use the generic horizband function
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This ports the fix from 033a533 to the new parser module in prepartion
of using it for the h264 decoder.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
When only one sublayer is present, no information is coded. Only when at least two
are present, all 8 sublayers are written.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>