256 bits is just wide enough to fit all the operands needed to vectorize
the software implementation, but AVX2 is needed to for a couple of
instructions like cross-lane permutation.
Output is bit-for-bit identical to C.
Signed-off-by: Nelson Gomez <nelson.gomez@microsoft.com>
The original inline assembly and nasm code have the same fps when called by command.
NASM code almost has no impact on the perfromance.
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This affected many FATE-tests: The number of failing tests went down
from 663 to 344. (Both numbers exclude tests that failed because of
unaligned accesses in code that is inside #if HAVE_FAST_UNALIGNED.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Variables used in inline assembly need to be marked with attribute((used)).
Static constants already were, via the define of DECLARE_ASM_CONST.
But DECLARE_ALIGNED does not add this attribute, and some of the variables
defined with it are const only used in inline assembly, and therefore
appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks
variables as used.
This change makes FFMPEG work with Clang's ThinLTO.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
+ split color conversion from scaling
- disabled gamma correction, until it's refactored too
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The rationale is that you have a packed format in form
<greyscale sample> <alpha sample> <greyscale sample> <alpha sample>
and shortening greyscale to 'G' might make one thing about Greenscale instead.
An alias pixel format and color space name are provided for compatibility.
* commit '46bacb5cc6169ff5e8e982495c4925467c1d8bb7':
x86: Consistently use cpu flag detection macros in places that still miss it
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c16bfb147df8a9d350e8a0dbc01937b78faf5949':
swscale: x86: Consistently use lowercase function name suffixes
Conflicts:
libswscale/x86/rgb2rgb.c
libswscale/x86/swscale.c
See: 1de064e21e
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This simplifies the code and improves quality at the expense of a slight
slowdown of a rarely used function (no fate test uses it).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'fa8fcab1e0d31074c0644c4ac5194474c6c26415':
x86: h264_chromamc_10bit: drop pointless PAVG %define
x86: mmx2 ---> mmxext in function names
swscale: do not forget to swap data in formats with different endianness
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavfilter/x86/gradfun.c
libswscale/input.c
libswscale/utils.c
libswscale/x86/swscale.c
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
swscale: Provide the right alignment for external mmx asm
x86: Replace checks for CPU extensions and flags by convenience macros
configure: msvc: fix/simplify setting of flags for hostcc
x86: mlpdsp: mlp_filter_channel_x86 requires inline asm
Conflicts:
libavcodec/x86/fft_init.c
libavcodec/x86/h264_intrapred_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/mpegaudiodec.c
libavcodec/x86/proresdsp_init.c
libavutil/x86/float_dsp_init.c
libswscale/utils.c
libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
MSS1 and MSS2: set final pixel format after common stuff has been initialised
MSS2 decoder
configure: handle --disable-asm before check_deps
x86: Split inline and external assembly #ifdefs
configure: x86: Separate inline from standalone assembler capabilities
pktdumper: Use a custom define instead of PATH_MAX for buffers
pktdumper: Use av_strlcpy instead of strncpy
pktdumper: Use sizeof(variable) instead of the direct buffer length
Conflicts:
Changelog
configure
libavcodec/allcodecs.c
libavcodec/avcodec.h
libavcodec/codec_desc.c
libavcodec/dct-test.c
libavcodec/imgconvert.c
libavcodec/mss12.c
libavcodec/version.h
libavfilter/x86/gradfun.c
libswscale/x86/yuv2rgb.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ec36aa69448f20a78d8c4588265022e0b2272ab5':
x86: Fix linking with some or all of yasm, mmx, optimizations disabled
configure: Add more fine-grained SSE CPU capabilities flags
avfilter: x86: Use more precise compile template names
x86: cosmetics: Comment some #endifs for better readability
g723_1: add comfort noise generation
utvideoenc: Switch to dsputils' median prediction
utvideoenc: Avoid writing into the input picture
avtools: remove the distinction between func_arg and func2_arg.
avconv: make the -passlogfile option per-stream.
avconv: make the -pass option per-stream.
cmdutils: make -codecs print lossy/lossless flags.
lavc: add lossy/lossless codec properties.
Conflicts:
Changelog
cmdutils.c
configure
doc/APIchanges
ffmpeg.h
ffmpeg_opt.c
ffprobe.c
libavcodec/codec_desc.c
libavcodec/g723_1.c
libavcodec/utvideoenc.c
libavcodec/version.h
libavcodec/x86/mpegaudiodec.c
libavcodec/x86/rv40dsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>