From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)
x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.
The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.
In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.
Moreover, some of the removed code was buggy/not bitexact
and lead to failures involving the f32le and f32be versions of
gray, gbrp and gbrap on x86-32 when SSE2 was not disabled.
See e.g.
https://fate.ffmpeg.org/report.cgi?time=20220609221253&slot=x86_32-debian-kfreebsd-gcc-4.4-cpuflags-mmx
Notice that yuv2yuvX_mmx is not removed, because it is used
by SSE3 and AVX2 as fallback in case of unaligned data and
also for tail processing. I don't know why yuv2yuvX_mmxext
isn't being used for this; an earlier version [1] of
554c2bc7086f49ef5a6a989ad6bc4bc11807eb6f used it, but
the version that was eventually applied does not.
[1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272124.html
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b':
x86inc: Rename "program_name" to "private_prefix"
configure: Run SHFLAGS through ldflags_filter()
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
x86: include x86inc.asm in x86util.asm
cng: Reindent some incorrectly indented lines
cngdec: Allow flushing the decoder
cngdec: Make the dbov variable have the right unit
cngdec: Fix the memset size to cover the full array
cngdec: Update the LPC coefficients after averaging the reflection coefficients
configure: fix print_config() with broke awks
Conflicts:
libavcodec/x86/ac3dsp.asm
libavcodec/x86/dct32.asm
libavcodec/x86/deinterlace.asm
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputilenc.asm
libavcodec/x86/fft.asm
libavcodec/x86/fmtconvert.asm
libavcodec/x86/h264_chromamc.asm
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264_deblock_10bit.asm
libavcodec/x86/h264_idct.asm
libavcodec/x86/h264_idct_10bit.asm
libavcodec/x86/h264_intrapred.asm
libavcodec/x86/h264_intrapred_10bit.asm
libavcodec/x86/h264_weight.asm
libavcodec/x86/vc1dsp.asm
libavcodec/x86/vp3dsp.asm
libavcodec/x86/vp56dsp.asm
libavcodec/x86/vp8dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
libx264: fix indentation.
vorbis: fix overflows in floor1[] vector and inverse db table index.
win64: add a XMM clobber test configure option.
movdec: Parse the dvc1 atom
ARM: ac3: fix ac3_bit_alloc_calc_bap_armv6
swscale: K&R formatting cosmetics for Blackfin code
frwu: lowercase the FRWU codec name
movdec: fix dts generation in fragmented files
fate: make acodec-ac3_fixed test output raw AC3
APIchanges: add missing commit hashes
swscale: implement MMX, SSE2 and AVX functions for RGB32 input.
ra144enc: drop pointless "encoder" from .long_name
bethsoftvideo: fix palette reading.
mpc7: use av_fast_padded_malloc()
mpc7: simplify handling of packet sizes that are not a multiple of 4 bytes
doc: decoding Forward Uncompressed is supported
Fix a typo in the x86 asm version of ff_vector_clip_int32()
pcmenc: Do not set avpkt->size.
ff_alloc_packet: modify the size of the packet to match the requested size
Conflicts:
doc/APIchanges
libavcodec/libx264.c
libavcodec/mpc7.c
libavformat/isom.h
libswscale/Makefile
libswscale/bfin/yuv2rgb_bfin.c
tests/ref/fate/bethsoft-vid
tests/ref/seek/ac3_ac3
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (22 commits)
frwu: Employ more meaningful return values.
fraps: Use av_fast_padded_malloc() instead of av_realloc()
mjpegdec: use av_fast_padded_malloc()
eatqi: use av_fast_padded_malloc()
asv1: use av_fast_padded_malloc()
avcodec: Add av_fast_padded_malloc().
swscale: enable dithering in MMX functions.
swscale: make rgb24 function macros slightly smaller.
avcodec.h: Remove some disabled cruft.
swscale: remove obsolete comment.
swscale-test: Drop unused argc and argv arguments from main().
zmbv: Employ more meaningful return values.
zmbvenc: Employ more meaningful return values.
vc1: prevent null pointer dereference on broken files
zmbv: check av_realloc() return values and avoid memleaks on ENOMEM
truespeech: align buffer
ac3: Do not read past the end of ff_ac3_band_start_tab.
dv: Fix small stack overread related to CVE-2011-3929 and CVE-2011-3936.
dv: Fix null pointer dereference due to ach=0
dv: check stype
...
Conflicts:
doc/APIchanges
libavcodec/asv1.c
libavcodec/avcodec.h
libavcodec/eatqi.c
libavcodec/fraps.c
libavcodec/frwu.c
libavcodec/zmbv.c
libavformat/dv.c
libswscale/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (22 commits)
rv34: frame-level multi-threading
mpegvideo: claim ownership of referenced pictures
aacsbr: prevent out of bounds memcpy().
ipmovie: fix pts for CODEC_ID_INTERPLAY_DPCM
sierravmd: fix audio pts
bethsoftvideo: Use bytestream2 functions to prevent buffer overreads.
bmpenc: support for PIX_FMT_RGB444
swscale: fix crash in fast_bilinear code when compiled with -mred-zone.
swscale: specify register type.
rv34: use get_bits_left()
avconv: reinitialize the filtergraph on resolution change.
vsrc_buffer: error on changing frame parameters.
avconv: fix -copyinkf.
fate: Update file checksums after the mov muxer change in a78dbada55d6
movenc: Don't store a nonzero creation time if nothing was set by the caller
bmpdec: support for rgb444 with bitfields compression
rgb2rgb: allow conversion for <15 bpp
doc: fix stray reference to FFmpeg
v4l2: use C99 struct initializer
v4l2: poll the file descriptor
...
Conflicts:
avconv.c
libavcodec/aacsbr.c
libavcodec/bethsoftvideo.c
libavcodec/kmvc.c
libavdevice/v4l2.c
libavfilter/vsrc_buffer.c
libswscale/swscale_unscaled.c
libswscale/x86/input.asm
tests/ref/acodec/alac
tests/ref/acodec/pcm_s16be
tests/ref/acodec/pcm_s24be
tests/ref/acodec/pcm_s32be
tests/ref/acodec/pcm_s8
tests/ref/lavf/mov
tests/ref/vsynth1/dnxhd_1080i
tests/ref/vsynth1/mpeg4
tests/ref/vsynth1/qtrle
tests/ref/vsynth1/svq1
tests/ref/vsynth2/dnxhd_1080i
tests/ref/vsynth2/mpeg4
tests/ref/vsynth2/qtrle
tests/ref/vsynth2/svq1
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
fate: Add tests for more AAC features.
aacps: Add missing newline in error message.
fate: Add tests for vc1/wmapro in ism.
aacdec: Add a fate test for 5.1 channel SBR.
aacdec: Turn off PS for multichannel files that use PCE based configs.
cabac: remove put_cabac_u/ueg from cabac-test.
swscale: RGB4444 and BGR444 input
FATE: add test for xWMA demuxer.
FATE: add test for SMJPEG demuxer and associated IMA ADPCM audio decoder.
mpegaudiodec: optimized iMDCT transform
mpegaudiodec: change imdct window arrangment for better pointer alignment
mpegaudiodec: move imdct and windowing function to mpegaudiodsp
mpegaudiodec: interleave iMDCT buffer to simplify future SIMD implementations
swscale: convert yuy2/uyvy/nv12/nv21ToY/UV from inline asm to yasm.
FATE: test to exercise WTV demuxer.
mjpegdec: K&R formatting cosmetics
swscale: K&R formatting cosmetics for code examples
swscale: K&R reformatting cosmetics for header files
FATE test: cvid-grayscale; ensures that the grayscale Cinepak variant is exercised.
Conflicts:
libavcodec/cabac.c
libavcodec/mjpegdec.c
libavcodec/mpegaudiodec.c
libavcodec/mpegaudiodsp.c
libavcodec/mpegaudiodsp.h
libavcodec/mpegaudiodsp_template.c
libavcodec/x86/Makefile
libavcodec/x86/imdct36_sse.asm
libavcodec/x86/mpegaudiodec_mmx.c
libswscale/swscale-test.c
libswscale/swscale.c
libswscale/swscale_internal.h
libswscale/x86/swscale_template.c
tests/fate/demux.mak
tests/fate/microsoft.mak
tests/fate/video.mak
tests/fate/wma.mak
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>