1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-08-04 22:03:09 +02:00
Go to file
Shreesh Adiga 26f2f03e0d swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422
Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the following:
4 vinsertq to have interleaving of the vector lanes during load from memory.
4 vperm2i128 inside 4 RSHIFT_COPY calls to achieve the desired layout.

This patch replaces the above 8 instructions with 2 vpermq and
2 vpermd with a vector register similar to AVX512ICL version.

Observed the following numbers on various microarchitectures:

On AMD Zen3 laptop:
Before:
uyvytoyuv422_c:                                      51979.7 ( 1.00x)
uyvytoyuv422_sse2:                                    5410.5 ( 9.61x)
uyvytoyuv422_avx:                                     4642.7 (11.20x)
uyvytoyuv422_avx2:                                    4249.0 (12.23x)

After:
uyvytoyuv422_c:                                      51659.8 ( 1.00x)
uyvytoyuv422_sse2:                                    5420.8 ( 9.53x)
uyvytoyuv422_avx:                                     4651.2 (11.11x)
uyvytoyuv422_avx2:                                    3953.8 (13.07x)

On Intel Macbook Pro 2019:
Before:
uyvytoyuv422_c:                                     185014.4 ( 1.00x)
uyvytoyuv422_sse2:                                   22800.4 ( 8.11x)
uyvytoyuv422_avx:                                    19796.9 ( 9.35x)
uyvytoyuv422_avx2:                                   13141.9 (14.08x)

After:
uyvytoyuv422_c:                                     185093.4 ( 1.00x)
uyvytoyuv422_sse2:                                   22795.4 ( 8.12x)
uyvytoyuv422_avx:                                    19791.9 ( 9.35x)
uyvytoyuv422_avx2:                                   12043.1 (15.37x)

On AMD Zen4 desktop:
Before:
uyvytoyuv422_c:                                      29105.0 ( 1.00x)
uyvytoyuv422_sse2:                                    3888.0 ( 7.49x)
uyvytoyuv422_avx:                                     3374.2 ( 8.63x)
uyvytoyuv422_avx2:                                    2649.8 (10.98x)
uyvytoyuv422_avx512icl:                               1615.0 (18.02x)

After:
uyvytoyuv422_c:                                      29093.4 ( 1.00x)
uyvytoyuv422_sse2:                                    3874.4 ( 7.51x)
uyvytoyuv422_avx:                                     3371.6 ( 8.63x)
uyvytoyuv422_avx2:                                    2174.6 (13.38x)
uyvytoyuv422_avx512icl:                               1625.1 (17.90x)

Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
2025-03-23 15:25:48 +00:00
2025-03-17 08:49:04 +01:00
2025-02-25 02:03:58 +01:00
2024-07-15 01:59:37 +02:00
2025-02-06 13:48:47 -03:00
2023-03-01 21:59:10 +01:00
2019-12-28 11:20:48 +01:00
2024-12-04 16:43:06 +08:00

FFmpeg README

FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

  • libavcodec provides implementation of a wider range of codecs.
  • libavformat implements streaming protocols, container formats and basic I/O access.
  • libavutil includes hashers, decompressors and miscellaneous utility functions.
  • libavfilter provides means to alter decoded audio and video through a directed graph of connected filters.
  • libavdevice provides an abstraction to access capture and playback devices.
  • libswresample implements audio mixing and resampling routines.
  • libswscale implements color conversion and scaling routines.

Tools

  • ffmpeg is a command line toolbox to manipulate, convert and stream multimedia content.
  • ffplay is a minimalistic multimedia player.
  • ffprobe is a simple analysis tool to inspect multimedia content.
  • Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Coding examples are available in the doc/examples directory.

License

FFmpeg codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.

Contributing

Patches should be submitted to the ffmpeg-devel mailing list using git format-patch or git send-email. Github pull requests should be avoided because they are not part of our review process and will be ignored.

Languages
C 90.1%
Assembly 7.9%
Makefile 1.3%
C++ 0.2%
Objective-C 0.2%
Other 0.1%