1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2026-05-04 21:08:03 +02:00
Andreas Rheinhardt 9beecb2670 avcodec/x86/qpeldsp: Add SSE2 vertical lowpass functions
Benchmarks ([4], [8] and [12] are pure vertical functions
and therefore show the biggest improvements):

avg_qpel_pixels_tab[0][4]_c:                           844.5 ( 1.00x)
avg_qpel_pixels_tab[0][4]_mmxext:                      225.5 ( 3.74x)
avg_qpel_pixels_tab[0][4]_sse2:                        146.6 ( 5.76x)
avg_qpel_pixels_tab[0][5]_c:                          1915.9 ( 1.00x)
avg_qpel_pixels_tab[0][5]_mmxext:                      499.6 ( 3.83x)
avg_qpel_pixels_tab[0][5]_sse2:                        405.5 ( 4.72x)
avg_qpel_pixels_tab[0][6]_c:                          1775.9 ( 1.00x)
avg_qpel_pixels_tab[0][6]_mmxext:                      484.9 ( 3.66x)
avg_qpel_pixels_tab[0][6]_sse2:                        385.4 ( 4.61x)
avg_qpel_pixels_tab[0][7]_c:                          1937.0 ( 1.00x)
avg_qpel_pixels_tab[0][7]_mmxext:                      501.3 ( 3.86x)
avg_qpel_pixels_tab[0][7]_sse2:                        403.6 ( 4.80x)
avg_qpel_pixels_tab[0][8]_c:                           976.7 ( 1.00x)
avg_qpel_pixels_tab[0][8]_mmxext:                      216.9 ( 4.50x)
avg_qpel_pixels_tab[0][8]_sse2:                        113.1 ( 8.64x)
avg_qpel_pixels_tab[0][9]_c:                          1971.8 ( 1.00x)
avg_qpel_pixels_tab[0][9]_mmxext:                      494.9 ( 3.98x)
avg_qpel_pixels_tab[0][9]_sse2:                        388.3 ( 5.08x)
avg_qpel_pixels_tab[0][10]_c:                         1900.8 ( 1.00x)
avg_qpel_pixels_tab[0][10]_mmxext:                     476.4 ( 3.99x)
avg_qpel_pixels_tab[0][10]_sse2:                       362.4 ( 5.24x)
avg_qpel_pixels_tab[0][11]_c:                         2003.3 ( 1.00x)
avg_qpel_pixels_tab[0][11]_mmxext:                     496.5 ( 4.04x)
avg_qpel_pixels_tab[0][11]_sse2:                       385.9 ( 5.19x)
avg_qpel_pixels_tab[0][12]_c:                          841.8 ( 1.00x)
avg_qpel_pixels_tab[0][12]_mmxext:                     226.7 ( 3.71x)
avg_qpel_pixels_tab[0][12]_sse2:                       143.3 ( 5.87x)
avg_qpel_pixels_tab[0][13]_c:                         1929.0 ( 1.00x)
avg_qpel_pixels_tab[0][13]_mmxext:                     499.6 ( 3.86x)
avg_qpel_pixels_tab[0][13]_sse2:                       412.1 ( 4.68x)
avg_qpel_pixels_tab[0][14]_c:                         1777.9 ( 1.00x)
avg_qpel_pixels_tab[0][14]_mmxext:                     484.8 ( 3.67x)
avg_qpel_pixels_tab[0][14]_sse2:                       385.9 ( 4.61x)
avg_qpel_pixels_tab[0][15]_c:                         1914.8 ( 1.00x)
avg_qpel_pixels_tab[0][15]_mmxext:                     501.8 ( 3.82x)
avg_qpel_pixels_tab[0][15]_sse2:                       405.0 ( 4.73x)
avg_qpel_pixels_tab[1][4]_c:                           203.4 ( 1.00x)
avg_qpel_pixels_tab[1][4]_mmxext:                       64.7 ( 3.14x)
avg_qpel_pixels_tab[1][4]_sse2:                         40.3 ( 5.05x)
avg_qpel_pixels_tab[1][5]_c:                           488.8 ( 1.00x)
avg_qpel_pixels_tab[1][5]_mmxext:                      134.6 ( 3.63x)
avg_qpel_pixels_tab[1][5]_sse2:                        108.5 ( 4.50x)
avg_qpel_pixels_tab[1][6]_c:                           448.2 ( 1.00x)
avg_qpel_pixels_tab[1][6]_mmxext:                      128.8 ( 3.48x)
avg_qpel_pixels_tab[1][6]_sse2:                        102.5 ( 4.37x)
avg_qpel_pixels_tab[1][7]_c:                           489.6 ( 1.00x)
avg_qpel_pixels_tab[1][7]_mmxext:                      134.5 ( 3.64x)
avg_qpel_pixels_tab[1][7]_sse2:                        108.8 ( 4.50x)
avg_qpel_pixels_tab[1][8]_c:                           223.8 ( 1.00x)
avg_qpel_pixels_tab[1][8]_mmxext:                       57.5 ( 3.89x)
avg_qpel_pixels_tab[1][8]_sse2:                         36.3 ( 6.16x)
avg_qpel_pixels_tab[1][9]_c:                           496.6 ( 1.00x)
avg_qpel_pixels_tab[1][9]_mmxext:                      129.8 ( 3.82x)
avg_qpel_pixels_tab[1][9]_sse2:                        105.1 ( 4.72x)
avg_qpel_pixels_tab[1][10]_c:                          466.1 ( 1.00x)
avg_qpel_pixels_tab[1][10]_mmxext:                     123.2 ( 3.78x)
avg_qpel_pixels_tab[1][10]_sse2:                        99.1 ( 4.70x)
avg_qpel_pixels_tab[1][11]_c:                          497.9 ( 1.00x)
avg_qpel_pixels_tab[1][11]_mmxext:                     129.9 ( 3.83x)
avg_qpel_pixels_tab[1][11]_sse2:                       105.4 ( 4.72x)
avg_qpel_pixels_tab[1][12]_c:                          203.5 ( 1.00x)
avg_qpel_pixels_tab[1][12]_mmxext:                      63.8 ( 3.19x)
avg_qpel_pixels_tab[1][12]_sse2:                        38.8 ( 5.25x)
avg_qpel_pixels_tab[1][13]_c:                          487.9 ( 1.00x)
avg_qpel_pixels_tab[1][13]_mmxext:                     134.7 ( 3.62x)
avg_qpel_pixels_tab[1][13]_sse2:                       108.4 ( 4.50x)
avg_qpel_pixels_tab[1][14]_c:                          447.4 ( 1.00x)
avg_qpel_pixels_tab[1][14]_mmxext:                     128.2 ( 3.49x)
avg_qpel_pixels_tab[1][14]_sse2:                       102.4 ( 4.37x)
avg_qpel_pixels_tab[1][15]_c:                          487.5 ( 1.00x)
avg_qpel_pixels_tab[1][15]_mmxext:                     134.0 ( 3.64x)
avg_qpel_pixels_tab[1][15]_sse2:                       109.9 ( 4.44x)

put_no_rnd_qpel_pixels_tab[0][4]_c:                    825.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][4]_mmxext:               242.5 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][4]_sse2:                 136.0 ( 6.07x)
put_no_rnd_qpel_pixels_tab[0][5]_c:                   1837.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][5]_mmxext:               542.5 ( 3.39x)
put_no_rnd_qpel_pixels_tab[0][5]_sse2:                 446.5 ( 4.11x)
put_no_rnd_qpel_pixels_tab[0][6]_c:                   1766.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][6]_mmxext:               493.6 ( 3.58x)
put_no_rnd_qpel_pixels_tab[0][6]_sse2:                 394.6 ( 4.48x)
put_no_rnd_qpel_pixels_tab[0][7]_c:                   1877.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][7]_mmxext:               541.9 ( 3.46x)
put_no_rnd_qpel_pixels_tab[0][7]_sse2:                 447.6 ( 4.19x)
put_no_rnd_qpel_pixels_tab[0][8]_c:                    785.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][8]_mmxext:               206.2 ( 3.81x)
put_no_rnd_qpel_pixels_tab[0][8]_sse2:                 101.6 ( 7.73x)
put_no_rnd_qpel_pixels_tab[0][9]_c:                   1772.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][9]_mmxext:               489.5 ( 3.62x)
put_no_rnd_qpel_pixels_tab[0][9]_sse2:                 394.8 ( 4.49x)
put_no_rnd_qpel_pixels_tab[0][10]_c:                  1711.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][10]_mmxext:              461.2 ( 3.71x)
put_no_rnd_qpel_pixels_tab[0][10]_sse2:                357.9 ( 4.78x)
put_no_rnd_qpel_pixels_tab[0][11]_c:                  1815.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][11]_mmxext:              490.8 ( 3.70x)
put_no_rnd_qpel_pixels_tab[0][11]_sse2:                394.0 ( 4.61x)
put_no_rnd_qpel_pixels_tab[0][12]_c:                   824.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][12]_mmxext:              242.9 ( 3.40x)
put_no_rnd_qpel_pixels_tab[0][12]_sse2:                135.3 ( 6.10x)
put_no_rnd_qpel_pixels_tab[0][13]_c:                  1843.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][13]_mmxext:              545.4 ( 3.38x)
put_no_rnd_qpel_pixels_tab[0][13]_sse2:                444.9 ( 4.14x)
put_no_rnd_qpel_pixels_tab[0][14]_c:                  1758.1 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][14]_mmxext:              497.7 ( 3.53x)
put_no_rnd_qpel_pixels_tab[0][14]_sse2:                393.5 ( 4.47x)
put_no_rnd_qpel_pixels_tab[0][15]_c:                  1861.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[0][15]_mmxext:              545.0 ( 3.42x)
put_no_rnd_qpel_pixels_tab[0][15]_sse2:                445.7 ( 4.18x)
put_no_rnd_qpel_pixels_tab[1][4]_c:                    198.3 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][4]_mmxext:                64.3 ( 3.08x)
put_no_rnd_qpel_pixels_tab[1][4]_sse2:                  39.8 ( 4.98x)
put_no_rnd_qpel_pixels_tab[1][5]_c:                    460.7 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][5]_mmxext:               137.2 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][5]_sse2:                 113.5 ( 4.06x)
put_no_rnd_qpel_pixels_tab[1][6]_c:                    441.4 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][6]_mmxext:               126.7 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][6]_sse2:                 103.7 ( 4.26x)
put_no_rnd_qpel_pixels_tab[1][7]_c:                    465.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][7]_mmxext:               137.7 ( 3.38x)
put_no_rnd_qpel_pixels_tab[1][7]_sse2:                 114.0 ( 4.09x)
put_no_rnd_qpel_pixels_tab[1][8]_c:                    193.8 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][8]_mmxext:                52.1 ( 3.72x)
put_no_rnd_qpel_pixels_tab[1][8]_sse2:                  27.8 ( 6.97x)
put_no_rnd_qpel_pixels_tab[1][9]_c:                    450.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][9]_mmxext:               126.2 ( 3.57x)
put_no_rnd_qpel_pixels_tab[1][9]_sse2:                 104.3 ( 4.32x)
put_no_rnd_qpel_pixels_tab[1][10]_c:                   436.5 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][10]_mmxext:              118.1 ( 3.69x)
put_no_rnd_qpel_pixels_tab[1][10]_sse2:                 92.4 ( 4.73x)
put_no_rnd_qpel_pixels_tab[1][11]_c:                   453.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][11]_mmxext:              128.7 ( 3.52x)
put_no_rnd_qpel_pixels_tab[1][11]_sse2:                103.6 ( 4.38x)
put_no_rnd_qpel_pixels_tab[1][12]_c:                   201.2 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][12]_mmxext:               64.2 ( 3.13x)
put_no_rnd_qpel_pixels_tab[1][12]_sse2:                 39.6 ( 5.08x)
put_no_rnd_qpel_pixels_tab[1][13]_c:                   461.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][13]_mmxext:              137.6 ( 3.36x)
put_no_rnd_qpel_pixels_tab[1][13]_sse2:                113.4 ( 4.07x)
put_no_rnd_qpel_pixels_tab[1][14]_c:                   442.6 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][14]_mmxext:              127.0 ( 3.49x)
put_no_rnd_qpel_pixels_tab[1][14]_sse2:                102.2 ( 4.33x)
put_no_rnd_qpel_pixels_tab[1][15]_c:                   462.9 ( 1.00x)
put_no_rnd_qpel_pixels_tab[1][15]_mmxext:              139.5 ( 3.32x)
put_no_rnd_qpel_pixels_tab[1][15]_sse2:                113.3 ( 4.09x)

put_qpel_pixels_tab[0][4]_c:                           824.6 ( 1.00x)
put_qpel_pixels_tab[0][4]_mmxext:                      220.1 ( 3.75x)
put_qpel_pixels_tab[0][4]_sse2:                        137.8 ( 5.98x)
put_qpel_pixels_tab[0][5]_c:                          1892.0 ( 1.00x)
put_qpel_pixels_tab[0][5]_mmxext:                      508.0 ( 3.72x)
put_qpel_pixels_tab[0][5]_sse2:                        408.6 ( 4.63x)
put_qpel_pixels_tab[0][6]_c:                          1758.0 ( 1.00x)
put_qpel_pixels_tab[0][6]_mmxext:                      476.7 ( 3.69x)
put_qpel_pixels_tab[0][6]_sse2:                        381.4 ( 4.61x)
put_qpel_pixels_tab[0][7]_c:                          1924.3 ( 1.00x)
put_qpel_pixels_tab[0][7]_mmxext:                      495.1 ( 3.89x)
put_qpel_pixels_tab[0][7]_sse2:                        417.2 ( 4.61x)
put_qpel_pixels_tab[0][8]_c:                           772.1 ( 1.00x)
put_qpel_pixels_tab[0][8]_mmxext:                      197.5 ( 3.91x)
put_qpel_pixels_tab[0][8]_sse2:                        118.4 ( 6.52x)
put_qpel_pixels_tab[0][9]_c:                          1778.2 ( 1.00x)
put_qpel_pixels_tab[0][9]_mmxext:                      476.7 ( 3.73x)
put_qpel_pixels_tab[0][9]_sse2:                        379.6 ( 4.68x)
put_qpel_pixels_tab[0][10]_c:                         1714.6 ( 1.00x)
put_qpel_pixels_tab[0][10]_mmxext:                     460.7 ( 3.72x)
put_qpel_pixels_tab[0][10]_sse2:                       386.8 ( 4.43x)
put_qpel_pixels_tab[0][11]_c:                         1819.1 ( 1.00x)
put_qpel_pixels_tab[0][11]_mmxext:                     474.9 ( 3.83x)
put_qpel_pixels_tab[0][11]_sse2:                       404.5 ( 4.50x)
put_qpel_pixels_tab[0][12]_c:                          829.7 ( 1.00x)
put_qpel_pixels_tab[0][12]_mmxext:                     221.5 ( 3.75x)
put_qpel_pixels_tab[0][12]_sse2:                       138.7 ( 5.98x)
put_qpel_pixels_tab[0][13]_c:                         1892.8 ( 1.00x)
put_qpel_pixels_tab[0][13]_mmxext:                     494.4 ( 3.83x)
put_qpel_pixels_tab[0][13]_sse2:                       413.9 ( 4.57x)
put_qpel_pixels_tab[0][14]_c:                         1763.1 ( 1.00x)
put_qpel_pixels_tab[0][14]_mmxext:                     473.4 ( 3.72x)
put_qpel_pixels_tab[0][14]_sse2:                       377.8 ( 4.67x)
put_qpel_pixels_tab[0][15]_c:                         1896.4 ( 1.00x)
put_qpel_pixels_tab[0][15]_mmxext:                     492.5 ( 3.85x)
put_qpel_pixels_tab[0][15]_sse2:                       399.0 ( 4.75x)
put_qpel_pixels_tab[1][4]_c:                           198.6 ( 1.00x)
put_qpel_pixels_tab[1][4]_mmxext:                       60.9 ( 3.26x)
put_qpel_pixels_tab[1][4]_sse2:                         40.1 ( 4.95x)
put_qpel_pixels_tab[1][5]_c:                           471.4 ( 1.00x)
put_qpel_pixels_tab[1][5]_mmxext:                      131.8 ( 3.58x)
put_qpel_pixels_tab[1][5]_sse2:                        107.2 ( 4.40x)
put_qpel_pixels_tab[1][6]_c:                           440.3 ( 1.00x)
put_qpel_pixels_tab[1][6]_mmxext:                      126.3 ( 3.49x)
put_qpel_pixels_tab[1][6]_sse2:                        100.6 ( 4.38x)
put_qpel_pixels_tab[1][7]_c:                           469.2 ( 1.00x)
put_qpel_pixels_tab[1][7]_mmxext:                      131.7 ( 3.56x)
put_qpel_pixels_tab[1][7]_sse2:                        106.9 ( 4.39x)
put_qpel_pixels_tab[1][8]_c:                           194.2 ( 1.00x)
put_qpel_pixels_tab[1][8]_mmxext:                       52.9 ( 3.67x)
put_qpel_pixels_tab[1][8]_sse2:                         28.0 ( 6.95x)
put_qpel_pixels_tab[1][9]_c:                           464.6 ( 1.00x)
put_qpel_pixels_tab[1][9]_mmxext:                      125.1 ( 3.71x)
put_qpel_pixels_tab[1][9]_sse2:                        100.9 ( 4.60x)
put_qpel_pixels_tab[1][10]_c:                          433.8 ( 1.00x)
put_qpel_pixels_tab[1][10]_mmxext:                     118.2 ( 3.67x)
put_qpel_pixels_tab[1][10]_sse2:                        94.5 ( 4.59x)
put_qpel_pixels_tab[1][11]_c:                          463.9 ( 1.00x)
put_qpel_pixels_tab[1][11]_mmxext:                     125.5 ( 3.70x)
put_qpel_pixels_tab[1][11]_sse2:                       102.6 ( 4.52x)
put_qpel_pixels_tab[1][12]_c:                          199.2 ( 1.00x)
put_qpel_pixels_tab[1][12]_mmxext:                      63.7 ( 3.12x)
put_qpel_pixels_tab[1][12]_sse2:                        36.2 ( 5.50x)
put_qpel_pixels_tab[1][13]_c:                          475.6 ( 1.00x)
put_qpel_pixels_tab[1][13]_mmxext:                     139.5 ( 3.41x)
put_qpel_pixels_tab[1][13]_sse2:                       107.3 ( 4.43x)
put_qpel_pixels_tab[1][14]_c:                          441.9 ( 1.00x)
put_qpel_pixels_tab[1][14]_mmxext:                     126.9 ( 3.48x)
put_qpel_pixels_tab[1][14]_sse2:                       101.3 ( 4.36x)
put_qpel_pixels_tab[1][15]_c:                          475.9 ( 1.00x)
put_qpel_pixels_tab[1][15]_mmxext:                     131.9 ( 3.61x)
put_qpel_pixels_tab[1][15]_sse2:                       107.0 ( 4.45x)

The new functions (in qpeldsp.asm) occupy 8244B (the MMXEXT functions
which they will replace occupy only 6720B).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-30 10:39:33 +02:00
2025-08-08 21:51:15 +00:00
2025-06-23 14:48:40 +02:00
2025-05-07 15:35:47 +02:00
2025-08-14 08:42:29 -04:00

FFmpeg README

FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

  • libavcodec provides implementation of a wider range of codecs.
  • libavformat implements streaming protocols, container formats and basic I/O access.
  • libavutil includes hashers, decompressors and miscellaneous utility functions.
  • libavfilter provides means to alter decoded audio and video through a directed graph of connected filters.
  • libavdevice provides an abstraction to access capture and playback devices.
  • libswresample implements audio mixing and resampling routines.
  • libswscale implements color conversion and scaling routines.

Tools

  • ffmpeg is a command line toolbox to manipulate, convert and stream multimedia content.
  • ffplay is a minimalistic multimedia player.
  • ffprobe is a simple analysis tool to inspect multimedia content.
  • Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Coding examples are available in the doc/examples directory.

License

FFmpeg codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.

Contributing

Patches should be submitted to the ffmpeg-devel mailing list using git format-patch or git send-email. Github pull requests should be avoided because they are not part of our review process and will be ignored.

Languages
C 89.3%
Assembly 8.3%
Makefile 1.3%
C++ 0.3%
GLSL 0.3%
Other 0.3%