You've already forked FFmpeg
mirror of
https://github.com/FFmpeg/FFmpeg.git
synced 2026-04-24 04:44:54 +02:00
1a7979a2f8
For pre-AVX2, vpbroadcastw is emulated via a load, followed by two shuffles. Yet given that one always wants to splat multiple pairs of coefficients which are adjacent in memory, one can do better than that: Load all of them at once, perform a punpcklwd with itself and use one pshufd per register. In case one has to sign-extend the coefficients, too, one can replace the punpcklwd with one pmovsxbw (instead of one per register) and use pshufd directly afterwards. This saved 4816B of .text here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>