You've already forked FFmpeg
mirror of
https://github.com/FFmpeg/FFmpeg.git
synced 2025-11-23 21:54:53 +02:00
Before this commit, the (32-bit only) simple idct came in three versions: A pure MMX IDCT and idct-put and idct-add versions which use SSE2 at the put and add stage, but still use pure MMX for the actual IDCT. This commit ports said IDCT to SSE2; this was entirely trivial for the IDCT1-5 and IDCT7 parts (where one can directly use the full register width) and was easy for IDCT6 and IDCT8 (involving a few movhps and pshufds). Unfortunately, DC_COND_INIT and Z_COND_INIT still use only the lower half of the registers. This saved 4658B here; the benchmarking option of the dct test tool showed a 15% speedup. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>