You've already forked FFmpeg
mirror of
https://github.com/FFmpeg/FFmpeg.git
synced 2025-11-23 21:54:53 +02:00
Doubling the register size allowed to avoid two pmaddubsw. It is also ABI compliant (the old version lacked an emms) and the average versions no longer rely on padding (the old versions used pavgb with a memory operand reading eight bytes, although only four are needed). Old benchmarks (the latter four refer to RV40): avg_h264_chroma_mc4_8_c: 145.7 ( 1.00x) avg_h264_chroma_mc4_8_ssse3: 32.3 ( 4.51x) put_h264_chroma_mc4_8_c: 136.1 ( 1.00x) put_h264_chroma_mc4_8_ssse3: 29.0 ( 4.70x) avg_chroma_mc4_c: 162.1 ( 1.00x) avg_chroma_mc4_ssse3: 31.1 ( 5.22x) put_chroma_mc4_c: 137.5 ( 1.00x) put_chroma_mc4_ssse3: 28.6 ( 4.81x) New benchmarks: avg_h264_chroma_mc4_8_c: 146.7 ( 1.00x) avg_h264_chroma_mc4_8_ssse3: 26.5 ( 5.53x) put_h264_chroma_mc4_8_c: 136.8 ( 1.00x) put_h264_chroma_mc4_8_ssse3: 22.5 ( 6.09x) avg_chroma_mc4_c: 165.5 ( 1.00x) avg_chroma_mc4_ssse3: 27.2 ( 6.08x) put_chroma_mc4_c: 138.1 ( 1.00x) put_chroma_mc4_ssse3: 23.2 ( 5.96x) Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>