FFmpeg/libavcodec/x86/fmtconvert.asm

;******************************************************************************
;* x86 optimized Format Conversion Utils
;* Copyright (c) 2008 Loren Merritt
;*
;* This file is part of Libav.
;*
;* Libav is free software; you can redistribute it and/or
;* modify it under the terms of the GNU Lesser General Public
;* License as published by the Free Software Foundation; either
;* version 2.1 of the License, or (at your option) any later version.
;*
;* Libav is distributed in the hope that it will be useful,
;* but WITHOUT ANY WARRANTY; without even the implied warranty of
;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;* Lesser General Public License for more details.
;*
;* You should have received a copy of the GNU Lesser General Public
;* License along with Libav; if not, write to the Free Software
;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
;******************************************************************************

%include "libavutil/x86/x86util.asm"

SECTION .text

;------------------------------------------------------------------------------
; void ff_int32_to_float_fmul_scalar(float *dst, const int32_t *src, float mul,
;                                    int len);
;------------------------------------------------------------------------------
%macro INT32_TO_FLOAT_FMUL_SCALAR 1
%if UNIX64
cglobal int32_to_float_fmul_scalar, 3, 3, %1, dst, src, len
%else
cglobal int32_to_float_fmul_scalar, 4, 4, %1, dst, src, mul, len
%endif
%if WIN64
    SWAP 0, 2
%elif ARCH_X86_32
    movss   m0, mulm
%endif
    SPLATD  m0
    shl     lend, 2
    add     srcq, lenq
    add     dstq, lenq
    neg     lenq
.loop:
%if cpuflag(sse2)
    cvtdq2ps  m1, [srcq+lenq   ]
    cvtdq2ps  m2, [srcq+lenq+16]
%else
    cvtpi2ps  m1, [srcq+lenq   ]
    cvtpi2ps  m3, [srcq+lenq+ 8]
    cvtpi2ps  m2, [srcq+lenq+16]
    cvtpi2ps  m4, [srcq+lenq+24]
    movlhps   m1, m3
    movlhps   m2, m4
%endif
    mulps     m1, m0
    mulps     m2, m0
    mova  [dstq+lenq   ], m1
    mova  [dstq+lenq+16], m2
    add     lenq, 32
    jl .loop
%if notcpuflag(sse2)
    ;; cvtpi2ps switches to MMX even if the source is a memory location
    ;; possible an error in documentation since every tested CPU disagrees with
    ;; that. Use emms anyway since the vast majority of machines will use the
    ;; SSE2 variant
    emms
%endif
    RET
%endmacro

INIT_XMM sse
INT32_TO_FLOAT_FMUL_SCALAR 5
INIT_XMM sse2
INT32_TO_FLOAT_FMUL_SCALAR 3
Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-01-30 17:06:46 +02:00			`;******************************************************************************`
			`;* x86 optimized Format Conversion Utils`
			`;* Copyright (c) 2008 Loren Merritt`
			`;*`
Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-03-18 19:35:10 +02:00			`;* This file is part of Libav.`
Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-01-30 17:06:46 +02:00			`;*`
Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-03-18 19:35:10 +02:00			`;* Libav is free software; you can redistribute it and/or`
Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-01-30 17:06:46 +02:00			`;* modify it under the terms of the GNU Lesser General Public`
			`;* License as published by the Free Software Foundation; either`
			`;* version 2.1 of the License, or (at your option) any later version.`
			`;*`
Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-03-18 19:35:10 +02:00			`;* Libav is distributed in the hope that it will be useful,`
Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-01-30 17:06:46 +02:00			`;* but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU`
			`;* Lesser General Public License for more details.`
			`;*`
			`;* You should have received a copy of the GNU Lesser General Public`
Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-03-18 19:35:10 +02:00			`;* License along with Libav; if not, write to the Free Software`
Fix FSF address copy paste error in some license headers. 2011-05-14 22:32:31 +03:00			`;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA`
Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-01-30 17:06:46 +02:00			`;******************************************************************************`

x86: yasm: Use complete source path for macro helper %includes This is more consistent with the way we handle C #includes and it simplifies the build system. 2012-07-15 12:48:21 +03:00			`%include "libavutil/x86/x86util.asm"`
Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-01-30 17:06:46 +02:00
x86inc: Drop SECTION_TEXT macro The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. Signed-off-by: Anton Khirnov <anton@khirnov.net> 2015-08-01 17:27:36 +02:00			`SECTION .text`
Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com> 2011-01-30 17:06:46 +02:00
x86: Make function prototype comments in assembly code consistent This helps grepping for functions, among other things. 2014-01-28 22:35:58 +03:00			`;------------------------------------------------------------------------------`
			`; void ff_int32_to_float_fmul_scalar(float dst, const int32_t src, float mul,`
			`; int len);`
			`;------------------------------------------------------------------------------`
x86: fmtconvert: port to cpuflags 2012-07-15 16:42:17 +03:00			`%macro INT32_TO_FLOAT_FMUL_SCALAR 1`
config.asm: change %ifdef directives to %if directives. This allows combining multiple conditionals in a single statement. 2012-01-23 12:45:58 +03:00			`%if UNIX64`
x86: fmtconvert: port to cpuflags 2012-07-15 16:42:17 +03:00			`cglobal int32_to_float_fmul_scalar, 3, 3, %1, dst, src, len`
fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm 2011-10-10 06:52:03 +03:00			`%else`
x86: fmtconvert: port to cpuflags 2012-07-15 16:42:17 +03:00			`cglobal int32_to_float_fmul_scalar, 4, 4, %1, dst, src, mul, len`
fmtconvert: fix int32_to_float_fmul_scalar() for windows x86_64 The calling convention only allows 4 non-stack parameter, with each float or int register being skipped if not used. fixes Bug 64 2011-11-01 23:57:41 +03:00			`%endif`
config.asm: change %ifdef directives to %if directives. This allows combining multiple conditionals in a single statement. 2012-01-23 12:45:58 +03:00			`%if WIN64`
fmtconvert: fix int32_to_float_fmul_scalar() for windows x86_64 The calling convention only allows 4 non-stack parameter, with each float or int register being skipped if not used. fixes Bug 64 2011-11-01 23:57:41 +03:00			`SWAP 0, 2`
config.asm: change %ifdef directives to %if directives. This allows combining multiple conditionals in a single statement. 2012-01-23 12:45:58 +03:00			`%elif ARCH_X86_32`
fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm 2011-10-10 06:52:03 +03:00			`movss m0, mulm`
			`%endif`
			`SPLATD m0`
x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly This reverts commit 5dfe4edad63971d669ae456b0bc40ef9364cca80. 2015-12-22 23:45:42 +02:00			`shl lend, 2`
fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm 2011-10-10 06:52:03 +03:00			`add srcq, lenq`
			`add dstq, lenq`
			`neg lenq`
			`.loop:`
x86: fmtconvert: port to cpuflags 2012-07-15 16:42:17 +03:00			`%if cpuflag(sse2)`
fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm 2011-10-10 06:52:03 +03:00			`cvtdq2ps m1, [srcq+lenq ]`
			`cvtdq2ps m2, [srcq+lenq+16]`
			`%else`
			`cvtpi2ps m1, [srcq+lenq ]`
			`cvtpi2ps m3, [srcq+lenq+ 8]`
			`cvtpi2ps m2, [srcq+lenq+16]`
			`cvtpi2ps m4, [srcq+lenq+24]`
			`movlhps m1, m3`
			`movlhps m2, m4`
			`%endif`
			`mulps m1, m0`
			`mulps m2, m0`
			`mova [dstq+lenq ], m1`
			`mova [dstq+lenq+16], m2`
			`add lenq, 32`
			`jl .loop`
x86: use emms after ff_int32_to_float_fmul_scalar_sse Intel's Instruction Set Reference (as of September 2015) clearly states that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the source is a memory location. The Instruction Set Reference from 1999 (Order Number 243191) describes this behaviour but all later versions I've seen have make no distinction whether MMX registers or memory is used as source. The documentation for the matching SSE2 instruction to convert to double (cvtpi2pd) was fixed (see the valgrind bug https://bugs.kde.org/show_bug.cgi?id=210264). It will take time to get a clarification and fixes in place. In the meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to be correct according to the documentation. The vast majority of users will have SSE2 so a change to the SSE version has little effect. Fixes fate-checkasm on x86 valgrind targets. Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059 2015-12-29 13:08:38 +02:00			`%if notcpuflag(sse2)`
			`;; cvtpi2ps switches to MMX even if the source is a memory location`
			`;; possible an error in documentation since every tested CPU disagrees with`
			`;; that. Use emms anyway since the vast majority of machines will use the`
			`;; SSE2 variant`
			`emms`
			`%endif`
			`RET`
fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm 2011-10-10 06:52:03 +03:00			`%endmacro`

x86: fmtconvert: port to cpuflags 2012-07-15 16:42:17 +03:00			`INIT_XMM sse`
			`INT32_TO_FLOAT_FMUL_SCALAR 5`
			`INIT_XMM sse2`
			`INT32_TO_FLOAT_FMUL_SCALAR 3`