1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00
Commit Graph

8 Commits

Author SHA1 Message Date
James Darnley
d7246ea9f2 avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions
Includes add/put functions

Rounding contributed by Ronald S. Bultje
2017-06-28 17:27:35 +02:00
James Darnley
8221c71703 avcodec/x86: allow future 8-bit simple idct to use slightly different coefficients 2017-06-20 16:12:25 +02:00
James Darnley
d2597fb0c1 avcodec/x86: modify simple_idct10 macros to add an action paramter 2017-06-20 13:35:01 +02:00
James Darnley
8781330d80 avcodec/x86: cleanup simple_idct10
Use named arguments for the functions so we can remove a define.  The
stride/linesize argument is now ptrdiff_t type so we no longer need to
sign extend the register.
2017-06-20 13:34:38 +02:00
Geza Lore
d39c229e54 x86inc: Add debug symbols indicating sizes of compiled functions
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.

Currently only implemented for ELF.
2016-01-21 23:19:46 +01:00
Christophe Gisquet
74c414202f x86: simple_idct10_template: use const
This avoid going through constants.c while still sharing them
with proresdsp.asm

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 22:52:33 +02:00
Christophe Gisquet
7ece8b50b1 x86: simple_idct: 12bits versions
On 12 frames of a 444p 12 bits DNxHR sequence, _put function:
C:         78902 decicycles in idct,  262071 runs,     73 skips
avx:       32478 decicycles in idct,  262045 runs,     99 skips

Difference between the 2:
stddev:    0.39 PSNR:104.47 MAXDIFF:    2

This is unavoidable and due to the scale factors used in the x86
version, which cannot match the C ones.

In addition, the trick of adding an initial bias to the input of a
pass can overflow, as the input coefficients are already 15bits,
which is the maximum this function can handle.

Overall, however, the omse on 12 bits samples goes from 0.16916 to
0.16883. Reducing rowshift by 1 improves to 0.0908, but causes
overflows.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 15:34:32 +02:00
Christophe Gisquet
4369b9dc7b x86: simple_idct(_put): 10bits versions
Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.

For 16 frames of a DNxHD 4:2:2 10bits test sequence:

C:    60861 decicycles in idct, 1048205 runs,    371 skips
sse2: 27567 decicycles in idct, 1048216 runs,    360 skips
avx:  26272 decicycles in idct, 1048171 runs,    405 skips

The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 13:32:21 +02:00