Ronald S. Bultje
385a3420d1
vp9/x86: fix overwrite in ipred_vl_4x4_ssse3.
...
Fixes track ticket 3717.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-12 04:11:20 +02:00
Christophe Gisquet
508e7a5c16
x86: huffyuv: fix {add,diff}_int16
...
They used an extra, undeclared register. Fixes a crash in
fate-vsynth3-ffvhuff444p16
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-12 00:26:19 +02:00
Michael Niedermayer
1a2ff62859
Merge commit '570d4b21863b6254d6bbca9c528bede471bb4478'
...
* commit '570d4b21863b6254d6bbca9c528bede471bb4478':
x86: h264: Don't keep data in the redzone across function calls on 64 bit unix
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-10 18:35:49 +02:00
Martin Storsjö
570d4b2186
x86: h264: Don't keep data in the redzone across function calls on 64 bit unix
...
We know that the called function (ff_chroma_inter_body_mmxext)
doesn't touch the redzone, and thus will be kept intact - thus,
this doesn't fix any bug per se.
However, valgrind's memcheck tool intentionally assumes that the
redzone is clobbered on every function call and function return
(see a long comment in valgrind/memcheck/mc_main.c). This avoids
false positives in that tool, at the cost of an extra stack pointer
adjustment.
The other alternative would be a valgrind suppression for this issue,
but that's an extra burden for everybody that wants to run libavcodec
within valgrind.
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-06-10 16:31:48 +03:00
Michael Niedermayer
06f576c4ab
avcodec/x86/dct_init: fix build failure with clang && disable-optimizations
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-09 19:32:41 +02:00
James Almer
6d408495b5
x86/dct32: don't build ff_dct32_float_sse on x86_64
...
There's an SSE2 version already, and technically the SSE version
on x86_64 was wrong (using pshufd and pshuflw, SSE2 instructions).
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-09 00:51:43 +02:00
James Almer
fc8db12a73
x86/vp9: inital AVX2 intra_pred
...
tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz
1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips
439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips
3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips
2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips
1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips
717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips
2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips
2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips
3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips
2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips
1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips
922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-08 02:37:20 +02:00
James Almer
ec98f80af4
x86/dsputil: move some mmx init code inside dsputil_init_mmx()
...
This reduces differences with the fork
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-06 05:26:04 +02:00
Christophe Gisquet
ccff45a0d3
apedsp: move to llauddsp
...
APE is not the sole codec using scalarproduct_and_madd_int16.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-05 20:31:59 +02:00
Michael Niedermayer
d5c9d055ea
avcodec/x86/dsputilenc_mmx: fix build without yasm
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-04 05:39:03 +02:00
James Almer
625ffa1457
x86/motion_est: sad_{x, y}2_mmxext functions are bitexact
...
Only the xy2 functions aren't.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-04 00:48:35 +02:00
Timothy Gu
108dec3055
x86: dsputilenc: convert hf_noise*_mmx to yasm
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Several bugfixes by: Christophe Gisquet <christophe.gisquet@gmail.com>
See: [FFmpeg-devel] [WIP] [PATCH 4/4] x86: dsputilenc: convert hf_noise*_mmx to yasm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-03 23:59:43 +02:00
Christophe Gisquet
dcd2a6ca36
x86: hevc_mc: remove unneeded shift
...
The immediate value may be 0.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-01 23:34:33 +02:00
Christophe Gisquet
09fc28aed1
x86: hevcdsp_init: fix macro usage
...
The macro was not using the parameter but unconditionally using sse4.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-01 23:20:07 +02:00
James Almer
e1bd40fe6b
x86/motion_est: enable sad16_sse2 on k10 CPUs
...
The check is meant for k8 CPUs. sad16_sse2 is ~20% faster than sad16_mmxext on k10.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-01 02:10:32 +02:00
James Almer
f128342df2
build: fix compilation of svq1enc_mmx.c with --disable-mmx
...
It's needed for ff_svq1enc_init_x86() even if simd functions are disabled.
Alternatively, svq1enc_init.c could be made and the relevant code moved there.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-31 00:38:24 +02:00
James Almer
4ac41a52e2
x86/huffyuvdsp: fix some prototypes
...
Remove duplicate prototypes and fix int -> intptr_t in another
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-31 00:29:00 +02:00
Christophe Gisquet
d136fe6fd7
x86: huffyuvdsp: fewer functions for x86_64
...
When there are 2 functions that are <= SSE2, only one is needed for x86_64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 21:39:06 +02:00
Timothy Gu
154cee9292
x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 16:57:52 +02:00
Timothy Gu
0b6292b7b8
x86: dsputilenc: move all the function prototypes together
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 16:18:10 +02:00
Christophe Gisquet
f743fa9c7f
x86: huffyuvdsp: add_hfyu_left_pred_bgr32
...
C MMX SSE2
Cycles: 3092 1053 578
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 15:20:36 +02:00
Michael Niedermayer
7be79c76d3
avcodec/huffyuvdsp: Change w to intptr in add_hfyu_median_pred() and add_hfyu_left_pred()
...
This avoids potential issues with the high 32bits being random in x86-64 asm
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 15:12:58 +02:00
Christophe Gisquet
884078d2df
x86: huffyuvdsp: add SSE2 median prediction
...
From 5010c to 4566 on lagarith YUY2.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 14:57:57 +02:00
Michael Niedermayer
8c891d90ca
avcodec/x86/qpeldsp_init: Restore author attribution
...
See: 368f50359e
See: 44eb495128
, and many others
See:
similarity index 83%
copy from libavcodec/x86/dsputil_init.c
copy to libavcodec/x86/qpeldsp_init.c
index ebbf97f..8f296a1 100644
--- a/libavcodec/x86/dsputil_init.c
+++ b/libavcodec/x86/qpeldsp_init.c
@@ -1,6 +1,5 @@
/*
- * Copyright (c) 2000, 2001 Fabrice Bellard
- * Copyright (c) 2002-2004 Michael Niedermayer <michaelni@gmx.at>
+ * quarterpel DSP functions
*
* This file is part of FFmpeg.
*
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 04:05:29 +02:00
Michael Niedermayer
98a6806fdd
Merge commit '368f50359eb328b0b9d67451f56fda20b3255f9a'
...
* commit '368f50359eb328b0b9d67451f56fda20b3255f9a':
dsputil: Split off quarterpel bits into their own context
Conflicts:
configure
libavcodec/dsputil.c
libavcodec/h263dec.c
libavcodec/mpegvideo.c
libavcodec/mpegvideo_enc.c
libavcodec/vc1dec.c
libavcodec/vc1dsp.c
libavcodec/x86/dsputil_init.c
libavcodec/x86/qpeldsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 02:43:34 +02:00
Michael Niedermayer
40f3a87c10
Merge commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3'
...
* commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3':
dsputil: Move APE-specific bits into apedsp
Conflicts:
libavcodec/arm/int_neon.S
libavcodec/x86/dsputil.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 00:59:15 +02:00
Michael Niedermayer
c814a6c778
avcodec/x86/svq1enc_mmx: Add author attribution
...
See: 5900637219
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 00:30:05 +02:00
Michael Niedermayer
ea0931fb96
Merge commit '65d5d5865845f057cc6530a8d0f34db952d9009c'
...
* commit '65d5d5865845f057cc6530a8d0f34db952d9009c':
dsputil: Move SVQ1 encoding specific bits into svq1enc
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 00:01:45 +02:00
James Almer
02a3e327f1
x86/dsputilenc: add missing guards to ff_pix_sum16_xop
...
XOP support was added in Yasm 1.0.0 and Nasm 2.06, and we still
support older versions.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 22:31:28 +02:00
Christophe Gisquet
99a319c4e7
x86: huffyuvdsp: port add_bytes to yasm
...
C MMX SSE2
Cycles: 2972 587 302
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 21:56:00 +02:00
Christophe Gisquet
2267003981
x86: hpeldsp: better factorization
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 21:47:40 +02:00
Michael Niedermayer
7b4c46050e
rename add_hfyu_left_prediction_int16 to add_hfyu_left_pred_int16
...
This makes the naming more consistent with the 8bit variant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 19:50:44 +02:00
Michael Niedermayer
550ae6c02f
rename add_hfyu_median_prediction_int16 to add_hfyu_median_pred_int16
...
This makes the naming more consistent with the 8bit variant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 19:49:29 +02:00
Michael Niedermayer
40a4ab8ba4
rename sub_hfyu_median_prediction_int16 to sub_hfyu_median_pred_int16
...
This makes the naming more consistent with the 8bit variant
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 19:48:23 +02:00
James Almer
05de4d3011
x86/dsputilenc: implement XOP version of pix_sum16
...
SSE2: 137 cycles
XOP: 87 cycles
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 18:40:23 +02:00
Diego Biurrun
368f50359e
dsputil: Split off quarterpel bits into their own context
2014-05-29 06:48:31 -07:00
Diego Biurrun
054013a0fc
dsputil: Move APE-specific bits into apedsp
2014-05-29 06:41:15 -07:00
Diego Biurrun
65d5d58658
dsputil: Move SVQ1 encoding specific bits into svq1enc
2014-05-29 06:41:15 -07:00
Michael Niedermayer
b50559fc0b
libavcodec/x86/dsputilenc: drop and 0xffff that should have becomei redundant
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 00:16:52 +02:00
James Almer
561bfc85eb
x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-28 23:29:34 +02:00
Christophe Gisquet
0810608e23
x86: hevc_mc: better register allocation
...
The xmm reg count was incorrect, and manual loading of the gprs
furthermore allows to noticeable reduce the number needed.
The modified functions are used in weighted prediction, so only a
few samples like WP_* exhibit a change. For this one and Win64
(some widths removed because of too few occurrences):
WP_A_Toshiba_3.bit, ff_hevc_put_hevc_uni_w
16 32
before: 2194 3872
after: 2119 3767
WP_B_Toshiba_3.bit, ff_hevc_put_hevc_bi_w
16 32 64
before: 2819 4960 9396
after: 2617 4788 9150
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-28 17:39:34 +02:00
Michael Niedermayer
48a6916308
Merge commit '512f3ffe9b4bb86767c2b1176554407c75fe1a5c'
...
* commit '512f3ffe9b4bb86767c2b1176554407c75fe1a5c':
dsputil: Split off HuffYUV encoding bits into their own context
Conflicts:
configure
libavcodec/dsputil.c
libavcodec/dsputil.h
libavcodec/huffyuv.h
libavcodec/huffyuvenc.c
libavcodec/pngenc.c
libavcodec/x86/dsputilenc_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-28 00:03:59 +02:00
Michael Niedermayer
e2abc0d5ca
Merge commit '0d439fbede03854eac8a978cccf21a3425a3c82d'
...
* commit '0d439fbede03854eac8a978cccf21a3425a3c82d':
dsputil: Split off HuffYUV decoding bits into their own context
Conflicts:
configure
libavcodec/dsputil.c
libavcodec/dsputil.h
libavcodec/huffyuv.h
libavcodec/huffyuvdec.c
libavcodec/lagarith.c
libavcodec/vble.c
libavcodec/x86/Makefile
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputil_init.c
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-27 23:16:06 +02:00
Diego Biurrun
512f3ffe9b
dsputil: Split off HuffYUV encoding bits into their own context
...
Also shorten HuffYUV context member names to avoid clutter.
2014-05-27 08:54:53 -07:00
Diego Biurrun
0d439fbede
dsputil: Split off HuffYUV decoding bits into their own context
...
Also shorten HuffYUV context member names to avoid clutter.
2014-05-27 08:52:34 -07:00
James Almer
5863207086
x86/dsputilenc: use HADDD in ff_sse16_sse2
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-27 15:12:50 +02:00
James Almer
e64e079ece
x86/dsputilenc: implement SSE2 version of diff_pixels
...
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-27 05:55:11 +02:00
Michael Niedermayer
a0c5cd3475
avcodec/x86/dsputilenc: set the count of SSE registers correctly for get_pixels
...
Found-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-27 05:52:25 +02:00
Christophe Gisquet
86ae0da60c
x86: hpeldsp: propagate changes across codecs
...
Some codecs still use mmx versions, so have them use the versions
with newer instruction sets.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-26 15:37:04 +02:00
Michael Niedermayer
a3950a90f6
Revert "x86: dsputilenc: convert ff_sse{8, 16}_mmx() to yasm"
...
This reverts commit ad733089b0
.
breaks with --disable-yasm
revert requested by: Christophe Gisquet <christophe.gisquet@gmail.com>
2014-05-25 19:42:18 +02:00