1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-12 19:18:44 +02:00
Commit Graph

306 Commits

Author SHA1 Message Date
Martin Storsjö
db54426975 vc1dsp: Change remaining stride parameters to ptrdiff_t
The existing x86 assembly for loop filters uses the stride as a
full register without clearing/sign extending the upper half
of the registers on x86_64.

This avoids crashes if the caller would have passed nonzero bits
in the previously undefined upper 32 bits of the parameters.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-03-30 14:53:55 +03:00
Andreas Rheinhardt
636631d9db Remove unnecessary libavutil/(avutil|common|internal).h inclusions
Some of these were made possible by moving several common macros to
libavutil/macros.h.

While just at it, also improve the other headers a bit.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-02-24 12:56:49 +01:00
Andreas Rheinhardt
58492ce443 avcodec/mips: Fix checkheaders
mips has several headers that are only meant for inclusion in another
non-arch specific file; they do not even try to be standalone. So don't
test them in checkheaders.

Also fix vp9dsp_mips.h, an ordinary header missing some includes.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-02-21 13:10:10 +01:00
Andreas Rheinhardt
afc95a10ac avcodec/h264dsp, h264idct: Fix lengths of array parameters
Fixes many -Warray-parameter warnings from GCC 11.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-08-08 17:44:57 +02:00
Andreas Rheinhardt
7bad2a61d8 avcodec/mips/constants: Include intfloat.h in constants.h
Don't rely on the user including it (mostly indirectly).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-08-05 19:00:53 +02:00
Jiaxun Yang
2323d3a923 avcodec/mips: cabac.h provide fallback for wsbh instruction
wsbh is only avilable for MIPS R2+.
Provide a fallback for older processors.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-28 23:31:48 +02:00
Jiaxun Yang
1042039ccd avcodec/mips: Use MMI marcos to replace Loongson3 instructions
Loongson3's extention instructions (prefixed with gs) are widely used
in our MMI codebase. However, these instructions are not avilable on
Loongson-2E/F while MMI code should work on these processors.

Previously we introduced mmiutils marcos to provide backward compactbility
but newly commited code didn't follow that. In this patch I revised the
codebase and converted all these instructions into MMI marcos to get
Loongson2 supproted again.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-28 23:31:48 +02:00
Jin Bo
2fac1e370c libavcodec/mips: Fix fate errors reported by clang
The data width of gsldrc1/gsldlc1 should be 8 bytes wide.

Signed-off-by: Jin Bo <jinbo@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-06-03 13:44:00 +02:00
Jin Bo
fd5fd48659 libavcodec/mips: Fix build errors reported by clang
Clang is more strict on the type of asm operands, float or double
type variable should use constraint 'f', integer variable should
use constraint 'r'.

Signed-off-by: Jin Bo <jinbo@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-06-03 13:44:00 +02:00
Jin Bo
ebedd26eef libavcodec/mips: Fix specification of instruction name
1.'xor,or,and' to 'pxor,por,pand'. In the case of operating FPR,
  gcc supports both of them, clang only supports the second type.
2.'dsrl,srl' to 'ssrld,ssrlw'. In the case of operating FPR, gcc
  supports both of them, clang only supports the second type.

Signed-off-by: Jin Bo <jinbo@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-05-28 17:31:21 +02:00
gxw
464d28c070 avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msa
Using mask to avoid judgment, H264 4K decoding speed
improved about 0.1fps tested on 3A4000

Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-05-07 17:53:23 +02:00
gxw
6458c6bdb4 avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.
Speed of decoding H264 1080P: 5.05x ==> 5.13x

Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-05-07 17:53:23 +02:00
Shiyou Yin
5ab8e8bc92 avcodec/mips: Refine get_cabac_inline_mips.
1. Refined function get_cabac_inline_mips.
2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.

Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-05-07 17:53:23 +02:00
Shiyou Yin
56c57fe68a avcodec/mips: Restore the initialization sequence of MSA and MMI in ff_h264chroma_init_mips.
The MSA optimization has been refined in commit 93218c2 and ce0a52e.
It is better than MMI version now.
Speed of decoding H264: 4.83x ==> 4.89x (tested on 3A4000).

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-05-07 17:53:23 +02:00
Andreas Rheinhardt
f3c197b129 Include attributes.h directly
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
Anton Khirnov
e15371061d lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump
They are not properly namespaced and not intended for public use.
2021-01-01 14:14:57 +01:00
Anton Khirnov
c8c2dfbc37 lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h
That is a more appropriate place for it.
2021-01-01 14:11:01 +01:00
Andreas Rheinhardt
ed33bbe678 avcodec/mpegaudiodec: Hardcode tables to save space
The csa_tables (which always consist of 32 entries of four byte each,
but the type depends upon whether the decoder is fixed or
floating-point) are currently initialized once during decoder
initialization; yet it turns out that this is actually no benefit: The
code used to initialize these tables takes up 153 (fixed point) and 122
(floating point) bytes when compiled with GCC 9.3 with -O3 on x64, so it
is better to just hardcode these tables.

Essentially the same applies to the is_tables: They have a size of 128B
each and the code to initialize them occupies 149 (fixed point) resp.
140 (floating point) bytes. So hardcode them, too.

To make the origin of the tables clear, references to the code used to
create them have been added.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2020-12-08 17:51:47 +01:00
Andreas Rheinhardt
b9c1ab8907 avcodec/fft_template, fft_init_table: Make ff_fft_init() thread-safe
Commit 1af615683e put initializing
the ff_fft_offsets_lut (which is typically used if FFT_FIXED_32)
behind an ff_thread_once() to make ff_fft_init() thread-safe; yet
there is a second place where said table may be initialized which
is not guarded by this AVOnce: ff_fft_init_mips(). MIPS uses this LUT
even for ordinary floating point FFTs, so that ff_fft_init() is not
thread-safe (on MIPS) for both 32bit fixed-point as well as
floating-point FFTs; e.g. ff_mdct_init() inherits this flaw and
therefore initializing e.g. the AAC decoders is not thread-safe (on
MIPS) despite them having FF_CODEC_CAP_INIT_CLEANUP set.

This commit fixes this by moving the AVOnce to fft_init_table.c and
using it to guard all initializations of ff_fft_offsets_lut.

(It is not that bad in practice, because every entry of
ff_fft_offsets_lut is never read during initialization and is only once
ever written to (namely to its final value); but even these are
conflicting actions which are (by definition) data races and lead to
undefined behaviour.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2020-11-24 11:35:03 +01:00
gxw
a4f7b09536 avcodec/mips: [loongson] Fixed mmi optimization
Test case fate-checkasm-h264pred failed in latest community code.
This patch fixed the bug.

Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-09-10 23:31:10 +02:00
Shiyou Yin
bd4f37f2eb avcodec/mips: Fix segfault in imdct36_mips_float.
'li.s' is a synthesized instruction, it does not work properly
when compiled with clang on mips, and A segfault occurred.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-30 00:23:45 +02:00
Shiyou Yin
1563b4b4c6 avcodec/mips/cabac: Fix a bug in get_cabac_inline_mips.
Failed fate case: fate-h264-conformance-caba2_sony_e
Clang is more strict in the use of register constraint.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-30 00:23:45 +02:00
Shiyou Yin
44699db6db avcodec/mips: Fix register constraint error reported by clang.
Clang report following error in aacsbr_mips.c,ac3dsp_mips.c and aacdec_mips.c:
"couldn't allocate output register for constraint 'r'"

Use 'f' constraint for float variable.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-30 00:23:45 +02:00
Jiaxun Yang
24911b9244 libavcodec: MIPS: MMI: Move sp out of the clobber list
GCC complains:
warning: listing the stack pointer register ‘$29’ in a clobber
list is deprecated [-Wdeprecated]

Actually stack pointer was restored at the end of the inline assembly
so there is no reason to add it to the clobber list.

Also use $sp insted of $29 to make our intention much more clear.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-23 17:21:58 +02:00
Jiaxun Yang
7a7ed1699c libavcodec: MIPS: MMI: Fix type mismatches
GCC complains about them.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-23 17:21:58 +02:00
Jiaxun Yang
e2fa12e3ae libavcodec: Enable runtime detection for MIPS MMI & MSA
Apply optimized functions according to cpuflags.
MSA is usually put after MMI as it's generally faster than MMI.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-23 17:21:58 +02:00
Jiaxun Yang
d57d6def73 ffbuild: Refine MIPS handling
To enable runtime detection for MIPS, we need to refine ffbuild
part to support buildding these feature together.

Firstly, we fixed configure, let it probe native ability of toolchain
to decide wether a feature can to be enabled, also clearly marked
the conflictions between loongson2 & loongson3 and Release 6 & rest.

Secondly, we compile MMI and MSA C sources with their own flags to ensure
their flags won't pollute the whole program and generate illegal code.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-23 16:30:02 +02:00
Shiyou Yin
12614a589f avcodec/mips: fix type mismatch in h264dsp_msa.c
gcc warning: assignment from incompatible pointer type.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-19 10:59:43 +02:00
Rosen Penev
4fa4ab97f9 avcodec/mips: fix get_cabac_inline_mips function name
On other platforms, the functions are named get_cabac_inline_xxx but not
this one. There's also a define.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-04-12 16:36:47 +02:00
Rosen Penev
875ba23333 avcodec/aacdec: fix compilation under soft float MIPS
Place HAVE_MIPSFPU further up so that functions that use floating point
ASM are defined away. Otherwise compilation failures result when soft
float in enabled on the toolchain.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
2020-04-11 14:00:32 +02:00
Linjie Fu
bffb9326b6 lavc/mips: simplify the switch code
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-12 19:25:33 +01:00
gxw
648b422e17 avcodec/mips: msa optimizations for vc1dsp
Performance of WMV3 decoding has speed up from 3.66x to 5.23x tested on 3A4000.

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-30 18:09:00 +01:00
gxw
21d19f49b7 avcodec/mips: Fixed four warnings in vc1dsp
Change the stride argument to ptrdiff_t in the following functions:
ff_put_no_rnd_vc1_chroma_mc8_mmi, ff_put_no_rnd_vc1_chroma_mc4_mmi,
ff_avg_no_rnd_vc1_chroma_mc8_mmi, ff_avg_no_rnd_vc1_chroma_mc4_mmi.

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-14 00:48:44 +02:00
gxw
92fc0bfa54 avutil/mips: refactor msa SLDI_Bn_0 and SLDI_Bn macros.
Changing details as following:
1. The previous order of parameters are irregular and difficult to
   understand. Adjust the order of the parameters according to the
   rule: (RTYPE, input registers, input mask/input index/..., output registers).
   Most of the existing msa macros follow the rule.
2. Remove the redundant macro SLDI_Bn_0 and use SLDI_Bn instead.

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-16 00:04:18 +02:00
Shiyou Yin
de5543d8d4 avcodec/mips: Fix a warnning of indentation not reflect the block structure.
The indentation of code dose not reflect the if block structure in
'apply_ltp_mips', and this will generate a warnning when build with
'-Wall' or '-Wmisleading-indentation'.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-10 17:21:53 +02:00
gxw
a3e572d96f avutil/mips: refine msa macros CLIP_*.
Changing details as following:
1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in
   source vector.
2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'.
   Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x).
   Performance of H264 decoding has speed up about 0.5%(from 4.35x to 4.37x).
   Performance of Theora decoding has speed up about 0.7%(from 5.79x to 5.83x).
3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 'CLIP_SH/Wn_0_255'
   instead, because there are no difference in the effect of this two macros.

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-08-13 16:48:38 +02:00
Shiyou Yin
11f99a9a45 avutil/mips: Avoid instruction exception caused by gssqc1/gslqc1.
Ensure the address accesed by gssqc1/gslqc1 are 16-byte aligned.
2019-08-02 19:01:51 +02:00
Shiyou Yin
62e6b634a8 avcodec/mips: [loongson] refine process of setting block as 0 in h264dsp_mmi.
In function ff_h264_add_pixels4_8_mmi, there is no need to reset '%[ftmp0]'
to 0, because it's value has never changed since the start of the asm block.
This patch remove the redundant 'xor' and set src to zero once it was loaded.

In function ff_h264_idct_add_8_mmi, 'block' is seted to zero twice.
This patch removed the first setting zero operation and move the second one
after the load operation of block.

In function ff_h264_idct8_add_8_mmi, 'block' is seted to zero twice too.
This patch just removed the second setting zero operation.

This patch mainly simplifies the implementation of functions above,
the effect on the performance of whole h264 decoding process is not obvious.
According to the perf data, proportion of ff_h264_idct_add_8_mmi decreased from
0.29% to 0.26% and ff_h264_idct8_add_8_mmi decreased from 0.62% to 0.59% when decoding
H264 format on loongson 3A3000(For reference only , not very stable.).

Reviewed-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-07-28 20:50:30 +02:00
Shiyou Yin
153c607525 avutil/mips: refactor msa load and store macros.
Replace STnxm_UB and LDnxm_SH with new macros ST_{H/W/D}{1/2/4/8}.
The old macros are difficult to use because they don't follow the same parameter passing rules.
Changing details as following:
1. remove LD4x4_SH.
2. replace ST2x4_UB with ST_H4.
3. replace ST4x2_UB with ST_W2.
4. replace ST4x4_UB with ST_W4.
5. replace ST4x8_UB with ST_W8.
6. replace ST6x4_UB with ST_W2 and ST_H2.
7. replace ST8x1_UB with ST_D1.
8. replace ST8x2_UB with ST_D2.
9. replace ST8x4_UB with ST_D4.
10. replace ST8x8_UB with ST_D8.
11. replace ST12x4_UB with ST_D4 and ST_W4.

Examples of new macro: ST_H4(in, idx0, idx1, idx2, idx3, pdst, stride)
ST_H4 store four half-word elements in vector 'in' to pdst with stride.
About the macro name:
1) 'ST' means store operation.
2) 'H/W/D' means type of vector element is 'half-word/word/double-word'.
3) Number '1/2/4/8' means how many elements will be stored.
About the macro parameter:
1) 'in0, in1...' 128-bits vector.
2) 'idx0, idx1...' elements index.
3) 'pdst' destination pointer to store to
4) 'stride' stride of each store operation.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-07-19 01:23:23 +02:00
YunQiang Su
925e33b253 avcodec/mips/cabac: replace addi with addiu
addi/daddi are deprecated by MIPS for years, and MIPS r6 remove
them.

They should be replace with addiu:
   ADDIU performs the same arithmetic operation but
   does not trap on overflow.

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-07-10 12:54:57 +02:00
Shiyou Yin
6b67daa326 avcodec/mips: [loongson] fix mpeg4 decoding error on loongson platform.
In function ff_dct_unquantize_mpeg2_intra_mmi,
addr0 shoudn't be changed before storage operation.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-05-26 21:04:18 +02:00
gxw
4571c7c05d avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functions
VP9 decoding speed improved about 60.5%(from 38fps to 61fps, tested on loongson 3A3000).

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-27 01:51:40 +01:00
gxw
1466dc140d avcodec/mips: [loongson] optimize theora decoding with mmi.
Optimize theora decoding with mmi in functions:
1. ff_vp3_idct_add_mmi
2. ff_vp3_idct_put_mmi
3. ff_vp3_idct_dc_add_mmi
4. ff_put_no_rnd_pixels_l2_mmi

Theora decoding speed improved about 32%(from 88fps to 116fps, Tested on loongson 3A3000).

Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-16 19:56:57 +01:00
Shiyou Yin
b429c86d84 avcodec/mips: [loongson] optimize put_hevc_qpel_h_8 with mmi.
Optimize put_hevc_qpel_h_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 2%(2.39x to 2.44x, tested on loongson 3A3000).

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-02 20:17:37 +01:00
Shiyou Yin
dceefb2b84 avcodec/mips: [loongson] optimize put_hevc_qpel_bi_h_8 with mmi.
Optimize put_hevc_qpel_bi_h_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 2.1%(2.34x to 2.39x, tested on loongson 3A3000).

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-02 20:17:37 +01:00
Shiyou Yin
c0942b7a2c avcodec/mips: [loongson] optimize put_hevc_epel_bi_hv_8 with mmi.
Optimize put_hevc_epel_bi_hv_8 with mmi in the case width=4/8/12/16/24/32.
This optimization improved HEVC decoding performance 1.7%(2.30x to 2.34x, tested on loongson 3A3000).

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-02 20:17:37 +01:00
Shiyou Yin
0c43429210 avcodec/mips: [loongson] optimize put_hevc_qpel_uni_hv_8 with mmi.
Optimize put_hevc_qpel_uni_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 2.7%(2.24x to 2.30x, tested on loongson 3A3000).

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-02 20:17:37 +01:00
Shiyou Yin
83aa2cd757 avcodec/mips: [loongson] optimize put_hevc_qpel_bi_hv_8 with mmi.
Optimize put_hevc_qpel_bi_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 11.4%(2.01x to 2.24x, tested on loongson 3A3000).

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-22 00:46:36 +01:00
Shiyou Yin
6d19164811 avcodec/mips: [loongson] optimize put_hevc_qpel_hv_8 with mmi.
Optimize put_hevc_qpel_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 11%(1.81x to 2.01x, tested on loongson 3A3000).

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-22 00:46:36 +01:00
Shiyou Yin
32421602df avcodec/mips: [loongson] optimize put_hevc_pel_bi_pixels_8 with mmi.
Optimize put_hevc_pel_bi_pixels_8 with mmi in the case width=8/16/24/32/48/64.
This optimization improved HEVC decoding performance 2%(1.77x to 1.81x, tested on loongson 3A3000).

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-20 00:44:02 +01:00