This optimization improved h264 decoding performance about 4%(from 74fps to 77fps, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Combined 1st and 2nd loop into one inline asm in function ff_vc1_inv_trans_8x8_mmi to
reduce memory operation, and made some small optimization in ff_vc1_inv_trans_4x8_mmi.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Failed case: svq3-watermark
When minimum loop count of following functions are greater than parameter h passed to them, svq3-watermark failed.
1. ff_put_pixels4_8_mmi
2. ff_avg_pixels4_8_mmi
3. ff_put_pixels4_l2_8_mmi
4. ff_avg_pixels4_l2_8_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Constraint "g" means compiler can store variable in memory or register.
When we use constraint "g" for a variable and this variable was operated by
instruction which only support register operands may lead "invalid operands" error.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Simplify the usage of intermediate variable addr and remove unused variable all64
in following functions:
1. ff_put_pixels_clamped_mmi
2. ff_put_signed_pixels_clamped_mmi
3. ff_add_pixels_clamped_mmi
This optimization speed up mpeg4 decode about 2% on loongson platform(tested with 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Simplify the usage of intermediate variable addr in following functions:
1. ff_put_pixels4_8_mmi
2. ff_put_pixels8_8_mmi
3. ff_put_pixels16_8_mmi
4. ff_avg_pixels16_8_mmi.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Failed case: mss2-wmv
In following functions, pmullh was used to multiply two 16-bit data, this will cause data overflow.
1. ff_vc1_inv_trans_8x8_dc_mmi
2. ff_vc1_inv_trans_8x8_mmi
3. ff_vc1_inv_trans_8x4_mmi
4. ff_vc1_inv_trans_4x8_mmi
5. ff_vc1_inv_trans_4x4_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Optimized memset with mmi in following functions:
1. ff_h264_add_pixels4_8_mmi.
2. ff_h264_idct_add_8_mmi.
3. ff_h264_idct8_add_8_mmi.
This optimization improved h264 decoding performance about 1.3%(tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Reoptimize function ff_put_h264_chroma_mc8_mmi and ff_avg_h264_chroma_mc8_mmi.
Performance of h264 decoding improved about 5%(from 69fps to 73fps, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Performance of mpeg4 decoding improved about 23%(from 128fps to 158fps, tested on loongson 3A3000).
Reoptimized following functions with mmi.
1. ff_simple_idct_put_8_mmi
2. ff_simple_idct_add_8_mmi
3. ff_simple_idct_8_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In commit 975a1a8,function ff_vc1_h_s_overlap_mmi was refactored,
but the declaration in libavcodec/mips/vc1dsp_mips.h was unchanged.
Change-Id: I90beae683511622a0cc1130ab1660ac8669ec3ef
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Reviewed-by: Jerome Borsboom <jerome.borsboom@carpalis.nl>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The overlap filter is not correct for vertical edges in frame interlaced
I and P pictures. When filtering macroblocks with different FIELDTX values,
we have to match the lines at both sides of the vertical border. In addition,
we have to use the correct rounding values, depending on the line we are
filtering.
Signed-off-by: Jerome Borsboom <jerome.borsboom@carpalis.nl>
Use mask buffer.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use mask buffer.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use global mask buffer for appropriate mask load.
Use immediate unsigned saturation for clip to max saving one vector register.
Remove unused macro.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use global mask buffer for appropriate mask load.
Remove unused macro and table.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use global mask buffer for appropriate mask load.
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use global mask buffer for appropriate mask load.
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Align the mask buffer to 64 bytes.
Load the specific destination bytes instead of MSA load and pack.
Remove unused macros and functions.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use global mask buffer for appropriate mask load.
Use immediate unsigned saturation for clip to max saving one vector register.
Remove unused macro.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use global mask buffer for appropriate mask load.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use global mask buffer for appropriate mask load.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use global mask buffer for appropriate mask load.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Load the specific destination bytes instead of MSA load and pack.
Remove unused macros and functions.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Remove loops and unroll as block sizes are known.
Load the specific destination bytes instead of MSA load and pack.
Remove unused macro and functions.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Pack the data to half word before clipping.
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Replace generic with block size specific function.
Load the specific destination bytes instead of MSA load and pack.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Remove loops and unroll as block sizes are known.
Load the specific destination bytes instead of MSA load and pack.
Remove unused macro and functions.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Replace generic with block size specific function.
Load the specific destination bytes instead of MSA load and pack.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Remove loops and unroll as block sizes are known.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Pack the data to half word before clipping.
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Load the specific destination bytes instead of MSA load and pack.
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Remove loops and unroll as block sizes are known.
Removed unused functions.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Replace generic with block size specific function.
Load the specific destination bytes instead of MSA load and pack.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Load the specific bytes instead of MSA load.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Load the specific destination bytes instead of MSA load and pack.
Pack the data to half word before clipping.
Use immediate unsigned saturation for clip to max saving one vector register.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Remove loops and unroll as block sizes are known.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Replace generic with block size specific function.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Replace generic with block size specific function.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Remove loops and unroll as block sizes are known.
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>