* commit '15a29c39d9ef15b0783c04b3228e1c55f6701ee3':
truehd: add hand-scheduled ARM asm version of mlp_filter_channel.
Conflicts:
libavcodec/arm/Makefile
libavcodec/arm/mlpdsp_init_arm.c
See: 87b128d5ef
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:
Before After
Mean StdDev Mean StdDev Confidence Change
6:2 total 380.4 22.0 370.8 17.0 87.4% +2.6% (insignificant)
6:2 function 60.7 7.2 36.6 8.1 100.0% +65.8%
8:2 total 357.0 17.5 343.2 19.0 97.8% +4.0% (insignificant)
8:2 function 60.3 8.8 37.3 3.8 100.0% +61.8%
6:6 total 717.2 23.2 658.4 15.7 100.0% +8.9%
6:6 function 140.4 12.9 81.5 9.2 100.0% +72.4%
8:8 total 981.9 16.2 896.2 24.5 100.0% +9.6%
8:8 function 193.4 15.0 103.3 11.5 100.0% +87.2%
Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.
The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:
Before After
Mean StdDev Mean StdDev Confidence Change
6:2 total 380.4 22.0 370.8 17.0 87.4% +2.6% (insignificant)
6:2 function 60.7 7.2 36.6 8.1 100.0% +65.8%
8:2 total 357.0 17.5 343.2 19.0 97.8% +4.0% (insignificant)
8:2 function 60.3 8.8 37.3 3.8 100.0% +61.8%
6:6 total 717.2 23.2 658.4 15.7 100.0% +8.9%
6:6 function 140.4 12.9 81.5 9.2 100.0% +72.4%
8:8 total 981.9 16.2 896.2 24.5 100.0% +9.6%
8:8 function 193.4 15.0 103.3 11.5 100.0% +87.2%
Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.
The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.
Signed-off-by: Martin Storsjö <martin@martin.st>
The memory allocation for f->diffs was freed multiple times in some
corner cases. Simplify the code so that this doesn't happen.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '09d4389de10b03ea65a84eaf3d6c4b7a7538ad75':
hpeldsp_template: Drop av_unused attribute from *_no_rnd_pixels16_8_c functions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '92ba965103d3884609730ba9bf293772dc78a9ef':
dsputil: Move draw_edges and clear_block* out of dsputil_template
Conflicts:
libavcodec/dsputil.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e7373585f827d4ec05d952daa3877e8decfe3c08':
dsputil_template: Move bits that are used templatized into separate file
Conflicts:
libavcodec/dsputil_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'd3c3c1664a958923f234283e66fbcbfe69a6927f':
dsputil: Move hpel_template #include out of dsputil_template
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This only uses the first slice, improvement here is welcome
analyzing all slices the trivial way would interfere with threads
Fixes Ticket3185
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'aa499568afc01d59215eef7e5b14b949a9671afc':
avconv: More descriptive message about framedrop
Merged-by: Michael Niedermayer <michaelni@gmx.at>
AV_CPU_FLAG_AVX is enabled at this point only if there's OS support.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Both of these dithering methods are from http://pippin.gimp.org/a_dither/ for
GIF they can be considered better than bayer (provides more gray-levels), and
spatial stability - often more than twice as good compression and less visual
flicker than error diffusion methods (the methods also avoids error-shadow
artifacts of diffusion dithers).
These methods are similar to blue/green noise type dither masks; but are
simple enough to generate their mask on the fly. They are still research work
in progress; though more expensive to generate masks (which can be used in a
LUT) like 'void and cluster' and similar methods will yield superior results