Michael Niedermayer
37b2b0d6cd
Get rid of a check in one direction that cant be true in it in that part
...
of the code.
No meassureable speed change.
Originally committed as revision 21566 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-31 02:05:26 +00:00
Michael Niedermayer
2646814897
Split first reference list comparission from mv comparission.
...
about 0.5% faster MBAFF loop filtering
Originally committed as revision 21552 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-30 20:07:37 +00:00
Michael Niedermayer
4e992796a9
Replace h->left_type[0] by the local variable for it we have.
...
No meassureable speed effect.
Originally committed as revision 21541 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-30 14:33:25 +00:00
Michael Niedermayer
012dbcce08
slightly faster bit trickery.
...
Originally committed as revision 21540 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-30 14:10:06 +00:00
Michael Niedermayer
77821e11b3
Replace ?: by branchless code.
...
about 0.5% faster loop filtering
Originally committed as revision 21539 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-30 13:40:20 +00:00
Michael Niedermayer
34032e26ab
factorize first filter call out, this makes the code somewhat
...
smaller without any speed loss.
Originally committed as revision 21514 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 19:44:13 +00:00
Michael Niedermayer
592e03a8da
Change wraper functions to always inline, they are faster now that way.
...
1% faster MBAFF decoding overall, maybe ~0.1% faster for the cathedral sample.
Originally committed as revision 21507 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 11:37:35 +00:00
Michael Niedermayer
5364db2893
indent
...
Originally committed as revision 21506 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 11:18:06 +00:00
Michael Niedermayer
2cf0d46d4c
Restructure check_mv()
...
~20 cpu cycles faster loopfilter
Originally committed as revision 21505 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 11:12:46 +00:00
Michael Niedermayer
fabd704b37
Restructure if() in check_mv()
...
quite a bit faster
Originally committed as revision 21504 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 10:38:43 +00:00
Michael Niedermayer
ca7c784fdf
Unroll loops in check_mv()
...
~6% faster (slow path) loopfilter (should be ~2% overall)
Originally committed as revision 21503 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 10:34:06 +00:00
Michael Niedermayer
e814817b74
Factor mv/ref compare code out.
...
This is a hair slower (0.15% maybe) but i really dont want to have the
identical code duplicated 3 times because gcc adds odd threaded jumps with
register reshuffling and register safe/restore.
Originally committed as revision 21502 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 10:10:02 +00:00
Michael Niedermayer
3b84924516
Simplify first edge filter condition.
...
Originally committed as revision 21497 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 02:41:52 +00:00
Michael Niedermayer
b6302d0c55
Cosmetics, mostly indention, 2 or so new fixme comments that i was to lazy
...
to split out
Originally committed as revision 21496 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 02:20:31 +00:00
Michael Niedermayer
0a32508d90
Make the fast loop filter path work with unavailable left MBs.
...
This prevents the issue with having to switch between slow and
fast code paths in each row.
0.5% faster loopfilter for cathedral
Originally committed as revision 21495 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 02:15:25 +00:00
Michael Niedermayer
b304767301
get rid of the start variable.
...
a few cycles faster
Originally committed as revision 21494 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 01:31:06 +00:00
Michael Niedermayer
980bcc554d
Unroll main loop so the edge==0 case is seperate.
...
This allows many things to be simplified away.
h264 decoder is overall 1% faster with a mbaff sample and
0.1% slower with the cathedral sample, probably because the slow loop
filter code must be loaded into the code cache for each first MB of each
row but isnt used for the following MBs.
Originally committed as revision 21493 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-28 01:24:25 +00:00
Michael Niedermayer
8670f84cf9
Update comment.
...
Originally committed as revision 21479 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-27 13:18:08 +00:00
Michael Niedermayer
e470ef7641
Use table to speedup access to non_zero_count in MBAFF with differing interlacing.
...
~4 cpu cycles speedup
Originally committed as revision 21474 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-27 11:14:29 +00:00
Michael Niedermayer
16e5e39ab4
Optimize loop filtering of the left edge in MBAFF.
...
60 cpu cycles speedup
Originally committed as revision 21467 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-26 22:59:19 +00:00
Michael Niedermayer
6548c939ec
remove unneeded check
...
Originally committed as revision 21460 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-26 15:34:21 +00:00
Michael Niedermayer
18ea2f933c
Use left_mb_xy from fill_caches instead of recalculating it.
...
Originally committed as revision 21459 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-26 14:57:53 +00:00
Michael Niedermayer
d5c30c86d0
Simplify loop filter a little by using top/left_type.
...
Originally committed as revision 21457 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-26 13:39:26 +00:00
Michael Niedermayer
50eb40a799
Remove all uses of slice_type* from the loop filter, also remove its
...
initialization befre the loop filter.
Originally committed as revision 21416 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-24 13:20:17 +00:00
Michael Niedermayer
0c32e19d58
Move +52 from the loop filter to the alpha/beta offsets in the context.
...
This should fix a segfault, also it might be faster on systems where the
+52 wasnt free.
Originally committed as revision 21406 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-23 18:05:30 +00:00
Michael Niedermayer
1cc2d21175
Set edges based on cbp and mv partitioning, not just skiped MBs.
...
This is faster for videos that have lots of MBs that fall in this category.
Originally committed as revision 21400 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-23 15:28:34 +00:00
Michael Niedermayer
6b3661b22d
Optimize filter_mb_mbaff_edge*()
...
Originally committed as revision 21397 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-23 14:50:56 +00:00
Michael Niedermayer
933bea77e5
Optmize 8x8dct check used to skip some borders in the loop filter.
...
4 cpu cycles faster.
Originally committed as revision 21396 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-23 13:54:02 +00:00
Måns Rullgård
c67278098d
Move array specifiers outside DECLARE_ALIGNED() invocations
...
Originally committed as revision 21377 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-22 03:25:11 +00:00
Michael Niedermayer
258b60c224
Gcc idiocy fixes related to filter_mb_edge*.
...
Change order of operands as gcc uses a hardcoded register per operand it seems
even for static functions
thus reducing unneeded moved (now functions try to pass the same argument in
the same spot).
Change signed int to unsigned int for array indexes as signed requires signed
extension while unsigned is free.
move the +52 up and merge it where it will end as a lea instruction, gcc always
splits the 52 out there turning the free +52 into an expensive one otherwise.
The changed code becomes a little faster.
Originally committed as revision 21375 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-22 01:59:17 +00:00
Michael Niedermayer
31f6e3c19e
Make calculation of mask_edge free of branches, faster of course but probably
...
little effect overall as this is not that often executed.
Originally committed as revision 21366 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-21 16:50:31 +00:00
Alexander Strange
bec358d683
H.264: Declare bS with DECLARE_ALIGNED_8 for uint64_t casts.
...
Originally committed as revision 21345 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-20 03:28:57 +00:00
Michael Niedermayer
97775235ec
Simplify/Optimize another of the mbaff loop filter cases.
...
Its faster but too rarely used to make a differnce.
Originally committed as revision 21344 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-20 03:00:08 +00:00
Michael Niedermayer
085d9d98e8
Only calculate the second chroma qp if it differs from the firstin the main
...
loop filter. (a little faster for the common case where they are equal)
Originally committed as revision 21342 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-20 01:49:24 +00:00
Michael Niedermayer
948180e7b1
Set bS with 64bits at a time.
...
Originally committed as revision 21341 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-20 01:38:32 +00:00
Michael Niedermayer
87df989ee3
Merge multiple IS_* macro uses where possible.
...
Originally committed as revision 21340 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-20 01:15:30 +00:00
Michael Niedermayer
55c54371c4
Simplify and optimize intra code in h264_loopfilter.c
...
Originally committed as revision 21339 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-20 00:44:03 +00:00
Michael Niedermayer
9528ce7b99
Sightly simplify initialization of int start.
...
No real speed change.
Originally committed as revision 21336 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-20 00:17:16 +00:00
Michael Niedermayer
655a1d57d5
Reenable ff_h264_filter_mb_fast() for all slices it supported before.
...
Originally committed as revision 21328 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-19 16:43:57 +00:00
Michael Niedermayer
2b3649f656
Fix compilation with -O0.
...
Originally committed as revision 21308 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 23:41:12 +00:00
Michael Niedermayer
bffe82f504
Rather call filter_mb_mbaff_edge*v() more often than do extra calculations
...
in the innerst loop. ~150 cpu cycles faster
Originally committed as revision 21299 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 21:22:09 +00:00
Michael Niedermayer
0fe674cb4a
Use h->slice_num where possible.
...
Originally committed as revision 21292 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 20:13:53 +00:00
Michael Niedermayer
bce6a1e7c7
Enable filter_mb_fast for CAVLC P slices.
...
Originally committed as revision 21291 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 19:45:56 +00:00
Michael Niedermayer
42ebca8551
PAFF CABAC P slices seem to work as well, so enable them for ff_h264_filter_mb_fast() too.
...
Originally committed as revision 21289 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 16:29:16 +00:00
Michael Niedermayer
a8f4921595
Reenable filter_mb_fast for I slices and progressive CABAC P slices.
...
Originally committed as revision 21288 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 16:16:22 +00:00
Michael Niedermayer
b6ef858ec7
Move CAVLC 8x8 DCT special case from ff_h264_filter_mb() to fill_caches
...
that way it is also available for ff_h264_filter_mb_fast().
Originally committed as revision 21283 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 13:09:53 +00:00
Michael Niedermayer
6d7e6b2657
Perform reference remapping at fill_cache() time instead of in the
...
loop filter. This removes one obstacle of getting ff_h264_filter_mb_fast()
bitexact. code is maybe 0.1% faster
Originally committed as revision 21280 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 05:15:31 +00:00
Michael Niedermayer
44a5e7b64c
Move the qp check to skip the loop filter up.
...
Originally committed as revision 21274 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 00:20:44 +00:00
Michael Niedermayer
b6303e6d2a
Reorganize how values are stored in h->non_zero_count.
...
~1% faster
Originally committed as revision 21273 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-17 23:44:23 +00:00
Michael Niedermayer
c988f97566
Rearchitecturing the stiched up goose part 1
...
Run loop filter per row instead of per MB, this also should make it
much easier to switch to per frame filtering and also doing so in a
seperate thread in the future if some volunteer wants to try.
Overall decoding speedup of 1.7% (single thread on pentium dual / cathedral sample)
This change also allows some optimizations to be tried that would not have
been possible before.
Originally committed as revision 21270 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-17 20:35:55 +00:00