Rework it to improve performance. Now mutex is not shared by workers,
instead each worker has its own mutex and condition variable. This
reduces lock contention between workers. Also use atomic variable for
counter.
The interface also allows execute to run special function on main
thread, requested by Ronald.
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>