Whew, I had some serious error in my algortih (something one couldn't even consider as a bug :~) It seems, I was able to resolve it without noticable effect in performance... now each handle uses four locks to guard counters & buffer queues. (and that's a lot... 16bytes of mem
)
I did some performance tests with single thread.. and so far, results are far beyond my expecetations:
On 10mb/sec transfer over network results are following:
Wait: 9.48972774seconds Work: 0.43205802seconds
Which leads to theoretical maximum output of: 10mb/sec * 9.48972774secs / 0.43205802seconds = ~210mb/sec on single CPU system... assuming that after bandwidth limits, timeouts, etc. I get even half of the theoretical maximum, it should be breaking the 200mb/sec barrier that I set for my computer.