How is it possible to improve so much performance? I don't think there is a lot of way to read a file and transfer it to a socket (perhaps I'm wrong).
On windows there are several ways of reading from handle; synchronous (ReadFile()), asynchronous , ReadFileEx(), overlapped ReadFile() using events, overlapped ReadFile() using io completion ports.. not to mention options that are available for sockets). New core uses the most advanced method available; single completion port, several simultanous reads/writes per socket/file, and several threads polling for the completion port. It makes it possible, for several cpus to process notifications that arrive in one completion port. I can't give latest functional sources for you to see, but what I can give is:
http://www.ioftpd.com/~darkone/tmp/newcore.txt ... that's the first implementation of async processing line that I did. (latest version is much better, and in most cases it can process the operations inside thread safe areas within one quantum)
What hardware do you use to get transfer so fast (MB, CPU, Hard disk, raid controller, RAM, network cards, ...)?
2x 2.66Ghz Xeon, 2048Mb (Dual channel), cached file, local transfer (no NIC involved, which causes worse results that you'd get with a transfer through NIC)
What windows version do you recommand to get best speed?
Any version of windows is ok, just make sure you have you set windows to use memory for system cache rather than using it for programs. Server editions of windows have longer socket queue, which makes it better suitable for server use. However, io rarely needs listen queue - as it can accept up to 10connections simultanously - 10AcceptEx() requests are pending on each service socket. But in case you're going to run it as httpd server, which gets lots of short lived connections - you might hit the wall using XP or 2000 Professional.
ld you explain how to code such function or do you have a website that explain it:
It's standard function to pointer declaration. In some cases it's slower: if function can be inlined, and there's only few operators in array. However ie. php source code could be optimized this way, I remember seeing switch() operators of like 200 members there.... you can just imagine how long it takes to get to the last operator.