PDA

View Full Version : 1.0 internals


darkone
07-21-2005, 11:09 PM
This time I felt too lazy to draw example of command flow using ms paint (I wonder if anyone actually understud any of my earlier drawings)... so instead I decided to write ascii version :)

There are two common approaches to create application that handles large number of simultanous tasks equally.

First, the easy one, is to create thread for each task and let the OS handle load balancing between the tasks.

Second, the hard one, is to split large task into several smaller tasks, and execute these smaller tasks one by one using pool of threads (or in parallel when possible). Next task is usually queued using First In First Out logic at the end of current task. Once thread completes one task, it begins to process next queued task - if any. Because new tasks are placed to end of queue, application is able to process any number of large tasks simultanously by processing these smaller tasks without using excessive amounts of threads.


Example of splitting large task (from 1.0):
--------------------------------+-------------------------------+---------------+
Task | Executing thread type | Thread switch |
--------------------------------+-------------------------------+---------------+
Start service | Worker thread | |
Create listen queue | Transfer processing thread | Yes |
Accept new connection | Transfer thread | |
Process new connection | Transfer processing thread | Yes |
Create new listen queue item | Transfer processing thread | |
Call common service callback | Transfer processing thread | |
Hostname lookup | Worker thread | Yes |
Call service specific callback | Worker thread | Yes |
... | ?? | |
--------------------------------+-------------------------------+---------------+

?) On thread switch, task is sent from one thread to another.

Thread types in 1.0:
--------------------------------+---------------------------------------+
Thread type | Thread count |
--------------------------------+---------------------------------------+
Worker thread | Logical CPU count * 2 + N (!1) |
Transfer thread | Physical CPU count * 2 (!2) |
Transfer scheduler thread | 1 |
Transfer processing thread | Physical CPU count |
Timer thread | 1 |
Window message thread | 1 |
--------------------------------+---------------------------------------+


!1) 'Worker thread' pool may grow temporarily, if all threads are in use and one ore more
threads is performing long blocking operation (THREAD_FLAG_BLOCKING is set).

!2) Half of 'transfer threads' may switch type on the fly to 'transfer processing thread'.
'Transfer threads' should not perform blocking or CPU intensive operations, instead they
should queue such operations to 'transfer processing threads'.
'Transfer processing threads' revert back to 'transfer threads', when there are no
more queued operations.


Boring.. hope I can get some sleep after writing this :confused:

FTPServerTools
07-22-2005, 04:03 AM
I assume some of those are single threads and others can be multiple available. Otherwise the threadswitching slows stuff down. And yes you only have 1088 tls spaces.

darkone
07-22-2005, 06:22 AM
I assume some of those are single threads and others can be multiple available. Otherwise the threadswitching slows stuff down. And yes you only have 1088 tls spaces.

Thread counts of different types of thread pools were listed, but generally there is more than one thread of each type. Thread switching does indeed cost something (a lot of CPU time), which is why thread pools are used in the first place.


Let's compare:

With thread per client model X clients performing example task simultanously would use resources for following:
- X CreateThread()s
- More threads equals to higher chance for cache miss.
- Thread specific memory pooling does not make sense, memory allocations are costly
- OS thread switches between X active threads
- X ExitThread()s

With thread pool of size Y and client count of X:
- No CreateThread()s
- Thread specific memory pools, memory allocation may come for free.
- Manual thread switches between tasks 4 * X
- OS thread switches between Y active threads
- No ExitThread()s

Each time new thread is created, existing threads perform less work during one quantum. And if thread can't perform it's task during one quantum, OS thread switch occurs.

If task takes exactly one quantum to execute from single thread, X threads executing same task simultanously on single processor would require each X quantums - X * X thread switches.

- X * X thread switches to execute X tasks, one in each thread

When executed in thread pool size of Y, X tasks divided in 4parts would require Y * Y + (4 * X) thread switches. If thread pool size is greater than number of simultanous tasks, it takes X * X + (4 * X) thread switches to execute task.

- (2 * 2) + (4 * X) thread switches to execute X tasks divided in 4 parts, in thread pool of 2threads.

Comparison:
X^2 > 4 + 4X
X^2 - 4X > 4

Conclusion:
Thread pool size of 2 threads uses less resources when there are 5 or more simultanous tasks executing. Note that benefites of memory and other thread specific pools, greater chance for cache hits, and resources saved by not creating/deleting threads are not even taken in account.