oldhouse: Can you confirm the server stopped and started again in the logfile?
There are situations which might require a TCP port to be held past the exit of the application which created it. TIME_WAIT, etc may prevent a quickly restarted application from listening on the same port immediately. I can't say that I've seen this, but it could be possible. The next time the server reconfigs itself (the &ConfigUpdate scheduler event which runs every 10 minutes or so) should fix everything if this was the problem...
Having said that, I have a w2k3 server running the debug version of the code that appears to lockup every few days. It's incredibly difficult to diagnose since the application appears to somehow be grabbing the Dll Loader Lock in such a way as to deadlock. Since only the windows API internally manipulates this lock I fail to see why this should be happening. The reason I mention this is because the lock is VERY low level. I can execute a site crashnow command and the application can't exit because the exit code can't acquire the lock ! The only way to kill it is via the task manager. This is why I asked you to verify the logfile entries to show that the previous server exited and the new one started and that the task manager confirms
this.
For the record the first sign of the deadlock appearing is when the ident routines start failing and the server starts rejecting logins with user@... hostmasks. Other users can login and sometimes do things like download but anything that results in a script triggering tends to lockup. Once the worker/io threads start locking up it eventually just freezes.
Of course I have numerous snapshots of the poor process but since none of it involves ioFTPD code and I don't have source for the libraries it's difficult to see how or why the situation is occurring.
I've finished the new debug/minidump code and it appears to be working well but I've yet to test it with Vista. So now to finish up the rest of my todo list...
|