View Single Post
Old 08-09-2010, 01:23 PM  
Yil
Too much time...
 
Join Date: May 2005
Posts: 1,194
Default

opcode: That's sort of interesting. The "classic" lockup bug where the loader lock gets compromised should be easily detected now and had that been the problem the server would report that and try to exit. Unless Restart_On_Deadlock was true it couldn't actually exit/terminate because it's so badly locked up though. So if you didn't enable the Restart_On_Deadlock feature and the process exited after deciding things were broken it isn't the "classic" lockup bug - which the logfiles would seem to support.

In that case, you have an example of the connection lockup bug which sorta looks the same but is far less deadly. I haven't see too many of these because there wasn't a good way to tell them apart before but for some reason winsock isn't happy and things stop working correctly but the "process" isn't corrupted so it can exit if it wants to.

Regarding "site who" (I think you meant "site swho"). You are correct that the server now connects to itself fairly often. Roughly every 30 seconds or so since that reduces the failure detection window to under 2 minutes since it must fail 3 times in a row. However, these connections should be immediately recognized and are thus never considered clients so they don't show up anywhere. It's possible under some serious lag/swap situation (more then 30 seconds without letting ioFTPD any CPU time) that one could slip through, but 3 in a row seems hard to do unless you had a runaway process on a single processor machine or something...

More than likely you just had a regular user trying to connect and because the server was hosed they couldn't actually log in or get disconnected... Since they got stuck no login or failure error messages can be generated so it's not surprising you didn't see anything in the logfile, however the login timer is set to like 15 seconds by default so it's really surprising to me to see client login entries with 20+ minute idle times. That must mean the server couldn't close the connection which isn't surprising, but I hadn't had confirmation of before...

Can you confirm the 2 releases where you did and did not see this problem? The 7.5.8 to 7.5.9 didn't change anything except the auto-ban logic... The 7.5.8 release relaxed the requirement for shutting down from 1 "offline" (a failed local connection attempt) service to all services marked as "failed" (3-in-a-row failed local connection attempts). This should mean that 7.5.8 is far more forgiving than 7.5.0-7.5.7 where this feature was first introduced.

Prior to upgrading had you been having lockup issues and just restarting it manually? Any trouble logging in at all? If there are problems does file transfer still work fine or does it have issues as well? Any scripts/extensions in use?

Looking forward, we can try defining a 2nd local only service and see if that makes a difference since the server may take longer to decide that service is locked up giving a greater window to look at what is wrong...

The v7.6 async event update may fix this issue entirely. Or at least, that's the hope...

Update: You can try running this as a service since it is exiting and that way you get auto-restart. Should be far less impact if you really are having it exit every couple of hours. How many users on average?
Yil is offline   Reply With Quote