ioFTPD General New releases, comments, questions regarding the latest version of ioFTPD. |
04-22-2010, 03:34 PM
|
#31
|
Senior Member
Join Date: May 2007
Posts: 692
|
Yil: 10min is to long, should be MAX 2min.
(22:34:46) [2] 550-Init: 1, Aborted: 0, Socket: 1456, LastError: 0, Size: 0, Pos: 2267753480192
it doesn't timeout transfers or it takes 10min and since there is no way for the client to know if something is actually transfering, it makes it impossible for the client to know that it should abor the transfer.
__________________
ioNiNJA
|
|
|
04-22-2010, 03:34 PM
|
#32
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
This was win2k3? There isn't a really easy way on that OS. Vista+ and you can create a dump directly from task manager. You should however be able to install windbg and/or Visual Studio Express (2010 edition might be out), both are free, and then you can attach the debugger to the process and create a dump at any time with the process in any state.
Windbg used to be downloadable via the "Debugging Tools for Windows" package, but now you need to get the Windows Driver Kit from MS (Link). It's like 700MB but you can just install the Debugging Tools for Windows component without needing any of the other stuff.
Launch WinDbg
Click File->Attach to a Process
select ioFTPD from the list of running processes
.dump /ma c:\ioFTPD.dmp
zip or rar and upload to me
|
|
|
04-22-2010, 03:51 PM
|
#33
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
o_dog: Yup, I'd agree. I didn't pick the timeouts, I just remembered it was long... 2 minute means > 1/2 KB/sec average when filling up 64K socket receive buffer, or a good internet spike and a couple re-transmissions at the max delay (30 sec or so) trying to get data through a spike in bandwidth.
|
|
|
04-22-2010, 04:16 PM
|
#34
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
It just hit me... You're using nxMyDB right pion? I'm beginning to wonder if the MySQL library might be a factor. It must create sockets to talk to the remote database, and if they aren't being marked as non-inheritable it could trigger some sort of problem. Remember I fixed the TCL library and all of ioFTPD to never spawn child processes while inheritable sockets are being created. It's a SERIOUS mis-feature that Microsoft decided that all sockets by default should be inherited. I would have guessed just one connection was created by mysql's library and maintained as long as possible, but it's definitely something to look into.
I think I'll try and write a simple program that will complain if it finds it inherited handles to files/sockets/etc and we can try running that via a new custom command line in the .ini file... I can tell you that I was able to prevent ioFTPD from accepting new connections before I fixed it and TCL and this is starting to look similar. It's just not that common for applications to spawn thousands of child processes like we do...
|
|
|
04-23-2010, 12:58 PM
|
#35
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
I looked into the source for MySQL and it just uses the very simple socket() call and I can find no reference to making that socket/handle un-inheritable anywhere.
I can understand why ioFTPD sockets that are setup to use I/O completion ports might have given us problems when leaked to child processes, but I can't really see why the same is true for a simple socket, but my guess is this is the problem. It's the same behavior I saw before.
I'll look into building mysql from sources, and then modify the client library piece it so we can get it to use the ioFTPD lock for process/socket creation and see if that makes a difference...
|
|
|
04-23-2010, 09:34 PM
|
#36
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
* EXPERIMENTAL v7.4.5 - not for general use yet *
Pion: Here's a new version of ioFTPD, a customized nxMyDB, and a modified libmysql. The nxMyDB is based upon his v2.1 release neoxed was preparing in response to the changes in ioFTPD v7.2's exported functions. It has a couple of new features and requires that you slightly change you configuration file since it supports multiple database servers. It's a simple change though. Details are in the readme/changes file. The database schema/data are all the same though so nothing to do there.
Make sure you update the system/libmysql.dll file when you update the modules/nxmydb.dll file after upgrading to v7.4.5.
Link: ioFTPD-v7.4.5-exe-only.zip
Link: nxMyDB-v2.1.1-custom.zip
|
|
|
04-23-2010, 11:07 PM
|
#37
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
Hmm, I just noticed that I may have forgotten to save the last change to the libmysql.dll file and the one I put in the nxMyDB might not actually do what it says. It will work fine, but may not mark the socket as non-inheritable... DOH!
Use this one instead:
Link: libMySQL-v5.1.46-ioFTPD.zip
This btw, just adds a user definable connection option or two. Thus it is fully compatible with any application. nxMyDB was modified to set these options.
|
|
|
04-24-2010, 09:39 AM
|
#38
|
Senior Member
Join Date: Feb 2006
Posts: 138
|
I am using nxmydb 2.1.0 yeah. I will fire up windbg on 7.4.5, and your customized nxmydb 2.1.1 and produce some dumps for you. Hopefully they'll be helpful
|
|
|
04-24-2010, 04:18 PM
|
#39
|
Senior Member
Join Date: Feb 2006
Posts: 138
|
It locked up with io745 aswell, but produced a dump now using your description of windbg.
|
|
|
04-25-2010, 02:16 AM
|
#40
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
pion: This having a dumpfile to debug sure makes things I bit easier.
First off, the "main" ioFTPD thread doesn't really do anything in the server. It just sits around waiting on an event to tell it to exit because the server shut down. The latest versions now occasionally check to make sure another thread is updating a variable to catch the loader lock being compromised. So basically it does nothing...
However, in the dump I'm looking at the main thread's stack trace appears to be running code from the libmysql.dll library and it is stuck waiting for data from the database. The stack trace isn't complete, or is corrupted, so I probably need to rebuild everything with optimizing turned off instead of just generating debug symbols with optimized code. I've found that turning off the optimizer and turning on stack corruption checks often helps recover stack frames so I can see how the function is getting called...
Anyway, without a full stack trace I can't say how the main thread is doing this since it's in a loop using WaitForSingleObject and not WaitForSingleObjectEx so it isn't in an alertable state so it shouldn't be running any type of async procedure call or anything... So I guess something is trashing the stack of the main thread?
pion: This is running as a normal process and not as a real service right? Firedaemon will run it as a normal process I believe. I didn't see the service specific thread running like I would as a real service.
On the other hand, if we take the 7 procedures on the call stack that look valid (all MS except the last) at face value it looks like the yaSSL encryption package in MySQL's library is waiting for input on a socket and I guess it's stuck. It's supposed to be a non-blocking socket, but well, I guess it isn't or something... Or perhaps it's running in a loop and this is just where it was at the time of the snapshot.
* Can people using nxMyDB post what type of configuration they are using here? In particular are you using encrypted communications and is it via OpenSSL or via yaSSL? And what version or nxMyDB are you using?
pion: you mentioned you were running v2.1 already? I don't think that was released, and perhaps it's got a bug if we hear back that other people using 2.0 don't have problems. Seems unlikely, but you never know...
I noticed one other odd thing. For some reason the secur32.dll library (that MS encryption library I thought I just ditched) was loaded. Not sure who the culprit loading it is, but I don't remember seeing it locally after the switch but perhaps I missed something. Could be openSSL and/or libmysql's yaSSL load it. They might not need it to encrypt data but to use it's really good random number generator or something. Not a big deal, but I'm curious now...
pion: I'll see what I can do about getting another test version put together for you at some point so we can maybe learn more. Now that you have windbg installed we can also try Application Verifier as it uses windbg to generate stack traces when it notices something is wrong. Really useful and might tell us something. It's a simple download from MS, just configure it to watch ioFTPD.exe and turn on all the options...
Here are some "workaround" ideas:
Try switching to OpenSSL encryption in the nxMyDB configuration. If we believe the dump (send more please, let's see if it always shows up this way!) then switching to OpenSSL from yaSSL might get around an issue in the libmysql.dll library.
Also, try running without encryption (and over shared memory if possible to avoid the socket code altogether) on the machine hosting the database to see what that does...
Of course, it's also likely that not using nxMyDB might make it stable. The problem might not even be in nxMyDB but in libmysql and getting rid of both might help. Of course it sucks loosing it's features, but it might be worth a try to take one site off it, if that's possible, and see what happens if none of the above ideas work... At least it would help me a lot if I knew it wasn't involved in the lockups you are seeing.
* I'd also really like to here from other people running v7.4.3 and how it's working for them. Especially people who had problems with the lockup bug.
|
|
|
04-25-2010, 02:26 AM
|
#41
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
pion: Just for the record, the debugger claims that no locks are stuck. Thus whatever is going on is not the classic lockup bug such as other people have encountered. Not sure why you would have trouble issuing "site crashnow" though...
|
|
|
04-25-2010, 05:11 AM
|
#42
|
Senior Member
Join Date: Feb 2006
Posts: 138
|
Not sure what to call it, if not the lockup bug :P The same behaviour was present for io running previous nxmydb versions also, not just 2.1. However it is possible that it became more frequent with io7 and nxmydb 2.
I am using FireDaemon to execute it, running under the SYSTEM user account.
When I up Application Verifier, I was unable to get io starter proper. It never spawned ioFTPD-Watch.exe even.
I turned off the nxmydb encryption in config now, to see if it still locks up.
*Edit: I am now unable to attach windbg to ioftpd process. io locks up/hangs right away when I hit attach process. I just tested it on another box to be sure I did excactly as before, but without much success. I sent you a couple of dumps from the crashed io, from after I attached the process..
I also noticed something in debug.log from io 7.4.3:
04-24-2010 04:07:28 error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
04-24-2010 04:17:46 error:140773F2:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert unexpected message
04-25-2010 16:29:07 error:140773F2:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert unexpected message
04-25-2010 16:29:08 error:140773F2:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert unexpected message
04-25-2010 16:29:10 error:140773F2:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert unexpected message
in addition to
04-25-2010 16:29:07 AsyncSelectCancel flags: 0
On a side note, do you have some input on my preloading questions?
Last edited by pion; 04-25-2010 at 10:53 AM.
|
|
|
04-25-2010, 02:13 PM
|
#43
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
pion: I think you can ignore the debug.log messages. At the moment it looks like I goofed the test for when to output the AsyncSelectCancel line so I'm printing the un-interesting case. Doh!
The SSL routines complaining about SERVER_HELLO errors are almost certainly caused by transfers where someone didn't enable SSL so it can't perform the handshake right after connect. This is just the type of information I was looking for, but you can safely ignore anything in the GET_SERVER_HELLO category right now.
Regarding the slow start problem. I found 2 bugs.
I said it would process the VFS= file, if defined, and then do the Default.vfs file. It does that, but instead of processing the individually specified directories (i.e. 1 = /dir) only once, it does it for both. The second time should be fast since it's likely in cache, but it's extra work and if stuff gets pushed out of the cache it's a lot of extra work.
Remember when I mentioned that enabling DEBUG=True would cause the server to wait until finished, but it would use lots of threads to work in parallel so it would be faster? That is true only for the first time it's executed. That is correct behavior, BUT with the above bug the 2nd time is for Default.vfs if VFS= was defined...
For the moment try Delay=TRUE, don't define VFS=, and don't define any individual dirs with a depth of 1 already defined in the default VFS file as they should already be done. My guess is the only thing you really want is Delay=TRUE set and loading times should go down.
When the server comes up, login and use "site dircache". What you want to look for is how many buckets have the same size as the max size which I think defaults to 100. If you see 10% or so them at 100 then too much stuff is being forced out of the cache during preloading and we can either up the size of each bucket, or the number of buckets to distribute dirs over. I added this command for just this type of tuning. You can post the output of this command here and I'll examine it for you if you want. I'd be interested in it anyway.
Long term I think I might have an idea to help everyone. I think it's possible to write the current directory cache to disk during shutdown and read it back in on the next startup. Thus whatever was in cache and/or popular before will be cached now. Preloading would then work against an already loaded cache and just fill in the blanks. This isn't a simple change, but would be really useful for very large sites. The next change would be to use that same code, but to limit it to just write out a single dir whenever some threshold is past. So a directory like a /XVID dir would have a .iofTPD.cache file or something in it which could be loaded and provided the timestamps match it wouldn't have to examine all the subdirs themselves.
Both of those changes would actually create more work for the server, but it would be spread out and the impact wouldn't be felt by the user so much...
|
|
|
04-25-2010, 02:44 PM
|
#44
|
Senior Member
FlashFXP Beta Tester ioFTPD Foundation User
Join Date: Dec 2001
Posts: 306
|
.iofTPD.cache in every release dir?
Ouchh.... can that file be stored in the /logs dir or somethin. Mainly maybe is a security reason. See, thare are some people that use FTP and e.g Torrent or DC to share stuff. And when they mount those release dirs. All .iofTPD.cache will follow aswell.,,, Um...you know what i mean?.... yea, maybe im thinking to much. Im aware of .ioFTPD.message file aswell.
|
|
|
04-25-2010, 02:56 PM
|
#45
|
Too much time...
FlashFXP Beta Tester ioFTPD Administrator
Join Date: May 2005
Posts: 1,194
|
pion: I walked away and then realized I forgot to mention 2 things in my last post. Attaching a debugger to a running process can cause it to crash/fail. I haven't had this happen to me, but I read that it can. It's also possible that Application Verifier and you attaching windbg manually are not a good combo since AV uses windbg underneath to report the stack trace on errors found. Thus your recent problems trying to attach to the process might be solved if you disable AV. However, I'd let AV run and only attach windbg after it's already locked up... If you do want to run it under windbg all the time it's better to use the "Open executable" feature and then hit F5. Starting under the debugger instead of attaching to a running process is safer.
For the record however, the stuck process was in libmysql at the time it got stuck. The main thread was fine this time, but 2 worker threads only gave me partial stack traces but both were waiting for reads from the database. It looks like this was in a different read function but I presume this is from a process after encryption was turned off so it would be different. You might be able to enable that again...
I'm going to have to get to the bottom of why libmysql is having issues, especially in your custom version, but it is...
|
|
|
Thread Tools |
|
Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 07:32 PM.
|