View Single Post
Old 04-25-2010, 02:16 AM  
Yil
Too much time...
 
Join Date: May 2005
Posts: 1,194
Default

pion: This having a dumpfile to debug sure makes things I bit easier.

First off, the "main" ioFTPD thread doesn't really do anything in the server. It just sits around waiting on an event to tell it to exit because the server shut down. The latest versions now occasionally check to make sure another thread is updating a variable to catch the loader lock being compromised. So basically it does nothing...

However, in the dump I'm looking at the main thread's stack trace appears to be running code from the libmysql.dll library and it is stuck waiting for data from the database. The stack trace isn't complete, or is corrupted, so I probably need to rebuild everything with optimizing turned off instead of just generating debug symbols with optimized code. I've found that turning off the optimizer and turning on stack corruption checks often helps recover stack frames so I can see how the function is getting called...

Anyway, without a full stack trace I can't say how the main thread is doing this since it's in a loop using WaitForSingleObject and not WaitForSingleObjectEx so it isn't in an alertable state so it shouldn't be running any type of async procedure call or anything... So I guess something is trashing the stack of the main thread?

pion: This is running as a normal process and not as a real service right? Firedaemon will run it as a normal process I believe. I didn't see the service specific thread running like I would as a real service.

On the other hand, if we take the 7 procedures on the call stack that look valid (all MS except the last) at face value it looks like the yaSSL encryption package in MySQL's library is waiting for input on a socket and I guess it's stuck. It's supposed to be a non-blocking socket, but well, I guess it isn't or something... Or perhaps it's running in a loop and this is just where it was at the time of the snapshot.

* Can people using nxMyDB post what type of configuration they are using here? In particular are you using encrypted communications and is it via OpenSSL or via yaSSL? And what version or nxMyDB are you using?

pion: you mentioned you were running v2.1 already? I don't think that was released, and perhaps it's got a bug if we hear back that other people using 2.0 don't have problems. Seems unlikely, but you never know...

I noticed one other odd thing. For some reason the secur32.dll library (that MS encryption library I thought I just ditched) was loaded. Not sure who the culprit loading it is, but I don't remember seeing it locally after the switch but perhaps I missed something. Could be openSSL and/or libmysql's yaSSL load it. They might not need it to encrypt data but to use it's really good random number generator or something. Not a big deal, but I'm curious now...

pion: I'll see what I can do about getting another test version put together for you at some point so we can maybe learn more. Now that you have windbg installed we can also try Application Verifier as it uses windbg to generate stack traces when it notices something is wrong. Really useful and might tell us something. It's a simple download from MS, just configure it to watch ioFTPD.exe and turn on all the options...

Here are some "workaround" ideas:

Try switching to OpenSSL encryption in the nxMyDB configuration. If we believe the dump (send more please, let's see if it always shows up this way!) then switching to OpenSSL from yaSSL might get around an issue in the libmysql.dll library.

Also, try running without encryption (and over shared memory if possible to avoid the socket code altogether) on the machine hosting the database to see what that does...

Of course, it's also likely that not using nxMyDB might make it stable. The problem might not even be in nxMyDB but in libmysql and getting rid of both might help. Of course it sucks loosing it's features, but it might be worth a try to take one site off it, if that's possible, and see what happens if none of the above ideas work... At least it would help me a lot if I knew it wasn't involved in the lockups you are seeing.


* I'd also really like to here from other people running v7.4.3 and how it's working for them. Especially people who had problems with the lockup bug.
Yil is offline   Reply With Quote