PDA

View Full Version : ioFTPD v7.3.3 Released (Beta)


Yil
03-26-2010, 08:27 PM
Changed to dynamic runtime linking just to see if that affects the lockup bug in any way.

If you get "This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem." then you need to install the VS2008 SP1 Runtime (http://www.microsoft.com/downloads/details.aspx?familyid=A5C84275-3B97-4AB7-A40D-3802B2AF5FC2&displaylang=en) from MS which is a free download.

Assuming the lockup bug isn't fix this version may be able to detect it and suicide. This is useful when run as a service since it can be configured to auto restart.

Restored nxMyDB compatibility which was broken in v7.1 but wasn't reported. I also fixed a bug that looks like it was causing all the crashes pion sent me, but only occurs when using nxMyDB.

Latest Version:
Link: ioFTPD-v7.3.3.zip (http://home.comcast.net/~yil/ioFTPD-v7.3.3.zip)

Yil
03-26-2010, 08:30 PM
v7.3.0 Release Notes:

IMPORTANT: The way ioFTPD.exe is linked has changed. If you get the following
popup error dialog: "This application has failed to start because
the application configuration is incorrect. Reinstalling the
application may fix this problem." Then you need to install the
Visual Studio 2008 SP1 runtime from MS, see #4 below!

1) Files in \System:
Changed : ioFTPD.[exe,pdb] - Version 7.3.0.0.
Changed : tcl85t.[dll,pdb] - Version 8.5.2.8 (dynamic CRT, dated 3/24/2010)

2) Files in \:
Deleted : README
Added : README.txt

3) Directories in \lib:
Replace entire reg1.2 directory.
Replace entire dde1.3 directory.


*** New Features

4) The TCL dlls and ioFTPD.exe are now linked against the VS2008 SP1 dynamic
CRT/runtime dlls. This means that they need to be installed on your
computer because they are not distributed via a simple file you can drop
into the directory and ioFTPD doesn't have a fancy installer... I think
Win7 and perhaps Vista have these installed by default, but XP I know does
not, although it is possible that another piece of software installed these
for you are part of their setup.

If you run ioFTPD.exe and see the follow message:

This application has failed to start because the application
configuration is incorrect. Reinstalling the application may fix this
problem.

then you don't have them installed. Goto:

http://www.microsoft.com/downloads/details.aspx?familyid=A5C84275-3B97-4AB7-A40D-3802B2AF5FC2&displaylang=en

to download vcredist_x86.exe for VS2008 SP1 which is a free download.

You may also see entries in the system event log in the System category
if running ioFTPD as a service or if you somehow aren't seeing the error
message popup. They are under the name 'SideBySide' and look like:

Dependent Assembly Microsoft.VC90.CRT could not be found and Last Error
was The referenced assembly is not installed on your system.

Resolve Partial Assembly failed for Microsoft.VC90.CRT. Reference error
message: The referenced assembly is not installed on your system.

Generate Activation Context failed for c:\ioFTPD\system\ioFTPD.exe...

5) Added a new thread that just updates a variable with the current time every
10 seconds after having called a simple function that internally uses the
loader lock. When the window's internal loader lock gets stuck this should
fail to return and within a minute the process will detect this in another
thread, output a log entry that looks like:
"System detected loader lock compromised! Terminating!"
and then try to terminate the process immediately / ungracefully.

6) When run as a service ioFTPD will now return an error code to the service
control manager if the server was unable to start because of an error
during initialization. This should result in a windows system event being
logged which can help alert you to the problem. The Invalid Function
error (#1) is used if the server doesn't have a more appropriate error
code to return. When run interactively the server will continue to just
pop up a dialog box with the error.

7) To restore compatibility with nxMyDB the exported functions Config_Get,
Config_Get_Int, Config_Get_Bool, Config_Get_Path have the old signature
and only work against the server's configuration file, usually ioFTPD.ini.
Also, I've recorded the list of functions nxMyDB looks up and will try to
keep them the same in the future.


*** Bug Fixes:

8) Fixed a bug where the server would crash if a valid user was logging in
and the userfile lock was unable to be acquired, or if in the process of
being acquired and the user was kicked (via site command or !login). The
later is nearly impossible as local locking cannot fail and is very quick,
however the former may be common when using 3rd party user modules such
as nxMyDB and the DB is offline / lagged.

9) Fixed a bug with hostnames forwarded via the IDNT commmand from BNCs.
If the BNC forwarded the IP address as the hostname the server would treat
it as a reverse resolved hostname and apply the host obscuring rule to it.
Now if the IDNT command sees the host parameter is all numbers it converts
it to an IP address and if it matches the actual IP address portion of the
the IDNT argument it doesn't set the hostname. That emulates the regular
meaning of the hostname field where it is only set upon successful reverse
resolving of the IP address, thus when the server needs to display a
hostname it will default to using the IP address in numeric form and apply
the IP obscuring rule to the result.

neoxed
03-26-2010, 10:49 PM
I suppose its a bit late, but I recently added support for the new (now old :p) ConfigReader API to nxMyDB (yesterday in fact).

http://code.google.com/p/nxscripts/source/detail?r=2546

Yil
03-26-2010, 11:15 PM
neoxed: Glad to see you around :) I'll leave the "original" routines in place until the next server change or you get ready for the next release of nxMyDB. Just give me a heads up and I'll switch to the new routines so people can upgrade then.

That link looks like you put the sources up, I didn't think they were there so I couldn't fix it myself. Thanks. In the future that will make it easier if I break something and there isn't an easy way around it.

Yil
03-26-2010, 11:23 PM
Oh, I forgot to mention. If you find you need/want info from the server, an easier way to do something, etc please ask. I'm happy to encourage anyone interested in coding scripts/addons!

Silly
03-27-2010, 02:22 AM
thank you yil. i am going to update my server right now...

monk-
03-27-2010, 06:01 AM
oke, running smooth now with nxmydb
lets wait for some traffic, to see if any lockup occurs

Flow
03-28-2010, 10:35 AM
.... updated :)

Silly
03-29-2010, 01:19 AM
seems this build for me ...do kill itself when met lockup...but never reboot again.....

i am uploading the minidump to ftp...

Yil
03-29-2010, 01:43 PM
Silly. Can you check the Error.log and see if you see the "System detected loader lock compromised! Terminating!" message? I'm assuming you saw that because you said it detected it. On the other hand it appears that trying to kill oneself via TerminateProcess while the loader lock is held and there are outstanding I/O requests doesn't work... I'll have to see about better ways to force an exit, perhaps even using another process to do it...

Yil
03-29-2010, 06:02 PM
I learned a thing or two about how to get the thread ID that holds a lock and using that examined some old minidumps of the lockup bug I had from years ago and discovered something weird. The thread holding the loader lock isn't an ioFTPD created thread (they are all accounted for) and it isn't in the minidump which I presume means it exited without releasing the lock and without calling the ioFTPD CrashLog code (the minidumps were created manually).

Since it isn't an ioFTPD thread it's harder to debug since I'm not sure how I would try to even catch it's creation/exiting since the normal uncaught exception handler that should be called clearly wasn't. I'm guessing it's a winsock thread of some sort, but for the moment it appears to make the most sense to switch over to openSSL as soon as possible to try to remove all that MS encryption library crap that might be causing problems and we don't want to use long term anyway...

P.S. Silly, others. The windbg !locks command is useful only against full minidumps. Tinydump's simply don't include the memory regions that extension scans to enumerate critical sections. So, I'll need the full size large dumps rar'd or zip'd please.

The only other thing I might look into is knowing the lockup bug is caused by a thread that exited without releasing the lock, I might just try to find a way to force the freeing of the lock somehow and then suicide...

anITee
03-31-2010, 01:54 AM
Thank you for sharing your expirence, and I'm gonna try it and find some bugges if I can.

Yil
04-02-2010, 03:42 PM
Two new issues.

monk-: Your dumps indicate I fixed one crash issue with nxMyDB only to expose a related one later in the cleanup function. It's easy to fix, and I'll do so in a bit along with looking into updating the open/lock module functions to print errors to the logfile so we can get a handle on what is failing where. Does nxMyDB have it's own error log that we could look at? This bug only affects sites using a shared user module.

Silly: I learned from your dump that the server was preloading directories during a server shutdown and didn't finish in time so it crashed when the thread didn't end. I'll add some code to test for graceful shutdowns and exit it cleanly. This doesn't affect anything since the FTP was shutting down but may create dumpfiles if preloading takes more than a minute or so.

Zer0Racer
04-05-2010, 12:57 PM
[19:49:22] [L] SITE SYMLINK /DIR/Test/ | Test-link
[19:49:24] [L] Connection lost: TEST (ioFTPD)ioFTPD crashes.

Also a zipscript (exe) that works fine in v6.9.3 and creates symlinks that work. In v7.3.0 (haven't tested older v7.x) the symlinks won't resolve, but when checking the virtual target of the symlink it seems ok:
CWD (incomplete)-Dirname
550 (incomplete)-Dirname: No such file or directory.
SITE CHATTR +l "(incomplete)-Dirname"
200-CHATTR: "/Some_section/Dirname"

Also, that same zipscript (again, that works just fine in v6.9.3) says all is ok but no files remain after upload. It creates the missing-files and progress bar and that non-working symlink but all uploaded files disappear. How much of the EXEC module has changed between given versions and what has to be done to make the script work again?

/ZR

monk-
04-05-2010, 06:17 PM
with 703 it still worked

same here on 730:

[L] SITE CHATTR +l "linktotest" "/MVID/test"
[L] 200 'CHATTR' Command successful.
[L] CWD linktotest
[L] 250 CWD command successful.
[L] PWD
[L] 257 "/linktotest" is current directory.

Yil
04-05-2010, 07:20 PM
zero: The exec module actually changed quite a bit between v6.9 and v7.0 because it started supporting keep-alive timeouts and that required a number of changes. I don't think the conditions for deleting a file after upload changed though. After the changes I fixed a few things and tested with ioZS and ioA/B which I believe both work fine now. Which zipscript is failing?

I'll take a look at the symlink crash though. Have you left Keep_Links_In_Paths enabled? For kicks, switch it off, rehash, re-login, and see if it makes a difference in resolving.

o_dog
04-05-2010, 08:19 PM
is it just me having trouble with "426 Data connection: Broken pipe." site just hangs in uploda and i have to hard abort then reupload, it doesn't time out as it should. Dunno what tha hell is going on or if it's my link

After the timeout it tries to transfer the rest of the files and I get:
550 Active transfer in progress, terminate transfer with ABOR before proceeding.
and it aborts the transfer.

whats going on?

Yil
04-05-2010, 11:06 PM
o_dog: Could you send me an FTP log starting from just before where it gets confused? I can see cases where maybe the client gets confused, but that 550 message means the server thinks it's still got a transfer going and that's kinda weird. It does mean you can use "site crashnow" to generate a dump I can debug if you can find a time where the server is otherwise idle.

I'm getting the next minor release with just a few simple things in it ready, but it won't be for a day or so.

Zer0Racer
04-06-2010, 06:16 AM
zero: The exec module actually changed quite a bit between v6.9 and v7.0 because it started supporting keep-alive timeouts and that required a number of changes. I don't think the conditions for deleting a file after upload changed though. After the changes I fixed a few things and tested with ioZS and ioA/B which I believe both work fine now. Which zipscript is failing?

I'll take a look at the symlink crash though. Have you left Keep_Links_In_Paths enabled? For kicks, switch it off, rehash, re-login, and see if it makes a difference in resolving.

It's a modified version of Project-ZS for ioFTPD. Exactly what changes are required?

Changing any of the EXEC related settings in ioFTPD.ini doesn't help in this case? I've tried but with no luck.

With Keep_Links_In_Paths = False it's still crashing when I try to create a symlink manually with site symlink. Using site chattr +l manually still works. No change for the valid link that the zipscript creates. Though I guess it could be a result of the incompatibility with v7.x.

/ZR

monk-
04-06-2010, 10:46 AM
Keep_Links_In_Paths = False did the trick for me, entering symlinks now show as they used to, with full path

i also have the problem that warchive only runs manualy:
[R] site warchive
[R] 200-- WarChive ------------------------------------------------------------------
[R] 200-Current free space: 0
[R] 200-Deleting: 0325
[R] 200-Printed putlogline
[R] 200--!putlog WARCWIPE: "0325" "43396.04" "/XXX/" "0.00" "43408.00"
[R] 200-Enough space free on device
[R] 200-Current free space: 330155
[R] 200-Enough space free on device
[R] 200-Current free space: 18404
[R] 200-Enough space free on device
[R] 200------------------------------------------------------------------------[1.4]-

but on schedule returns the error:
04-06-2010 17:11:53 Scheduler event 'warchive' returned error.

this is since 7.0.x, prolly also some EXEC module problem

Yil
04-06-2010, 01:13 PM
Zero: there shouldn't be any changes required. However, can you tell if the modified Project-ZS tries to use ! commands like !vfs:add, !putlog, etc. It's possible that it's using one of these that I didn't port correctly and that's causing the errors. If the zipscript is easy to install I'll do that and figure out the problem locally so just PM me about how to get it, or send it to me.

monk: Do you not get that scheduler event returned error message on v6.x releases? Did warchive actually do what it was supposed to when run? If so then it's usually because the process returned a non-zero exit code, and we'll have to figure out why... I can already see the -!putlog line from your example and the leading - I don't think belongs there. That shouldn't cause any issues but it probably isn't going to do anything...

Yil
04-06-2010, 02:22 PM
I forgot to zero out the Overlapped structure in the exec module before re-using it. So if the server didn't read the input all in at once it would sometimes generate an error. This might fix both issues. Working on the rest of the fixes for the next release should be done tonight or tomorrow.

Zer0Racer
04-06-2010, 03:01 PM
Yil, just sent you a PM. Maybe you won't have to try the zipscript if the next release is the solution. Will try it as soon as it will become available. Thanks.

Yil
04-06-2010, 04:23 PM
v7.3.1 Release Notes:

1) Files in \System:
Changed : ioFTPD.[exe,pdb] - Version 7.3.1.0.
Added : ioFTPD-Watch.exe - Version 1.0.0.0.
Changed : ioFTPD.ini - summary of changes by section...
[Threads] : Restart_On_Deadlock added.


*** New Features

2) New ioFTPD.ini option (Restart_On_Deadlock under [Threads]). If this
is enabled the server will start the child process ioFTPD-Watch.exe and
signal it when it thinks the server has locked up so it can terminate
the server since it cannot do it itself when the loader lock has been
compromised. When run as a service it can be configured to automatically
restart. (The ServiceInstaller.exe tool by default configures it to
auto-restart provided it hasn't crashed within the last 10 minutes).

3) If the server is preloading directories or running a recursive action
through a directory tree and the server begins shutting down the operation
will now abort.


*** Bug Fixes

4) Fixed a bug with resolving relative symbolic links (../foo) where the
last component of the symbolic directory path was being removed before
following the link when it shouldn't have been. The non-symbolic path
was be updated correctly though.

5) Fixed a bug in "site symlink" introduced when attempting to preserve the
alternate timestamp / upload speed fields added in v7.2 but referenced the
wrong fileinfo structure.

6) Fixed a bug in the EXEC module where I wasn't zeroing out the Overlapped
structure before re-using it. This would cause it to sometimes generate
an error.

7) Fixed a bug where an Event handle created in the EXEC module wasn't being
closed.

8) Fixed a bug in login code that would cause crashes during logout if the
userfile was unable to be locked during login. This can only happen if
using a shared user module and even then it shouldn't happen.

9) Fixed a potential race condition on some bits in a bit field used by the
data transfer code. In theory if a transfer is aborted in some situations
it's possible that 2 threads may be changing 2 different bits but because
these are bit fields require a read/modify/write sequence they aren't
atomic operations. I simply moved the 1 bit that can be set by more than
1 thread out of the bitfield.


*** Debugging Only Changes:

10) 2 new Error logfile entries to catch problems during logoff that should
no longer be possible with the bugfix above.
* Supposedly logged in user missing userfile!
* Client structure identifies user as logged in but no userfile: %d

11) New Error logfile entry if a userfile cannot be locked.

12) New debug logfile entry if the server attempts to cancel an internal
async select and it appears that it may have already had activity.

13) New debug logfile entry if the timeout function for connected sockets
shows a previous error. Not all errors recorded here are things to
worry about, some may be valid and will be excluded in later releases.


*** Internal non-visible changes:

14) Moved setting socket to non-lingering after disconnect from the socket
close routines to right after creation just in case that helps.

Yil
04-06-2010, 04:25 PM
Link to new version: See top of thread, a new version is out.

Let me know what's still broken.

Yil
04-06-2010, 06:04 PM
v7.3.2 Release Notes:

1) Files in \System:
Changed : ioFTPD.[exe,pdb] - Version 7.3.2.0.


*** Bug Fixes

2) Forgot to remove a line of code used when trying to get the server to
act like it locked up to see if I could get the window's service control
manager to restart it.

Yil
04-06-2010, 06:07 PM
Link to new version: ioFTPD-v7.3.2.zip (http://home.comcast.net/~yil/ioFTPD-v7.3.2.zip)

monk-
04-06-2010, 08:52 PM
what is Restart_On_Deadlock 's default option?
cos if this workarrounds the lockup bug, i say TRUE
anyway, nice release, im gonna test a bit, thx

Yil
04-06-2010, 10:57 PM
monk-: It defaults to false, you have to enable it.

pion
04-07-2010, 11:26 AM
I can confirm that the lockup bug is indeed still present in 7.3.2. The process didn't appear to implode, and io refused to accept connections

Zer0Racer
04-07-2010, 11:39 AM
In v7.3.2 "site symlink" works again.

No change for project-zs though. All files disappear even if the script exit with no error.

And the symlink created by the script is not resolved, even if site chattr +l "link" on the dir/link shows a valid target.

Yil
04-07-2010, 12:11 PM
pion: Can you check the logfile for the "System detected loader lock compromised! Terminating!" message? And just to double check, the .ini file has the new option enabled and you can see the ioFTPD-Watch.exe running when you start the server?

Yil
04-07-2010, 01:04 PM
Zero: I've tried a bunch of symlinks and everything appears to be working fine. I even found that relative symlink bug when I first looked at it. If you view the raw link target via "site chattr +l <dir>" like you did in your example can you CWD to that and have it work? Does restarting the server make the link work? If all that fails, send me the .ioFTPD file from the link directory and I'll check to see if it looks OK. I'm just not sure how the zipscript creates the link, i.e. via a !vfs style command which means the EXEC module, via the shared memory interface, or by manipulating the .ioFTPD file itself... If the last then it might be a stale cache issue which restarting would fix but wouldn't be an issue if the zipscript wasn't crashing and told the server to reload the dir's data... Just a thought...

Zer0Racer
04-07-2010, 02:17 PM
Yil: Restarting ioFTPD didn't work and I can't CWD the symlink/dir. My flashfxp just reports that the dir/file doesn't exist (since ioFTPD tries to CWD into the symlink dir).

I have sent the .ioFTPD file to you. I think the zs uses shared memory.

Zer0Racer
04-07-2010, 02:31 PM
Reverting to v6.9.3 (replacing ioFTPD.exe into the same config and adding php4ts.dll to /ioFTPD/system) and the zs works and no files disappear. Then I tried something as "old" as v7.0.3 and the same thing happens as in v7.3.2. So one of the changes from v7.0.0 or something brakes the compatibility with project-zs (at least). I haven't tried the zr-tools exes (zr-ban.exe etc.) yet.

Yil
04-07-2010, 02:54 PM
v7.3.3 Release Notes:

1) Files in \System:
Changed : ioFTPD.[exe,pdb] - Version 7.3.3.0.


*** Bug Fixes

2) Fixed a race condition during startup where the scheduler subsystem was
started before the services subsystem. This could result in the &reset
scheduler event (which updates defined services) running before they have
been configured for the first time. I believe this is the cause of the
server starting up but not accepting any connections at all.

3) Fixed a bug where '\r' wasn't being removed from the end of a line when
processing ! redir commands issued in the EXEC module.

4) Fixed a bug with the !detach feature in the EXEC module where I flipped
the meaning of the integer argument and was recording an error instead
of success.

Yil
04-07-2010, 02:56 PM
Some good stuff in this one... I think I fixed the server starting up but not responding bug and I think zero's problems with the EXEC module should be fixed.

Link: ioFTPD-v7.3.3.zip (http://home.comcast.net/~yil/ioFTPD-v7.3.3.zip)

Yil
04-07-2010, 03:03 PM
zero: I should mention it because it's not obvious, but the problem with the symlinks turned out to be a creation problem. When the \r wasn't being removed from the end of the ! line in the EXEC module the !vfs:chattr function got a double quoted string with the \r on the end so it failed to remove both the \r and the double quotes so the link was wrong. I should have noticed the target from the site chattr +l command having quotes around it, but just thought that was the way it should be displayed... It's fixed now but the existing links will still be wrong (though with it deleting all files on upload it's unlikely you had a lot of them).

Zer0Racer
04-07-2010, 03:06 PM
Yil, very nice! Now in ioFTPD v7.3.3 project-zs works just as it is supposed to and the zs-created symlinks work too. Solid work! Thanks!

Zer0Racer
04-07-2010, 03:10 PM
zero: I should mention it because it's not obvious, but the problem with the symlinks turned out to be a creation problem. When the \r wasn't being removed from the end of the ! line in the EXEC module the !vfs:chattr function got a double quoted string with the \r on the end so it failed to remove both the \r and the double quotes so the link was wrong. I should have noticed the target from the site chattr +l command having quotes around it, but just thought that was the way it should be displayed... It's fixed now but the existing links will still be wrong (though with it deleting all files on upload it's unlikely you had a lot of them).

Okay. But the correct way of creating a symlink "the old way" is still like this (including the quotes)?site chattr +l "dir" "virtual target"

Edit: Btw, creation problem you say. Does that mean there is a bug in the zipscript on symlink creation or just the EXEC module in ioFTPD?

pion
04-07-2010, 04:50 PM
Nothing in log, and ioFTPD-Watch.exe is running using 996K ram. I did however notice something I haven't seen before when examining event prior to hangup:

sitebot only reported newdir creation on three different releases, as if zipscript stopped working entirely. Three minutes earlier it put out full output with a working zs. Type of files was excactly the same in both before and after. In addition to this, when examining logs, I noticed that io had stopped accepting new connections 20 MINUTES before. (I noticed this due to a user hammering every 20s but getting login refused in log) This particular user had been hammering for a full hour already, and is also the last entry in ioftpd.log so it's unlikely that he stopped hammering at that particular point.

This leads me to believe that already connected users aren't very much affected by the lockup bug, and that it is possibly connected with the execution of zipscript.

Using ioFTPD 7.3.2 - dZSbot - ioninja customized PZS-NG 21.1

Yil
04-07-2010, 06:30 PM
zero: Nothing changed with how you use site chattr, or site symlink. The double quotes are indeed required if the name contains spaces so it's a good idea to always use them. You can check out "site help chattr" for details. The issue was in the EXEC module entirely and the way it improperly handled the arguments to !vfs:chattr which failed to remove the double quotes because of the extra \r at the end. This means that the link was stored as "/link"\r (the quotes are part of the name!) instead of just /link and therefore wouldn't resolve...

pion: Keep an eye out to see if you see that behavior again. If new connections are not being accepted, try to see if you can get directory listings via LIST (not STAT!) or alternatively transfer any files. If so then it's just something weird going on with users logging in. Try a site rehash to see if that helps, and also check the logfiles. You can also try "site crashnow" if you are still logged in to see if that works and send me the dumps.

pion
04-08-2010, 12:10 AM
I don't really have any way of detecting the lockup bug until I am no longer able to connect. Is there a way to trigger crashnow other than actually being logged in?

Yil
04-08-2010, 01:26 AM
pion: No, sorry. However there is something you can try that might help in your situation because I don't think you have the dreaded "lockup bug" but rather something that just stops accepting connections.

First in the .ini file copy the [FTP_Service] section and call the new one [FTP_Service2]. Change the "Port =" line to use a different port number, and then under [Network] change

Active_Services = FTP_Service FTP_Service2

You will now have a server that listens and accepts connections on 2 different ports! If the first gets locked up, try connecting to the second to see if that still works...

o_dog
04-08-2010, 04:07 AM
it's not only me it seems that have problem with the 550 - transfer bla bla bla thing. Seems others have it too, this is kinda important.. The log says nothing since it doesn't crash, timeout or anything like that, it just seems to cling to a connection it likes very much. Can't really generate a crashlog for ya, atleast not right now.

Silly
04-08-2010, 06:46 AM
just upgraded....let's see...how's it going..

Yil
04-08-2010, 10:02 AM
o_dog: I took a look around and didn't see anything obvious. It looked like all possible paths ended up resetting the flag no matter how the connection ended. The only suspicious thing I saw was the race condition on the bit field that could only occur if your client timed it out and tried to abort it which I presume you would see in the logs. I fixed that possibility in the 7.3.2 release just in case. If you're using that or newer and it's still happening I can always gin up a special debug version to record some info to the debug log or you can generate minidumps.

Can you double check that your client didn't try to abort the transfer on it's own before it got stuck? And upgrade to 7.3.3 to rule out the race condition I fixed.

pion
04-08-2010, 10:12 AM
Yil; I tried your extra service configuration, but I had some problems with it. I copied the existing ftp service, renamed it, and change port, then added that Active_Services. At some point it appeared that only the latest service got loaded, essentially only opening one port. I also had problems getting the service to allow connections alltogether, also after reverting config, as if some bindings where still active. Reverting, manually killing the ioftpd.exe process, wait for a few, then starting it again appears to get things back to normal.

It's likely that the above problems is entirely my fault, however I find it curious that nothing (at all) showed up in logs when attempting to start ioftpd.exe. I could clearly see the process running, and spawning ioftpd-watch.exe - but not able to log in, and nothing is written in logs, not even startup events.

Yil
04-08-2010, 01:09 PM
pion: I've seen once or twice locally and had people tell me they would start the server and not be able to connect to it, but restarting it fixed it so it wasn't a .ini file issue. I scratched my head since everything looked OK at first glance when I got it into the debugger but then I got the idea it might have something to do with the scheduler race condition on startup so I fixed that. If you see that problem again and you aren't playing with the config file let me know as it means the problem is still in there...

"Added" Active_Services? You mean replaced/modified the line under [Network] that was already there? If you can login you can use "site services" to see a list of configured services and their status. Use "site devices" to see configured devices although you won't need multiple of those since services can share them.

monk-
04-09-2010, 08:22 AM
i also have problems with io just hanging at the startup
i figured out its hangs on directory caching
im used to putting all paths in the default.vfs
but with these hangs, it seams nessesary to disable cache on some paths

edit: i think it hangs on paths that are already cached
like:
"W:\FTP\MP3-ARCHIVE\OLDIES" /MP3-ARCHIVE/OLDIES
"w:" /STAFF/PROGRAMME/_HDD/W-drive
etc

and another question: is it possible to surpress log errors/warnings like: VFS WARNING: file '..\etc\default.vfs', line #59 has a real path that is invalid (error = No such file or directory): "y:" /STAFF/PROGRAMME/_HDD/Y-drive




edit2: hmm, i thought it would be anough to only put the sections in a seperate vfs file, and remove them from default.vfs
but then the sections dont show up on ftp
so now i wonder what the function is of this extra vfs file, when paths dont get loaded to the ftp anyway

pion
04-09-2010, 08:56 AM
Startup bug appears to be more severe that I thought. I now have it on configs that has been static. ioftpd.exe is running and spawn ioftpd-watch.exe after a few seconds. I can also see io steadily growing in memory size for a few minutes, I'm guessing this is directory cache? After it reaches some final memory consumption it just sits there doing nothing. Not writing logs, not accepting connections.

Error.log are created showing: 04-09-2010 16:19:31 Unable to open/parse pre-load VFS file: ../etc/sections.vfs
04-09-2010 16:19:31 VFS WARNING: file '..\etc\default.vfs', line #45 has a real path that is invalid (error = No such file or directory): "mylocation" /mylocation

ioFTPD.log:
04-09-2010 16:19:31 PRELOAD: "begin" "..\etc\default.vfs"
04-09-2010 16:19:31 PRELOAD: "points=29" "..\etc\default.vfs"
04-09-2010 16:19:47 SSL: "Found certificate" "name=aaaa" "Service=FTP_Service" "(Certificate_name)"
04-09-2010 16:19:47 START: "PID=2836" "CmdLine="

nxMyDb.log:
04-09-2010 16:19:32 ------------------------------------------------------------

and that's it. No more logs to debug what's going on

Using io 7.3.3


I tried creating an empty file called sections.vfs - but same error came in log, however the ftpd started :S Looks somewhat random

Yil
04-09-2010, 09:50 AM
Thanks for the observation. I'll take a quick look at the directory caching during preloading stuff which during startup is handled a little differently because the server isn't actually entirely configured at that point. It's possible I missed something being initialized and that's where the randomness is coming from.

I'll also add an option to completely disable it in the next build to see if that makes a difference as well.

pion
04-09-2010, 12:07 PM
04-09-2010 17:02:23 Color #2 bad in .ini theme: '+ 2 Light-Full -1 2 2 2 -1 -1 -1 -1 -1 -1 2 2 2 2 2 2 2 2 2 -1 -1 -1 -1 -1 -1 -1 -1 5000 -1 5000 -1 5000 -1 5047 1 5 6 3 -1 -1 -1 -1 -1 -1 -1 -1 2 2 -1 -1 2'
04-09-2010 17:06:42 VFS WARNING: file '..\etc\default.vfs', line #28 has a real path that is invalid (error = No such file or direct..

I'm assuming it's directory caching taking so much time?
Is it possible to speed up that directory caching by specifying max folder depth or set some max execution time or similar? That time is on a core2 system with plenty free ram, and idle disk IO..

Basicly the problem is that the server is refusing any connection until that VFS WARNING log entry is written

pion
04-09-2010, 01:59 PM
Not sure if this is nxMyDb or ioftpd problem, but I've seen groups disappear for no aparent reason, and the user is left in NoGroup. Anyone else seen this problem?

pion
04-09-2010, 02:39 PM
Server stopping to accept files: Site 1 = drftpd, site2 = io 7.0.3


(21:17:05) [Site2] MKD /Mydir
(21:17:06) [Site2] 257 "/Mydir" created.
(21:17:06) [Site2] CWD /Mydir
(21:17:06) [Site2] 250 CWD command successful.
(21:17:06) [Site2] PWD
(21:17:06) [Site2] 257 "/Mydir" is current directory.
(21:17:06) [Site2] STAT -l
(21:17:06) [Site2] List Complete: 33 bytes in 0,11 seconds (0,30KB/s)
(21:17:06) [Site1] PRET RETR my.sfv
(21:17:06) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:06) [Site1] CPSV
(21:17:06) [Site1] 227 Entering Passive Mode (1,1,1,1,165,166).
(21:17:06) [Site2] PORT 1,1,1,1,165,166
(21:17:06) [Site2] 200 PORT command successful.
(21:17:06) [Site2] STOR my.sfv
(21:17:06) [Site2] 150 Opening ASCII mode data connection for my.sfv using SSL/TLS.
(21:17:06) [Site1] RETR my.sfv
(21:17:07) [Site1] 150 File status okay; about to open data connection from SLAVE1.
(21:17:17) [Site1] 426- Read timed out
(21:17:17) [Site1] 426 Read timed out
(21:17:17) [Site1] PRET RETR my.sfv
(21:17:17) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:17) [Site1] CPSV
(21:17:17) [Site1] 227 Entering Passive Mode (1,1,1,1,165,108).
(21:17:17) [Site2] PORT 1,1,1,1,165,108
(21:17:17) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:17) [Site1] PRET RETR my.sfv
(21:17:17) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:17) [Site1] CPSV
(21:17:18) [Site1] 227 Entering Passive Mode (1,1,1,1,164,93).
(21:17:18) [Site2] PORT 1,1,1,1,164,93
(21:17:18) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:18) [Site1] PRET RETR my.sfv
(21:17:18) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:18) [Site1] CPSV
(21:17:18) [Site1] 227 Entering Passive Mode (1,1,1,1,166,82).
(21:17:18) [Site2] PORT 1,1,1,1,166,82
(21:17:18) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:18) [Site1] PRET RETR my.sfv
(21:17:18) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:18) [Site1] CPSV
(21:17:18) [Site1] 227 Entering Passive Mode (1,1,1,1,167,198).
(21:17:18) [Site2] PORT 1,1,1,1,167,198
(21:17:18) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:18) [Site1] PRET RETR my.sfv
(21:17:19) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:19) [Site1] CPSV
(21:17:19) [Site1] 227 Entering Passive Mode (1,1,1,1,165,180).
(21:17:19) [Site2] PORT 1,1,1,1,165,180
(21:17:19) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:19) [Site1] PRET RETR my.sfv
(21:17:19) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:19) [Site1] CPSV
(21:17:19) [Site1] 227 Entering Passive Mode (1,1,1,1,165,218).
(21:17:19) [Site2] PORT 1,1,1,1,165,218
(21:17:19) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:19) [Site1] PRET RETR my.sfv
(21:17:19) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:19) [Site1] CPSV
(21:17:19) [Site1] 227 Entering Passive Mode (1,1,1,1,165,123).
(21:17:19) [Site2] PORT 1,1,1,1,165,123
(21:17:19) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:20) [Site1] PRET RETR my.sfv
(21:17:20) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:20) [Site1] CPSV
(21:17:20) [Site1] 227 Entering Passive Mode (1,1,1,1,165,42).
(21:17:20) [Site2] PORT 1,1,1,1,165,42
(21:17:20) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:20) [Site1] PRET RETR my.sfv
(21:17:20) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:20) [Site1] CPSV
(21:17:20) [Site1] 227 Entering Passive Mode (1,1,1,1,167,125).
(21:17:20) [Site2] PORT 1,1,1,1,167,125
(21:17:20) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:21) [Site1] PRET RETR my.sfv
(21:17:21) [Site1] 200 OK, will use SLAVE1 for upcoming transfer
(21:17:21) [Site1] CPSV
(21:17:21) [Site1] 227 Entering Passive Mode (1,1,1,1,166,158).
(21:17:21) [Site2] PORT 1,1,1,1,166,158
(21:17:21) [Site2] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(21:17:21) [Site1] PRET RETR my.sfv
(21:17:21) [Site1] 200 OK, will use SLAVE1 for upcoming transfer


In addition to this, the server site2 is no longer working with PASV (timeout). So no transfers possible at all.. the ABOR message, sticks around for about 5s before it aparently timeout.

I can log in and out, resulting in the same behaviour.

Now if I restart the ioftpd.exe service all transfers behave proper again (both PORT and PASV).

(as a note, waiting 3-5 minutes for io to start (dir caching?) is a real pain for debugging)

I didn't think about issuing 'site crashnow' - but if it happends again I'll do that also

pion
04-09-2010, 02:49 PM
SystemError.log: 04-09-2010 21:46:59 Error converting string: /dir/subdir <-- Where is this error from? How to avoid it?

pion
04-09-2010, 05:02 PM
I finally managed to catch some events leading up to hangup now:

THREAD 1:
(21:51:29) [site] 200 Type set to A.
(21:51:29) [site] PASV
(21:51:29) [site] 227 Entering Passive Mode (1,1,1,1,55,97)
(21:51:29) [site] Opening data connection IP: 1.1.1.1 PORT: 14177
(21:51:29) [site] LIST
(21:51:29) [site] 150 Opening ASCII mode data connection for directory listing using SSL/TLS.

THREAD 2:
(22:43:52) [site] TYPE A
(22:43:52) [site] 200 Type set to A.
(22:43:52) [site] PASV
(22:43:52) [site] 227 Entering Passive Mode (1,1,1,1,67,196)
(22:43:52) [site] Opening data connection IP: 1.1.1.1 PORT: 17348
(22:43:52) [site] LIST
(22:43:52) [site] 150 Opening ASCII mode data connection for directory listing using SSL/TLS.

Thread 1 had a successful transfer just before the hanging dir list.
Thread 2 had one successful transfer at 22:54 which is AFTER thread 1 got stuck. After that, I don't have any successful transfers. In addition, thread had some

(22:43:30) [site] Can't open data connection
(22:43:30) [site] 426 Connection closed: Connection timed out.

entries on list BEFORE it hanged completely

After that command it hanged for 2 and 1 hours consequently, until I now aborted them both manually in client. No new connections are accepted to the site, and I am unable to send any additional commands. ioftpd.exe is running, without any hints it's crashed/hanged/locked up. This is io 7.0.3

o_dog
04-09-2010, 06:23 PM
pion: same error that i got. the 550 one

pion
04-10-2010, 12:24 PM
Another crash/hangup/lockup where IO was dropping all connections more or less at once:

I had the following three threads connected:

IO -> IO

(19:06:05) [i] [IO_703 -> IO_703] myfile1.r02 4,3 Mbytes/8,41(s)/814,95Kbps
(19:06:05) [IO_703] CPSV
(19:06:05) [IO_703] 227 Entering Passive Mode (1,1,1,1,59,178)
(19:06:05) [IO_703] PORT 1,1,1,1,59,178
(19:06:05) [IO_703] 200 PORT command successful.
(19:06:05) [IO_703] STOR myfile1.r01
(19:06:05) [IO_703] 150 Opening BINARY mode data connection for myfile1.r01 using SSL/TLS.
(19:06:05) [IO_703] RETR myfile1.r01
(19:06:06) [IO_703] 150 Opening BINARY mode data connection for myfile1.r01 (14648 bytes) using SSL/TLS.
(19:06:22) [IO_703] 426 Connection closed: The specified network name is no longer available.
(19:06:22) [IO_703] CPSV
(19:06:22) [IO_703] 227 Entering Passive Mode (1,1,1,1,63,77)
(19:06:22) [IO_703] CPSV
(19:06:22) [IO_703] 227 Entering Passive Mode (1,1,1,1,64,179)
(19:06:22) [IO_703] CPSV
(19:06:22) [IO_703] 227 Entering Passive Mode (1,1,1,1,67,161)
(19:06:22) [IO_703] Connecting to IO_703
(19:06:22) [IO_703] Resolving IO_703.dns.com...
(19:06:22) [IO_703] IO_703.dns.com => 2.2.2.2
(19:06:22) [IO_703] Connecting to 2.2.2.2:21
(19:06:22) [IO_703] Connection Reset by Peer(10054)


DR -> IO

(19:05:25) [i] [DRFTPD -> IO_703] myfile2.r45 4,3 Mbytes/3,91(s)/442,39Kbps
(19:05:25) [DRFTPD] PRET RETR myfile2.r44
(19:05:25) [DRFTPD] 200 OK, will use master for upcoming transfer
(19:05:25) [DRFTPD] CPSV
(19:05:25) [DRFTPD] 227 Entering Passive Mode (1,1,1,1,119,108).
(19:05:25) [IO_703] PORT 1,1,1,1,119,108
(19:05:26) [IO_703] 200 PORT command successful.
(19:05:26) [IO_703] STOR myfile2.r44
(19:05:26) [IO_703] 150 Opening BINARY mode data connection for myfile2.r44 using SSL/TLS.
(19:05:26) [DRFTPD] RETR myfile2.r44
(19:05:26) [DRFTPD] 150 File status okay; about to open data connection from master.
(19:06:21) [DRFTPD] 426- Connection reset by peer: socket write error
(19:06:21) [DRFTPD] 426 Connection reset by peer: socket write error
(19:06:21) [DRFTPD] PRET RETR myfile2.r44
(19:06:21) [DRFTPD] 200 OK, will use master for upcoming transfer
(19:06:21) [DRFTPD] CPSV
(19:06:21) [DRFTPD] 227 Entering Passive Mode (1,1,1,1,125,249).
(19:06:21) [IO_703] Connecting to IO_703
(19:06:21) [IO_703] Resolving IO_703.dns.com...
(19:06:21) [IO_703] IO_703.dns.com => 2.2.2.2
(19:06:21) [IO_703] Connecting to 2.2.2.2:21
(19:06:21) [IO_703] Connection Reset by Peer(10054)



Moving directory

(19:04:56) [IO_703] RNFR Subdir
(19:04:56) [IO_703] 350 Directory exists, ready for destination name.
(19:04:56) [IO_703] RNTO /ROOT/Mydir/Subdir
(19:04:56) [IO_703] 250 RNTO command successful.
(19:04:56) [IO_703] PASV
(19:04:57) [IO_703] 227 Entering Passive Mode (2,2,2,2,60,17)
(19:04:57) [IO_703] Opening data connection IP: 2.2.2.2 PORT: 15377
(19:04:57) [IO_703] LIST
(19:04:57) [IO_703] 150 Opening ASCII mode data connection for directory listing using SSL/TLS.
(19:04:57) [IO_703] Encryption algorithm: TLSv1 DES-CBC3-SHA-168
(19:04:59) [IO_703] 226-[UL: 1GB] [DL: 1GB] [Speed: 1.3KB/s] [Free: 1MB]
(19:04:59) [IO_703] 226 [Section: DEFAULT] [Credits: 150.0MB] [Ratio: Leech]
(19:04:59) [IO_703] List Complete: 3*129 bytes in 0,47 seconds (10,67KB/s)
(19:05:05) [IO_703] CWD .
(19:05:06) [IO_703] 250 CWD command successful.
(19:05:06) [IO_703] PWD
(19:05:06) [IO_703] 257 "/ROOT" is current directory.
(19:05:11) [IO_703] RNFR Subdir3
(19:05:11) [IO_703] 350 Directory exists, ready for destination name.
(19:05:11) [IO_703] RNTO /ROOT/Mydir3/Subdir3
(19:05:12) [IO_703] 250 RNTO command successful.
(19:05:12) [IO_703] PASV
(19:05:12) [IO_703] 227 Entering Passive Mode (2,2,2,2,56,136)
(19:05:12) [IO_703] Opening data connection IP: 2.2.2.2 PORT: 14472
(19:05:12) [IO_703] LIST
(19:05:12) [IO_703] 150 Opening ASCII mode data connection for directory listing using SSL/TLS.
(19:05:15) [IO_703] Encryption algorithm: TLSv1 DES-CBC3-SHA-168
(19:05:16) [IO_703] 226-[UL: 1GB] [DL: 1GB] [Speed: 1KB/s] [Free: 1MB]
(19:05:16) [IO_703] 226 [Section: DEFAULT] [Credits: 1MB] [Ratio: 1]
(19:05:16) [IO_703] List Complete: 72*984 bytes in 0,56 seconds (74,77KB/s)
(19:05:36) [IO_703] CWD .
(19:05:36) [IO_703] 250 CWD command successful.
(19:05:36) [IO_703] PWD
(19:05:36) [IO_703] 257 "/ROOT" is current directory.
(19:05:51) [IO_703] CWD .
(19:05:51) [IO_703] 250 CWD command successful.
(19:05:51) [IO_703] PWD
(19:05:51) [IO_703] 257 "/ROOT" is current directory.
(19:06:06) [IO_703] CWD .
(19:06:06) [IO_703] 250 CWD command successful.
(19:06:06) [IO_703] PWD
(19:06:06) [IO_703] 257 "/ROOT" is current directory.
(19:06:17) [IO_703] RNFR Subdir2
(19:06:18) [IO_703] 350 Directory exists, ready for destination name.
(19:06:18) [IO_703] RNTO /ROOT/Mydir2/Subdir2
(19:06:21) [IO_703] Connecting to IO_703
(19:06:21) [IO_703] Resolving IO_703.dns.com...
(19:06:21) [IO_703] IO_703.dns.com => 2.2.2.2
(19:06:21) [IO_703] Connecting to 2.2.2.2:21
(19:06:21) [IO_703] Connection Reset by Peer(10054)


Notice the timestamps, where io decided to just drop all connections, and not allow any new ones to be made..

*Edit: After some minutes, io was back on again - and it had created a crashlog - so now there's something to debug! :D

Zer0Racer
04-10-2010, 01:24 PM
What is the point in spamming logs from v7.0.3 when there a newer versions? Have you tried upgrading?

Yil
04-10-2010, 01:26 PM
Let me take the posts one at a time, starting from before my last post.

pion: I'm guessing the old version is at fault in the preloading taking a while problem. Look carefully at the logfile entries you showed. If preloading must be performed before the server is allowed to accept connections (DELAY=True mode) then the PRELOAD: lines will appear before the START: entry in the logfile. This is clearly true of your example so you need to set the DELAY option in VFS_Preload to FALSE. V7.0 defaulted this to TRUE if undefined and v7.1+ changed this to default to FALSE if undefined. So set it to FALSE explicitly to get the same behavior across those 2 versions. You can examine the logfile to make sure you get the correct behavior.

pion: The color warning error you are seeing probably is because you are running an older version. Perhaps to test with? V7.2+ changed all the theme color codes into letters and got put into Theme.ini instead of ioFTPD.ini so the line you are showing isn't possible. Also, the theme format changed between v6.9 and v7.0 to support sharing subthemes and the "+" sign was added to make sure an error just like the one you showed was displayed to indicate the definitions weren't updated. I'm guessing you tested v6.9 or something just to check behavior and thus you can ignore this error as it will only disable the theme and isn't a big deal.

Directory caching of entries in the mountfile do not decend into the directory tree, however they do have to process all immediate subdirs in order to get their sizes. Thus a large XVID folder will take a little bit of time because it has a lot of entries but a folder with just months like 01xx will be fast. The optional preloading section in the .ini file where you can specify where and to what depth to decend can be used to make it go deeper.

pion (and perhaps o_dog from earlier?): Take a look at that exchange. The FTP Client is at fault not the server!

(21:17:06) [Site2] PORT 1,1,1,1,165,166
(21:17:06) [Site2] 200 PORT command successful.
(21:17:06) [Site2] STOR my.sfv

No ABOR command was EVER sent, the client just kept trying to send new PORT commands and they should be rejected, in fact it even says so in the error reply! After like 20 seconds ioFTPD will timeout the attempt and it would probably continue working from that point. I forget which version of FlashFXP had this bug, but I know many earlier versions of the current beta did and it's possible other clients do/did as well. Newer, or older, versions of FlashFXP work fine and send the ABOR command before trying again...

pion: "Error converting string" errors were fixed in v7.1 when the TCL temporary internal name for the waitobject was increased in size to handle paths instead of just short names. Just use a newer version.

pion: That v7.0.3 lockup isn't that useful to me, but your next post with the minidump/crashlog might be. However, v7.1+ included a number of bugfixes that might have solved the reason for the problem, but I'll take a look at the crashlogs and get back to you if it's something new.

Zer0Racer
04-10-2010, 01:32 PM
Please use pastebin.com

pion
04-10-2010, 03:00 PM
The color warning was not the issue, it was the timestamp on the log for how long it took before login was possible.

Running through 10000 or even 100000 folders should really not take 5 minutes in any case afaik? Especially not with idle IO?

Ideas on where to put the blame for disappearing groups?

My first post describe a COMPLETE lock. I could log out and in, and not get a single transfer through, neither using PASV or PORT. After a log in/out, what the client did in a previous session should be completely irrelevant for the new session? This behaviour lasted for many minutes, through multiple sessions until I restarted the process manually.

As for the second post with similar behaviour, the interesting part is the preceeding commands before the process is refusing all connections. (Or that's what I figured anyway) due to 7.3.3 locking up the very same way

pion
04-10-2010, 04:01 PM
I was able to reproduce the blocking of all transfers using io 7.3.3 now. I also issued 'site crashnow' but that didn't do anything other than blocking the current session and writing 04-10-2010 22:48:55 CRASHNOW: "pion" to log, forcing a reconnect. Just minutes before transfers where progressing just fine. Nothing at all has been done to the ftpd except transfers. It appeared that all transfers (4 different sessions) all stopped at once. There are entries in log about LOGIN, LOGOUT, NEWDIR, after the crashnow, but no file transfers. Debug.log also got written after site crashed: 04-10-2010 22:49:40 AsyncSelectCancel flags: 0

In fact, I tried 'site crashnow' three times, in three different sessions without any noticeable effect.

After about 5 minutes of this - the process locked up completely, not accepting any connections at all.

ioftpd-watch.exe also just sits there without stopping the daemon. Manual restart needed.

Yil
04-10-2010, 05:54 PM
pion: Hmm, that's weird. Crashnow should cause the process to crash. It's a fatal error! If it doesn't something is seriously wrong, and usually it has been the loader lock getting stuck which is now tested for. Perhaps there is another reason, some sort of 3rd party module catching crashes, etc. I may write a simple connectivity tester in the next release similar to the loader lock tester just to keep an eye on things so it can restart if it gets stuck...

I'm about 50% of the way done writing the openSSL conversion and noticed something important. The Overlapped I/O structure is being re-used without being cleared. I failed to do this when I re-wrote the EXEC module and it caused problems and is now fixed, but this is original unchanged by me code deep inside the transfer logic. The important thing here is it's possible not clearing this causes the winsock library to freak out once in a rare while and is causing the lockup bugs. I think I'll put out a 7.3.4 release in a bit just to see if it makes a difference.

Important tidbit. ioFTPD pre-creates 10 sockets so it's ready to accept connections quickly. This means that even when ioFTPD gets stuck it's sometimes possible to connect to the server but not be able to login, or transfer files/listings over new data ports. However, since new sockets cannot be created to replace those 10 sockets it eventually stops accepting connections and you'll get timeouts instead of connections (which may or may not print anything). The only reason I mention this is because often times you'll see the server accept the connection and then timeout versus the server not accepting the connection at all and the client timing out the attempt. If you see the first start happening you can always try to login 10 times and see if the behavior changes which will tell us something about what is going on...

pion
04-11-2010, 06:36 AM
Looking at the log, there's only 4 logins within the timeframe of the errors. 3 of those where after first 'site crashnow' event i log

Yil
04-14-2010, 01:28 AM
Update: I got the openSSL version of ioFTPD with a heavily modified transfer engine to compile and do a key exchange. It seems I haven't figured out some of the trickier aspects of using the undocumented (but faster) non-copying interface though... I learned a bunch of things so I'll have to re-write part of it again, but I did restore full functionality to it without encryption which is actually a big step given all the changes. The new model handles the encryption in just 2 places instead of over lots of code so it should be easier once I figure out how it works...

I've been working really hard the last 7 days or so on this and assuming I can keep the pace for a few more there might be something to play with soon.

Yil
04-15-2010, 04:08 PM
I found a really good reason why the server sometimes won't accept login connections on startup and it's a trivial mistake. I'm not sure what version introduced it but it was v6+ and therefore my bad. It obviously had serious consequences and because it's a race condition it's more likely on multi-core / faster machines. It's also possible this could explain some cases of the server not accepting logins after startup BUT still allowing logged in users to transfer files (i.e. not the bad lockup bug).

I'm not positive yet, but I think I found a bug in FlashFXP and FTPRush transferring files when uploading via the loopback interface. It's not too big of a deal when using SSL transfers, but it truncates the file when using unencrypted transfers. I couldn't believe BOTH of them wouldn't handle this correctly and so it's still possible it's my fault but ioFTPD to another copy of ioFTPD on the same machine via loopback is perfect... Needless to say a lot of hours were lost trying to track this down and then convincing myself they must be at fault and trying to prove that, but let's face it uploading via the loopback interface which means the same machine just isn't that common of a thing unless testing... It's possible this is a win7 x64 new "feature" but my research says this is allowed behavior by the windows library and a somewhat common programming bug.

Now I can get back trying to figure out the openSSL issues :)

pion
04-16-2010, 09:37 AM
How soon do you think you'll have an openssl version or bugfix version ready? Crashing/lockup has gotten so bad that I'm actively checking in here multiple times a day for updates :P

Silly
04-16-2010, 09:42 AM
lol..same here.....but since there is watch...i don't need reboot it manually that offen..

Flow
04-16-2010, 02:09 PM
Dont stress the cook man. You´ll get bad food

pion
04-16-2010, 02:23 PM
If you're hungry enough, bad food is better than no food!

monk-
04-16-2010, 04:57 PM
I found a really good reason why the server sometimes won't accept login connections on startup and it's a trivial mistake. I'm not sure what version introduced it but it was v6+ and therefore my bad.

to be precise, version 693 has no hang on startup, where 701+ will
and i dont think multi-core is the problem, since i also have this problem on single-core's
and its also not an os dependend bug, cos its on xp too, with 2003
i still think the problem is in the dir caching
fyi

pion
04-17-2010, 04:08 AM
# IDNT command restricted to use by these hosts. You may list up to 10 IP
# addresses or hostnames (i.e. BNC_HOST_10) without skipping numbers.
BNC_HOST_1 = 192.168.0.* <-- doesn't work, sais "doesnt resolve" in log
BNC_HOST_2 = my.dns.name <-- doesn't work, tested using ip instead of dns here, which worked fine

In dynamic ip scenarios, both cases would be a welcomed addition to this feature.

As a temp fix, how to disable the ip restriction alltogether? I tried just removing the BNC_HOST entries without much success.

Using io 7.3.3

Yil
04-17-2010, 12:46 PM
pion: My assumption is that BNCs use static IPs. They are basically either local or on a shell account somewhere and in both cases are usually static. I could support using a DNS name and refresh the address after a "site rehash" if you want, but it certainly isn't going to look it up on demand for each connection because of the performance issue. Disabling the restriction is a really bad idea though as it opens up a security hole.

pion
04-17-2010, 02:05 PM
Caching the DNS lookup from a task manually added to the scheduler should be plenty. Entrybncs are not always on static ips, often cases where cuBNC is involved.

o_dog
04-17-2010, 04:06 PM
i'm still having problem with the lockup of a transfer and then denying transfers until you login again. Using latest version.

Yil
04-17-2010, 06:05 PM
o_dog: Check your logs. The example pion posted clearly shows the FTP client at fault. It requested a transfer via a PORT command and after 10 seconds the 2nd site timed it out and the client just starting issuing PORT commands to retry without aborting the still running command. After an additional 10 seconds ioFTPD goes ahead and declares it failed and then things would start working again... If you wait 30 seconds does it start working again for you? It sounds like it doesn't, but just checking. Some logs would be useful showing the problem, send them to the FTP crash site if you can cut/paste an example of the problem. PM me if you need the login info.

I think I got openSSL working. I can login, enter commands, transfer files, etc. Need to finish up a few things but looking good...

o_dog
04-17-2010, 06:17 PM
pm'd you about it, but i can't use this as it stands now. I can't transfer files with ioFTPD right now which is kinda dealbreaking for me. plz try to fix this one fast

Yil
04-17-2010, 06:46 PM
PM'd. Can you confirm from your client logs that this happens only when FXPing? And does it only happen with drftpd as the other side like in pion's example? The FTP client probably handles things a bit differently with the PRET command. My money is still on whatever FTP program you are using... Any chance you upgraded/changed it recently?

o_dog
04-17-2010, 06:54 PM
no, it's gllftpd. I don't go near drftpd. And no, i havn't upgraded recently
or any such thing, except for ioftpd, gonna check for the logs of the ftp client.

pion
04-18-2010, 03:55 AM
Yil; I also got the same behaviour in direct download, with client sending ABOR (unlike fxp, where it didn't send ABOR) - where all transfers got blocked. PORT, PASV, fxp, direct. But the fact that io was blocking both pasv and port for all sessions, including new login, and the fact that io locked up a few minutes later denying all connections makes me assume that it's symptoms of the lockup bug (or the cause), rather than a separate incident.

monk-
04-18-2010, 06:28 PM
BNC_HOST_1 = 192.168.0.* <-- doesn't work, sais "doesnt resolve" in log


what about using wildcard for iprange, can u support this too, yil?

Yil
04-18-2010, 10:56 PM
monk-: If you need IP wildcards for the BNC host address (reverse resolved names probably as well), I can do that. And fully specified names for looked up via site rehash or via a new scheduler event sounds good too so it can handle dynamic DNS names if needed.