ioFTPD v7.4.3 Released (Experimental) [Archive]

View Full Version : ioFTPD v7.4.3 Released (Experimental)

Yil

04-20-2010, 03:33 PM

The OpenSSL port of ioFTPD is now available. It should function the same, but a lot has changed so let me know if you notice anything odd...

NOTE: experimental release... make sure to save old /system dir in case this doesn't work for you. Report all issues here.

Latest Version:
Link: ioFTPD-v7.4.3.zip (http://home.comcast.net/~yil/ioFTPD-v7.4.3.zip)

Source:
Link: ioFTPD-v7.4.0-src.zip (http://home.comcast.net/~yil/ioFTPD-v7.4.0-src.zip)

Yil

04-20-2010, 03:33 PM

v7.4.0 Release Notes:

1) Files in \System:
Changed : ioFTPD.[exe,pdb] - Version 7.4.0.0.
Changed : dgbhelp.dll, symsrv.dll - Version 6.12.2.633
Added : libeay32.[dll,pdb], ssleay32.[dll,pdb] - Version 1.0.0.0
Deleted : makecert.exe
Changed : ioFTPD.ini - summary of changes by section...
[FTP_Service] : BNC_HOST_# comment changed.

*** New features:

2) The server now uses the OpenSSL library to perform SSL/TLS encryption.

New SSL/TLS certificates will be needed and can be created via the
"site makecert" command, and removed via the "site removecert" command.
The server can now do this without the help of an external program.
The certificates are stored as <name>.key and <name>.pem files in the
ioFTPD system directory. If necessary change permissions on the .key
file to make it readable only by the server/you if the machine is shared.

For the moment the following ioFTPD.ini settings do not matter, however
they may in a future build so until I decided one way or the other I'm
not deleting references to them.
[FTP_Service] : Min_Cipher_Strength, Max_Cipher_Strength
[Threads] : Encryption_Threads

3) Make sure you now have at least as many io_Threads under [Threads] as you
do cores as the encryption and CRC computation for incoming files are
performed in the I/O thread now instead of a separate thread.

4) You may now specify wildcards in BNC_HOST_# definitions for a service.
Dynamic DNS hostnames are not supported at this time.

5) AUTH SSL and implicit SSL connections used to imply PROT P but AUTH TLS
didn't... Now none do, as clients are supposed to enable this via PROT P.

*** Bug Fixes

6) Fixed a race condition when a service starts that would cause some or even
all sockets listening for connections to not be created. This should fix
the server sometimes not accepting connections at startup problem.

7) Changed socket and file read/write routines to make sure they clear the
overlapped fields that should be set to zero before using them.

*** Internal changes:

8) The Overlapped socket structures contain an identifier that is incremented
before each call, and cleared in the callback. This can detect bad usage.
If everything looks good it will be removed in a later build.

9) Moved CRC field out of overlapped structure and into ioFile itself.

10) The following functions no longer exist or have changed enough that they
should not be called by external modules: ioSend, ioRecv,
TransmitPackages, ReceiveLine, ioCloseSocket, Secure_SendResult,
Secure_ReceiveResult.

Yil

04-20-2010, 03:38 PM

Grr, forgot to update the ioFTPD.ini file for distribution with this simple comment change:

# IDNT command restricted to use by these hosts. You may list up to
# 10 (i.e. BNC_HOST_10) IP addresses or hostnames without skipping numbers.
# You may use wildcards.
;BNC_HOST_1 = 127.0.0.1

Yil

04-20-2010, 05:11 PM

Hmm... the openSSL libs are built against the debug runtimes... I'm going to need to re-compile them against the regular... Update coming in a few mins.

Yil

04-20-2010, 05:38 PM

v7.4.1 Release Notes:

1) Files in \System:
Changed : libeay32.dll ssleay32.dll - Version 1.0.0.0
Deleted : libeay32.pdb ssleay32.pdb

*** Bug Fixes

2) The OpenSSL libraries were compiled with debugging information and would
fail to find the right libraries when ioFTPD.exe was run and generate the
"The application failed to initialize properly (0xc0150002). Click on
OK to terminate the application."
message.

Yil

04-20-2010, 05:45 PM

I'll figure out how to get it compiled without using the debug library but to output a .pdb for better error reporting later...

I'll also probably add an option to have it auto-generate the cert if it's missing or invalid.

Yil

04-20-2010, 06:51 PM

Looks like an early tester seeing some issues with incomplete files and a script having login issues. And it crashed...

Give me a day or two, I don't suggest upgrading yet.

Yil

04-20-2010, 11:55 PM

v7.4.2 Release Notes:

1) Files in \System:
Changed : ioFTPD.[exe,pdb] - Version 7.4.2.0

*** Bug Fixes

2) Fixed a bug introduced in the port where an internal job was ended too
soon during login and could result in a race condition and the client
being disconnected and/or crashing the server.

Yil

04-20-2010, 11:59 PM

Ok, simple fix to eliminate the easy to reproduce crash problem. Funny enough, it wasn't in any of the OpenSSL code LOL...

See if this works. There have been some issues with received files being truncated. This version should show a "226-Transfered: <bytes>." message after the zipscript output if any. If you are finding things don't transfer correctly please confirm if the bytes written match the size of the file exactly.

Oh, and if you see it writing any lines to the logs/Debug.log file please let me know. There are a few cases I don't expect to happen that I can log there and if they do occur I'll need to handle them.

pion

04-21-2010, 03:16 AM

I tried upgrading io733 to io742. I did not change anything in the ioftpd.ini.

without ssl:

ioftpd.ini:
Host = my.external.ip
Bind = 192.168.10.100

PASV
227 Entering Passive Mode (127,0,0,1,57,248)

and

PWD
257 "/" is current directory.
STAT -l
(hangs)

Both cases are without SSL

site makecert
200-NAME="abcd" [FTP_Service (Certificate_Name)]
500 makecert: The object or property already exists.

however, I could login with ssl after, AND stat -l worked with ssl. In addition, stat -l seems to work without ssl also after the 'site makecert' (?)

with ssl:

For AUTH SSL after the upgrade I got "Network subsystem is unusable(10091)", after the new cert was in place - but I suspect the client to be at fault there. However using AUTH TLS worked up until fxp transfer:

(09:57:00) [IO_703] 150 Opening BINARY mode data connection for file.rar (14648 bytes) using SSL/TLS.
(09:57:01) [IO_703] 426 Connection closed: The specified network name is no longer available.
(09:57:01) [IO_703] CPSV
(09:57:01) [IO_703] 227 Entering Passive Mode (1,1,1,1,65,3)
(09:57:01) [IO_742] 226-Transferred: 0.
(09:57:01) [IO_742] 426 Connection closed: SSL library returned a failure code.
(09:57:01) [IO_742] PORT 1,1,1,1,65,3
(09:57:01) [IO_742] 226 ABOR command successful.
(09:57:01) [IO_742] 200 PORT command successful.
(09:57:01) [IO_742] STOR file.rar
(09:57:01) [IO_742] 150 Opening BINARY mode data connection for file.rar using SSL/TLS.
(09:57:01) [IO_703] RETR file.rar
(09:57:02) [IO_703] 150 Opening BINARY mode data connection for file.rar (14648 bytes) using SSL/TLS.
(09:57:03) [IO_703] 426 Connection closed: The specified network name is no longer available.
(09:57:03) [IO_703] CPSV
(09:57:03) [IO_703] 227 Entering Passive Mode (1,1,1,1,68,235)
(09:57:03) [IO_742] 226-Transferred: 0.
(09:57:03) [IO_742] 426 Connection closed: SSL library returned a failure code.
(09:57:03) [IO_742] PORT 1,1,1,1,68,235
(09:57:03) [IO_742] 226 ABOR command successful.
(09:57:03) [IO_742] 200 PORT command successful.
(09:57:03) [IO_742] STOR file.rar
(09:57:04) [IO_742] 150 Opening BINARY mode data connection for file.rar using SSL/TLS.
(09:57:04) [IO_703] RETR file.rar
(09:57:04) [IO_703] 150 Opening BINARY mode data connection for file.rar (14648 bytes) using SSL/TLS.
(09:57:05) [IO_703] 426 Connection closed: The specified network name is no longer available.
(09:57:05) [IO_703] CPSV
(09:57:05) [IO_703] 227 Entering Passive Mode (1,1,1,1,60,109)
(09:57:05) [IO_742] 226-Transferred: 0.
(09:57:05) [IO_742] 426 Connection closed: SSL library returned a failure code.
(09:57:05) [IO_742] PORT 1,1,1,1,60,109
(09:57:05) [IO_742] 226 ABOR command successful.
(09:57:05) [IO_742] 200 PORT command successful.
(09:57:05) [IO_742] STOR file.rar
(09:57:05) [IO_742] 150 Opening BINARY mode data connection for file.rar using SSL/TLS.
(09:57:05) [IO_703] RETR file.rar
(09:57:06) [IO_703] 150 Opening BINARY mode data connection for file.rar (14648 bytes) using SSL/TLS.
(09:57:07) [IO_703] 426 Connection closed: The specified network name is no longer available.
(09:57:07) [IO_742] 226-Transferred: 0.
(09:57:07) [IO_742] 426 Connection closed: SSL library returned a failure code.
(09:57:07) [IO_703] CWD /
(09:57:07) [IO_742] CWD /
(09:57:07) [IO_742] 226 ABOR command successful.

(09:54:47) [IO_742] STOR myfile.r19
(09:54:47) [IO_742] 150 Opening BINARY mode data connection for myfile.r19 using SSL/TLS.
(09:54:47) [glftpd] RETR myfile.r19
(09:54:47) [glftpd] 150 Opening BINARY mode data connection for myfile.r19 (15000000 bytes) using SSL/TLS.
(09:54:47) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_connect()), disconnected: Connection reset by peer.
(09:54:47) [glftpd] CPSV
(09:54:47) [glftpd] 227 Entering Passive Mode (1,1,1,1,38,235)
(09:54:47) [IO_742] 226-Transferred: 0.
(09:54:47) [IO_742] 426 Connection closed: SSL library returned a failure code.
(09:54:47) [IO_742] PORT 1,1,1,1,38,235
(09:54:47) [IO_742] 226 ABOR command successful.
(09:54:47) [IO_742] 200 PORT command successful.
(09:54:47) [IO_742] STOR myfile.r19
(09:54:48) [IO_742] 150 Opening BINARY mode data connection for myfile.r19 using SSL/TLS.
(09:54:48) [glftpd] RETR myfile.r19
(09:54:48) [glftpd] 150 Opening BINARY mode data connection for myfile.r19 (15000000 bytes) using SSL/TLS.
(09:54:48) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_connect()), disconnected: Connection reset by peer.
(09:54:48) [glftpd] CPSV
(09:54:48) [glftpd] 227 Entering Passive Mode (1,1,1,1,37,82)
(09:54:48) [IO_742] 226-Transferred: 0.
(09:54:48) [IO_742] 426 Connection closed: SSL library returned a failure code.
(09:54:48) [IO_742] PORT 1,1,1,1,37,82
(09:54:48) [IO_742] 226 ABOR command successful.
(09:54:48) [IO_742] 200 PORT command successful.
(09:54:48) [IO_742] STOR myfile.r19
(09:54:49) [IO_742] 150 Opening BINARY mode data connection for myfile.r19 using SSL/TLS.
(09:54:49) [glftpd] RETR myfile.r19
(09:54:49) [glftpd] 150 Opening BINARY mode data connection for myfile.r19 (15000000 bytes) using SSL/TLS.
(09:54:49) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_connect()), disconnected: Connection reset by peer.
(09:54:49) [glftpd] CPSV
(09:54:49) [glftpd] 227 Entering Passive Mode (1,1,1,1,38,72)
(09:54:49) [IO_742] 226-Transferred: 0.
(09:54:49) [IO_742] 426 Connection closed: SSL library returned a failure code.
(09:54:49) [IO_742] PORT 1,1,1,1,38,72
(09:54:49) [IO_742] 226 ABOR command successful.
(09:54:49) [IO_742] 200 PORT command successful.
(09:54:49) [IO_742] STOR myfile.r19
(09:54:49) [IO_742] 150 Opening BINARY mode data connection for myfile.r19 using SSL/TLS.
(09:54:49) [glftpd] RETR myfile.r19
(09:54:49) [glftpd] 150 Opening BINARY mode data connection for myfile.r19 (15000000 bytes) using SSL/TLS.
(09:54:49) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_connect()), disconnected: Connection reset by peer.
(09:54:49) [glftpd] CPSV
(09:54:49) [glftpd] 227 Entering Passive Mode (1,1,1,1,42,129)
(09:54:49) [IO_742] 226-Transferred: 0.
(09:54:49) [IO_742] 426 Connection closed: SSL library returned a failure code.
(09:54:49) [IO_742] PORT 1,1,1,1,42,129
(09:54:50) [IO_742] 226 ABOR command successful.
(09:54:50) [IO_742] 200 PORT command successful.
(09:54:50) [IO_742] STOR myfile.r19
(09:54:50) [IO_742] 150 Opening BINARY mode data connection for myfile.r19 using SSL/TLS.
(09:54:50) [glftpd] RETR myfile.r19
(09:54:50) [glftpd] 150 Opening BINARY mode data connection for myfile.r19 (15000000 bytes) using SSL/TLS.
(09:54:50) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_connect()), disconnected: Connection reset by peer.

as pasv doesn't resolve proper, reverse transfer naturally don't work either:

(10:11:00) [io_703] TYPE I
(10:11:00) [io_703] 200 Type set to I.
(10:11:00) [io_742] TYPE I
(10:11:00) [io_742] 200 Type set to I.
(10:11:00) [io_703] Reversed FXP started
(10:11:00) [io_742] CPSV
(10:11:00) [io_742] 227 Entering Passive Mode (127,0,0,1,61,45)
(10:11:00) [io_703] PORT 127,0,0,1,21,45
(10:11:00) [io_703] 200 PORT command successful.
(10:11:00) [io_742] STOR thefile.rar
(10:11:01) [io_742] 150 Opening BINARY mode data connection for thefile.rar using SSL/TLS.
(10:11:01) [io_703] RETR thefile.rar
(10:11:01) [io_703] 150 Opening BINARY mode data connection for thefile.rar (14648 bytes) using SSL/TLS.
(10:11:02) [io_703] 426 Connection closed: No connection could be made because the target machine actively refused it.
(10:11:02) [i] Reversed FXP started
(10:11:02) [io_742] CPSV
(10:11:02) [io_742] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(10:11:02) [i] Reversed FXP started
(10:11:02) [io_742] CPSV
(10:11:03) [io_742] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(10:11:03) [i] Reversed FXP started
(10:11:03) [io_742] CPSV
(10:11:03) [io_742] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.

I tried to manually enter a directory, the ftpd disconnected me, causing my client to reconnect right away:

(10:02:17) [IO_742] CWD mydir
(10:02:17) [IO_742] 250 CWD command successful.
(10:02:17) [IO_742] PWD
(10:02:17) [IO_742] 257 "/mydir" is current directory.
(10:02:17) [IO_742] STAT -l
(10:02:18) [IO_742] Connecting to IO_742
(10:02:18) [IO_742] Resolving IO_742.mydns.com...
(10:02:18) [IO_742] IO_742.mydns.com => 1.1.1.1
(10:02:18) [IO_742] Connecting to 1.1.1.1:21

and after reconnect, the directory listed fine! (??) also for new sessions. (I might add that it's a fairly large directory). Other directories also listed fine in this and later sessions.

And finally, transferring files in AUTH TLS mode as regular download worked fine (except for PASV, which resolves wrong)

*Edit: I might add that I got some debug.log messages:

04-21-2010 09:42:52 AsyncSelectCancel flags: 0
04-21-2010 09:46:18 AsyncSelectCancel flags: 0
04-21-2010 09:46:58 AsyncSelectCancel flags: 0
..

Yil

04-21-2010, 09:58 AM

Hmm. I'm willing to bet that the "large" directory is at fault for your listing issues. If you request a directory listing via "stat -l" that takes longer that 2 minutes to enumerate most FTP clients will time out the command. The most common case is a network mounted drive and I've seen this lots of times in that situation. Just re-connect and do it again and everything from that point works fine which is what you are seeing as well. I haven't found a good way to get around that problem, but it's the entire reason the preloading feature exists and allows for the DELAY option...

You mentioned PASV not resolving correctly. Is that something new or unexpected? I don't believe any of that should be different than before...

I guess I missed something in the SSL case though with the "SSL library returned a failure code" messages... I was hoping for more info from the Debug.log file, but knowing there is an issue means I'll test more and see what happens locally. My simple tests with glftpd/drftpd and back to itself worked, but I guess I got something wrong somewhere. I presume it has to do with the handshake code given no bytes were transferred.

pion

04-21-2010, 10:42 AM

For large directories my client is listing bytes as it recieves them, when it hang it didn't get any bytes at all. As for my previous dirlisting problems, they are occuring at small directories also (<300 subdirs).

And on a sidenote of dirlisting: I put preload=false, but io742 still used a couple of minutes of loading ram before I was able to log in? (connection time out)

I'm not connecting to localhost ftpd... so yeah, the 127.0.0.1 is quite unexpected

Yil

04-21-2010, 12:06 PM

ioFTPD won't display anything until all subdirs are evaluated so that is normal. And I mentioned before that no matter what preloading settings you have set it will load the mountpoints from the default VFS file (usually default.vfs) and any defined via VFS under [VFS_Preload]. However if delay=false it won't stop you from logging in while that is happening. Check the logfile to see the message order and what it means (see the v7.3 thread).

I just tested up/down over SSL and switched the connect/accept role via SSCN between ioFTPD and glFTPD with 2 different FTPs (one using an expired cert even) and all worked fine. I tested downloading from drFTPD as well in both roles and that worked as well.

It's still pretty likely I goofed something up by missing a case (as I said there are a few I was hoping don't happen), but I had no issues with the PASV feature. Double check your .ini file because until it reports that correctly you have other problems to worry about... Swap the .exe with the old without touching the .ini and see if it gets the PASV host correct. Are you using an IP address, a dynamic DNS name, a regular DNS name? Does it resolve correctly outside of ioFTPD?

I know, sometimes silly stuff, but have to check... Firewall? etc...

pion

04-21-2010, 01:16 PM

The 127.0.0.1 problem was my windows hosts file with one dns record too much, so no wonder there was problems!

SSL transfers worked fine after I fixed that bit. I still get that network subsystem unusable in FTPRush, and in FlashFXP I get
[R] 220 -
[R] AUTH SSL
[R] 234 AUTH SSL successful.
[R] Connected. Negotiating SSL session..
[R] Connection failed (Connection lost)
[R] Delaying for 120 seconds before reconnect attempt #1

Where AUTH TLS works fine in both clients. AUTH SSL also work towards previous io versions in both clients.

Not so sure it's solely the clients fault anymore, I attempted to install latest openssl libraries, and tried flashfxp install today, in both win7, and win2k3 with the same result.. any way to look closer on what's happening when negotiating the ssl session?

I have DELAY = FALSE in ioftpd.ini, starting with empty logs

This is from clean logs:

nxmydb.log:
04-21-2010 20:04:16 ------------------------------------------------------------

<unable to log in at this point>

ioftpd.log:
04-21-2010 20:05:40 SSL: "Found certificate" "name=abcd" "Service=FTP_Service" "(Certificate_name)"
04-21-2010 20:05:40 START: "PID=1296" "CmdLine="
04-21-2010 20:05:40 PRELOAD: "begin" "../etc/default.vfs"
04-21-2010 20:06:12 LOGIN: "FTP_Service" "pion"

At that login, I'm stuck waiting for STAT -l (no disconnect, just delay) for a couple of minutes until
(20:09:01) [FTP] List Complete: 426 bytes in 84,33 seconds (0,01KB/s)

This is about 4000 folders, which each have 3 subfolders, so that leaves the total to 16000 folders. Why does it take 4 minutes, 45 seconds before login successful? I find it hard to believe that it takes that long to iterate through the subdirs on idle cpu/hdd IO..

Yil

04-21-2010, 02:00 PM

pion: Just to double check are you using 7.4.2? The previous versions had a bug where it could drop the client right after/during negotiation which would look a lot like the first problem you listed in your latest post.

AUTH SSL and AUTH TLS are the same PROVIDED you haven't set the "Encryption_Protocol" option in the .ini file for the Service. If you have then you are limited to that protocol only... Can you double check that line is commented out? ioFTPD should be smart enough to not advertise/accept protocols it doesn't support, but right now it just ignore the argument to AUTH. The .ini option definitely controls what is allowed though. And AUTH TLS or SSL appeared to work for me locally, but I'll try some other clients!

It really does take a while to enter, enumerate all files/folders, read and parse the .ioFTPD file to get permissions and apply them to files and to figure out if the directory is an ioFTPD symbolic link or not. The only shortcut here is to disable sizing of directories in which case it goes straight to the .ioFTPD file to get only the perms and symbolic link target in each subdir. In the future perhaps I can provide a cache file and store it in directories with large fanouts or something, but for the moment on startup you're kinda stuck waiting for a while for the first listing in large dirs... I will double check that a directory lock isn't being held longer than necessary by the preloader though. I did do some profiling on this, and it's almost entirely disk limited. It just takes windows/NTFS a long time to open and enumerate all the dirs.

I know it sounds weird, but try DELAY=True. That enables a special multi-threaded directory scan that does a lot of things in parallel. It's actually MUCH MUCH faster if you have multiple drives. And as a bonus the server won't respond until it's actually ready which means it looks offline for 2 minutes but is immediately usable instead of answering right away but really won't do anything for 5 minutes using your numbers.

pion

04-21-2010, 02:06 PM

I got some weird transfer errors now, after I upgraded. From one particular site I got a lot of faulty files. There has not been any problems previously in between these two sites, but now it's close to 30% failed files..

(20:55:43) [io742] CPSV
(20:55:43) [io742] 227 Entering Passive Mode (1,1,1,1,60,83)
(20:55:43) [glftpd] PORT 1,1,1,1,60,83
(20:55:43) [glftpd] 200 PORT command successful.
(20:55:43) [io742] STOR file.r12
(20:55:43) [io742] 150 Opening BINARY mode data connection for file.r12 using SSL/TLS.
(20:55:43) [glftpd] RETR file.r12
(20:55:43) [glftpd] 150 Opening BINARY mode data connection for file.r12 (15000000 bytes) using SSL/TLS.
(20:55:49) [glftpd] 426 Data connection: Broken pipe.
(20:55:49) [io742] CWD /dir1/dir3/Subdir/
(20:55:49) [glftpd] STAT -l
(20:55:49) [io742] 426-250- .----== ioNiNJA v0.8 ==----------------------------.
(20:55:49) [io742] 426-250- | + CRC-Check: FaileD! |
(20:55:49) [io742] 426-250- `--------------------------------=====-------------'
(20:55:49) [io742] 226-Transferred: 8863744.
(20:55:49) [io742] 426 Connection closed: SSL library returned a failure code.
(20:55:49) [glftpd] List Complete: 3*742 bytes in 0,08 seconds (47,97KB/s)
(20:55:49) [io742] 226 ABOR command successful.
(20:55:49) [io742] PWD

Why does io think there's an ABOR sendt? Why wouldn't the transfer complete, even if there's packetloss on the network, as it aborts after half the transfer? I might add that I got a io703->io742 transfer running in another session without any problems.

Is there a way to be sure who's at fault here? I can't seem to find any records in logs about what failiure code the ssl library returns..

*Edit: Just updated another server, experiencing the same weird dropped transfers, from multiple sources that never had such problems before update

pion

04-21-2010, 02:16 PM

You where dead on about the Encryption_Protocol. I commented it out now, and both SSL and TLS worked fine. No more complaints from client.

Yil

04-21-2010, 03:07 PM

Ah! Finally! There IS a problem with SSL transferring files and loosing up to the last 100k or so of the file. I can't seem to reproduce it at the moment, but other people trying the release found the same thing. Hence the extra line of output showing how many bytes got written...

I am working on figuring out what is going on right now and hopefully should have a fix soon. At least we figured out all your other issues, and perhaps highlighted the fact that I should double check that SSL or TLS has not been disabled in the config file before responding to the command :)

Yil

04-21-2010, 03:37 PM

I looked over your latest logfile... It's weird. I agree I can't see a reference to the ABOR command anywhere, but why is the client issuing a CWD command after starting a transfer and before it's finished? And the server not complaining? You sure this isn't a combined logfile from more than one session? Like FlashFXP writing to a unified logfile?

Yil

04-21-2010, 03:42 PM

Oh, just for the record. The new transfer engine is picky about errors, whereas the old one silently ignored most of them. The new one records and aborts on all errors. This is a little annoying right now, but once I understand how and where certain errors occur I'll make sure they are handled correctly and silently which will highlight the real ones later on rather than masking them.

I've already found one instance where the error returned isn't supposedly possible in a particular situation according to MS but online people seem to handle it and get it as well so it's not just me. It's that kind of thing I want to uncover...

Yil

04-21-2010, 07:07 PM

v7.4.3 Release Notes:

1) Files in \System:
Changed : ioFTPD.[exe,pdb] - Version 7.4.3.0

*** New Features

2) Modified command (AUTH {TLS|SSL}). The server will no longer accept
"AUTH TLS" or "AUTH SSL" if that particular protocol is not supported by
the service because of the Encryption_Protocol setting in the .ini
configuration file.

*** Bug Fixes

3) Fixed secure socket read routine to consume as much as possible from the
read buffer before requesting more data. This should prevent truncated
and/or corrupt file uploads.

Yil

04-21-2010, 07:11 PM

Try this one. Works fine for me locally. If the OpenSSL library returns an error it will record the details into the Debug.log file...

pion

04-22-2010, 04:04 AM

io 743 appears to work fine :) No broken transfers yet.

On a side note, preload is still strangly slow.. this is in io 743 with delay=true:

04-22-2010 09:30:16 PRELOAD: "begin" "../etc/default.vfs"
04-22-2010 09:34:47 PRELOAD: "points=13" "../etc/default.vfs"

After playing around some with it, I'm still confused as to how it's intended to work.

I tried creating ../etc/sections.vfs, and move some of the mountpoints in there, but they don't show up on site!

I also tried adding 1=/mydir/ for every single dir in the / mount, with it still taking equally long time to start up.

Then I have another site with atleast 100x the ammount of dirs and subdirs, which load in 0s

04-22-2010 10:21:20 PRELOAD: "begin" "../etc/sections.vfs"
04-22-2010 10:21:20 PRELOAD: "points=4" "../etc/sections.vfs"
04-22-2010 10:21:20 PRELOAD: "count=141" "../etc/sections.vfs"
04-22-2010 10:21:33 SSL: "Unable to locate default certificate" "name=ioFTPD" "Service=FTP_Service"
04-22-2010 10:21:33 START: "PID=2960" "CmdLine="

(assuming START is where users are able to log in)

in that sections.vfs file, some of the mountpoints are plainly copied from default.vfs

When going back to my trouble site, I attempted to only add 1 single (small) directory to sections.vfs, and leave a copy + rest of mounts in default.vfs:

04-22-2010 10:55:02 PRELOAD: "begin" "../etc/sections.vfs"
04-22-2010 10:55:15 PRELOAD: "points=1" "../etc/sections.vfs"
04-22-2010 11:00:45 SSL: "Found certificate" "name=abcd" "Service=FTP_Service" "(Certificate_name)"
04-22-2010 11:00:45 START: "PID=528" "CmdLine="
04-22-2010 11:00:47 LOGIN: "FTP_Service" "pion"

As you can see, it would appear as if preload finish in a timeframe that looks much more sane, but I still can't log in.

Anyway - I'm fairly confused as to how I'm suppose to use sections.vfs, and the preload settings.. maybe you could clarify? Like what is correct syntax to only go 1 subdir deep during preloading? And debug what directory/harddrive in particular that's causing this very long startup time?

*Edit: to top it off, I tried removing all but one of the mountpoints in default.vfs, then boot ftpd, and add the rest of dirs after. Relogin works instantly, listing all folders as it should instantly..

pion

04-22-2010, 04:30 AM

Also: This is still a problem during fxp:

(11:25:33) [io743] CPSV
(11:25:33) [io743] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.

You added a workaround for PASV command I see, might add it here aswell? I'm not sure how the client would know when to abort transfer during fxp tho, since it has no idea of the actual data transfer.. except for that 550 ofc. During regular download, the client aborts just fine on problems.

Flow

04-22-2010, 09:48 AM

Duhhh, had to re-do the command Â´site makecertÂ´to make it kickin. Is working for me now :)

Cool, OpenSSL yey

Yil

04-22-2010, 10:06 AM

Really good to hear it's working for people now!!! :)

Here's a special build that is just for o_dog and pion:

Link: ioFTPD-v7.4.4-exe-only.zip (http://home.comcast.net/~yil/ioFTPD-v7.4.4-exe-only.zip)

You will now see something like:
550-Init: 1, Aborted: 0, Socket: 752, LastError: 0, Size: 0, Pos: 0
550 Active transfer in progress, terminate transfer with ABOR before proceeding.

The extra line above is the only change from 7.4.3, but it will show a couple of fields from the data transfer structure whenever you would get that 550 message. In particular if Socket is not 0 or -1 then the system has an open data socket and really does think it's transferring a file or waiting to connect/listening. Size is the size of the file (0 for uploads) and Pos is the current position. If you see that increasing then it really is sending/receiving.

Hopefully this can help figure out what you are seeing. I still believe it's the client not sending an ABOR command when required so try to get client logs if possible.

o_dog

04-22-2010, 01:01 PM

Even if it doesn't send them it should abort if nothing gets sent or if the speed is below 0.1kb/s for a long time. nm, works jsut fine, missed the openssl things.

pion

04-22-2010, 01:11 PM

Bad news: Hangup bug still present...

(19:43:34) [i] [io703 -> io744] file.r24 14,3 Mbytes/13,52(s)/1*109,80Kbps
(19:43:34) [io703] Reversed FXP started
(19:43:34) [io744] CPSV
(19:43:35) [io744] 227 Entering Passive Mode (1,1,1,1,61,236)
(19:43:35) [io703] PORT 1,1,1,1,61,236
(19:43:35) [io703] 200 PORT command successful.
(19:43:35) [io744] STOR file.r23
(19:43:35) [io744] 150 Opening BINARY mode data connection for file.r23 using SSL/TLS.
(19:43:35) [io703] RETR file.r23
(19:43:36) [io703] 150 Opening BINARY mode data connection for file.r23 (14648 bytes) using SSL/TLS.
(19:44:32) [io703] 200 NOOP command successful.
(19:44:37) [io703] 426 Connection closed: The network connection was aborted by the local system.
(19:44:37) [io703] Reversed FXP started
(19:44:37) [io744] CPSV
(19:47:57) [io744] Timeout, Connection closed
(19:47:57) [io703] Reversed FXP started
(19:47:57) [io744] CPSV
(19:51:17) [io744] Timeout, Connection closed
(19:51:17) [io703] Reversed FXP started
(19:51:17) [io744] CPSV
(19:54:37) [io744] Timeout, Connection closed
(19:54:37) [io703] Reversed FXP started

(19:58:02) [glftpd] PORT 1,1,1,1,66,83
(19:58:03) [glftpd] 200 PORT command successful.
(19:58:03) [io744] STOR myfile.r46
(19:58:03) [io744] 150 Opening BINARY mode data connection for myfile.r46 using SSL/TLS.
(19:58:03) [glftpd] RETR myfile.r46
(19:58:03) [glftpd] 150 Opening BINARY mode data connection for myfile.r46 (15000000 bytes) using SSL/TLS.
(19:58:22) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_accept()), disconnected: Connection reset by peer.
(19:58:22) [i] [glftpd -> io744] Transfer Failed: myfile.r46
(19:58:22) [glftpd] Reversed FXP started
(19:58:22) [io744] CPSV
(19:58:22) [io744] 226-Transferred: 0.
(19:58:22) [io744] 426 Connection closed: Connection timed out.
(19:58:22) [io744] 226 ABOR command successful.
(19:58:23) [io744] 227 Entering Passive Mode (1,1,1,1,58,93)
(19:58:23) [glftpd] PORT 1,1,1,1,58,93
(19:58:23) [glftpd] 200 PORT command successful.
(19:58:23) [io744] STOR myfile.r45
(19:58:23) [io744] 150 Opening BINARY mode data connection for myfile.r45 using SSL/TLS.
(19:58:23) [glftpd] RETR myfile.r45
(19:58:23) [glftpd] 150 Opening BINARY mode data connection for myfile.r45 (15000000 bytes) using SSL/TLS.
(20:00:12) [i] (User Abort)
(20:00:12) [glftpd] ABOR
(20:00:12) [io744] ABOR
(timeout)

ioftpd-watch.exe just sits there doing nothing, and manual restart needed. And nothing in logs as usual.

Yil

04-22-2010, 02:07 PM

I'll add some more debugging for you pion... But what you are seeing is not the hardcore lockup bug else it would be caught after like 3 minutes. Does "site crashnow" work for you when your connected but it won't do stuff? I forget if you answered that one already.

o_dog: It should timeout after 20 seconds without a connection happening. It will not however timeout a connection that sends REALLY slow. But I think 10 minutes without any data at all and it does. I remember it being rather long.

pion

04-22-2010, 02:54 PM

site crashnow doesn't produce any dumps, or restart site. At the latest crash I had just now, I was able to log in once, and one session even responded to ABOR. But after login+ABOR, no new sessions where accepted. The first thread that hanged, was still dead. Site crashnow did nothing.

It's a pain to debug it, since I have to catch it at the excact couple of minutes it's crashing to be able to issue commands. Would be very helpful to be able to crash/debug externally, for when io stops accepting connections.. which is multiple times a day on some boxes :(

o_dog

04-22-2010, 03:34 PM

Yil: 10min is to long, should be MAX 2min.
(22:34:46) [2] 550-Init: 1, Aborted: 0, Socket: 1456, LastError: 0, Size: 0, Pos: 2267753480192

it doesn't timeout transfers or it takes 10min and since there is no way for the client to know if something is actually transfering, it makes it impossible for the client to know that it should abor the transfer.

Yil

04-22-2010, 03:34 PM

This was win2k3? There isn't a really easy way on that OS. Vista+ and you can create a dump directly from task manager. You should however be able to install windbg and/or Visual Studio Express (2010 edition might be out), both are free, and then you can attach the debugger to the process and create a dump at any time with the process in any state.

Windbg used to be downloadable via the "Debugging Tools for Windows" package, but now you need to get the Windows Driver Kit from MS (Link) (http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=36a2630f-5d56-43b5-b996-7633f2ec14ff). It's like 700MB but you can just install the Debugging Tools for Windows component without needing any of the other stuff.

Launch WinDbg
Click File->Attach to a Process
select ioFTPD from the list of running processes
.dump /ma c:\ioFTPD.dmp
zip or rar and upload to me

Yil

04-22-2010, 03:51 PM

o_dog: Yup, I'd agree. I didn't pick the timeouts, I just remembered it was long... 2 minute means > 1/2 KB/sec average when filling up 64K socket receive buffer, or a good internet spike and a couple re-transmissions at the max delay (30 sec or so) trying to get data through a spike in bandwidth.

Yil

04-22-2010, 04:16 PM

It just hit me... You're using nxMyDB right pion? I'm beginning to wonder if the MySQL library might be a factor. It must create sockets to talk to the remote database, and if they aren't being marked as non-inheritable it could trigger some sort of problem. Remember I fixed the TCL library and all of ioFTPD to never spawn child processes while inheritable sockets are being created. It's a SERIOUS mis-feature that Microsoft decided that all sockets by default should be inherited. I would have guessed just one connection was created by mysql's library and maintained as long as possible, but it's definitely something to look into.

I think I'll try and write a simple program that will complain if it finds it inherited handles to files/sockets/etc and we can try running that via a new custom command line in the .ini file... I can tell you that I was able to prevent ioFTPD from accepting new connections before I fixed it and TCL and this is starting to look similar. It's just not that common for applications to spawn thousands of child processes like we do...

Yil

04-23-2010, 12:58 PM

I looked into the source for MySQL and it just uses the very simple socket() call and I can find no reference to making that socket/handle un-inheritable anywhere.

I can understand why ioFTPD sockets that are setup to use I/O completion ports might have given us problems when leaked to child processes, but I can't really see why the same is true for a simple socket, but my guess is this is the problem. It's the same behavior I saw before.

I'll look into building mysql from sources, and then modify the client library piece it so we can get it to use the ioFTPD lock for process/socket creation and see if that makes a difference...

Yil

04-23-2010, 09:34 PM

* EXPERIMENTAL v7.4.5 - not for general use yet *

Pion: Here's a new version of ioFTPD, a customized nxMyDB, and a modified libmysql. The nxMyDB is based upon his v2.1 release neoxed was preparing in response to the changes in ioFTPD v7.2's exported functions. It has a couple of new features and requires that you slightly change you configuration file since it supports multiple database servers. It's a simple change though. Details are in the readme/changes file. The database schema/data are all the same though so nothing to do there.

Make sure you update the system/libmysql.dll file when you update the modules/nxmydb.dll file after upgrading to v7.4.5.

Link: ioFTPD-v7.4.5-exe-only.zip (http://home.comcast.net/~yil/ioFTPD-v7.4.5-exe-only.zip)
Link: nxMyDB-v2.1.1-custom.zip (http://home.comcast.net/~yil/nxMyDB-v2.1.1-custom.zip)

Yil

04-23-2010, 11:07 PM

Hmm, I just noticed that I may have forgotten to save the last change to the libmysql.dll file and the one I put in the nxMyDB might not actually do what it says. It will work fine, but may not mark the socket as non-inheritable... DOH!

Use this one instead:
Link: libMySQL-v5.1.46-ioFTPD.zip (http://home.comcast.net/~yil/libMySQL-v5.1.46-ioFTPD.zip)

This btw, just adds a user definable connection option or two. Thus it is fully compatible with any application. nxMyDB was modified to set these options.

pion

04-24-2010, 09:39 AM

I am using nxmydb 2.1.0 yeah. I will fire up windbg on 7.4.5, and your customized nxmydb 2.1.1 and produce some dumps for you. Hopefully they'll be helpful

pion

04-24-2010, 04:18 PM

It locked up with io745 aswell, but produced a dump now using your description of windbg.

Yil

04-25-2010, 02:16 AM

pion: This having a dumpfile to debug sure makes things I bit easier.

First off, the "main" ioFTPD thread doesn't really do anything in the server. It just sits around waiting on an event to tell it to exit because the server shut down. The latest versions now occasionally check to make sure another thread is updating a variable to catch the loader lock being compromised. So basically it does nothing...

However, in the dump I'm looking at the main thread's stack trace appears to be running code from the libmysql.dll library and it is stuck waiting for data from the database. The stack trace isn't complete, or is corrupted, so I probably need to rebuild everything with optimizing turned off instead of just generating debug symbols with optimized code. I've found that turning off the optimizer and turning on stack corruption checks often helps recover stack frames so I can see how the function is getting called...

Anyway, without a full stack trace I can't say how the main thread is doing this since it's in a loop using WaitForSingleObject and not WaitForSingleObjectEx so it isn't in an alertable state so it shouldn't be running any type of async procedure call or anything... So I guess something is trashing the stack of the main thread?

pion: This is running as a normal process and not as a real service right? Firedaemon will run it as a normal process I believe. I didn't see the service specific thread running like I would as a real service.

On the other hand, if we take the 7 procedures on the call stack that look valid (all MS except the last) at face value it looks like the yaSSL encryption package in MySQL's library is waiting for input on a socket and I guess it's stuck. It's supposed to be a non-blocking socket, but well, I guess it isn't or something... Or perhaps it's running in a loop and this is just where it was at the time of the snapshot.

* Can people using nxMyDB post what type of configuration they are using here? In particular are you using encrypted communications and is it via OpenSSL or via yaSSL? And what version or nxMyDB are you using?

pion: you mentioned you were running v2.1 already? I don't think that was released, and perhaps it's got a bug if we hear back that other people using 2.0 don't have problems. Seems unlikely, but you never know...

I noticed one other odd thing. For some reason the secur32.dll library (that MS encryption library I thought I just ditched) was loaded. Not sure who the culprit loading it is, but I don't remember seeing it locally after the switch but perhaps I missed something. Could be openSSL and/or libmysql's yaSSL load it. They might not need it to encrypt data but to use it's really good random number generator or something. Not a big deal, but I'm curious now...

pion: I'll see what I can do about getting another test version put together for you at some point so we can maybe learn more. Now that you have windbg installed we can also try Application Verifier as it uses windbg to generate stack traces when it notices something is wrong. Really useful and might tell us something. It's a simple download from MS, just configure it to watch ioFTPD.exe and turn on all the options...

Here are some "workaround" ideas:

Try switching to OpenSSL encryption in the nxMyDB configuration. If we believe the dump (send more please, let's see if it always shows up this way!) then switching to OpenSSL from yaSSL might get around an issue in the libmysql.dll library.

Also, try running without encryption (and over shared memory if possible to avoid the socket code altogether) on the machine hosting the database to see what that does...

Of course, it's also likely that not using nxMyDB might make it stable. The problem might not even be in nxMyDB but in libmysql and getting rid of both might help. Of course it sucks loosing it's features, but it might be worth a try to take one site off it, if that's possible, and see what happens if none of the above ideas work... At least it would help me a lot if I knew it wasn't involved in the lockups you are seeing.

* I'd also really like to here from other people running v7.4.3 and how it's working for them. Especially people who had problems with the lockup bug.

Yil

04-25-2010, 02:26 AM

pion: Just for the record, the debugger claims that no locks are stuck. Thus whatever is going on is not the classic lockup bug such as other people have encountered. Not sure why you would have trouble issuing "site crashnow" though...

pion

04-25-2010, 05:11 AM

Not sure what to call it, if not the lockup bug :P The same behaviour was present for io running previous nxmydb versions also, not just 2.1. However it is possible that it became more frequent with io7 and nxmydb 2.

I am using FireDaemon to execute it, running under the SYSTEM user account.

When I up Application Verifier, I was unable to get io starter proper. It never spawned ioFTPD-Watch.exe even.

I turned off the nxmydb encryption in config now, to see if it still locks up.

*Edit: I am now unable to attach windbg to ioftpd process. io locks up/hangs right away when I hit attach process. I just tested it on another box to be sure I did excactly as before, but without much success. I sent you a couple of dumps from the crashed io, from after I attached the process..

I also noticed something in debug.log from io 7.4.3:

04-24-2010 04:07:28 error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
04-24-2010 04:17:46 error:140773F2:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert unexpected message

04-25-2010 16:29:07 error:140773F2:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert unexpected message
04-25-2010 16:29:08 error:140773F2:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert unexpected message
04-25-2010 16:29:10 error:140773F2:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert unexpected message

in addition to

04-25-2010 16:29:07 AsyncSelectCancel flags: 0

On a side note, do you have some input on my preloading questions?

Yil

04-25-2010, 02:13 PM

pion: I think you can ignore the debug.log messages. At the moment it looks like I goofed the test for when to output the AsyncSelectCancel line so I'm printing the un-interesting case. Doh!

The SSL routines complaining about SERVER_HELLO errors are almost certainly caused by transfers where someone didn't enable SSL so it can't perform the handshake right after connect. This is just the type of information I was looking for, but you can safely ignore anything in the GET_SERVER_HELLO category right now.

Regarding the slow start problem. I found 2 bugs.

I said it would process the VFS= file, if defined, and then do the Default.vfs file. It does that, but instead of processing the individually specified directories (i.e. 1 = /dir) only once, it does it for both. The second time should be fast since it's likely in cache, but it's extra work and if stuff gets pushed out of the cache it's a lot of extra work.

Remember when I mentioned that enabling DEBUG=True would cause the server to wait until finished, but it would use lots of threads to work in parallel so it would be faster? That is true only for the first time it's executed. That is correct behavior, BUT with the above bug the 2nd time is for Default.vfs if VFS= was defined...

For the moment try Delay=TRUE, don't define VFS=, and don't define any individual dirs with a depth of 1 already defined in the default VFS file as they should already be done. My guess is the only thing you really want is Delay=TRUE set and loading times should go down.

When the server comes up, login and use "site dircache". What you want to look for is how many buckets have the same size as the max size which I think defaults to 100. If you see 10% or so them at 100 then too much stuff is being forced out of the cache during preloading and we can either up the size of each bucket, or the number of buckets to distribute dirs over. I added this command for just this type of tuning. You can post the output of this command here and I'll examine it for you if you want. I'd be interested in it anyway.

Long term I think I might have an idea to help everyone. I think it's possible to write the current directory cache to disk during shutdown and read it back in on the next startup. Thus whatever was in cache and/or popular before will be cached now. Preloading would then work against an already loaded cache and just fill in the blanks. This isn't a simple change, but would be really useful for very large sites. The next change would be to use that same code, but to limit it to just write out a single dir whenever some threshold is past. So a directory like a /XVID dir would have a .iofTPD.cache file or something in it which could be loaded and provided the timestamps match it wouldn't have to examine all the subdirs themselves.

Both of those changes would actually create more work for the server, but it would be spread out and the impact wouldn't be felt by the user so much...

Flow

04-25-2010, 02:44 PM

.iofTPD.cache in every release dir?

Ouchh.... can that file be stored in the /logs dir or somethin. Mainly maybe is a security reason. See, thare are some people that use FTP and e.g Torrent or DC to share stuff. And when they mount those release dirs. All .iofTPD.cache will follow aswell.,,, Um...you know what i mean?.... yea, maybe im thinking to much. Im aware of .ioFTPD.message file aswell.

Yil

04-25-2010, 02:56 PM

pion: I walked away and then realized I forgot to mention 2 things in my last post. Attaching a debugger to a running process can cause it to crash/fail. I haven't had this happen to me, but I read that it can. It's also possible that Application Verifier and you attaching windbg manually are not a good combo since AV uses windbg underneath to report the stack trace on errors found. Thus your recent problems trying to attach to the process might be solved if you disable AV. However, I'd let AV run and only attach windbg after it's already locked up... If you do want to run it under windbg all the time it's better to use the "Open executable" feature and then hit F5. Starting under the debugger instead of attaching to a running process is safer.

For the record however, the stuck process was in libmysql at the time it got stuck. The main thread was fine this time, but 2 worker threads only gave me partial stack traces but both were waiting for reads from the database. It looks like this was in a different read function but I presume this is from a process after encryption was turned off so it would be different. You might be able to enable that again...

I'm going to have to get to the bottom of why libmysql is having issues, especially in your custom version, but it is...

Yil

04-25-2010, 03:01 PM

Flow: Well, creating the .cache type files would be optional of course. It would also only happen when a configurable threshold of say 100 subdirs were in a directory which would mean it wouldn't happen for most dirs just large daily dirs or something like /XVID. This is far trickier than the start/stop cache bit anyway because you wouldn't want to update these .cache files on every change, but say every 15 minutes if there had been any changes to the dir...

Update, I see your trailer says 7.4.3. How's it working for you?

pion

04-25-2010, 03:24 PM

This box use 5 minutes+ before allowing connections

(22:02:14) [io_745] 200- Buckets (num=100, size=1000):
(22:02:14) [io_745] 200- 37 39 46 50 37 48
(22:02:14) [io_745] 200- 42 53 45 34 52 47
(22:02:14) [io_745] 200- 42 56 45 45 45 49
(22:02:14) [io_745] 200- 33 48 55 49 38 47
(22:02:14) [io_745] 200- 53 43 45 46 35 37
(22:02:14) [io_745] 200- 46 50 44 41 50 52
(22:02:14) [io_745] 200- 62 53 39 42 48 48
(22:02:14) [io_745] 200- 52 45 41 47 46 40
(22:02:14) [io_745] 200- 36 52 55 39 44 52
(22:02:14) [io_745] 200- 35 57 58 49 48 48
(22:02:14) [io_745] 200- 49 44 61 55 51 50
(22:02:14) [io_745] 200- 41 47 50 47 34 54
(22:02:14) [io_745] 200- 35 49 59 50 45 48
(22:02:14) [io_745] 200- 51 51 51 50 41 46
(22:02:14) [io_745] 200- 56 50 39 64 48 38
(22:02:14) [io_745] 200- 45 39 51 43 44 47
(22:02:14) [io_745] 200- 46 49 52 46
(22:02:14) [io_745] 200-
(22:02:14) [io_745] 200- Total: Dirs = 4676, InUse = 0, Locked = 0, Shared = 0
(22:02:14) [io_745] 200- Total: FileInfos = 173337, InfosUsed = 0, NoInfo = 0
(22:02:14) [io_745] 200 'dircache' Command successful.

I fail to attach to process on two different boxes, one where I had AV installed (tried to remove without success), and one without. Both places I also fail to start it using "open executable". io never starts proper, nor spawn ioftpd-watch.exe, and they both crash right away (refuse to allow any connections) when I attach to process.

Yil

04-25-2010, 03:35 PM

100 buckets of 1000 length and your length is 70ish tops. Nothing being flushed from the cache, it's fine. The real piece of info is the total FileInfos in use though. Perhaps I should separate out file, directory, and fake dir fileinfos into multiple counters so we can get a better handle on what is going on. If it's mostly dirs and fake dirs then 100K directory opens under windows is going to take a bit no matter what but I would have guess more like a minute or two...

What kind of machine is this? Are these just plain drives? Not remote filesystems? This whole thing isn't running inside a VM? Drives aren't truecrypted? How much memory is in the machine?

Flow

04-25-2010, 03:37 PM

Yil, is a clean 7.4.3 version with 2 script addon. nxTools and nxHelp. Right now im satisfied with no complains :p

Are thare anymore biggies to be fixed till -stable release? I feel like after this lockup SSL buggsquash hunt is over would be a time for a development line. Maybe you could posibly re-open the ioYil case again?

Yil

04-25-2010, 04:20 PM

Flow: I'm REALLY hoping that this switch to OpenSSL will fix the loader lock problem. I'm not aware of any other showstopping issues like that left in the plain no-addon server configuration. In fact I think nxTools and ioNiNJA are pretty happy these days as well. Pion's nxMyDB troubles from libmysql is annoying because I was hoping to offer MySQL as an option in ioYil for larger sites so I really do have an incentive to figure out what is going on...

pion

04-26-2010, 02:47 AM

Athlon XP 2600+ on 512 ram, but I've never seen io take up more than 80mb?. The harddrives are new sata in a hardware raid5 array.

I tried turning off SSL for nxmydb in config, and I hit another error:

(09:33:59) [io_745] CPSV
(09:33:59) [io_745] 226-Transferred: 0.
(09:33:59) [io_745] 426 Connection closed: Connection timed out.
(09:33:59) [io_745] 226 ABOR command successful.
(09:33:59) [io_745] 227 Entering Passive Mode (1,1,1,1,60,255)
(09:33:59) [glftpd] PORT 1,1,1,1,60,255
(09:33:59) [glftpd] 200 PORT command successful.
(09:33:59) [io_745] STOR file21.rar
(09:33:59) [io_745] 150 Opening BINARY mode data connection for file21.rar using SSL/TLS.
(09:33:59) [glftpd] RETR file21.rar
(09:34:00) [glftpd] 150 Opening BINARY mode data connection for file21.rar (15000000 bytes) using SSL/TLS.
(09:34:19) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_accept()), disconnected: Connection reset by peer.
(09:34:19) [glftpd] Reversed FXP started
(09:34:19) [io_745] CPSV
(09:34:19) [io_745] 226-Transferred: 0.
(09:34:19) [io_745] 426 Connection closed: Connection timed out.
(09:34:19) [io_745] 226 ABOR command successful.
(09:34:19) [io_745] 227 Entering Passive Mode (1,1,1,1,65,229)
(09:34:19) [glftpd] PORT 1,1,1,1,65,229
(09:34:19) [glftpd] 200 PORT command successful.
(09:34:19) [io_745] STOR file21.rar
(09:34:20) [io_745] 150 Opening BINARY mode data connection for file21.rar using SSL/TLS.
(09:34:20) [glftpd] RETR file21.rar
(09:34:20) [glftpd] 150 Opening BINARY mode data connection for file21.rar (15000000 bytes) using SSL/TLS.
(09:34:39) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_accept()), disconnected: Connection reset by peer.

then io hanged like the lockup bug. The weird part about this, is that it hanged more or less excactly at the same time I logged in through remote desktop to look up box details.. most likely random, but the process is really stuck like always. And windbg is having trouble connecting:

Break-in sent, waiting 30 seconds...
WARNING: Break-in timed out, suspending.
This is usually caused by another thread holding the loader lock

But I was able to connect and create a dump after. I also noticed io eating up 30mb more ram when I ran the dump, I've seen it before also.. is it supposed to do that? And io implodes, and process dies when I shut down windbg again. That happends regardless of what state the ioftpd.exe process is in atm.

Yil

04-26-2010, 11:50 AM

pion: Is App Verifier running and all the options it can check enabled? I'd really like to see if it's spotting something...

Regarding extra memory usage during dumps... Most processes in the computer don't have all their pages in memory all the time. Unused or rarely used memory pages are swapped to disk to make room for more commonly used pages in other applications. The size of the memory used by looking at the default "memory" column in task manager is not the applications true size. There are at least 5 ways to measure memory but committed pages should be closest to its complete memory footprint. So what happens when you create a dump? You have to write to the dump file all the memory of the process which means the system will swap in all those unused pages from disk and even the shared .dll files. Thus the memory footprint during dump creation is essentially it's committed size and would appear to you to be growing if you just looked at task manager's default column. On a low memory computer it might actually have to swap in some pages and out some already written pages so it's memory size might never equal it's commit size at any point in time, but trust me it would if it could.

During the generation of CrashLog.txt however ioFTPD loads parts of the .pdb files so it can print function names instead of memory addresses on the stack traces. So it's true that it also consumes more memory during dump generation from within ioFTPD. This should not happen when done via windbg or any other debugger though.

I'll have to look into why windbg is having issues connecting to the process. It's interesting to me that it mentions the loader lock, but obviously it isn't completely stuck as that should be caught now. I should also point out that if the loader lock really was stuck we would see that by doing an analysis of the dumpfile and this is not the case in any of your dumps thus far unlike other people who really did have this problem.

My guess is this is a windows socket/handle problem. ioFTPD could be doing something wrong and I'll try and review the code, but Application Verifier should be the first line of attack to determine this... It's also possible my attempt to fix libmysql didn't work or I missed something.

pion

04-26-2010, 12:53 PM

I might add that ioftpd hangs with ssl disabled in nxmydb config aswell.

I was having some problems with application verifier. I couldn't get io started proper due to io taking up all the ram on the box (about 400mb, and then it started paging), so I decided to kill the process. I tried it on a faster box, there it has the same behaviour, but the process dies during startup.. it does however spawn ioftpd-watch.exe, and start consuming ram, before it dies.

Yil

04-26-2010, 05:51 PM

Application verifier with some of the memory features enables does dramatically increase the application's memory footprint. You can try turning off some of the memory related settings and it should make far less of an impact...

pion

04-27-2010, 07:04 AM

All the boxes I've tried it on behaves somewhat the same, by crashing. So not really easy to debug :P

Yil

04-27-2010, 10:35 AM

Hey pion, I've got something for you to try.

Can you see if Log_Exiting_Worker_Threads under [Threads] is enabled, and if not enable it. Then check the debug log to see if you see any "Worker exit: ..." lines. I noticed you only had 5 worker threads and under load that might not be enough (don't change that number now, let's test first). If we find that threads are exiting then that gives me an idea of some things to look into. I already fixed a bug in the original timer design that broke if threads exited...

pion

04-27-2010, 12:13 PM

I'll enable it. I also got a couple of crashlogs, but ran out of space when writing them.. anyhow I uploaded what I got, in addition to some dumps using windbg.

pion

04-28-2010, 02:22 AM

04-27-2010 22:44:07 Worker exit: total=5, free=5, blocking=0, initial=5.
04-27-2010 23:06:35 AsyncSelectCancel flags: 0
04-27-2010 23:37:46 Worker exit: total=5, free=5, blocking=0, initial=5.
..
04-27-2010 22:59:00 Worker exit: total=5, free=5, blocking=0, initial=5.
04-27-2010 22:59:24 AsyncSelectCancel flags: 0
04-27-2010 22:59:34 AsyncSelectCancel flags: 0
04-27-2010 23:01:16 Worker exit: total=5, free=5, blocking=0, initial=5.
04-27-2010 23:02:28 error:14094417:SSL routines:SSL3_READ_BYTES:sslv3 alert illegal parameter
04-27-2010 23:02:39 error:14094417:SSL routines:SSL3_READ_BYTES:sslv3 alert illegal parameter
04-28-2010 00:32:53 Worker exit: total=5, free=4, blocking=0, initial=5.
04-28-2010 00:40:27 Worker exit: total=5, free=4, blocking=0, initial=5.

Also got some new symptoms on lockup bug. It transferred 0 on all transfers, but still appeared to accept the connection. When connecting windbg I got:

WARNING: Break-in timed out, suspending.
This is usually caused by another thread holding the loader lock

then after a minute or two it connected, and I was able to create a dump. After creating dump with windbg, it locked up completely denying login, and implode after I exit windbg as usual.

Yil

04-28-2010, 10:28 AM

pion: I might have forgot to mention this last time, but if you just exit windbg the process you attached to will exit as well. You need to detach from it to allow it to continue running. There might not be much point in doing that, since the process might be unstable and may have timed out transfers, users, etc, but thought I'd mention the most likely reason for what you were seeing when you exit windbg.

Now that we know that 5 worker threads is too small for you (I was sure it was but wanted proof) go ahead and bump that up to 10 or 15 and see if that makes a difference. Besides consuming a little more memory, it's probably going to be faster for you since the TCL interpreter on the extra thread you appear to need from time to time won't have to keep being created/destroyed.

I still need to put together the test for inherited handles as well. Turns out it was a bit trickier than I thought and I need to install the window's driver kit to do it.

pion

04-29-2010, 04:26 AM

Some new symptoms on hangup: It appeared to happend just as I was compressing the dumpfile, using 100% cpu.. low resource emulation could be something to try, to reproduce?

(11:23:26) [io745] 211 END
(11:23:26) [io745] CLNT FTPRush 1.1.2.19U
(11:23:26) [io745] 200 Noted.
(11:23:26) [io745] PWD
(11:23:26) [io745] 257 "/" is current directory.
(11:23:26) [io745] CWD /mydir/
(11:23:26) [io745] 250 CWD command successful.
(11:23:26) [io745] PWD
(11:23:26) [io745] 257 "/mydir" is current directory.
(11:23:26) [io745] STAT -l
(11:23:26) [io745] List Complete: 1*121 bytes in 0,08 seconds (14,37KB/s)
(11:23:26) [io745] SITE XDUPE 3
(11:23:26) [io745] 200-Extended dupe mode 3 is enabled.
(11:23:26) [io745] 200 Command successful.
(11:23:26) [io745] TYPE I
(11:23:26) [io745] 200 Type set to I.
(11:23:26) [io745] PASV
(11:23:26) [io745] 227 Entering Passive Mode (1,1,1,1,55,147)
(11:23:26) [io745] REST 0
(11:23:26) [io745] 350 Restarting at 0. Send STORE or RETRIEVE to initiate transfer.
(11:23:26) [io745] RETR ioftpd745_hanged_up_state.rar
(11:23:28) [io745] 150 Opening BINARY mode data connection for ioftpd745_hanged_up_state.rar (24619924 bytes) using SSL/TLS.
(11:23:42) [io745] ABOR
(11:23:42) [io745] Can't open data connection
(11:23:42) [io745] PASV
(11:23:42) [io745] 550-Init: 1, Aborted: 1, Socket: -1, LastError: 20500, Size: 24619924, Pos: 240518168576
(11:23:42) [io745] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(11:23:42) [io745] Connecting to io745

Yil

04-29-2010, 11:37 AM

pion: That looked interesting for a minute, but it looks like nothing more than whatever problem you are having with socket communications. First, I'm curious about that "SITE XDUPE" line. Do you have an addon for that? I was surprised to see the command was accepted...

The interesting part is the ABOR line. The client issued the command and never received a response. I'm guessing the server hung up trying to force close the socket for some reason. That could turn out to be a really useful clue to me though, so keep them coming.

The part that caught my eye originally was the 550 debug line + Active Transfer in progress err, but it turns out that is correct behavior. The server has not returned acknowledgment of the ABOR command or returned success/failure of the transfer itself. The client is not allowed to start another transfer until one of those 3 things happen. Now obviously the server is stuck and not happy, but what made that log look really interesting to me just isn't...

o_dog: Any ABOR lines like the above showing up? Still having that issue that I was blaming on the client?

o_dog

04-29-2010, 05:36 PM

Nah, i got problems with 10min timeouts it seems. thats my only problem at the moment. Dont really know why the transfer is stopping though.

Yil

04-30-2010, 12:18 AM

pion: I examined a number of the dump files you sent me and most of them are hung in closesocket. This is familiar territory for me. I eventually figured out that what was happening is we passed a socket handle onto a child process (which is easy to do unfortunately) and then tried to force the socket closed. This can result in the server hanging until the child exits or perhaps blocks in an alertable state, but scripts don't expect to need to do this and so they often never do. I discovered this by accident back in late v6 and fixed the server to protect socket creation. It wasn't until v7.0 that I realized I also had to prevent the TCL library from creating sockets while ioFTPD was creating processes. So as I mentioned above I had a thought that libmysql probably has the same problem which is why your configuration is different than others and they aren't seeing any problems...

Anyway, I setup nxMyDB and performed some tests against the modified library that I thought should have fixed this. Evidently I didn't do something right because I can confirm that it looks like a socket handle is being passed onto child processes by nxMyDB but which really means libmysql. Debugging through several layers of dynamically loaded libraries really sucks, but I'll figure it out for you.

I can't prove this is the problem, but given past history I'm pretty sure this is it. So there is hope yet :)

Flow

04-30-2010, 07:19 AM

hi Yil, can you check if IDENT port always open?. Can it be close if no one is using it?

Yil

04-30-2010, 10:29 AM

ioFTPD doesn't respond to IDENT requests, it only sends them. You can disable the sending of them by using a timeout of 0 in the config file if you want. As far as ports go it just uses one of the defined out_ports and thus is the same range as active transfers from the server would use.

Yil

04-30-2010, 03:11 PM

pion: I might have spoken too soon about the socket issue. I just noticed that the above test wasn't against the custom build I made for you. When using the new builds it does indeed look like I fixed the problem. So that means I no longer have a clue on how to fix this... Time to go bang the head against the wall some more and hope something else comes to mind.

pion

05-01-2010, 05:41 AM

I attempted some more stresstesting, and fired up 15-20 transfers at once. Then I managed to hang the site after about 15 minutes. Could there be some unresolved race condition or similar?

But in any case, just have some logic that can detect when transfers start acting up, and restart it would be a huge help also.. manual restarts can be really painful...

SSL transfer also seem to be a problem still from some sources:

GL: 426 Data connection: Broken pipe.
DR: 426- Unexpected end of handshake data
(13:54:04) [DR] 426- Connection reset
(13:54:04) [DR] 426 Connection reset

No idea about the reason, but I noticed this in the event log:

An SSL connection request was received from a remote client application, but none of the cipher suites supported by the client application are supported by the server. The SSL connection request has failed.

Not sure if it's related or not, but the box doesn't have any other functions

Yil

05-01-2010, 11:16 AM

I'm ahead of you on this :) I spent last night making the server try to connect to itself every minute to detect when something goes wrong so it can restart itself. I also changed the watcher program to detect a locked up server better. I'll roll those and a few more changes out soon.

I've had another person report issues with drFTPD over SSL getting that "Unexpected end of handshake data" message as well. That log message info is new though and almost certainly related. If you can connect to the drFTPD site and download from it directly what does the client say the encryption mechanism chosen was? I.e. was is aes256, aes128, or something else. I am aware that some Java implementations have a crippled crypto suite installed by default to use short keys and I wonder if that could be part of the issue.

I'm also not convinced that OpenSSL v1.0 isn't slightly different than older beta releases based on comments elsewhere.

Do any transfers from that site work, or does it randomly not work?

pion

05-01-2010, 11:37 AM

Transfers generally work fine from that site, but I randomly get "Network subsystem is unusable(10091)" in client after AUTH SSL command. But on reconnect, I get Encryption algorithm: TLSv1 DHE-RSA-AES256-SHA-256

Yil

05-03-2010, 02:06 PM

Good news, I can confirm that 2 other people who had issues with ioFTPD running under win2k3 are now doing fine. The switch to OpenSSL seems to have made things FAR more stable than before for them as they are crashfree now. Pion's use of nxMyDB however is still unstable...

I did some more research into OpenSSL and found some useful info. I see how to order/limit which ciphers we want to use, so I think I'll see if I can't let users in the .ini file order them if people want that. More importantly though I can apply the key bit length min/max fields from before to weed out unwanted crappy ciphers automatically which is something far more likely for people to actually use.

I also see that I am using default/pure OpenSSL, and that there are a handful of "bugs" in other SSL libraries that it can be told to work around. I might try enabling all those workarounds to see if that increases interoperability.

We are also using the latest v1.0 release of OpenSSL which was released in March 2010. I found this tidbit in their list of changes:
*) If no SSLv2 ciphers are used don't use an SSLv2 compatible client hello:
this allows the use of compression and extensions. Change default cipher
string to remove SSLv2 ciphersuites. This effectively avoids ancient SSLv2
by default unless an application cipher string requests it.

Thus as of the 1.0 release SSLv2 is almost dead and the type of client hello is different than before. This may be a cause for slightly different behavior with some sites when FXPing.

pion

05-04-2010, 03:24 AM

Bad news - io 745 is not working proper without nxmydb either.

(10:08:02) [glftpd] Reversed FXP started
(10:08:02) [io745] CPSV
(10:08:02) [io745] 226-Transferred: 0.
(10:08:02) [io745] 426 Connection closed: Connection timed out.
(10:08:02) [io745] 226 ABOR command successful.
(10:08:02) [io745] 227 Entering Passive Mode (1,1,1,1,65,54)
(10:08:02) [glftpd] PORT 1,1,1,1,65,54
(10:08:02) [glftpd] 200 PORT command successful.
(10:08:02) [io745] STOR file.r27
(10:08:03) [io745] 150 Opening BINARY mode data connection for file.r27 using SSL/TLS.
(10:08:03) [glftpd] RETR file.r27
(10:08:03) [glftpd] 150 Opening BINARY mode data connection for file.r27 (15000000 bytes) using SSL/TLS.
(10:08:22) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_accept()), disconnected: Connection reset by peer.
(10:08:22) [glftpd] Reversed FXP started
(10:08:22) [io745] CPSV
(10:08:22) [io745] 226-Transferred: 0.
(10:08:22) [io745] 426 Connection closed: Connection timed out.
(10:08:22) [io745] 226 ABOR command successful.
(10:08:22) [io745] 227 Entering Passive Mode (1,1,1,1,66,140)
(10:08:22) [glftpd] PORT 1,1,1,1,66,140
(10:08:22) [glftpd] 200 PORT command successful.
(10:08:22) [io745] STOR file.r27
(10:08:23) [io745] 150 Opening BINARY mode data connection for file.r27 using SSL/TLS.
(10:08:23) [glftpd] RETR file.r27
(10:08:23) [glftpd] 150 Opening BINARY mode data connection for file.r27 (15000000 bytes) using SSL/TLS.
(10:08:42) [glftpd] 435 Failed TLS negotiation on data channel (using SSL_accept()), disconnected: Connection reset by peer.

(10:03:22) [io703] Reversed FXP started
(10:03:22) [io745] 226-Transferred: 0.
(10:03:22) [io745] 426 Connection closed: Connection timed out.
(10:03:22) [io745] CPSV
(10:03:22) [io745] 226 ABOR command successful.
(10:03:22) [io745] 227 Entering Passive Mode (1,1,1,1,61,26)
(10:03:22) [io703] PORT 1,1,1,1,61,26
(10:03:22) [io703] 200 PORT command successful.
(10:03:22) [io745] STOR myfile.r09
(10:03:23) [io745] 150 Opening BINARY mode data connection for myfile.r09 using SSL/TLS.
(10:03:23) [io703] RETR myfile.r09
(10:03:23) [io703] 150 Opening BINARY mode data connection for myfile.r09 (14648 bytes) using SSL/TLS.
(10:03:43) [io703] 426 Connection closed: The specified network name is no longer available.
(10:03:43) [io703] Reversed FXP started
(10:03:43) [io745] 226-Transferred: 0.
(10:03:43) [io745] 426 Connection closed: Connection timed out.
(10:03:43) [io745] CPSV
(10:03:43) [io745] 226 ABOR command successful.
(10:03:43) [io745] 227 Entering Passive Mode (1,1,1,1,63,54)
(10:03:43) [io703] PORT 1,1,1,1,63,54
(10:03:43) [io703] 200 PORT command successful.
(10:03:43) [io745] STOR myfile.r09
(10:03:44) [io745] 150 Opening BINARY mode data connection for myfile.r09 using SSL/TLS.
(10:03:44) [io703] RETR myfile.r09
(10:03:44) [io703] 150 Opening BINARY mode data connection for myfile.r09 (14648 bytes) using SSL/TLS.
(10:04:04) [io703] 426 Connection closed: The specified network name is no longer available.

(10:18:38) [io745] STAT -l
(10:18:38) [io745] List Complete: 4*405 bytes in 0,11 seconds (40,05KB/s)
(10:18:42) [io745] PASV
(10:18:42) [io745] 227 Entering Passive Mode (1,1,1,1,63,227)
(10:18:43) [io745] RETR 6kbfile.nfo
(10:18:43) [io745] 150 Opening ASCII mode data connection for 6kbfile.nfo (5857 bytes) using SSL/TLS.
(10:18:57) [io745] ABOR
(10:18:57) [io745] Can't open data connection
(10:18:57) [io745] PASV
(10:18:58) [io745] 550-Init: 1, Aborted: 1, Socket: -1, LastError: 20500, Size: 5857, Pos: 240518168576
(10:18:58) [io745] 550 Active transfer in progress, terminate transfer with ABOR before proceeding.
(10:18:58) [io745] Connecting to io745

This behaviour repeats itself over and over and over and over for hours and hours - so it's clearly crashed. This is in fact, the excact same behaviour as the hangup bug, where it becomes impossible to connect to the ftpd after a while, except this time I can still connect, just not transfer anything. Upload, download, fxp to and from, from any type of daemon at the other side.

When issuing 'site crashnow' it does nothing other than write an entry in ioftpd.log (I could log in just fine in a new session)

When I attached windbg, io stopped accepting connections. However, once detached, io accepted connections again, but not a single transfer goes through.

Only upside to this is that I got a new dump for you..

Yil

05-04-2010, 11:25 PM

I don't know what to say pion. I'm hoping someone like Silly will report whether 7.4.3 is stable or not for them as it clearly wasn't before, but the other reports I'm getting are all "stable" votes. The only downside so far is SSL traffic from some sites is having issues and I think I figured out how to fix that by enabling the bug workarounds and turning off SSL tickets. Bigstar and FlashFXP seems to have run into the ticket issue with the newer OpenSSL builds and older java SSL libraries as well so I'm pretty sure that will fix the issue.

Is there anything else in common on these systems that isn't standard? Common firewall? I'll take a look again at the loaded modules from the dumps you sent me, but I didn't notice anything weird the first time I looked. I.e. netlimiter, nvidia firewall, nod32's internet module (v3, I think v4 is ok), etc. All of those are known issues... It looked like nxmydb or the libmysql library it used were the likely suspect but if you've removed reference to them then you got me... I'll keep looking, but shy of the next round of changes coming which I don't expect to do much I'm running out of things to try.

Oh, and remember to make sure you have enough worker threads. I can't say for sure that rare cases don't crop up if worker threads exit. There aren't any I'm aware of, but the fact that I had to fix one in the past means there could be more.

monk-

05-05-2010, 01:43 AM

First, I'm curious about that "SITE XDUPE" line. Do you have an addon for that? I was surprised to see the command was accepted...

ioPRECHECK, it supports XDUPE, an odog script, discontinued, though theres nothing to fix

plz dont give up on that lockup bug
(10:04:04) [io703] 426 Connection closed: The specified network name is no longer available.
(10:18:38) [io745] STAT -l
(10:18:38) [io745] List Complete: 4*405 bytes in 0,11 seconds (40,05KB/s)
u can see that stat -l is still transfering, so its prolly not a winsock or ssl problem

pion

05-05-2010, 02:57 AM

I can also add, that while in this state, transfers without ssl is giving the exact same behaviour when I'm connecting to a new session. In fact, when transferring without ssl it crashed just the same!! And it doesn't last too long either, 1 hour or less in some cases. (like the case without ssl)

*Edit: The excact same behaviour also present with encryption disabled alltogether.

And as usual, just restarting the process fixes the problem.

Yil

05-05-2010, 11:48 PM

I've been looking over the code and found a potential issue with aborting in-use connections which might explain why some people have a bigger issue than others. It also depends upon how graceful closesocket is which might vary upon the windows release so win2k3 vs XP might act differently...

Give me a few days to wrap up all the OpenSSL changes and a few other things and we'll see if 7.5 doesn't do the trick :)

o_dog

05-12-2010, 12:37 AM

hmmm, it seems i'm not alone with my problem on windows7 32bit. and it does seem more and more like this is a ioFTPD bug since no other application lockup like this no matter how big the transfer.

is this going to be fixed anytime soon?

irccat

05-12-2010, 06:17 AM

hi.Yil

hey.man. ver.ioFTPD v7.4.3

Is the use possible from window 2003?

my.computer.

Does not become from window 2003.....

Solution plan..Solution plan....

but.. use Is possible... ( IoFTPD.v5.8.5r )..

Yil

05-12-2010, 01:29 PM

irccat: I'm not sure what you're asking. What OS are you trying to use?

I've still got a few things to finish up before the v7.5 release but hopefully it should be out in a few days...

mr.babek

05-13-2010, 05:49 AM

Hey Guys,

I wanted to upgrade to the latest ioFTPD build, but it won;t start on w2008r2 server 64bits.
I get this error:

Log Name: Application
Source: SideBySide
Date: 5/13/2010 12:45:17 PM
Event ID: 33
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: ACHILLES
Description:
Activation context generation failed for "C:\FtpMain\ioFTPD\system\ioFTPD.exe". Dependent Assembly Microsoft.VC90.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8" could not be found. Please use sxstrace.exe for detailed diagnosis.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="SideBySide" />
<EventID Qualifiers="49409">33</EventID>
<Level>2</Level>
<Task>0</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2010-05-13T10:45:17.000000000Z" />
<EventRecordID>2125</EventRecordID>
<Channel>Application</Channel>
<Computer>ACHILLES</Computer>
<Security />
</System>
<EventData>
<Data>Microsoft.VC90.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8"</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>C:\FtpMain\ioFTPD\system\ioFTPD.exe</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
<Data>
</Data>
</EventData>
</Event>

Yil

05-13-2010, 11:35 AM

mr.babek: That is easily solved. I switched to dynamically linked runtimes instead of static ones. This is usually considered a good thing and for applications with installers they handle this behind the scenes. ioFTPD doesn't have an installer, so you'll have to do this part manually.

Grab the runtime from Microsoft: Visual Studio 2008 SP1 Runtime (http://www.microsoft.com/downloads/details.aspx?familyid=A5C84275-3B97-4AB7-A40D-3802B2AF5FC2&displaylang=en)

After installing it you should be good to go.

pion

05-14-2010, 12:44 PM

chmod = 1M

with flag 1
[1] site chmod 777 /TV-DIVX-ARCHIVE/_ARCHIVE01/Royal.Pains/
[1] 500 /TV-DIVX-ARCHIVE/_ARCHIVE01/Royal.Pains/: Permission denied (directory mode).

with flag M
[1] site chmod 777 /TV-DIVX-ARCHIVE/_ARCHIVE01/Royal.Pains/
[1] 200 'chmod' Command successful.

pion

05-14-2010, 01:37 PM

Reporting queued error: faulting application ioFTPD.exe, version 7.3.3.0, faulting module tcl85t.dll, version 8.5.2.8, fault address 0x0007ad1f.

Then 7 seconds later:

Subprocess monitoring failed due to subprocess is no longer active. The subprocess is probably dead. Restarting the process. Error detail: N/A

Yil

05-14-2010, 02:49 PM

pion: What are the permissions on the directory? Non-VM flagged users must have write permission to the directory if they wish to modify its permissions. The "Permission Denied (directory mode)" includes the reason why. The user probably isn't the owner and the mask is 755 or not a member of the group and the mask is 775 or something. Unless the directory is 777 a SiteOp who is a non-owner non-group member who isn't a VM flagged user cannot change that directories permissions. While not specifically applicable the "site perms [<what>]" command might be useful.

I have zero clue about that TCL error. I'll have to poke around but my guess is it failed to start a process for some reason... Server out of memory? Kernel memory? Handles? Check for something in the windows event log around the same time. I've never seen that before though.

pion

05-14-2010, 03:54 PM

Why cannot +1 users issue chmod, when config cleary sais they're allowed? Or better yet - how can +1 users chmod the directory to 777 ? I also sendt you some crash logs that might be helpful..

Yil

05-14-2010, 05:49 PM

I looked up the test for chmod. It's a bit more restrictive than I just said. You must be the owner (group isn't good enough) and have write access to the parent of the item via owner, group, or other. M flagged users are immune from all access checks so they can always do anything, but VFS Admins (V flag) are only immune from read/visibility checks so they would still require +w to the parent of the item, but are immune from the owner test. Of course VFS Admins could always change the permission of the parent(s) to grant themselves +w provided the entire directory tree isn't read-only all the way up to root.

This prevents everyone but Masters from deleting stuff in a directory marked read-only which is a safety feature. Note that if you are in a directory you can delete subdirectories that are read-only such as might happen with a zipscript. Normally you would use a Wipe script to delete stuff in read-only directories and it would confirm that you weren't doing something stupid...

But back to chmod. I can see why you might want SiteOps to be able to chmod anything they can see but normally things like that were done by VFS Admins as they are the people who can modify the filesystem. SiteOps were for everything else but generally dealt with user account management.

But I'm persuaded that there are cases where you might want to give someone the ability to chmod something they don't own without making them a VFS Admin. I'll add a .ini option to remove the owner test so SiteOps will work like VFS Admins which I think should solve your problem.

pion

05-16-2010, 04:51 AM

Any timeframe on when your new fixit all release will be out? Because I have to manually restart sites on an hourly (!!) basis, just to maintain some level of uptime..

o_dog

05-16-2010, 08:57 AM

YiL: any news on my bug? since there isn't much i can do about it, it seems?

pion

05-16-2010, 09:54 AM

You have the same bug as me o_dog.. the one that locks up the process if you're running nxmydb.

o_dog

05-16-2010, 11:08 AM

no, my transfers just stops for some reason and never resume. seems like it's ioftpd since no other app has this behavior. The thing is i have no idea what is happening since no error message is produced.

Yil

05-17-2010, 12:10 AM

o_dog: Because you only see this issue with FXP transfers it makes diagnosing it harder. I'm still not convinced there is a server error. However, there are 2 possible issues I'm looking at right now. The first is a slow sender and ioFTPD thinks the connection is still active. V7.5 will offer you the option to change the per-packet timeout to whatever you want in case that helps. On the to-do list for a v7.6+ release is FXP progress reporting so you could verify that it's still in progress but because that might affect my socket stability work I decided against it for the next release.

I've also found a huge bug! In my 10 connection at a time upload case for stefano I found that I was getting some weird results. It turns out the server was passing the port re-use flag to the bind function which means it could give out the same port to 2 different PASV calls with the possible result that 2 files would get switched when uploaded! I verified this goes back to 5.8.5 so it's nothing new. If you randomize the passive port range and use say 50 ports the chances are slim but it can happen. If you don't randomize (Random = FALSE for a Device) it's nearly impossible with a large range so that's a good work-around right now. My test case with just 2 randomize passive ports and 10 brand new connections all at the same time had issues, but that's why I try weird config test cases. I just don't personally use a FTP client that does multiple connections all at once so it never popped up before...

The reason this might be important is if you have a small passive port range and someone or many people all starting uploads at the same time it's possible the server thinks the transfer is in progress because it actually is! Just not yours... You might be able to confirm this if you can check the xferlog to see if the file that you got stuck on eventually reports itself in the transfer log. You can manually try to abort the transfer, which should work, and then check the log right away as well to see what it says. Check that it was being sent from the host you were FXPing from or that it was still in progress by looking at the total transfer time field. Another clue would be uploads to a .sfv verified dir and the server reporting a crc failure.

This is a bad error for non-sfv dirs to handle though. I can't believe we've gone this long without people diagnosing the problem and *****ing about it...

pion: I'm pretty convinced at this point that using my custom nxmydb and libmysql probably means neither are at fault. I don't know why you have this problem so often yet, but I've got 2-3 really good changes in v7.5 and we'll have to hope it helps. The restart logic is VASTLY better so even if not fixed it will be all automatic soon.

I still need to finish up a couple of things but should be out in a day or so provided I don't find anything new. I've probably put over 60 hours in the last 2 weeks alone on this, so I'm working as fast as I can :)

pion

05-17-2010, 03:55 AM

That's great news Yil, looking forward to test a new version now. Just got a bit worried when you said "a few days" the first time, then no updates for considerably longer than that, hehe.

Does your vastly improved restart logic mean that the daemon can restart in 1s instead of 10 minutes (as dir cache, or whatever is going on, is dragging its feet) ?

Flow

05-17-2010, 04:04 AM

/me drewl ...