View Single Post
Old 01-18-2008, 02:25 AM  
Yil
Too much time...
 
Join Date: May 2005
Posts: 1,194
Default

I had a fragile solution that I used for a while a few years ago. I wrote a perl script that would connect to an FTP server and request a recursive directory listing of the server (anybody remember me adding that into the very first 6.0 release? hehe). If that failed because it wasn't supported or wasn't allowed it would manually walked the FTP creating the listing. I ended up putting a lot of work into making that latter case very efficient. It would compare the previous listing to what was on the server and eliminate listing directories whose timestamps matched that didn't have subdirectories (or whose local subdirs were excluded via a regex so you could make it ignore complete tags, etc). It also would just list directories without cwd into them and try to remember dirs that had permission issues before so if nothing had changed it would avoid trying them again. Finally, it would create the remote directory tree locally and copy matching files (like *.nfo, *.diz, etc) locally. I eventually wanted to make it a true mirror script where the copy feature could be to a remote machine but I never finished that bit.

I then created a simple search index based upon the local mirrors of the remote servers and sorted them by just the last directory name to get a name to location index. I also wanted to infer the release date of something from the paths and find a way to store everything in memory efficiently but didn't implement either.

The other piece of the puzzle I wanted was driving updates externally rather than on a timer for even faster updating. The timer event just checked a few dirs for changes so it was fast but this way it would catch rare changes as well. My solution was to request new and delete dir events from the ioftpd logfile that had been added since the last time it was checked and to watch messages in a spam channel. This combined with a periodic refresh to catch manually moved files/folders behind the FTPs back would seem to work well and couldn't hurt.

I probably should dig that stuff up from a few years ago, but it makes a template for a solution for you as well. Use a similar tool that creates a full listing (which is actually cool because it could verify that dirs and files were the same across multiple FTPs), or use any mirroring script (there are many, and even includes things like SuperFlexibleFileSynchronizer) to duplicate the remote file system directory tree and then search that from a prepared listing such as generated from find in cygwin or a similar windows tool.

Neoxed's solution would work very well for cooperating sites. The idea being that each site updates the database with changes and runs around and double checks nobody moved anything every once in a while is a great way to do things since everything is pretty much already there and it could easily update 2 databases. If you added a way to support non-participating sites that would cover all the bases.
Yil is offline   Reply With Quote