Yil
04-17-2010, 02:24 PM
I know it's possible to leave feedback on google's code stuff, but I'm lazy and this kind of long... I just thought I'd post a couple of bugs, ideas, and some lessons learned in my work with ioYil.
1) Moving directory trees doesn't update subdirs in the file/dir databases. Thus if you move a dated dir like 01xx or something none of the subdirs returned by search/dupe/etc are accurate until a rebuild is done.
2) There isn't a way to tell nxTools to rescan a directory tree for changes. This is useful after something like warchive goes and deletes stuff, you either have to rebuild the entire database including the possibility of scanning hundreds of thousands of dirs in an archive type folder or live with the results being off for a while. I think I used a mark and sweep approach. I used waitobjects to prevent more than one rescan at a time, and would set a column of all matching paths to be rescanned to 1 with a simple update call. Then I'd process all the dirs and for each one found set it back to 0 or add it with a 0 in it. And then wipe all records under that path with a 1 still in them when finished scanning. This allowed normal operations to continue so it didn't matter how long the rescan took. I intended to use the column as a bitmask so I could let multiple runs at the same time go forward but figured that for a later update. Also, without something like this all new dirs created while the rescan is taking effect are lost as happens now.
3) The new [resolve list] feature is insanely useful at walking directory trees! It does absolutely everything. Just keep calling it as you recursively descend its results. This is the perfect solution to #1 and #2.
4) See my comment in the general section post by isteana about auto-wiping and how it might be done. I assumed 2 or 3 new columns in the dupeDB with an index across them. The first was the drive letter, the second was whether it was wipeable (must have +x bit on all dirs in path), and the last was directory wipe order which could just be the creation time or some fudged value of that.
5) I made use of a single small loader script in ioYil which was the entry point for every call from ioFTPD.ini. It would test for a global variable that indicated it was initialized and if found then it had no need to source anything. This created a big performance improvement over having to read/parse large scripts on every command invocation. I compared [config counter] to the stored value to catch when to reload stuff.
6) Going along with the above it meant that ioYil was an all or nothing script. Either it was configured and ready to run correctly or it refused to do anything which meant configuration errors were caught immediately and not later. In cases where it had to initialize itself it would read the config file, for each module/feature the user wanted it would load and verify what was needed. In my implementation modules registered script keywords it would process from calls in the ioFTPD.ini file as simple entries in an array. Loading all the modules at once almost never made it take longer from a user's perspective because of the TCL interpreter preloading feature which pre-creates them for each worker thread on startup and after rehashes. This it was almost always done before a user actually needed it. I just called the loader script with an an initialization type command via init.itcl.
7) Global "state" for the script were in a [var] variables (the result of [array get]) and protected via a [waitobject]. There were a few instances I did this where it made more sense than using the database.
8) I used OnNewDir coupled with OnFailedDir/MKD(Post event) to catch race conditions on directory creation and failure. Without the new OnFailedDir you either had to hope the directory actually got created right after adding it to the dupeDB in OnNewDir (in practice that's a good bet) or do it via MKD which was trickier and allows race conditions. It's now possible to cover all bases.
9) I would suggest eliminating the customized sqlite implementation so it can be updated easier as sqlite fixes their own bugs. I was able to use either newer features of sqlite, or modified tables to do everything without changes. I think I kept the dirpath, the name, and the full path as separate entries so I didn't need the length based comparison function I think you used. There is also a new way to register your own operators at runtime which is cool but not sure of the performance. I think I also used the NOCASE modifier for the table column so case didn't matter.
10) I would REALLY recommend not making the user define something for use by the script that already exists elsewhere. In particular with the [mountpoints] and [sections] commands it's trivial to infer the configuration so no drive paths will ever be required. Everything in ioYil was via VFS path against the default.vfs file. That meant a user just had to edit the .vfs path and the script would immediately work.
11) Use the ioArgs variable! It's a HUGE help since you no longer have to parse OnNewDir style quoted args! And it never gets it wrong now! This was a big change and cut out a lot of complexity from ioYil once I decided to make the server do the work for us :)
12) I forget if nxTools used physical dir paths or VFS paths, but ioYil was a weird hybrid. It handled everything as VFS paths until the last minute and then resolved it to real paths. I'd be interested in discussing pros/cons of both if interested.
13) I'd recommend against symlinks that need to be updated all the time like latest dirs, etc. Use the virtual directory feature now. I wrote the nxSearch.itcl script as a quick example of how to create search results on demand and it just pulled the info from your dupeDB! It's really cool. I intended to use similar scripts for my own /Latest, /Incomplete, /Search, etc.
14) If possible assume drive spanning is possible. You could use [Resolve list] or just use [resolve mount] to get the real paths in case there is more than one. This is important for instance when determining if a directory has subdirs, files, etc.
15) And please... don't hesitate to ask for a feature. Check out the new iTCL.txt file for all the commands and Events.txt for the events. Almost all the new features were because I wanted to do something in ioYil. Originally I stupidly made the script do something which took a lot of work and then got smart later on and made the server do it... No sense you doing the same.
As an example, the server currently has a real to VFS path converter which it applies to the target of NTFS junctions in "symlink" mode so it can show those dirs as the links they really act like. I think I found a way around needing this in ioYil, but if you do, it's simple enough for me to wrap and export the functionality for you so don't need to write it and the server/script will be consistent in providing this function.
1) Moving directory trees doesn't update subdirs in the file/dir databases. Thus if you move a dated dir like 01xx or something none of the subdirs returned by search/dupe/etc are accurate until a rebuild is done.
2) There isn't a way to tell nxTools to rescan a directory tree for changes. This is useful after something like warchive goes and deletes stuff, you either have to rebuild the entire database including the possibility of scanning hundreds of thousands of dirs in an archive type folder or live with the results being off for a while. I think I used a mark and sweep approach. I used waitobjects to prevent more than one rescan at a time, and would set a column of all matching paths to be rescanned to 1 with a simple update call. Then I'd process all the dirs and for each one found set it back to 0 or add it with a 0 in it. And then wipe all records under that path with a 1 still in them when finished scanning. This allowed normal operations to continue so it didn't matter how long the rescan took. I intended to use the column as a bitmask so I could let multiple runs at the same time go forward but figured that for a later update. Also, without something like this all new dirs created while the rescan is taking effect are lost as happens now.
3) The new [resolve list] feature is insanely useful at walking directory trees! It does absolutely everything. Just keep calling it as you recursively descend its results. This is the perfect solution to #1 and #2.
4) See my comment in the general section post by isteana about auto-wiping and how it might be done. I assumed 2 or 3 new columns in the dupeDB with an index across them. The first was the drive letter, the second was whether it was wipeable (must have +x bit on all dirs in path), and the last was directory wipe order which could just be the creation time or some fudged value of that.
5) I made use of a single small loader script in ioYil which was the entry point for every call from ioFTPD.ini. It would test for a global variable that indicated it was initialized and if found then it had no need to source anything. This created a big performance improvement over having to read/parse large scripts on every command invocation. I compared [config counter] to the stored value to catch when to reload stuff.
6) Going along with the above it meant that ioYil was an all or nothing script. Either it was configured and ready to run correctly or it refused to do anything which meant configuration errors were caught immediately and not later. In cases where it had to initialize itself it would read the config file, for each module/feature the user wanted it would load and verify what was needed. In my implementation modules registered script keywords it would process from calls in the ioFTPD.ini file as simple entries in an array. Loading all the modules at once almost never made it take longer from a user's perspective because of the TCL interpreter preloading feature which pre-creates them for each worker thread on startup and after rehashes. This it was almost always done before a user actually needed it. I just called the loader script with an an initialization type command via init.itcl.
7) Global "state" for the script were in a [var] variables (the result of [array get]) and protected via a [waitobject]. There were a few instances I did this where it made more sense than using the database.
8) I used OnNewDir coupled with OnFailedDir/MKD(Post event) to catch race conditions on directory creation and failure. Without the new OnFailedDir you either had to hope the directory actually got created right after adding it to the dupeDB in OnNewDir (in practice that's a good bet) or do it via MKD which was trickier and allows race conditions. It's now possible to cover all bases.
9) I would suggest eliminating the customized sqlite implementation so it can be updated easier as sqlite fixes their own bugs. I was able to use either newer features of sqlite, or modified tables to do everything without changes. I think I kept the dirpath, the name, and the full path as separate entries so I didn't need the length based comparison function I think you used. There is also a new way to register your own operators at runtime which is cool but not sure of the performance. I think I also used the NOCASE modifier for the table column so case didn't matter.
10) I would REALLY recommend not making the user define something for use by the script that already exists elsewhere. In particular with the [mountpoints] and [sections] commands it's trivial to infer the configuration so no drive paths will ever be required. Everything in ioYil was via VFS path against the default.vfs file. That meant a user just had to edit the .vfs path and the script would immediately work.
11) Use the ioArgs variable! It's a HUGE help since you no longer have to parse OnNewDir style quoted args! And it never gets it wrong now! This was a big change and cut out a lot of complexity from ioYil once I decided to make the server do the work for us :)
12) I forget if nxTools used physical dir paths or VFS paths, but ioYil was a weird hybrid. It handled everything as VFS paths until the last minute and then resolved it to real paths. I'd be interested in discussing pros/cons of both if interested.
13) I'd recommend against symlinks that need to be updated all the time like latest dirs, etc. Use the virtual directory feature now. I wrote the nxSearch.itcl script as a quick example of how to create search results on demand and it just pulled the info from your dupeDB! It's really cool. I intended to use similar scripts for my own /Latest, /Incomplete, /Search, etc.
14) If possible assume drive spanning is possible. You could use [Resolve list] or just use [resolve mount] to get the real paths in case there is more than one. This is important for instance when determining if a directory has subdirs, files, etc.
15) And please... don't hesitate to ask for a feature. Check out the new iTCL.txt file for all the commands and Events.txt for the events. Almost all the new features were because I wanted to do something in ioYil. Originally I stupidly made the script do something which took a lot of work and then got smart later on and made the server do it... No sense you doing the same.
As an example, the server currently has a real to VFS path converter which it applies to the target of NTFS junctions in "symlink" mode so it can show those dirs as the links they really act like. I think I found a way around needing this in ioYil, but if you do, it's simple enough for me to wrap and export the functionality for you so don't need to write it and the server/script will be consistent in providing this function.