Cant seem to get the skiplist to work [Archive]

View Full Version : Cant seem to get the skiplist to work

brackebuschtino

04-03-2014, 12:39 PM

Hello, i am trying to get the skiplist to work without luck. I checked the help file and used several regex testers to validate my regexs and found them being valid. But the folders aren't highlighted nor skipped as expected. After reading several threads here in forum i checked what you repeatedly responded:

no per site skiplist in use (yet, but planned to have)
global skiplist:
"Enable skiplist" > checked
"Skip 0 byte files" > not checked
"Skip emtpy folders" > checked
"Display skip items in" > red
"When to skip" > Both
Mask > .*[^(substring)]$
Compare > Name of Folder

I want to match all folders that DO NOT MATCH the substring. It is planned to extened this substring by other substrings via pipe symbol, which is why i used the grouping. This pattern doesn't work.

This mask doesn't match either.

Mask > .*substring$
Compare > Name of Folder

I also tried to use 'Path of Folder' for comparison since on the server i am at 3rd or 4th sublevel below root. But this didn't fix the issue.

I don't understand why and feel pretty helpless since i cannot debug the transfer. Where is my mistake?

bigstar

04-03-2014, 01:56 PM

FlashFXP uses a tiny subset of the regular expression syntax.

It doesn't support a reverse evaluation, such as matching everything but <pattern>

See FlashFXP Help File - Pattern Matching (https://oss.azurewebsites.net/webhelp/Topics/pattern_matching.htm) for more information.

brackebuschtino

04-03-2014, 02:23 PM

With the help of that site i created my non-working regexp. I know it is not supporting reverse evaluation which is why i tried to combine the substrings as groups inside the [^<char set>] notation, which means to match if not containing any of the characters in <char set>. So, if this is supported, why don't groups of substrings like [^<(group1|group2|group3)>] work? Matching a single substring and matching a group of substrings works, Somehow it should be possible to negate that expression, shouldn't it?

Anyway, taking your suggestion to use *.*substring* instead of .*substring$ doesn't work either. The folder containing the substring doesn't get highlighted in red - neither when comparing by Name of Folder nor Path of Folder.

Please, can you tell me, what to configure for a server path of /level1/Data_From_(2014)-Substring taking care for the data inside that can have any extension and also futher directories?

bigstar

04-03-2014, 03:24 PM

If the pattern you're trying to match ends with "-Substring" then you could do *-substring and compare by any

If that works you can try to refine it by using name of folder instead of any and see if that provides you with satisfactory results.

brackebuschtino

04-03-2014, 03:38 PM

That works. Now the final problem is to negate this expression in either way just to drop these.
If this is really impossible. Would you mind to implement an allow list so one can save oneself from writing down hundreds of entries into the skiplist just to enable the filtering of only one match? A allowlist would solve the circumstance of unsupported regexes.

What you say?

bigstar

04-03-2014, 09:17 PM

I think it would be more productive to add PCRE (Perl Compatible Regular Expressions) support.

This is something I can implement right now in FlashFXP v5.

My thought is to add support using the following syntax "regex:<pattern>", this way both can co-exist.

Here's a v5 build with PCRE support using the regex:<pattern> syntax as mentioned above.
http://get.flashfxp.com/5.0/FlashFXP50_3724_Setup.exe

Currently regex is limited to the skip list and highlights but I will add support for everywhere else in the next update probably tomorrow or monday.

brackebuschtino

04-04-2014, 05:10 AM

PCRE support sounds great. Thanks for providing a beta that fast. Given that positive and negative lookahead (http://www.regular-expressions.info/lookaround.html) is supported then? I installed the linked beta version and added the filter .*-(?!(pattern1|pattern2)) to match all subdirectories names that do not match any of the grouped patterns, but get no results.

A general thought regarding your solution:

Actually i highly appreciate extended or full RegExp support. But, I suppose, this is the faster way to implement my suggestion regarding an allowlist. However, i'm afraid that extended regex capability will also require more support for non-experienced users that want to have some kind of allow list. From the logical aspect skipping content to me means to skip "everything that DOES NOT MATCH a rule" specified as filter. Whereas an allowlist to me means to allow "everything that DOES MATCH a rule" specified as filters.

From the UX point of view one never knows what kind of content a directory may contain. So, making use of the skiplist requires that a server contain semantically equal content so that defined rules will - once defined - skip content as wished. But since many public servers host content one can never know in advance the better way to filter content is to define all stuff one would like to allow for download. This would dramatically decrease the effort required to define skip rules since one does not need to sit by and watch if some content may slip through since its pattern was forgotten to be added to the skiplist.

Of course i do understand that coding another configuration level globally and per site requires more effort from you, but i'm pretty sure that in the end users - especially non-developers - were much more thankful for this feature than for the PCRE support alone. I think you get what i mean.

bigstar

04-04-2014, 10:59 AM

Make sure that the line you add into the skip list or highlights is added as
regex: .*-(?!(pattern1|pattern2))As I previously indicated regex support in this build only works in the skip list or highlights.

I plan to expand that to the rest of the program today.

What might be more suited for what you desire is to use the Selective Transfer feature.

FlashFXP > Tools > Selective Transfer > Edit

This allows you to create multiple rule sets, a rule set is a individual set of rules that determine whether or not a file is skipped or transferred. Rule sets are completely independent of the skip list.

You can quickly switch between active rule sets via the dropdown arrow to the left of the "transfer queue" toolbar button.

You can also bind selective transfer rule sets to individual queue items or scheduled tasks.

The important thing to remember about selective transfer rules is that by default the default rule set is always used unless you manually switch to another rule set or change the rule set for the queue items.

The next build will allow regex in the selective transfer rules.

brackebuschtino

04-04-2014, 11:54 AM

Make sure that the line you add into the skip list or highlights is added as
regex: .*-(?!(pattern1|pattern2))
I added this expression one including regex: and another time without it and both patterns do not make folders appear highlighted in red. I dunno if i do something wrong or if the expression is not correctly processed. I use the exact same settings as posted in my opening post but only with a different regex and nothing is highlighted. :question:

The only one time when a folder was highlighted was when i applied your previous suggestion
If the pattern you're trying to match ends with "-Substring" then you could do *-substring and compare by any
which matched the wrong folder.

Regardint the "Selected transfer":

I tried to manage my need via this feature longer time ago but dropped it since i couldnt figure out how the rules must be expressed (language/pattern/etc.). I also checked the help file but couldnt get any results. Dunno remember anymore if it was for the same reason (no pattern negation).

Can you provide me with an example for my case, please?

bigstar

04-04-2014, 04:31 PM

Pattern:
regex: .*-(?!(one|two))Set the Scope to "Name of folder" to skip the folder, or to "any" to skip the folder and all content within it.

and the result would be
dir-one
dir-two
dir-three
dir-four
something

If its not working like that please give me a little more time to finish these changes and I'll link you an update.

You'll be able to use the pre-existing pattern matching or the new regex pattern matching anywhere within FlashFXP that uses pattern matching.

brackebuschtino

04-04-2014, 06:04 PM

Thanks for the response. It is not (yet) working. I will wait now for your next release. Thanks for your effort! :)

bigstar

04-04-2014, 07:15 PM

Here's the update
http://get.flashfxp.com/5.0/FlashFXP50_3725_Setup.exe (http://get.flashfxp.com/ftp/client/download/5.0/FlashFXP50_3725_Setup.exe)

I've made a small change to the syntax prefix
rx .*(txt|log)$

notice instead of regex:<space> the prefix has changed to just rx<space>

bigstar

04-04-2014, 10:37 PM

I discovered a couple minor bugs.
http://get.flashfxp.com/5.0/FlashFXP50_3726_Setup.exe

brackebuschtino

04-05-2014, 06:24 AM

Thanks for the update. I get at least more matches, but not (yet) the desired behavior. When testing the pattern with an online PCRE regex tester (http://www.regexplanet.com/advanced/php/index.html) i discovered a potential reason for that.
Given these two folders
/subdir/subdir/.../Author1_-_Title1_(1234)-Publisher1
/subdir/subdir/.../Author2_-_Title2_(5678)-Publisher2

And given this regex
rx .*-(?!publisher1)
i would expect that the second folder would be highlighted in red, but in fact both folders are highlighted because the whole path /subdir/subdir/.../Author1_-_Title1_(1234)-Publisher1
/subdir/subdir/.../Author2_-_Title2_(1234)-Publisher2
matches this pattern.

The desired behavior is this:
/subdir/subdir/.../Author1_-_Title1_(1234)-Publisher1
/subdir/subdir/.../Author2_-_Title2_(1234)-Publisher2

Also, is there an explanation available explaining how the different regex scopes are evaluated? I mean, i would love to understand how the evaluating routine expects to see the line to be evaluated. Get what i mean? Or differently: Beginning from where up to where is the path that is evaluated for "Name of File" pattern? Same for the other patterns. Understanding this would help to build working patterns. Sometimes is only one missing character or character to much that might prevent a pattern from working. And since there is no way to debug the programm searching for the reason might become a search for the needle in a haystack. :dizzy:

bigstar

04-05-2014, 09:18 AM

The compare scope is outlined in the help file, F1 from the skip list tab points you to where its explained.
FlashFXP Help (https://oss.azurewebsites.net/webhelp/Index.htm?context=740)

I think what you need is something like this
((.*)-!?(publisher1$|publisher2$))I found the following website very good at visually seeing the logic path used for the comparison
https://www.debuggex.com/
And they have a very nice cheatsheet
https://www.debuggex.com/cheatsheet/regex/pcre

For quickly testing regex within FlashFXP I found the Mask Select feature (Ctrl+S) works very well and a big plus is that you can test against the file listing to see exactly what is matched based on the selection.

I had thought about a way to test and evaluate expressions and while I could add a way to do this within the filters dialog it would limit the functionality to this single area, since the goal is to have regex supported everywhere there is no simple way to add a test for each situation. Right now I think using the Mask Select is a good way to evaluate the pattern.

I have discovered a couple more places where the new regex style pattern matching isn't working.
Tools > Server file search > result sub-search
View > Active Edits > search
Options > File Associations > File Patterns

As far as I know regex is working every where else, if not please let me know. I will re-test again once these have been corrected, however this probably wont be until Monday at the earliest.

One thing I forgot to mention is that currently the regular expressions are case-sensitive.

This can be changed using the i flag to ignore case, I am not sure if this should to be changed to use case insensitive matching by default or not. However since the original pattern matching is case insensitive it would make sense that should be as well.

brackebuschtino

04-05-2014, 11:05 AM

Beginning with the last point:

I was already wondering if the regex evaluation might be case (in)sensitive. Since i am used to write regular expressions like /^(the|ex|pre|ssion)/i i didn't have the impression that the format used in the filters dialog allows for any switches. At least i had no idea where to add them. Simply append? Or use delimiter like generally used?

Regarding the filter box:

I found that its pretty unhandy to open the filter box, type the regex and then have to close it to see if it matches to open it again and try another version - especially as there seems to be no hotkey for it. Much handier would be an immediate feedback within the underlying windows ... a real-time 'onChange' ... evaluation so to say. Do you think that could be realised?

Regarding other places having issues with regexes:

I think not every section requires to support regexes. Take the 'Options > File Associations > File Patterns' section. File type associations typically look like
*.ext1, *.ext2, *.ext3, ...
In my opinion the only thing a regex could do here is to group these like so
\.(ext1|ext2|ext3|...)
In the end all extensions must be listed which - to me - excluded this section to support regex.

Regarding the testing recommendations:

Thanks for these hints. I'll definately check them out. Didn't know about the 'Mask Select' feature. There seems to be much more under the hood i didn't know of so far. Thank you! :cool:

Basically, the whole topic is not that urgent to spend all your spare time on it. Its weekend. Enjoy it!

bigstar

04-09-2014, 09:00 AM

I made some additional improvements, you can download the latest build from within FlashFXP via the main menu under Help > Check for new version.

The main improvement is that if you enter an invalid regex syntax the field background color turns light red and I added an Apply button to the filter box so that you can apply the changes without closing the dialog.

I also changed the regex to ignore case by default.

I think you might be right about the File Associations, for now I have held off on implementing regex for this.

brackebuschtino

04-09-2014, 09:36 AM

Thanks a lot for your effort in this!

I also changed the regex to ignore case by default.

I'm afraid this is not a good idea. In my case in fact it makes a difference. Better to make it case sensitive by default and allow for adding a switch, because one can turn off the case sensitivity rather than turning it on.

Just one thing i'd like to come back to:

Given the following example regex how or better where would the common regex switches (i,m,u, etc.) be placed in the specific notification for this app? Append?

rx .*-(?!publisher1)

bigstar

04-09-2014, 01:52 PM

I did not realize that PCRE does not have a way to turn off case sensitivity, This puts some kinks into my plan, I will need to come up with another way of handling "ignore case" perhaps a global setting where this can be toggled off for those who need it.

In this type of situation I imagine that 9 out of 10 times you would not want case sensitive matching, I am not even sure if I can come up a good example where I'd need case sensitivity.

The ideal solution would be to make this part of the entry settings but this is going to require some design changes. Something I am not sure if I can justify at this time.

It took me some time to figure out the proper way to ignore case with PCRE, I am not 100% sure if this is correct.
rx (?i).*-(?!publisher1)

DayCuts

04-10-2014, 10:57 AM

Wrote quite a lengthy/detailed response breaking down the problems with your attempts, the misunderstanding about how lookarounds work, and how to design a pattern that works but my browser crashed so you will just have to settle for the footnotes and research yourself to get a better understanding.

Pure PCRE solution:
(?im)^(?!.+-(publisher1|publisher2)$).+$
/subdir/subdir/.../Author1_-_Title1_(1234)-Publisher1
/subdir/subdir/.../Author2_-_Title2_(1234)-PublisherX
/subdir/subdir/.../Author2_-_Title2_(1234)-Publisher2

Other notes:
What might be more suited for what you desire is to use the Selective Transfer feature.
I agree with this suggestion, regex pattern matching was not designed for 'non-matching'. Although it can be done the internal processing is more expensive for anything other than use with single characters, as is the use of lookarounds, etc. While the above pattern should work in the Skip List if PCRE matching is now also possible within selective transfer rule sets I would highly suggest ditching the expensive 'non-match' style negative lookahead pattern and opting for a normal 'match' style pattern.

I've made a small change to the syntax prefix
Can I suggest you reinstate the colon as part of the prefix? There should be no circumstances in which somebody might try (or be able) to match 'rx:<space>...' as a literal (non regex) pattern, however there is the possibility of somebody trying to match 'rx<space>...'.

It took me some time to figure out the proper way to ignore case with PCRE, I am not 100% sure if this is correct.
rx (?i).*-(?!publisher1)
Your use of (?i) here is correct. Given that FlashFXP is a windows client and windows (and the users there of) mostly think is a case-insensitive manner it might be okay to make it case-insensitive by default. Just so long as the case-sensitive modifier can be used within the pattern. (?-i) would force case sensitivity.

Modifiers/switches can be used anywhere in a pattern. When a modifier is seen it is explicitly applied to the remainder of the pattern, or until switch by another modifier. The basic form of a modifier is (?[onswitches][-offswitches][:regex]). This support for :regex means you can do things like (?i)^x(?-i:Y)z to match any case form of xyz as long as Y is capitalized, where (?-i:Y) is equivalent to (?-i)Y(?i).

A great regex introductory tutorial can be found at Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns (http://www.regular-expressions.info/)

brackebuschtino

04-10-2014, 11:21 AM

Thanks for your reply and the suggested pattern. In fact i did my homework and searched the web as well as asked other developers, which resulted in a negative lookbehind reather than lookahead.
rx .*(?<!-PublisherX)$

I would highly suggest ditching the expensive 'non-match' style negative lookahead pattern and opting for a normal 'match' style pattern.
The issue with this solution is that one is forced to manually select all highlighted results and put them into queue, while when using the skiplist with the above pattern (or yours) allows for putting a complete directory into queue and leave the rest to the application wish will reliably drop all non-matching queue items. This is exactly what i want. If i was satified with the manual way of scanning a folder and pick the cherries i wouldn't had asked for the skiplist improvement.

Regarding the "expensiveness":
I think that with todays computer power this plays no role. Furthermore i think that a little more time for regex-processing results in less intensive server workload. ;) Also i think that not everybody using FFXP has an active skiplist that might have an impact on the transfer speed.

In fact im OK with every implementation (skiplist, selective transfer) that allows for the current state (PCRE support and lookahead/lookbehind-support) that allows to match as exactly as wished.

[...]reinstate the colon as part of the prefix?[...]
I agree to this suggestion. I also found the blank alone to be potentially more confusing than having the colon visually presenting the delimitation. Mabe the blank could be dropped completely as the colon could satisfy the requirement as a delimiter?

Thanks a bunch for the on-/off-switch lession. I didn't know that yet. With this feature available i absolutely agree to your suggestion to make the pattern matching case insensitive by default.

bigstar

04-10-2014, 04:06 PM

Can I suggest you reinstate the colon as part of the prefix? There should be no circumstances in which somebody might try (or be able) to match 'rx:<space>...' as a literal (non regex) pattern, however there is the possibility of somebody trying to match 'rx<space>...'.

Both rx<space> and rx:<space> can be used depending on your own preference.

It made more sense to me to simplify the prefix to to rx<space> because in most instances trailing spaces are automatically stripped off.

Just so long as the case-sensitive modifier can be used within the pattern. (?-i) would force case sensitivity.

Thank you for clarification, I was not aware of using - to reverse to modifier.

I don't use regexp as much as one might think and most of this is new to me as well :)

DayCuts

04-12-2014, 05:36 AM

Thanks for your reply and the suggested pattern. In fact i did my homework and searched the web as well as asked other developers, which resulted in a negative lookbehind reather than lookahead.
rx .*(?<!-PublisherX)$
Yep, in fact a lookbehind is the more appropriate selection in this case since the part of the string your most interested in is at the end. Less expensive as well.

The issue with this solution is that one is forced to manually select all highlighted results and put them into queue, while when using the skiplist with the above pattern (or yours) allows for putting a complete directory into queue and leave the rest to the application wish will reliably drop all non-matching queue items. This is exactly what i want. If i was satified with the manual way of scanning a folder and pick the cherries i wouldn't had asked for the skiplist improvement.
I was refering to use in the Selective Transfer rules when I suggested simplifying, which already gives the option to Transfer or Skip and a choice of File and Folder matching. You could use a combination of the skip list for most rules, and a selective transfer ruleset for those that require negating.

Regarding the "expensiveness":
I think that with todays computer power this plays no role. Furthermore i think that a little more time for regex-processing results in less intensive server workload. ;) Also i think that not everybody using FFXP has an active skiplist that might have an impact on the transfer speed.
Abundant resources is no excuse not to do things in the most efficient way possible. While in a normal regex matching situation (one pattern against one string or file) it may be negligible, in a situation where you may end up with multiple look-around rules among a list of dozens of other rules that all have to be checked against a potentially huge list of files/directories expensiveness can add up quickly to a noticeable delay. Admittedly you would likely need a complex skip list and huge directory listing to notice anything on the average system these days.

Both rx<space> and rx:<space> can be used depending on your own preference.

It made more sense to me to simplify the prefix to to rx<space> because in most instances trailing spaces are automatically stripped off.
My concern here was more to do with the difference between something like "rx abc*.mp?" being processed as a basic glob or pcre. The results would be vastly different due to wildcard and period function in regular expressions. Requiring the colon would be a way of ensuring somebody not familiar regular expressions (or the support for them in the program) does not try to use a simple glob rule that is misinterpreted.

brackebuschtino

04-12-2014, 10:36 AM

Yep, in fact a lookbehind is the more appropriate selection in this case since the part of the string your most interested in is at the end. Less expensive as well.
Unfortunately this doesn't seem to allow for grouping or appending an additional group that might exist. At least it didn't work for me with:

rx .*(?<!-PublisherX(_int)?)$
rx .*(?<!-(PublisherX|OtherY))$

DayCuts

04-12-2014, 07:29 PM

Learned something when trying to figure out why (_int)? worked in a look-ahead but not a look-behind. The answer is that in almost all regex flavors (language implementations) a look-behind must be a fixed-width expression. Not only does this mean you can not include ? + *, which rules out anything like (_int)?, but you also can not include optionals of different lengths like (pub|longpublisherame). Ultimately this means that a look-behind is not a viable option for your purposes unless the developing language of the program is using it is .NET or ABA.

Now onto a solution... first of all one reason optionals were not working for you is that you are forgetting part of the expression. The modifiers and anchors are important. I did come up with a working solution using a look-behind, but my test list used equal length publisher names. It failed thereafter due to the fixed-length requirement but here it is anyway...
(?im)(?(DEFINE)(?<publist>(?:publisher1|publisher2)))^.+(?(?<=_int$)(?<!-(?&publist)_int)|(?<!-(?&publist))$)

An updated version of the original pattern...
(?im)^(?!.+-(?:publisher1|publisher2)(?:_int)?$).+$

brackebuschtino

04-14-2014, 10:50 AM

Thanks alot for this lession. It turns out that using lookarounds is very tricky. I would never been able to adopt this pattern on my own. :confused: