Application icon

List Manipulate

This statement performs a function on an entire list to produce a new list. The lists used by the statement must all be stored in named variables or in track variables. When stored in named variables the statement is executed once in stepwise mode. When stored in track variables the statement is executed once for every file being executed in the current execution mode.

Where appropriate the function can be performed case insensitive and/or diacritic insensitive. Note that when case or diacritic insensitivity is specified you cannot make assumptions as to the representation of the values returned. The options control how values are matched, not how they are returned. Book and book may be treated as equivalent when case insensitive is specified but either can be returned.

Text fields may contain any of the escape sequences described in Escape Sequences.

All delimiter fields are defaulted to \~ if omitted.

In the function description a function followed by * implies that the source lists are preprocessed as follows unless the Do not compress option is checked:

Examples use a comma (,) for all list delimiters.


Functions

Union*
The two source lists are combined such that the resultant list contains a single instance of all items in either of the source lists. The order of items in the result list is first source list items then second source list items.

example 1:
    source list 1 --> 1,1,1,  2,3,4,5
    source list 2 --> 1,6,  7,

  Do not compress not checked
    result --> 1,2,3,4,5,6,7
  Do not compress checked
    result --> 1,  2,3,4,5,6,  7,

example 2:
    source list 1 --> 1,2
    source list 2 --> empty
  Do not compress not checked
    result --> 1,2
  Do not compress checked
    result --> 1,2,

Intersection*
The two source lists are combined such that the resultant list contains a single instance of all items which are in both of the source lists. The order of items in the result list preserves the order in the first source list.

example:
    source list 1 --> 1,1,1,2,,3, 4,5
    source list 2 --> 1,6,7,,4
  Do not compress not checked
    result --> 1,4
  Do not compress checked
    result --> 1,

If the fuzzy option is set, every item in the first list which fuzzy matches an item in the second list will be in the result list. Note that the Case Insensitive and Diacritic Insensitive options are used to determine duplicates in the first list. The fuzzy test uses Yate's fuzzy comparison logic to compare the two text strings. Various factors are used in a fuzzy comparison. The test always ignores alphabetic case, punctuation characters, diacritical marks and Unicode width differences. The Weight Exception Set is used to determine a set of words to be ignored. A fuzzy test returns a percentage representing the success of the comparison. By default this function returns a true value for a success of 80 or higher. You can change the threshold of the test by setting the named variable Fuzzy Threshold to an integer value in the range of 1-100.

The fuzzy test can be further configured to ignore the position of words when comparing. To force non positional testing, set the named variable Fuzzy Threshold to a negative number in the range of -1 to -100. The actual threshold used is the absolute value of the specified number. Using a negative threshold will result in a comparison of "John and Bill" being a 100% match to "Bill and John".

If the named variable Fuzzy Threshold is set to a value less than -100, 0, or greater than 100, the default of 80 will be used.

Remove All Matches*
The resultant list contains items in the first source list which are not in the second source list. The order of items in the result list preserves the order in the first source list.

example:
    source list 1 --> 1,1,1 ,2,3,4,5
    source list 2 --> 1, 3
  Do not compress not checked
    result --> 2,4,5
  Do not compress checked
    result --> 1 ,2,3,4,5

Remove One/Match*
The two lists are combined such that for each item in the second list, a single occurrence of the item in the first list is removed. The order of items in the result list is the same as the items appeared in the first source list.

example:
    source list 1 --> 1,1 ,1,2,3,4,5
    source list 2 --> 1,1, 3
  Do not compress not checked
    result --> 1,2,4,5
  Do not compress checked
    result --> 1 ,2,3,4,5

Set*
The results list contains a single instance of each unique item in the source list. The order of items in the result list is the same as the items appeared in the source list.

example:
    source list 1 --> 1,1 ,1,2,,3,4,5,1,2
  Do not compress not checked
    result --> 1,2,3,4,5
  Do not compress checked
    result --> 1,1 ,2,,3,4,5

Sublist
The results list contains range of elements from the source list. The range is specified as:

    index{,count}

The first list item is at index 0. If the index is negative, it postions from the end of the list with -1 being the last list element. The count may be omitted in which case it is assumed to be 1. If the resolved index is out of range an empty string is returned. If the count extends past the end of the list, it will be truncated.

example:
    source list 1 --> 0,1,2,3,4,5,6,7,8,9
    Specified range of 2,3
    result --> 2,3,4
    Specified range of -5,2
    result --> 5,6

Combine*
The results list contains a single instance of each contiguous item in the source list. The order of items in the result list is the same as the items appeared in the source list.

example:
    source list 1 --> 1,1 ,1,2,,3,4,5,1,2,2
  Do not compress not checked
    result --> 1,2,3,4,5,1,2
  Do not compress checked
    result --> 1,1 ,1,2,,3,4,5,1,2

Combine Counted*
The results list contains a single instance of each contiguous item in the source list. Counts are returned for each item. The default key value separator, \k, is always used. The order of items in the result list is the same as the items appeared in the source list.

example:
    source list 1 --> A,A ,A,B,,C,D,E,A,B,B
  Do not compress not checked
    result --> A≔3,B≔1,C≔1,D≔1,E≔1,A≔1,B≔2
  Do not compress checked
    result --> A≔1,A ≔1,A≔1,B≔1,≔1,C≔1,D≔1,E≔1,A≔1,B≔2

Reverse
The results list contains the reversed contents of the source list. No modifications are made to the items in the list.

example:
    source list --> A,B,C,D,E
    result --> E,D,C,B,A

Integer*
The results list contains a single instance of every item in the source list treated as an integer value. The order of items in the result list is the same as the items appeared in the source list.

example 1:
    source list 1 --> 1st field,, 2nd field, third field5
  Do not compress not checked
    result --> 1,2,0
  Do not compress checked
    result --> 1,0,2

example 2:
    source list 1 --> empty
  Do not compress not checked
    result --> empty
  Do not compress checked
    result --> 0

Filter
The results list contains only those items which match the specified filter test and filter data. The order of items in the result list is the same as the items appeared in the source list. The Case Insensitive option is ignored unless a Text test is chosen. The Diacritic Insensitive option is ignored except for Text tests other than the regular expression tests. The Match Words option is ignored unless a Starts With, Ends With or Contains test is chosen. For the purposes of this function a match whole word test fails if:
the first character in the from pattern is alphanumeric and the character preceding the match is alphanumeric.
   or if:
the last character in the from pattern is alphanumeric and the character following the match is alphanumeric.
The Is Audio File and Is Image File filters work on the presence of a matching filename extension in each list element.

The Text Matched by Regex and Text Not Matched by Regex functions only pre-process Yate escape sequences for named or track variables. All other escape sequences are passed to the regular expression parser. If the regular expression is invalid a match will fail. In this case Text Not Matched by Regex will always succeed. Typically, inserted escape sequences will be further escaped so that the inserted text is treated as a sequence of literal characters by the regular expression parser. If you are inserting portions of the regular expression you can disable this via the Do not escape inserted variable contents setting. The term variable applies to both track and named variables. You can validate and preview the regular expression with the Regular Expression Tester by using the appropriate buttons. Note that these functions are essentially contains tests. ie. if the list element contains text which matches the regular expression it will succeed. If you want to test a match for an entire element, anchor the regular expression with ^ and $ metacharacters.

example using filter case insensitive text contains "one"":
    source list --> one,not One again,two,three
    result --> one,not One again

key-value
A key-value list is constructed where the first source list provides the keys and the second source list provides the values. All keys have leading and trailing whitespace characters removed. If a key is entirely empty it and its associated value are ignored. If there are more items in the first list than the second, empty values will be provided. If there are more items in the second list than the first, they will be ignored. The default key value separator, \k, is always used. If there is more than one instance of a key, the first representation of the key and the last value will be preserved. The order of items in the result list is the same as the items appeared in the first source list (barring eliminated duplicate keys).

example:
    source list 1 --> key1,key2,key3
    source list 2 --> value1,value2,value3
    result --> key1≔value1,key2≔value2,key3≔value3

Join
A list is constructed by joining two lists on an item by item basis. The resultant list is always the same size as the larger of the two lists. Non existant items are treated as being empty. The items are separated by the specified join delimiter.

example: with join delimiter +
    source list 1 --> item1,item1,item3
    source list 2 --> alt1,alt2,alt3
    result --> item1+alt1,item2+alt2,item3+alt3

Common Prefix
The longest common prefix in the first source list is returned. The result is always a list of at most one item. Leading spaces in list components are significant.

example:
    source list 1 --> /volumes/music,/volumes/music/test1,/volumes/music/test2/file
    result --> /volumes/music

Trim Spaces
Each element in the first source list will have leading and trailing whitespace characters removed. The order of items in the result list is the same as the items appeared in the source list. The Trim statement is a more powerful alternative as it has additional complexity.

example:
    source list 1 -->    test1   ,   test2
    result --> test1,test2

Limit Item Count
The count field is treated as an integer at runtime. The count represents the maximum number of items allowed in the source list. If the source list contains more items than count, the list is truncated to the maximum number of elements allowed. A negative value for count is treated as zero.

example using a limit of 5:
    source list 1 --> 1,2,3,4,5,6,7,8,9,10
    result --> 1,2,3,4,5

Identify Duplicates
This function is used to identify duplicates in the source list. Each item in the returned list is formatted as follows:

tag\k{list of duplicates separated by \:

examples:

paco de lucia≔Paco de Lucía●Paco De Lucía
clarence frogman henry≔Clarence "Frogman" Henry●Clarence 'Frogman' Henry
sonny boy williamson ii≔Sonny Boy Williamson [II]●Sonny Boy Williamson [Ii]●Sonny Boy Williamson II

The tag is the reduced common identifer for all the duplicates. Any source list item which only occurs once is not present in the result list. There is no order to the returned list or any sublist.

Source list items which consist only of spaces or are empty are ignored. Items which differ only in leading or trailing spaces or in sequences of spaces are always considered to be duplicates. Unicode character width is ignored. Six settings further control duplicate identifiction:
Case insensitive
When set, items which differ only in alphabetic case are considered duplicates.

Diacritic insensitive
When set, items which differ only in diacritic marks are considered duplicates.

Ignore punctuation
When enabled, all punctuation characters are ignored when comparing items. This means that Gary Clark jr. would be considered a duplicate of Gary Clark, jr

Ignore leading article
A leading article found in the Natural Sort Exception set is ignored when comparing items. This means The Rolling Stones would be a duplicate of Rolling Stones as long as the set contained 'the'. Note that the leading article test is always case insensitive. The test is only performed if there is more than one space separated component in the item.

Ignore Sort Form Suffix List
A suffix which is found in the Sort Form Suffix Replacement set is ignored when comparing items. Note that set elements marked as + are not considered. If the set contained the word 'band', Dave Matthews Band would be a duplicate of Dave Matthews. Note that the suffix test is always case insensitive. The test is only performed if there is more than one space separated component in the item.

Ignore all spaces
As opposed to removing leading and trailing spaces and compressing sequences of spaces to a single space, all spaces are removed.

Ignore filename extensions
Filename extensions are ignored. Note that this setting only makes sense when you are comparing paths or filenames. In other cases the presense of a period may result in undesired results.

Ignore paths
File path components are ignored. ie. only the filename is retained. Note that this setting only makes sense when you are comparing paths or filenames. In other cases the presense of a / character may result in undesired results. Note that this option works as expected with Ignore filename extensions.

If you want to remove the tag from the returned list, you can do so by using a Find and Remove statement. Example assuming the output is in named variable Results and the output delimiter was the Default List Delimiter (\~):

    Find "\k" in named variable 'Results', as list with delimiter "⏎", remove it and everything before