Duplicate Identification

Yate supports the identification of duplicate tracks from within Track databases. Track databases can be constructed comprising your entire collection with a single Create Track Database function statement in an action via batch processing.

The functionality can be used to identify multiple versions of a song or actual duplicates.

To start the process, open a track database and from the context menu select Look for Duplicates...

A panel will appear which allows you to select the criteria to be used in identifying duplicates.

The Title field is always used. Other potential analysed fields are Album, Artist, Track Number, Disc Number, Year and Length. These field types are only available if they are contained in the track database.

The textual fields Title, Album and artist Artist have the following settings. Note that each setting implies everything that comes before.

Exact: An exact match is required.
Case Insensitive: Alphabetic case is ignored. Additionally accents are ignored and similar Unicode characters are folded (as described in Fold Characters Substitutions).
Key Words: Words in the Weight Exception Set and punctuation are ignored.
Fuzzy: A fuzzy search does an approximate match. You have to select the lower limit for accuracy for this option. Play with it until you find a value you like. The default of 80% provides fairly accurate results. Be careful in that fuzzy matches may actually produce false results. Setting the threshold to 100% effectively disables the functionality so that the next setting can be used without fuzzy matching.
Truncated: Truncated is used to specify one or more truncate points separated one per line. One example is setting a truncate point of featuring which will remove the word featuring and everything which follows it.

The Track Number, Disc Number and Year options require an exact match.

The Length option is used to match only sets of tracks where the length of any track is not greater than the length of the previous track in the same matched set by more than five seconds. The database must contain a Duration or Time column in order to perform this test. A Length column will not be used.

A Results column will automatically be created which will contain an integer value. The items which have been identified as potential duplicates as per the specified criteria, will be displayed in a filtered view. Items with the same Results value represent items in the same duplicate set.

If you save the track database the Results column will be saved as well.

As results are displayed in the track database, items can be selected and revealed in the Finder or opened in the Yate main window.

The duplicate identification system goes hand in hand with the Query Capability. For example if you did a duplicate search based on a fuzzy match of titles, a query constructed as:

Key Column Results
Containing Value Any Value
Data Column Results
Display data for column Title

would provide a list of each clustered result, listing the name and count of each title in a particular cluster.