Welcome Guest

Pages: 1
[Solved] [Discogs Wizard] Show differences between titles
PowderPostFebruary 16, 2014, 08:58
Pro
Posts: 116
Registered:
March 4, 2013, 18:53
Normal topic[Solved] [Discogs Wizard] Show differences between titles

When I am tagging a release using the Discogs Wizard, I look at every track and compare the original title (filename) with the track title from Discogs. Mostly everything is the same, but sometimes there are differences, so if I wouldn’t do that, Yate would match the wrong files and I would end up with false tags.

To make this manual tasks easier, it would be nice, if Yate would make a String compare between the original filename / title and Discogs title, so I would just have to look for visual highlights.
Here is a graphic which illustrates what I mean – the more different both titles are, the more red its background becomes.

Image

2MR2PostFebruary 17, 2014, 20:44
Avatar photo
Administrator
Posts: 2084
Registered:
August 23, 2012, 19:27
Normal topicRe: [Discogs Wizard] Show differences between titles

Sorry for the delay...I missed this somehow.

I understand what you're requesting. I'll look at it. The phrase 'the more different titles are' is somewhat dependent on the number of words in a title. Yate already does word matching/counting when doing its initial matching.

If I ignore leading digits:
Track 1 is 100% correct
Track 2 is 100% incorrect as there is a single word and it's different. (I'm afraid that detecting partial word matches is out of the scope).
Track 3 is 33% correct
Track 4 is also 100% incorrect as it shares no common words.

Also choosing gradients of colours is difficult to detect for some users. I could divide it into 3-4 percentage differences. Outside that would be difficult to detect.

I'll add it to my list 🙂

2MR2PostFebruary 18, 2014, 20:43
Avatar photo
Administrator
Posts: 2084
Registered:
August 23, 2012, 19:27
Normal topicRe: [Discogs Wizard] Show differences between titles

We released v2.6 today so I thought that I'd play with this a little.

There is a new Discogs preference called 'Color code file mappings'. There is also an associated preference where you can select a File to Tag template so that the Title can be extracted. If no template is specified it is assumed that the filename is the title. For the attached image I had a template of <track> - <title>.

I more than likely have to play with the weight algorithm a little more but this is how it currently stands.

If the filename and title are an exact case insensitive match, it's 100% (green).

Here's where it gets complicated. If not 100%, the two strings (filename and title) are decomposed into words.
All words in the weight exceptions set are tossed.

same = (count of words in both strings) * 2
total = the total count of words in both strings
unused = total - same
percentage = ((same / total) * 100) - unused

0% --> red
1%-33% --> orange
34%-66% --> yellow
67-99% --> clear (or gray)
100% --> green

Image

PowderPostFebruary 19, 2014, 05:07
Pro
Posts: 116
Registered:
March 4, 2013, 18:53
Normal topicRe: [Discogs Wizard] Show differences between titles

Thanks for getting your hands so fast on this!

I think two colors may be enough – green and yellow. In most cases (enriching a tagged release with Discogs metadata) everything should be okay already (green).
I think it’s important to do an OR comparison with title and filename. It would be really uncomfortable to have to edit the matching template all the time. It’s unlikely, that all the files you want to tag have the same naming structure. So I think it’s better to just use the one with the higher comparison score.

A more advanced version could use three colors (green=100%, yellow=80-99%, red=0-79%), there are algorithms which do intelligent string comparison (with word-misspell-check) and return a value [0..1].
I think clear (grey) may be misleading together with the other three colors – it looks more like inactive or not compared.

Looking forward to use your implementation 🙂

2MR2PostFebruary 19, 2014, 16:17
Avatar photo
Administrator
Posts: 2084
Registered:
August 23, 2012, 19:27
Normal topicRe: [Discogs Wizard] Show differences between titles

I've implemented a fuzzy string compare and it seems to work fine.

I can see restricting the colors to Red, Yellow and Green. I'm not sure what you mean by OR comparison. Do you mean testing with and without a supplied template? That's easy to do. With no template if your naming convention is anything other than <title> you will never get Green when matching.

Do you think it makes sense to display the 'score' next to the color? (0.00 -> 1.00 or 0%-100%)

I'm also assuming that you would want to ignore alphabetic case while scoring.

2MR2PostFebruary 19, 2014, 18:01
Avatar photo
Administrator
Posts: 2084
Registered:
August 23, 2012, 19:27
Normal topicRe: [Discogs Wizard] Show differences between titles

The screenshot shows some results with the fuzzy string matching. I compare the initial filename and the score based on every file to tag template...where I choose the highest score. I stop if I'm lucky enough to get 100%.

I've reduced the colors to Red (< 50%), Yellow (< 100%) and Green (100%).

Initially I thought I'd have to put in a new 'include - in auto - matching' column in the File To Tag Templates but it certainly seems fast enough without it. I currently have 12 File to Tag Templates and there is no noticeable delay.

I'm ignoring alphabetic case because the results are much better and the score is only used as an indication of a correct match, not the correctness of the filename.

I think I may be able to use this stuff to increase the accuracy of the weight algorithm used for matching.

One thing that has become clear is that the quality of the length (duration) data in Discogs is pretty poor. I'm considering putting in an option to ignore a track's length while matching tracks.

Image

2MR2PostFebruary 20, 2014, 18:12
Avatar photo
Administrator
Posts: 2084
Registered:
August 23, 2012, 19:27
Normal topicRe: [Discogs Wizard] Show differences between titles

I've improved the fuzzy matching significantly by removing all words in the weight exception list.

'time of day.mp3' vs. 'the time of day' --> 100%
'more & more.mp3' vs. 'more and more' --> 100%

🙂

Pages: 1
Mingle Forum by Cartpauj | Version: 1.1.0beta | Page loaded in: 0.023 seconds.