Last.fm recently unveiled their new fingerprinting lookup mechanism. They did aggregate quite a lot of fingerprints (650 million) using their fingerprinting software, which is a nice basis for such a lookup, perhaps bringing a viable alternative to Music DNS. I gave it a try (I just had to build a Linux 64 version of the lookup software), and was quite surprised by the results. The quality of the fingerprinting looks indeed good, but here are the results for a particular song:

<?xml version="1.0"?>
<!DOCTYPE metadata SYSTEM "http://fingerprints.last.fm/xml/metadata.dtd">
<metadata fid="281948" lastmodified="1205776219">
<track confidence="0.622890">
    <artist>Leftover Crack</artist>
    <title>Operation: M.O.V.E.</title>
    <url>http://www.last.fm/music/Leftover+Crack/_/Operation%3A+M.O.V.E.</url>
</track>
<track confidence="0.327927">
    <artist>Left&ouml;ver Crack</artist>
    <title>Operation: M.O.V.E.</title>
    <url>http://www.last.fm/music/Left%C3%B6ver+Crack/_/Operation%3A+M.O.V.E.</url>
</track>
<track confidence="0.007860">
    <artist>Leftover Crack</artist>
    <title>Operation MOVE</title>
    <url>http://www.last.fm/music/Leftover+Crack/_/Operation+MOVE</url>
</track>
<track confidence="0.006180">
    <artist>Leftover Crack</artist>
    <title>Operation M.O.V.E.</title>
    <url>http://www.last.fm/music/Leftover+Crack/_/Operation+M.O.V.E.</url>
</track>
<track confidence="0.004883">
    <artist>Leftover Crack</artist>
    <title>Operation; M.O.V.E.</title>
    <url>http://www.last.fm/music/Leftover+Crack/_/Operation%3B+M.O.V.E.</url>
</track>
<track confidence="0.004826">
    <artist>Left&ouml;ver Crack</artist>
    <title>Operation M.O.V.E.</title>
    <url>http://www.last.fm/music/Left%C3%B6ver+Crack/_/Operation+M.O.V.E.</url>
</track>
<track confidence="0.004717">
    <artist>Left&ouml;ver Crack</artist>
    <title>13 - operation m.o.v.e</title>
    <url>http://www.last.fm/music/Left%C3%B6ver+Crack/_/13+-+operation+m.o.v.e</url>
</track>
....
</metadata>

And it goes on and on... There are 21 results for this single track, which all actually correspond to this track.

So, what is disturbing me here? After all, the first result holds textual metadata that I could consider as somehow correct (even if that's not the way I would spell this band's name, but they plan to put a voting system to solve this sort of issues).

The real problem is that there are 21 URI in last.fm for the same thing. The emphasis of the last.fm metadata system is then probably on the textual metadata: two different ways of spelling the name of a band = two bands. But I do think it is wrong: for example, how would you handle the fact that the Russian band Ария is spelled Aria in English? The two spellings are correct, and they correspond to one unique band.

In my opinion, the important thing is the identifier. As long as you have one identifier for one single thing (an artist, an album, a track), you're saved. The relationship between a band, an artist, a track, etc. and its label is clearly a one-to-many one: the quest for a canonical spelling will never end... And what worries me even more is that it tends to kill the spellings in all languages but English (especially if a voting system is in place?).

Once you have a single identifier for a single thing within your system, you can start attaching labels to it, perhaps with a language tag. Then, it is up to the presentation layer to show you the label matching your preferences. And if you tend for such a model, Musicbrainz (centralised and moderated) or RDF and the Music Ontology (decentralised and not moderated) are probably the way to go.

I guess this emphasis on textual metadata is mainly due to the ID3 legacy and other embedded metadata format, which allowed just one single title for the track, the album and the artist to be associated with an audio-file?

I think that the real problem for last.fm will now be to match all the different identifiers they have for a single thing in their system, which is known as the record linkage problem in the database/Semantic Web community. But I also think this is not too far-fetched, as they already began to link their database to the Musicbrainz one?