DBTune blog

To content | To menu | To search

Tag - last.fm

Entries feed

Thursday 27 March 2008

The Quest for Canonical Spelling in music metadata

Last.fm recently unveiled their new fingerprinting lookup mechanism. They did aggregate quite a lot of fingerprints (650 million) using their fingerprinting software, which is a nice basis for such a lookup, perhaps bringing a viable alternative to Music DNS. I gave it a try (I just had to build a Linux 64 version of the lookup software), and was quite surprised by the results. The quality of the fingerprinting looks indeed good, but here are the results for a particular song:

<?xml version="1.0"?>
<!DOCTYPE metadata SYSTEM "http://fingerprints.last.fm/xml/metadata.dtd">
<metadata fid="281948" lastmodified="1205776219">
<track confidence="0.622890">
    <artist>Leftover Crack</artist>
    <title>Operation: M.O.V.E.</title>
    <url>http://www.last.fm/music/Leftover+Crack/_/Operation%3A+M.O.V.E.</url>
</track>
<track confidence="0.327927">
    <artist>Left&ouml;ver Crack</artist>
    <title>Operation: M.O.V.E.</title>
    <url>http://www.last.fm/music/Left%C3%B6ver+Crack/_/Operation%3A+M.O.V.E.</url>
</track>
<track confidence="0.007860">
    <artist>Leftover Crack</artist>
    <title>Operation MOVE</title>
    <url>http://www.last.fm/music/Leftover+Crack/_/Operation+MOVE</url>
</track>
<track confidence="0.006180">
    <artist>Leftover Crack</artist>
    <title>Operation M.O.V.E.</title>
    <url>http://www.last.fm/music/Leftover+Crack/_/Operation+M.O.V.E.</url>
</track>
<track confidence="0.004883">
    <artist>Leftover Crack</artist>
    <title>Operation; M.O.V.E.</title>
    <url>http://www.last.fm/music/Leftover+Crack/_/Operation%3B+M.O.V.E.</url>
</track>
<track confidence="0.004826">
    <artist>Left&ouml;ver Crack</artist>
    <title>Operation M.O.V.E.</title>
    <url>http://www.last.fm/music/Left%C3%B6ver+Crack/_/Operation+M.O.V.E.</url>
</track>
<track confidence="0.004717">
    <artist>Left&ouml;ver Crack</artist>
    <title>13 - operation m.o.v.e</title>
    <url>http://www.last.fm/music/Left%C3%B6ver+Crack/_/13+-+operation+m.o.v.e</url>
</track>
....
</metadata>

And it goes on and on... There are 21 results for this single track, which all actually correspond to this track.

So, what is disturbing me here? After all, the first result holds textual metadata that I could consider as somehow correct (even if that's not the way I would spell this band's name, but they plan to put a voting system to solve this sort of issues).

The real problem is that there are 21 URI in last.fm for the same thing. The emphasis of the last.fm metadata system is then probably on the textual metadata: two different ways of spelling the name of a band = two bands. But I do think it is wrong: for example, how would you handle the fact that the Russian band Ария is spelled Aria in English? The two spellings are correct, and they correspond to one unique band.

In my opinion, the important thing is the identifier. As long as you have one identifier for one single thing (an artist, an album, a track), you're saved. The relationship between a band, an artist, a track, etc. and its label is clearly a one-to-many one: the quest for a canonical spelling will never end... And what worries me even more is that it tends to kill the spellings in all languages but English (especially if a voting system is in place?).

Once you have a single identifier for a single thing within your system, you can start attaching labels to it, perhaps with a language tag. Then, it is up to the presentation layer to show you the label matching your preferences. And if you tend for such a model, Musicbrainz (centralised and moderated) or RDF and the Music Ontology (decentralised and not moderated) are probably the way to go.

I guess this emphasis on textual metadata is mainly due to the ID3 legacy and other embedded metadata format, which allowed just one single title for the track, the album and the artist to be associated with an audio-file?

I think that the real problem for last.fm will now be to match all the different identifiers they have for a single thing in their system, which is known as the record linkage problem in the database/Semantic Web community. But I also think this is not too far-fetched, as they already began to link their database to the Musicbrainz one?

Tuesday 22 January 2008

Pushing your Last.FM friends in the FOAF-O-Sphere

I just committed some changes to the last.fm linked data service. It now spits out, as well as your last scrobbled tracks linked to corresponding Musicbrainz URIs, your list of last.fm friends (using their URI on this service)

This is quite nice to explore the last scrobbles of the friends of your friends (hello Kurt and Ben!) :)

The friends of my friends on last.fm

Friday 11 January 2008

Your AudioScrobbler data as Linked Data

I just put online a small service, which converts your AudioScrobbler data to RDF, designed using the Music Ontology: it exposes your last 10 scrobbled tracks.

The funny thing is that it links the track, records, and artists to corresponding dereferencable URIs in the Musicbrainz dataset. So the tracks you were last listening to are part of the Data Web!

Just try it by getting this URI:

http://dbtune.org/last-fm/<last.fm username>

For example, mine is:

http://dbtune.org/last-fm/moustaki

Of course, by being linked to dereferencable URIs in Musicbrainz, you are able to access the birth dates of the artists you last listened to, or use the links published by DBPedia to plot your last artists played on map, by just crawling the Data Web a little.

Then, you can link that to your FOAF URI. Mine now holds the following statement:

<http://moustaki.org/foaf.rdf#moustaki> owl:sameAs <http://dbtune.org/last-fm/moustaki>.

Now, my URI looks quite nice, in the Tabulator generic data browser!

Me and my scrobbles