For the Mashed event this week end, the BBC released some really interesting data. This includes playcount data, stating how much an artist is featured within a particular BBC programmes (at the brand or episode level).

During the event, I wrote some RDF translators for this data, linking web identifiers in the DBTune Musicbrainz linked data to web identifiers in the BBC Programmes linked data. We used it with Kurt and Ben in our hack. Ben made a nice write-up about it. By finding web identifiers for tracks in a collection and following links to the BBC Programmes data, and finally connecting this Programmes data to the box holding all recorded BBC radio programmes over a year that was available at the event, we can quite easily generate playlists from an audio collection. Two python scripts implementing this mechanism are available there. The first one uses solely brands data, whereas the second one uses episodes data (and therefore helps to get fewer and more accurate items in the resulting playlist). Finally, the thing we spent the most time on was the SQLite storage for our RDF cache :-)

This morning, I published the playcount data as linked data. I wrote a new DBTune service for that. It publishes a set of web identifiers for playcount data, interlinking Musicbrainz and BBC Programmes. I also put online a SPARQL end-point holding all this playcount data along with aggregated data from Musicbrainz and the BBC Programmes linked data (around 2 million triples overall).

For example, you can try the following SPARQL query:

SELECT ?brand ?title ?count
WHERE {
   ?artist a mo:MusicArtist;
      foaf:name "The Beatles". 
   ?pc pc:object ?artist;
       pc:count ?count.
   ?brand a po:Brand;
       pc:playcount ?pc;
       dc:title ?title 
    FILTER (?count>10)}

This will return every BBC brand that has featured The Beatles more than 10 times.

Thanks to Nicholas and Patrick for their help!