DBTune blog

To content | To menu | To search

Tag - bbc

Entries feed

Tuesday 13 July 2010

First BBC microsite powered by a triple-store

Jem Rayfield wrote a very interesting post on the technologies used by the World Cup BBC web site, which also got covered by Read Write Web.

All this is very exciting, the World Cup Website proved that triple store technologies can be used to drive a production website with significant traffic. I am expecting lots more parts of the BBC web infrastructure to evolve in the same way :-)

There are two issues we are still currently trying to solve though:

  • We need to be able to cluster our triples in several dimension. For example, we may want to have a graph for a particular programme, and a much larger graph for a particular dataset (e.g. programme data, wildlife finder data, world cup data). The smaller graph is used to make our updates relatively cheap (we replace the whole graph whenever we receive an update). The bigger graph is used to give some degree of isolations between the different sources of data. For that, we need graphs within graphs. It can be done with N3-type graph literals, but is impossible to achieve in a standard quad-store setup, where one single triple can't be part of several graphs.
  • With regards to programme data, the main bottleneck we're facing is the number of updates per second we need to be able to process, which most of available triple stores struggle to keep up. The 4store instance on DBTune does keep up, but it has a negative impact on the querying performances, as the write operations are blocking the reads. We were quite surprised to see that the available triple store benchmarks do not take the write throughput into account!

Wednesday 19 May 2010

DBpedia and BBC Programmes

We just put live a new exciting feature on BBC Programmes: programme aggregations powered by DBpedia. For example, you can look at:

Of course, the RDF representations are linked up to DBpedia. Try loading adolescence in the Tabulator, for example - you will get an immediate mashup of BBC data, DBpedia data, and Freebase data. Or if you're not afraid of getting overloaded with data, try the California one.

One of the most interesting things about using web identifiers as tags for our programmes (apart from being able to automatically generate those aggregation pages, of course), is that we can use ancillary information about those tags to create new sorts of aggregations, and new visualisations of our data. We could for example plot all our Radio 3 programmes on a map, depending on the geolocation of the people associated to these programmes. Or we could create an aggregation of BBC programmes featuring artists living in the cities with the highest rainfall (why not?). And, of course, this will be a fantastic new source of data for the MusicBore! The possibilities are basically endless, and we are very excited about it!

Thursday 14 January 2010

Live SPARQL end-point for BBC Programmes

Update: We seem to have an issue with the 4store hosting the dataset, so the data is stale since the end of February. Update 2: All should be back to normal and in sync. Please comment on this post if you spot any issue, or general slowliness.

Last year, we got OpenLink and Talis to crawl BBC Programmes and provide two SPARQL end-points on top of the aggregated data. However, getting the data by crawling it means that the end-points did not have all the data, and that the data can get quite outdated -- especially as our programme data changes a lot.

At the moment, our data comes from two sources: PIPs (the central programme database at the BBC) and PIT (our content mangement system for programme information). In order to populate the /programmes database, we monitor changes on these two sources and replicate them on our database. We have a small piece of Ruby/ActiveRecord software (that we call the Tapp) which handles this process.

I made a small experiment, converting our ActiveRecord objects to RDF and hooking an HTTP POST or an HTTP DELETE request to a 4store instance for each change we receive. This means that this 4store instance is kept in sync with upstream data sources.

It took a while to backfill, but it is now up-to-date. Check out the SPARQL end-point, a test SPARQL query form and the size of the endpoint (currently about 44 million triples).

The end-point holds all information about services, programmes, categories, versions, broadcasts, ondemands, time intervals and segments, as defined within the Programme Ontology. All of these resources are held within their own named graph, which means we have a very large number of graphs (about 5 million). It makes it far easier to update the endpoint, as we can just replace the whole graph whenever something changes for a resource.

This is still highly experimental though, and and I already found a few bugs: some episodes seem to be missing (for example, some Strictly Come Dancing episodes are missing, for some reason). I've also encountered some really weird crashes of the machine hosting the end-point when concurrently pushing a large number of RDF documents at it - I still didn't succeed to identify the cause of it. To summarise: it might die without notice :-)

Here are some example SPARQL queries:

  • All programmes related to James Bond:
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?uri ?label
  ?uri po:category 
    <http://www.bbc.co.uk/programmes/people/bmFtZS9ib25kLCBqYW1lcyAobm8gcXVhbGlmaWVyKQ#person> ; rdfs:label ?label
  • FInd all Eastenders broadcast dates after 2009-01-01, along with the type of the version that was broadcast:
PREFIX event: <http://purl.org/NET/c4dm/event.owl#> 
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#> 
PREFIX po: <http://purl.org/ontology/po/> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?version_type ?broadcast_start
{ <http://www.bbc.co.uk/programmes/b006m86d#programme> po:episode ?episode .
  ?episode po:version ?version .
  ?version a ?version_type .
  ?broadcast po:broadcast_of ?version .
  ?broadcast event:time ?time .
  ?time tl:start ?broadcast_start .
  FILTER ((?version_type != <http://purl.org/ontology/po/Version>) && (?broadcast_start > "2009-01-01T00:00:00Z"^^xsd:dateTime))}
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
SELECT DISTINCT ?programme ?label
  ?event1 po:track ?track1 .
  ?track1 foaf:maker ?maker1 . ?maker1 owl:sameAs <http://www.bbc.co.uk/music/artists/67f66c07-6e61-4026-ade5-7e782fad3a5d#artist> .
  ?event2 po:track ?track2 .
  ?track2 foaf:maker ?maker2 . ?maker2 owl:sameAs <http://www.bbc.co.uk/music/artists/fb7272ba-f130-4f0a-934d-6eeea4c18c9a#artist> .
  ?event1 event:time ?t1 .
  ?event2 event:time ?t2 .
  ?t1 tl:timeline ?tl .
  ?t2 tl:timeline ?tl .
  ?version po:time ?t .
  ?t tl:timeline ?tl .
  ?programme po:version ?version .
  ?programme rdfs:label ?label .

Tuesday 27 October 2009

Music recommendation and Linked Data

We just presented yesterday at ISMIR a tutorial about Linked Data for music-related information. More information on the tutorial is available on the tutorial website, and the slides are also available.

In particular, we had two sets of slides dealing with the relationship between music recommendation and linked data. As this is something we're investigating within the NoTube project, I thought I would write up a bit more about it.

Let's focus on artist to artist recommendation for now. If we look at last.fm for recommendations for New Order, here is what we get.

Artists similar to New Order, from last.fm

Similarly, using the Echonest API for similar artists, we get back an ordered list of artists similar to New Order, including Orchestral Manoeuvres in the Dark, Depeche Mode, etc.

Now, let's play word associations for a few bands and musical genres. My colleague Michael Smethurst took the Sex Pistols, Acid House and Public Enemy, and draw the following associations:

Sex Pistols associated words

Acid House word associations

Pubic Enemy word associations

We can see that among the different terms in these diagrams, some refer to people, to TV programmes, to fashion styles, to drugs, to music hardware, to places, to laws, to political groups, to record labels, etc. Just a couple of these terms are actually other bands or tracks. If you were to describe these artists just in musical terms, you'd probably be missing the point. And all these things are also linked to each other: you could play word associations for any of them and see what are the connections between Public Enemy and the Sex Pistols. So how does that relate to recommendations? When recommending an artist from another artist, the context is key. You need to provide an explanation of why they actually relate to each other, whether it's through common members, drugs, belonging to the same independent record label, acoustically similar (if so, how exactly), etc. The main hypothesis here being that users are much more likely to be accepting a recommendation that is explicitly backed by some contextual information.

On the BBC website, we cover quite a few domains, and we try to create as much links as possible between these domains, by following the Linked Data principles. From our BBC Music site, we can explore much more information, from other BBC content (programmes, news etc.) to other Linked Data sources, e.g. DBpedia, Freebase and Musicbrainz. This provides us with a wealth of structured information that we would ultimately want to use for driving and backing up our recommendations.

The MusicBore I've described earlier on this blog kind of uses the same approach. Playlists are generated by following paths in Linked Data. Introduction of each artists is done by generating a sentence from the path leading from the seed artist to the target artist. The prototype described in that paper from the SDOW workshop last year also illustrates that approach.

So we developed a small prototype of these kind of ideas, rqommend (and when I say small, it is very small :) ). Basically, we define "relatedness rules" in the form of SPARQL queries, like "Two artists born in Detroit in the 60s are related". We could go for very general rules, e.g. "Any paths between two artists make them related", but it would be very hard to generate an accurate textual explanation for it, and might give some, hem, not very interesting connections. Then, we just go through these rules on an aggregation of Linked Data, and generate recommendations from them. Here is a greasemonkey script injecting such recommendations with BBC Music (see for example the Fugazi page). It injects Linked Data based recommendations, along with the associated explanation, within BBC artist pages. For example, for New Order:

BBC Music recs for New Order

To conclude, I think there is a really strong influence of traditional information retrieval systems on the music information retrieval community. But what makes Google, for example, particularly successful is to exploit links, not the documents themselves. We definitely need to go towards the same sort of model. Exploiting links surrounding music, and all the cross-domain information that makes it so rich, to create better music recommendation systems which combine the what is recommended with the why it is recommended.

Thursday 10 September 2009

Linked Data London event screencasts and London Web Standards meetup

With Tom Scott, we presented a talk on contextualising BBC programmes using linked data for the Linked Data London event. For the occasion, I made a couple of screencasts.

The first one shows some browsing of the linked data we expose on the BBC website, using the Tabulator Firefox extension. I start by getting to a Radio 2 programme, to get to its segmentation in musical tracks, to get to another programme featuring one of the tracks, to get to another artist featured in that programme. The Tabulator ends up displaying data aggregated from BBC Programmes, BBC Music and DBpedia.

Exploring BBC programmes and music data using the Tabulator

The second one shows what you can do by using these programmes/artists and artists/programmes links. We built some very straight-forward programme to programme recommendation using them. On the right-hand side of the programme page, there are recommendations, based on artists played in common. The recommendations are scoped by the availability of the programme on iPlayer or by the fact it has an upcoming broadcast. If you hover over those recommendations, it will display what allowed us to derive it: here, a list of common artists played in the two programmes. This work is part of our investigations within the NoTube European project.

Artist-based programme to programme recommendations

Also, as Michael already posted on Radio Labs, we gave a presentation to the London Web Standards group on Linked Data. It was a very nice event, especially as mainly web developers turned up. Linked data events tend to be mostly about linked data evangelists talking to other linked data evangelists (which is great too!), so this was quite different :-) Lots of interesting questions about provenance and trustworthiness of data were asked, which are always a bit difficult to answer, apart from the usual it's just the Web, you can deal with it as you do (or don't) currently with Web data, e.g. by keeping track of provenance information and filtering based on that. Somebody raised that you could make some statistics on how many times a particular statement is repeated in order to derive its trustworthiness, but this sounds a bit harmful... Currently on the Linked Data cloud, lots of information gets repeated. For example, if a statement about an artist is available on DBpedia, there is a fair chance it will get repeated in BBC Music, just because we also use Wikipedia as an information source. The fact that this statements gets repeated doesn't make it more valid.

Skim-read introduction to linked data slides

Monday 13 July 2009

Music Hack Day and the MusicBore

This week end was the Music Hackday in London. The event was great, with everything a hack day need: pizza, beer, and technical glitches during the demos :-) And, of course, lots of awesome hacks (including that amazing visualisation which didn't make it into the list for some reason).

With Christopher, Patrick and Nick, we created the MusicBore. The MusicBore is a completely automated radio DJ, which, well, tends to be really boring :-) We actually won two prizes with it! The Last.fm one and the best hack one! We were really, really happy :-) But let the musicbore introduce itself:

Hello. I am the Music Bore. I play music and I like to tell you ALL about the music I play. I live on IRC. I get my information from BBC Music, BBC Programmes, last fm, the Echo Nest, Yahoo Weather and the web of Linked Data. To find out more, please visit bit.ly/musicbore. You can dissect my disgusting innards on github. Now let me play you some music.

Here it is in action, the first one walks through Soul-ish tunes, whereas the second one goes into French punk-rock:

The Music Bore - Video 2 from Nicholas Humfrey on Vimeo.

The Music Bore - Video 1 from Nicholas Humfrey on Vimeo.

The MusicBore is powered by a new and exciting messaging technology: IRC :-) Lots of bots sit around in an IRC channel and talk to each other to create a radio show. The show is entirely created live, each bot contributing a specific ability to it. Just before the hack presentation, we had 10 bots in the same channel (all the bots sources are on github):
  • controller: In charge of starting the show by playing an introduction and choose a new song drawn from the BBC Music charts. Also in charge of re-drawing a new song in case the other bots gets stuck.
  • thebore: Renders information about a particular artist from BBC Music, BBC Programmes, Wikipedia, Last.fm and other Linked Data.
  • connectionfinder: Given a seed artist, gives the next one in the playlist, along with an explanation of how it was chosen. Basically walks through Linked Data to discover new artists.
  • placefinder: Given a seed artist, gives the next one in the playlist, along with an explanation of how it was chosen. This bot is constrained to go through places, so will give connections like Did you know that David Guetta is born in Paris, and that Georges Garvarentz died in the same place?
  • musicfinder: Finds music content for an artist, using BBC Programmes segment data and the Surge Radio RDF.
  • trackfinder: Finds music content for an artist, using the Echonest API
  • irc2play: Say sentences and play tracks mentioned on IRC, mixing them using Jack, Madjack and JackMiniMix
  • weatherbot: Renders weather information from Yahoo
  • imagebot: Finds an image from BBC Music for a particular artist.
  • hotnessfinder: Find the hotness for an artist using the Echonest API, and constructs a sentence from it.

Here is a small diagram of how the different bots interact with each other (if you don't understand that diagram, that's fine, we don't either, but it does seem to work :-) ).


As I thought they were quite fun, here are the logs of the bbcmusicbore channel over the week-end. Here is a small best-of:

14:05 < yvesr> trackfinder: weezer
14:05 -!- trackfinder [n=trackfin@] has quit
[Remote closed the connection]
This happens a lot at the beginning of the log, bots just throwing exceptions and quitting irc.
11:25 < thebore> connectionfinder:http://dbpedia.org/resource/David_Guetta
11:25 < connectionfinder> say:Did you know that David Guetta was born in
France which has leader name Nicolas Sarkozy, which has spouse Carla
11:25 < thebore> say:Carla Bruni has released 3 records. No Promises was
just terrible.
God. Nicolas Sarkozy is really in every media, even the randomly generated ones! thebore got enlightened on that one.
10:41 < yvesr> connectionfinder:http://dbpedia.org/resource/Barry_White
10:42 < connectionfinder> say:Did you know that Barry White died in the
same city as Chico Hamilton, Los Angeles?
A morbid playlist generation :-)
11:07 < thebore> say:Colin Murray on BBC Radio One is a massive fan. They
have a myspace at myspace.com/metallica. They sound a bit like Megadeth.
They have released 524 records. I really liked 2008-11-01: Rose Garden,
Portland, OR, USA.
Bootlegs FTW!
11:43 < yvesr> hotnessfinder: michael jackson
11:43 < hotnessfinder> say: Wow, michael jackson is really, really hot
right now! Did he die recently?
No comments
12:50 < thebore> connectionfinder:http://dbpedia.org/resource/Snoop_Dogg
12:50 < connectionfinder> say:Did you know that Dogg, Snoop is signed on
Death Row Records which was founded by Dr. Dre, which used to be quite
close to World Class Wreckin' Cru?
No, I didn't!
13:56 < thebore> say:Beverley Knight sounds a bit like Eternal.
In September 2005 Knight joined the likes of Bob Dylan and Joni
Mitchell by being made a Doctor of Music after receiving the honorary
degree from the University of Wolverhampton.
No, I didn't know either!

Right now, all the bots are offline (they were running on our laptops during the week end), but we plan to use the Amazon Web Services vouchers we won to make them run for a while :-)

Monday 15 June 2009

And another fun BBC SPARQL query

This query returns BBC programmes featuring artists originating from France (this is just a straight adaptation of the last query in my previous post).

The results are quite fun! Apparently, the big French hits on the BBC are from Jean-Michel Jarre, Air, Modjo, Phoenix (are they known in France? I've only heard of them in the UK) and Vanessa Paradis.

Note that the tracklisting data we expose in our RDF just goes back a couple of months, so that might explain why the list is not bigger.

Thursday 11 June 2009

BBC SPARQL end-points

We recently announced on the BBC backstage blog the availability of two SPARQL end-points, one hosted by Talis and one by OpenLink. These two companies aggregated the RDF data we publish at http://www.bbc.co.uk/programmes and http://www.bbc.co.uk/music. This opens up quite a lot of fascinating SPARQL queries. Talis already compiled a small list, and here are a couple I just designed:

  • Give me programmes that deal with the fictional character James Bond - results
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?uri ?label
  ?uri po:person 
    <http://www.bbc.co.uk/programmes/people/bmFtZS9ib25kLCBqYW1lcyAobm8gcXVhbGlmaWVyKQ#person> ; rdfs:label ?label
  • GIve me artists that were featured in the same programme as the Foo Fighters - results
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
SELECT DISTINCT ?artist2 ?label2
  ?event1 po:track ?track1 .
  ?track1 foaf:maker <http://www.bbc.co.uk/music/artists/67f66c07-6e61-4026-ade5-7e782fad3a5d#artist> .
  ?event2 po:track ?track2 .
  ?track2 foaf:maker ?artist2 .
  ?artist2 rdfs:label ?label2 .
  ?event1 po:time ?t1 .
  ?event2 po:time ?t2 .
  ?t1 tl:timeline ?tl .
  ?t2 tl:timeline ?tl .
  FILTER (?t1 != ?t2)
  • Give me programmes that featured both Al Green and the Foo Fighters (yes! there is one result!!) - results
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
SELECT DISTINCT ?programme ?label
  ?event1 po:track ?track1 .
  ?track1 foaf:maker <http://www.bbc.co.uk/music/artists/67f66c07-6e61-4026-ade5-7e782fad3a5d#artist> .
  ?event2 po:track ?track2 .
  ?track2 foaf:maker <http://www.bbc.co.uk/music/artists/fb7272ba-f130-4f0a-934d-6eeea4c18c9a#artist> .
  ?event1 po:time ?t1 .
  ?event2 po:time ?t2 .
  ?t1 tl:timeline ?tl .
  ?t2 tl:timeline ?tl .
  ?version po:time ?t .
  ?t tl:timeline ?tl .
  ?programme po:version ?version .
  ?programme rdfs:label ?label .
  • All programmes that featured an artist originating from Northern Ireland - results
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
PREFIX dbprop: <http://dbpedia.org/property/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?programme ?label ?artistlabel ?dbpmaker
  ?event1 po:track ?track1 .
  ?track1 foaf:maker ?maker .
  ?maker rdfs:label ?artistlabel .
  ?maker owl:sameAs ?dbpmaker .
  ?dbpmaker dbprop:origin <http://dbpedia.org/resource/Northern_Ireland> .
  ?event1 po:time ?t1 .
  ?t1 tl:timeline ?tl .
  ?version po:time ?t .
  ?t tl:timeline ?tl .
  ?programme po:version ?version .
  ?programme rdfs:label ?label .

(Note that we just need the owl:sameAs in the above query as the Talis end-point doesn't support inference)

Let us know what kind of query you can come up with this data! :-)

Tuesday 12 May 2009

Yahoo Hackday 2009


We went to the Yahoo Hackday this week end, with a couple of people from the C4DM and the BBC. Apart from a flaky wireless connection on the Saturday, it was a really great event, with lots of interesting talks and interesting hacks.

On the Saturday, we learned about Searchmonkey. I tried to create a small searchmonkey application during the talk, but eventually got frustrated. Apparently, Searchmonkey indexes RDFa and eRDF , but doesn't follow <link rel="alternate"/> links towards RDF representations (neither does it try to do content negotiation). So in order to create a searchmonkey application for BBC Programmes, I needed to either include RDFa in all the pages (which, hem, was difficult to do in an hour :-) ) or write an XSLT against our RDF/XML representations, which would just be Wrong, as there are lots of different ways to serialise the same RDF in an RDF/XML document.

We also learned about the Guardian Open Platform and Data Store, which holds a huge amount of interesting information. The license terms are also really permissive, even allowing commercial uses of this data. I can't even imagine how useful this data would be if it were linked to other open datasets, e.g. DBpedia, Geonames or Eurostat.

I got also a bit confused by YQL, which seems to be really similar to SPARQL, at least in the underlying concept ("a query language for the web"). However, it seems to be backed by lots of interesting data: almost all of Yahoo services, and a few third-party wrappers, e.g. for Last.fm. I wonder how hard it would be to write a SPARQL end-point that would wrap YQL queries?

Finally, on Saturday evening and Sunday morning, we got some time to actually hack :-) Kurt made a nice MySpace hack, which does an artist lookup on MySpace using BOSS and exposes relevant information extracted using the DBTune RDF wrapper, without having to look at an overloaded MySpace page. It uses the Yahoo Media Player to play the audio files this page links to.

At the same time, we got around to try out some of the things that can be built using the linked data we publish at the BBC, especially the segment RDF I announced on the linked data mailing list a couple of weeks ago. We built a small application which, from a place, gives you BBC programmes that feature an artist that is related in some way to that place. For example, Cardiff, Bristol, London or Lancashire. It might be bit slow (and the number of results are limited) as I didn't have time to implement any sort of caching. The application is crawling from DBpedia to BBC Music to BBC Programmes at each request. I just put the (really hacky) code online.

And we actually won the Backstage price with these hacks! :-)

This last hack illustrates to some extent the things we are investigating as part of the BBC use-cases of the NoTube project. Using these rich connections between things (programmes, artists, events, locations, etc.), it begins to be possible to provide data-rich recommendations backed by real stories (and not only "if you like this, you may like that"). I mentioned these issues in the last chapter of my thesis, and will try to follow up on that here!

Friday 17 April 2009

Brands, series, categories and tracklists on the new BBC Programmes

I just posted a small article on the BBC Radio Labs blog about the new features of the BBC Programmes website. Hopefully that makes some sense and highlights some of the things we've been working on over the last six months! Spoiler: lots of nice nice RDF :-)

Monday 15 December 2008


Last week, around 10 people from BBC A&Mi, including myself, gathered for two days of hardware hacking. The goal was to build a Rockterscale -- a device that was able to measure how much a band rocks. Since I haven't done any real-time audio processing in a long time, I decided to give that a go - analysing a live audio input and extract some of its characteristics. I used Paul Brossier's Aubio library to do so, as it seemed relatively easy to hack, and was already doing something we thought was great for visualisation purposes: beat tracking from a live audio input. After the first day, we had a bit of C code that extracted the loudness, the spectral centroid and the spectral spread from the live audio input. Then, we sent over the normalised data using Open Sound Control to the visualisation components.

But, of course, the audio signal is not the only thing to consider in order to determine how much a band rocks! We used a number of sensors to capture the reactions of the crowd:

  • The Hat of Rock, capturing some headbanging data:


  • An accelerometer under the dance-floor/mosh-pit, and a force sensor hooked on the crash barrier:


  • A webcam capturing how much movement there is in the crowd:


All the data fed by these different components was visualised on a screen: C

and on a physical rockterscale (yes, it does go up to 11 :-)) C

Here is a small video of all that into action! (I think the best part is the BBC A&Mi people dancing on Ace of Spades to try out the system :-) ).

Wednesday 3 September 2008

Good-bye C4DM, hello BBC!

I've been rather quiet for the last month: intense PhD writing. I have been trying to get it fully written by the end of September. Indeed, I will be joining BBC Audio & Music at the end of the month. I am really really excited about that! Of course, I am a bit sad to leave the Centre for Digital Music, after three fantastic years spent there: great people, great work, great projects, great art and great beer :-)

Monday 28 July 2008

Music Ontology linked data on BBC.co.uk/music

Just a couple of minutes ago on the Music Ontology mailing list, Nicholas Humfrey from the BBC announced the availability of linked data on BBC Music.

$ rapper -o turtle \

   a mo:MusicGroup;
   foaf:name "Coldplay";
   owl:sameAs <http://dbpedia.org/resource/Coldplay>;

This is just really, really, really great... Congratulations to the /music team!

Update: Tom Scott just wrote a really nice post about the new BBC music site, explaining what the BBC is trying to achieve by going down the linked data path.

Wednesday 9 July 2008


We learned yesterday that DBTune was nominated for the Triplify Challenge! The other seven projects are really interesting as well, so I guess the competition will be really high! The final results will be given at the I-Semantics conference in early September.

Also, Tim Berners-Lee made a great talk about linked data and the semantic web on Radio 4 earlier today. The first use-case he mentions sounds quite familiar: finding bands based on geo-location data. He already mentioned that in one of his blog posts, linking to this screencast.

An interesting discussion took place on the Linking Open Data mailing list just afterwards, to gather use-cases for explaining to a general public what linked data can be useful for.

Wednesday 25 June 2008

Linking Open Data: BBC playcount data as linked data

For the Mashed event this week end, the BBC released some really interesting data. This includes playcount data, stating how much an artist is featured within a particular BBC programmes (at the brand or episode level).

During the event, I wrote some RDF translators for this data, linking web identifiers in the DBTune Musicbrainz linked data to web identifiers in the BBC Programmes linked data. We used it with Kurt and Ben in our hack. Ben made a nice write-up about it. By finding web identifiers for tracks in a collection and following links to the BBC Programmes data, and finally connecting this Programmes data to the box holding all recorded BBC radio programmes over a year that was available at the event, we can quite easily generate playlists from an audio collection. Two python scripts implementing this mechanism are available there. The first one uses solely brands data, whereas the second one uses episodes data (and therefore helps to get fewer and more accurate items in the resulting playlist). Finally, the thing we spent the most time on was the SQLite storage for our RDF cache :-)

This morning, I published the playcount data as linked data. I wrote a new DBTune service for that. It publishes a set of web identifiers for playcount data, interlinking Musicbrainz and BBC Programmes. I also put online a SPARQL end-point holding all this playcount data along with aggregated data from Musicbrainz and the BBC Programmes linked data (around 2 million triples overall).

For example, you can try the following SPARQL query:

SELECT ?brand ?title ?count
   ?artist a mo:MusicArtist;
      foaf:name "The Beatles". 
   ?pc pc:object ?artist;
       pc:count ?count.
   ?brand a po:Brand;
       pc:playcount ?pc;
       dc:title ?title 
    FILTER (?count>10)}

This will return every BBC brand that has featured The Beatles more than 10 times.

Thanks to Nicholas and Patrick for their help!


I was at Mashed (the former Hack Day) this week-end - a really good and geeky event, organised by the BBC at Alexandra Palace. We arrived on the Saturday morning for some talks, detailing the different things we'd be able to play with over the week-end. Amongst these, a full DVB-T multiplex (apparently, it was the first time since 1956 that a TV signal was broadcasted from Alexandra Palace), lots of data from the BBC Programmes team and a box full of recorded radio content over the last year.

After these presentations, the 24 hours hacking session began. We sat down with Kurt and Ben and wrote a small hack which basically starts from a personal music collection and creates you a playlist of recorded BBC programmes. I will write a bit more about this later today

During the 24 hours hack, we had a Rock Band session on big screen, a real-world Tron game (basically, two guys running with GPS phones, guided by two persons watching their trail on a google satellite map :-) ), a rocket launching...

Finally, at 2pm on the Sunday, people presented their hacks. Almost 50 hacks were presented, all extremely interesting. Take a look at the complete list of hacks! On the music side, Patrick's recommender was particularly interesting. It used Latent Semantic Analysis on playcount data for artists in BBC brands and episodes to recommend brands from artists or artists from artists. It gave some surprising results :-) Jamie Munroe resurrected the FPFF Musicbrainz fingerprinting algorithm (which was apparently due to replace the old TRM one before MusicIP offered their services to Musicbrainz) to identify tracks played several times in BBC programmes. The WeDoID3 team talked about creating RSS feeds from embedded metadata in audio and video, but the demo didn't work.

My personal highlight was the hack (which actually won a prize) from Team Bob. Here is a screencast of it:

BBC Dylan - News 24 Revisited (Clip) from James Adam on Vimeo.

Thanks to Matthew Cashmore and the rest of the BBC backstage team for this great event! (and thanks to the sponsors for all the free stuff - I think I have enough T-shirts for about a year now :-))

Tuesday 4 December 2007

Linking open data: interlinking the BBC John Peel sessions and the DBPedia datasets

For the last Hackday, the BBC released data about the John Peel sessions. In June, I did publish them as Linked Data, using a SWI-Prolog representation of this data, bundled with a custom P2R mapping. But so far, it was not interlinked with any dataset, making it a small island :-)

In order to enrich my interlinking experiences, I wanted to tackle a dataset I never really tried to link to: DBPedia, holding structured data extracted from Wikipedia (well, the Magnatune RDF dataset is linked to it, but just for geographical locations, which was fairly easy).

And my conclusion is... it is not that easy :-)

Try 1 - Matching on labels:

The first thing I tried was matching directly on labels, using an algorithm which might be summarized by:

1 - For all items I_i in the John Peel sessions dataset we want to link (instance of foaf:Agent or mo:MusicalWork, take label L_i ({I_i rdfs:label L_i})

2 - Issue the following SPARQL query to the DBPedia dataset:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
?u rdfs:label L_i

3 - For all results R_i_j of this query, assert I_i owl:sameAs R_i_j

Just for fun, here are the results of such a query. You can't imagine how many people are called exactly the same... For example Jules Verne in the BBC John Peel sessions dataset is quite different from Jules Verne... Also, Jules Verne is quite different from the Jules Verne category, though they share the same label.

Try 2 - Still matching on labels, but with restrictions:

In the agent case, the disambiguation appears easy to achieve, by just expressing ''I am actually looking for someone which could be somehow related to the John Peel sessions''. But, err, Wikipedia (hence, DBPedia) is a bit messy some times, and it is quite difficult to find a reliable and consistent way of expressing this criteria. So I had to sample the John Peel data (taking some producers, some engineers, some artists, some bands) and look out manually how I could restrict the range of resources I was looking for in DBPedia and still be able to retrieve all my linked agents. This leads to the following SPARQL query (involving L_i as defined earlier):

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
      {?u <http://dbpedia.org/property/name> L_i} UNION 
      {?u rdfs:label L_i} UNION 
      {?u <http://dbpedia.org/property/bandName> L_i} 
       {?u <http://dbpedia.org/property/wikiPageUsesTemplate> <http://dbpedia.org/resource/Template:infobox_musical_artist>} UNION 
       {?u a <http://dbpedia.org/class/yago/Group100031264>} UNION 
       {?u a ?mus.?mus rdfs:subClassOf <http://dbpedia.org/class/yago/Musician110340312>} UNION 
       {?u a ?artist. ?artist rdfs:subClassOf <http://dbpedia.org/class/yago/Creator109614315>} 

And for musical works?

Of course, this does not hold as soon as I broaden the range of resources I want to link. I first tried to use exactly the same methodology (basically restricting the resources I was looking for to be related to the Yago song concept). But, err, it did not work that well :-) You can't imagine how many songs have the same name! Just look at the results - this is enlightening! So far, the best I found was Walked Away, which appears to be the most popular title :-)

So what did I do to disambiguate? I took these results, got the RDF corresponding to the DBPedia resources, and made a literal search using the SWI literal index module on the abstract, looking for the name of the artist involved in the performance of this work. That's a bit hacky, but well, it did well even with cover songs (like Nirvana's Love Buzz).


Surprisingly, the results do not seem to be too bad! I still have to check them more, but nothing seems obviously wrong, at first glance (I went through all the links manually, looking for something that would not make sense). The links are available here.

The SWI-Prolog code doing the trick is available here. Sorry, the code is a bit messy, and got increasingly hacky...

Wednesday 11 July 2007

John Peel sessions available as RDF

Yesterday, I put online the John Peel sessions as linked data (dereferencable identifiers, content negotiation, RDF, etc.).

It uses the data the BBC has released for the Hackday, some weeks ago. I wrote a SWI-Prolog wrapper for this data, which is then made accessible through SPARQL using P2R (which I have updated to handle dynamic construction of literals, by the way) and this mapping. The URIs are then made dereferencable through UriSpace.

Some documentation is available there.

Here are a bunch of URIs that you can try:

And then, for example

$ curl -L -H "Accept: application/rdf+xml" http://dbtune.org/bbc/peel/artist/1036
<?xml version='1.0' encoding='UTF-8'?>
    <!ENTITY foaf 'http://xmlns.com/foaf/0.1/'>
    <!ENTITY mo 'http://purl.org/ontology/mo/'>
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'>
    <!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'>

<mo:MusicArtist rdf:about="http://dbtune.org/bbc/peel/artist/1036">
  <rdfs:label rdf:datatype="&xsd;string">King Crimson</rdfs:label>
  <foaf:img rdf:resource="http://bbc.co.uk/music/king_crimson.jpg"/>
  <foaf:name rdf:datatype="&xsd;string">King Crimson</foaf:name>

<rdf:Description rdf:about="http://dbtune.org/bbc/peel/session/1788">
  <mo:performer rdf:resource="http://dbtune.org/bbc/peel/artist/1036"/>

<rdf:Description rdf:about="http://dbtune.org/bbc/peel/session/1789">
  <mo:performer rdf:resource="http://dbtune.org/bbc/peel/artist/1036"/>


So far, this dataset is not linked to anything external! But I plan to link it to Musicbrainz, Geonames, and Last.fm snippets soon.