After a very long time writing it, we finally have a BBC Semantic Web use-case on the W3C website! It describes work we did around BBC Programmes, BBC Music, BBC Wildlife Finder and Search+. I hope it all makes a bit of sense :-) For a more detailed writeup about these issues, Patrick's Linked Data on the BBC are very good.
Thursday 14 January 2010
Live SPARQL end-point for BBC Programmes
By Yves on Thursday 14 January 2010, 12:30
Update: We seem to have an issue with the 4store hosting the dataset, so the data is stale since the end of February.
Last year, we got OpenLink and Talis to crawl BBC Programmes and provide two SPARQL end-points on top of the aggregated data. However, getting the data by crawling it means that the end-points did not have all the data, and that the data can get quite outdated -- especially as our programme data changes a lot.
At the moment, our data comes from two sources: PIPs (the central programme
database at the BBC) and PIT (our content mangement system for programme
information). In order to populate the /programmes database, we monitor changes
on these two sources and replicate them on our database. We have a small piece
of Ruby/ActiveRecord
software (that we call the Tapp
) which handles this process.
I made a small experiment, converting our ActiveRecord objects to RDF and hooking an HTTP POST or an HTTP DELETE request to a 4store instance for each change we receive. This means that this 4store instance is kept in sync with upstream data sources.
It took a while to backfill, but it is now up-to-date. Check out the SPARQL end-point, a test SPARQL query form and the size of the endpoint (currently about 44 million triples).
The end-point holds all information about services, programmes, categories, versions, broadcasts, ondemands, time intervals and segments, as defined within the Programme Ontology. All of these resources are held within their own named graph, which means we have a very large number of graphs (about 5 million). It makes it far easier to update the endpoint, as we can just replace the whole graph whenever something changes for a resource.
This is still highly experimental though, and and I already found a few bugs: some episodes seem to be missing (for example, some Strictly Come Dancing episodes are missing, for some reason). I've also encountered some really weird crashes of the machine hosting the end-point when concurrently pushing a large number of RDF documents at it - I still didn't succeed to identify the cause of it. To summarise: it might die without notice :-)
Here are some example SPARQL queries:
- All programmes related to James Bond:
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?uri ?label
WHERE {
?uri po:category
<http://www.bbc.co.uk/programmes/people/bmFtZS9ib25kLCBqYW1lcyAobm8gcXVhbGlmaWVyKQ#person> ; rdfs:label ?label
}
- FInd all Eastenders broadcast dates after 2009-01-01, along with the type of the version that was broadcast:
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
PREFIX po: <http://purl.org/ontology/po/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?version_type ?broadcast_start
WHERE
{ <http://www.bbc.co.uk/programmes/b006m86d#programme> po:episode ?episode .
?episode po:version ?version .
?version a ?version_type .
?broadcast po:broadcast_of ?version .
?broadcast event:time ?time .
?time tl:start ?broadcast_start .
FILTER ((?version_type != <http://purl.org/ontology/po/Version>) && (?broadcast_start > "2009-01-01T00:00:00Z"^^xsd:dateTime))}
- Find all programmes that featured both the Foo Fighters and Al Green:
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?programme ?label
WHERE {
?event1 po:track ?track1 .
?track1 foaf:maker ?maker1 . ?maker1 owl:sameAs <http://www.bbc.co.uk/music/artists/67f66c07-6e61-4026-ade5-7e782fad3a5d#artist> .
?event2 po:track ?track2 .
?track2 foaf:maker ?maker2 . ?maker2 owl:sameAs <http://www.bbc.co.uk/music/artists/fb7272ba-f130-4f0a-934d-6eeea4c18c9a#artist> .
?event1 event:time ?t1 .
?event2 event:time ?t2 .
?t1 tl:timeline ?tl .
?t2 tl:timeline ?tl .
?version po:time ?t .
?t tl:timeline ?tl .
?programme po:version ?version .
?programme rdfs:label ?label .
}
Tuesday 27 October 2009
Music recommendation and Linked Data
By Yves on Tuesday 27 October 2009, 02:43
We just presented yesterday at ISMIR a tutorial about Linked Data for music-related information. More information on the tutorial is available on the tutorial website, and the slides are also available.
In particular, we had two sets of slides dealing with the relationship between music recommendation and linked data. As this is something we're investigating within the NoTube project, I thought I would write up a bit more about it.
Let's focus on artist to artist recommendation for now. If we look at last.fm for recommendations for New Order, here is what we get.
Similarly, using the Echonest API for similar artists, we get back an ordered list of artists similar to New Order, including Orchestral Manoeuvres in the Dark, Depeche Mode, etc.
Now, let's play word associations for a few bands and musical genres. My colleague Michael Smethurst took the Sex Pistols, Acid House and Public Enemy, and draw the following associations:
We can see that among the different terms in these diagrams, some refer to people, to TV programmes, to fashion styles, to drugs, to music hardware, to places, to laws, to political groups, to record labels, etc. Just a couple of these terms are actually other bands or tracks. If you were to describe these artists just in musical terms, you'd probably be missing the point. And all these things are also linked to each other: you could play word associations for any of them and see what are the connections between Public Enemy and the Sex Pistols. So how does that relate to recommendations? When recommending an artist from another artist, the context is key. You need to provide an explanation of why they actually relate to each other, whether it's through common members, drugs, belonging to the same independent record label, acoustically similar (if so, how exactly), etc. The main hypothesis here being that users are much more likely to be accepting a recommendation that is explicitly backed by some contextual information.
On the BBC website, we cover quite a few domains, and we try to create as much links as possible between these domains, by following the Linked Data principles. From our BBC Music site, we can explore much more information, from other BBC content (programmes, news etc.) to other Linked Data sources, e.g. DBpedia, Freebase and Musicbrainz. This provides us with a wealth of structured information that we would ultimately want to use for driving and backing up our recommendations.
The MusicBore I've described earlier on this blog kind of uses the same approach. Playlists are generated by following paths in Linked Data. Introduction of each artists is done by generating a sentence from the path leading from the seed artist to the target artist. The prototype described in that paper from the SDOW workshop last year also illustrates that approach.
So we developed a small prototype of these kind of ideas, rqommend (and when I say small, it is very small :) ). Basically, we define "relatedness rules" in the form of SPARQL queries, like "Two artists born in Detroit in the 60s are related". We could go for very general rules, e.g. "Any paths between two artists make them related", but it would be very hard to generate an accurate textual explanation for it, and might give some, hem, not very interesting connections. Then, we just go through these rules on an aggregation of Linked Data, and generate recommendations from them. Here is a greasemonkey script injecting such recommendations with BBC Music (see for example the Fugazi page). It injects Linked Data based recommendations, along with the associated explanation, within BBC artist pages. For example, for New Order:
To conclude, I think there is a really strong influence of traditional
information retrieval systems on the music information retrieval community. But
what makes Google, for example, particularly successful is to exploit links,
not the documents themselves. We definitely need to go towards the same sort of
model. Exploiting links surrounding music, and all the cross-domain information
that makes it so rich, to create better music recommendation systems which
combine the what
is recommended with the why
it is
recommended.
Thursday 10 September 2009
Linked Data London event screencasts and London Web Standards meetup
By Yves on Thursday 10 September 2009, 12:03
With Tom Scott, we presented a talk on contextualising BBC programmes using linked data for the Linked Data London event. For the occasion, I made a couple of screencasts.
The first one shows some browsing of the linked data we expose on the BBC website, using the Tabulator Firefox extension. I start by getting to a Radio 2 programme, to get to its segmentation in musical tracks, to get to another programme featuring one of the tracks, to get to another artist featured in that programme. The Tabulator ends up displaying data aggregated from BBC Programmes, BBC Music and DBpedia.
Exploring BBC programmes and music data using the Tabulator
The second one shows what you can do by using these programmes/artists and artists/programmes links. We built some very straight-forward programme to programme recommendation using them. On the right-hand side of the programme page, there are recommendations, based on artists played in common. The recommendations are scoped by the availability of the programme on iPlayer or by the fact it has an upcoming broadcast. If you hover over those recommendations, it will display what allowed us to derive it: here, a list of common artists played in the two programmes. This work is part of our investigations within the NoTube European project.
Artist-based programme to programme recommendations
Also, as
Michael already posted on Radio Labs, we gave a presentation to the
London Web Standards group on
Linked Data. It was a very nice event,
especially as mainly web developers turned up. Linked data events tend to be
mostly about linked data evangelists talking to other linked data evangelists
(which is great too!), so this was quite different :-) Lots of interesting
questions about provenance and trustworthiness of data were asked, which are
always a bit difficult to answer, apart from the usual it's just the Web,
you can deal with it as you do (or don't) currently with Web data, e.g. by
keeping track of provenance information and filtering based on that
.
Somebody raised that you could make some statistics on how many times a
particular statement is repeated in order to derive its trustworthiness, but
this sounds a bit harmful... Currently on the Linked Data cloud, lots of
information gets repeated. For example, if a statement about an artist is
available on DBpedia, there is a fair chance it will get repeated in BBC Music,
just because we also use Wikipedia as an information source. The fact that this
statements gets repeated doesn't make it more valid.
Friday 14 August 2009
4Store stuff
By Yves on Friday 14 August 2009, 12:45
I've been playing a lot with Garlik's 4store
recently, and I have been building a few things around it. I just finished
building packages for Ubuntu Jaunty, which you
can get by adding the following lines in your
/etc/apt/sources.list:
deb http://moustaki.org/apt jaunty main deb-src http://moustaki.org/apt jaunty main
And then, an apt-get update && apt-get install 4store
should do the trick. The packages are available for i386 and amd64. It is also
one of my first packages, so feedback is welcomed (I may have gotten it
completely wrong). After being installed, you can create a database and start a SPARQL server.
I've also been writing two client libraries for 4store, all available on Github:
- 4store-php, a PHP library to interact with 4store over HTTP (so not exactly similar to Alexandre's PHP library, which interacts with 4store through the command-line tools);
- 4store-ruby, a Ruby library to interact with 4store over HTTP or HTTPS.
Monday 13 July 2009
Music Hack Day and the MusicBore
By Yves on Monday 13 July 2009, 10:58
This week end was the Music Hackday in London. The event was great, with everything a hack day need: pizza, beer, and technical glitches during the demos :-) And, of course, lots of awesome hacks (including that amazing visualisation which didn't make it into the list for some reason).
With Christopher, Patrick and Nick, we created the MusicBore. The MusicBore is a completely automated radio DJ, which, well, tends to be really boring :-) We actually won two prizes with it! The Last.fm one and the best hack one! We were really, really happy :-) But let the musicbore introduce itself:
Hello. I am the Music Bore. I play music and I like to tell you ALL about
the music I play. I live on IRC. I get my information from BBC Music, BBC
Programmes, last fm, the Echo Nest, Yahoo Weather and the web of Linked Data.
To find out more, please visit bit.ly/musicbore. You can dissect my disgusting
innards on github.
Now let me play you some music.
Here it is in action, the first one walks through Soul-ish tunes, whereas the second one goes into French punk-rock:
The Music Bore - Video 2 from Nicholas Humfrey on Vimeo.
The Music Bore - Video 1 from Nicholas Humfrey on Vimeo.
The MusicBore is powered by a new and exciting messaging technology: IRC :-) Lots of bots sit around in an IRC channel and talk to each other to create a radio show. The show is entirely created live, each bot contributing a specific ability to it. Just before the hack presentation, we had 10 bots in the same channel (all the bots sources are on github):- controller: In charge of starting the show by playing an introduction and choose a new song drawn from the BBC Music charts. Also in charge of re-drawing a new song in case the other bots gets stuck.
- thebore: Renders information about a particular artist from BBC Music, BBC Programmes, Wikipedia, Last.fm and other Linked Data.
- connectionfinder: Given a seed artist, gives the next one in the playlist, along with an explanation of how it was chosen. Basically walks through Linked Data to discover new artists.
- placefinder: Given a seed artist, gives the next one in
the playlist, along with an explanation of how it was chosen. This bot is
constrained to go through places, so will give connections like
Did you know that David Guetta is born in Paris, and that Georges Garvarentz died in the same place?
- musicfinder: Finds music content for an artist, using BBC Programmes segment data and the Surge Radio RDF.
- trackfinder: Finds music content for an artist, using the Echonest API
- irc2play: Say sentences and play tracks mentioned on IRC, mixing them using Jack, Madjack and JackMiniMix
- weatherbot: Renders weather information from Yahoo
- imagebot: Finds an image from BBC Music for a particular artist.
- hotnessfinder: Find the hotness for an artist using the Echonest API, and constructs a sentence from it.
Here is a small diagram of how the different bots interact with each other (if you don't understand that diagram, that's fine, we don't either, but it does seem to work :-) ).

As I thought they were quite fun, here are the logs of the bbcmusicbore channel over the week-end. Here is a small best-of:
14:05 < yvesr> trackfinder: weezerThis happens a lot at the beginning of the log, bots just throwing exceptions and quitting irc.
14:05 -!- trackfinder [n=trackfin@77.91.248.124] has quit
[Remote closed the connection]
11:25 < thebore> connectionfinder:http://dbpedia.org/resource/David_GuettaGod. Nicolas Sarkozy is really in every media, even the randomly generated ones! thebore got enlightened on that one.
11:25 < connectionfinder> say:Did you know that David Guetta was born in
France which has leader name Nicolas Sarkozy, which has spouse Carla
Bruni?
11:25 < thebore> say:Carla Bruni has released 3 records. No Promises was
just terrible.
10:41 < yvesr> connectionfinder:http://dbpedia.org/resource/Barry_WhiteA morbid playlist generation :-)
10:42 < connectionfinder> say:Did you know that Barry White died in the
same city as Chico Hamilton, Los Angeles?
11:07 < thebore> say:Colin Murray on BBC Radio One is a massive fan. TheyBootlegs FTW!
have a myspace at myspace.com/metallica. They sound a bit like Megadeth.
They have released 524 records. I really liked 2008-11-01: Rose Garden,
Portland, OR, USA.
11:43 < yvesr> hotnessfinder: michael jacksonNo comments
11:43 < hotnessfinder> say: Wow, michael jackson is really, really hot
right now! Did he die recently?
12:50 < thebore> connectionfinder:http://dbpedia.org/resource/Snoop_DoggNo, I didn't!
12:50 < connectionfinder> say:Did you know that Dogg, Snoop is signed on
Death Row Records which was founded by Dr. Dre, which used to be quite
close to World Class Wreckin' Cru?
13:56 < thebore> say:Beverley Knight sounds a bit like Eternal.No, I didn't know either!
In September 2005 Knight joined the likes of Bob Dylan and Joni
Mitchell by being made a Doctor of Music after receiving the honorary
degree from the University of Wolverhampton.
Right now, all the bots are offline (they were running on our laptops during the week end), but we plan to use the Amazon Web Services vouchers we won to make them run for a while :-)
Monday 15 June 2009
And another fun BBC SPARQL query
By Yves on Monday 15 June 2009, 13:53
This query returns BBC programmes featuring artists originating from France (this is just a straight adaptation of the last query in my previous post).
The results are quite fun! Apparently, the big French hits on the BBC are from Jean-Michel Jarre, Air, Modjo, Phoenix (are they known in France? I've only heard of them in the UK) and Vanessa Paradis.
Note that the tracklisting data we expose in our RDF just goes back a couple of months, so that might explain why the list is not bigger.
Thursday 11 June 2009
BBC SPARQL end-points
By Yves on Thursday 11 June 2009, 00:15
We recently announced on the BBC backstage blog the availability of two SPARQL end-points, one hosted by Talis and one by OpenLink. These two companies aggregated the RDF data we publish at http://www.bbc.co.uk/programmes and http://www.bbc.co.uk/music. This opens up quite a lot of fascinating SPARQL queries. Talis already compiled a small list, and here are a couple I just designed:
- Give me programmes that deal with the fictional character
James Bond
- results
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?uri ?label
WHERE {
?uri po:person
<http://www.bbc.co.uk/programmes/people/bmFtZS9ib25kLCBqYW1lcyAobm8gcXVhbGlmaWVyKQ#person> ; rdfs:label ?label
}
- GIve me artists that were featured in the same programme as the Foo Fighters - results
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
SELECT DISTINCT ?artist2 ?label2
WHERE {
?event1 po:track ?track1 .
?track1 foaf:maker <http://www.bbc.co.uk/music/artists/67f66c07-6e61-4026-ade5-7e782fad3a5d#artist> .
?event2 po:track ?track2 .
?track2 foaf:maker ?artist2 .
?artist2 rdfs:label ?label2 .
?event1 po:time ?t1 .
?event2 po:time ?t2 .
?t1 tl:timeline ?tl .
?t2 tl:timeline ?tl .
FILTER (?t1 != ?t2)
}
- Give me programmes that featured both Al Green and the Foo Fighters (yes! there is one result!!) - results
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
SELECT DISTINCT ?programme ?label
WHERE {
?event1 po:track ?track1 .
?track1 foaf:maker <http://www.bbc.co.uk/music/artists/67f66c07-6e61-4026-ade5-7e782fad3a5d#artist> .
?event2 po:track ?track2 .
?track2 foaf:maker <http://www.bbc.co.uk/music/artists/fb7272ba-f130-4f0a-934d-6eeea4c18c9a#artist> .
?event1 po:time ?t1 .
?event2 po:time ?t2 .
?t1 tl:timeline ?tl .
?t2 tl:timeline ?tl .
?version po:time ?t .
?t tl:timeline ?tl .
?programme po:version ?version .
?programme rdfs:label ?label .
}
- All programmes that featured an artist originating from Northern Ireland - results
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
PREFIX dbprop: <http://dbpedia.org/property/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?programme ?label ?artistlabel ?dbpmaker
WHERE {
?event1 po:track ?track1 .
?track1 foaf:maker ?maker .
?maker rdfs:label ?artistlabel .
?maker owl:sameAs ?dbpmaker .
?dbpmaker dbprop:origin <http://dbpedia.org/resource/Northern_Ireland> .
?event1 po:time ?t1 .
?t1 tl:timeline ?tl .
?version po:time ?t .
?t tl:timeline ?tl .
?programme po:version ?version .
?programme rdfs:label ?label .
}
(Note that we just need the owl:sameAs in the above query as
the Talis end-point doesn't support inference)
Let us know what kind of query you can come up with this data! :-)
Tuesday 12 May 2009
Yahoo Hackday 2009
By Yves on Tuesday 12 May 2009, 16:56

We went to the Yahoo Hackday this week end, with a couple of people from the C4DM and the BBC. Apart from a flaky wireless connection on the Saturday, it was a really great event, with lots of interesting talks and interesting hacks.
On the Saturday, we learned about Searchmonkey. I tried to create
a small searchmonkey application during the talk, but eventually got
frustrated. Apparently, Searchmonkey indexes RDFa and eRDF , but
doesn't follow <link rel="alternate"/> links towards RDF
representations (neither does it try to do content negotiation). So
in order to create a searchmonkey application for BBC Programmes, I needed to either
include RDFa in all the pages (which, hem, was difficult to do in an hour :-) )
or write an XSLT against our RDF/XML representations, which would just be
Wrong, as there are lots of different
ways to serialise the same RDF in an RDF/XML document.
We also learned about the Guardian Open Platform and Data Store, which holds a huge amount of interesting information. The license terms are also really permissive, even allowing commercial uses of this data. I can't even imagine how useful this data would be if it were linked to other open datasets, e.g. DBpedia, Geonames or Eurostat.
I got also a bit confused by YQL, which seems to be really similar to SPARQL, at least in the underlying concept ("a query language for the web"). However, it seems to be backed by lots of interesting data: almost all of Yahoo services, and a few third-party wrappers, e.g. for Last.fm. I wonder how hard it would be to write a SPARQL end-point that would wrap YQL queries?
Finally, on Saturday evening and Sunday morning, we got some time to actually hack :-) Kurt made a nice MySpace hack, which does an artist lookup on MySpace using BOSS and exposes relevant information extracted using the DBTune RDF wrapper, without having to look at an overloaded MySpace page. It uses the Yahoo Media Player to play the audio files this page links to.
At the same time, we got around to try out some of the things that can be built using the linked data we publish at the BBC, especially the segment RDF I announced on the linked data mailing list a couple of weeks ago. We built a small application which, from a place, gives you BBC programmes that feature an artist that is related in some way to that place. For example, Cardiff, Bristol, London or Lancashire. It might be bit slow (and the number of results are limited) as I didn't have time to implement any sort of caching. The application is crawling from DBpedia to BBC Music to BBC Programmes at each request. I just put the (really hacky) code online.
And we actually won the Backstage price with these hacks! :-)
This last hack illustrates to some extent the things we are investigating as part of the BBC use-cases of the NoTube project. Using these rich connections between things (programmes, artists, events, locations, etc.), it begins to be possible to provide data-rich recommendations backed by real stories (and not only "if you like this, you may like that"). I mentioned these issues in the last chapter of my thesis, and will try to follow up on that here!
Friday 17 April 2009
Brands, series, categories and tracklists on the new BBC Programmes
By Yves on Friday 17 April 2009, 17:04
I just posted a small article on the BBC Radio Labs blog about the new features of the BBC Programmes website. Hopefully that makes some sense and highlights some of the things we've been working on over the last six months! Spoiler: lots of nice nice RDF :-)
Tuesday 24 March 2009
A sneak peek at the BBC Music RDF
By Yves on Tuesday 24 March 2009, 10:15
The new BBC Music website was launched yesterday, with a lot of Linked Data and RDF goodness. BBC Music provides a truly REST API. Congratulations to the whole team, they did an amazing work! In short, that means that you can easily build applications on top of BBC music data quite easily.
For example, each artist in BBC Music has an RDF representation. For example, Nirvana has an RDF representation, which exposes the aggregated BBC data about this band. The site also supports content negotiation, so doing
$ curl -L -H "Accept: application/rdf+xml" http://www.bbc.co.uk/music/artists/5b11f4ce-a62d-471e-81fc-a69a8278c7da
will lead you to the RDF representation.
Note that this representation includes links to further URIs, allowing you
to discover more data, e.g. about members of that band. It also includes a
owl:sameAs link to the corresponding DBpedia resource, allowing you to aggregate more data
about that band, extracted from Wikipedia's infoboxes.
As an example of a "linked data journey", you can get from Nirvana to Krist Novoselic to the corresponding Krist Novoselic in DBpedia to Compton, California to N.W.A. Lots of really rich data to do interesting thing, like, say, a music recommender :-)
BBC Music also includes RDF representation of reviews, e.g. that one. It also includes an RDF representation of the A to Z, and a search interface returning RDF links to matched artists. For example, here are the results of a search for "Bad Religion", which include a link to an RDF document about it on BBC Music.
Congrats again to Patrick and Nicholas, who did this work on the RDF side of BBC Music!
Tuesday 10 February 2009
Thesis uploaded!
By Yves on Tuesday 10 February 2009, 11:30
I just uploaded my PhD thesis
entitled A Distributed Music Information System
, which I defended on the
22nd of January. My examiners were David de Roure from University of
Southampton and Nicolas
Gold from King's College. My PhD supervisor was Mark Sandler.
Here is the abstract:
Information management is an important part of music technologies today, covering the man- agement of public and personal collections, the construction of large editorial databases and the storage of music analysis results. The information management solutions that have emerged for these use-cases are still isolated from each other. The information one of these solutions manages does not benefit from the information another holds.
In this thesis, we develop a distributed music information system that aims at gathering music- related information held by multiple databases or applications. To this end, we use Semantic Web technologies to create a unified information environment. Web identifiers correspond to any items in the music domain: performance, artist, musical work, etc. These web identifiers have structured representations permitting sophisticated reuse by applications, and these representations can quote other web identifiers leading to more information.
We develop a formal ontology for the music domain. This ontology allows us to publish and interlink a wide range of structured music-related data on the Web. We develop an ontology evaluation methodology and use it to evaluate our music ontology. We develop a knowledge representation framework for combining structured web data and analysis tools to derive more information. We apply these different technologies to publish a large amount of pre-existing music-related datasets on the Web. We develop an algorithm to automatically relate such datasets among each other. We create automated music-related Semantic Web agents, able to aggregate musical resources, structured web data and music processing tools to derive and publish new information. Finally, we describe three of our applications using this distributed information environment. These applications deal with personal collection management, enhanced access to large audio streams available on the Web and music recommendation.
So far, just a PDF is available, as I am still fighting with LaTeX2HTML, but there will be an HTML version some time soon :-) I am also planning to upload, at the same place, some extra annexes and extra results I didn't include in the main document. I think I will also blog here about some of the things included in this thesis.
In case you just want to jump to a particular chapter, I will just give some keywords to the different thesis chapters below:
- Introduction
- Knowledge Representation and Semantic Web technologies: FOL, Description Logics, RDF, Linked Data, OWL, N3.
- Conceptualisation of music-related information: web ontologies, music ontology, time ontology, event ontology, workflow-based modelling
- Evaluation of the Music Ontology framework: ontology evaluation, data-driven evaluation, task-based evaluation, latent dirichlet allocation
- Music processing workflows on the Web: workflows, concurrent transaction logic, N3, N3-Tr, DLP, publication of dynamically generated results, Semantic Web Services
- A web of music-related data: linking open data, dbtune, automated interlinking, quantification of structured web data
- Automated music processing agents: N3-Tr, Henry, music analysis, workflows, prolog
- Case studies: gnat, gnarql, personal music collection management, zempod, music recommendation
- Conclusion
Thursday 29 January 2009
Prolog message queue
By Yves on Thursday 29 January 2009, 12:19
It's been a long time since I last posted anything here, but things have been pretty hectic recently (I am a doctor, now!! I'll post my thesis here soon).
I've just hacked a really small implementation of an HTTP-driven SWI-Prolog message queue. I've often find myself doing quite expensive computation in Prolog and the best way to easily distribute it is to have a message queue on which you post messages to process (in that case, Prolog terms), and a pool of workers pick messages and process them. Then, if you find your program is still too slow, you can easily add a couple of workers to help going faster.
Monday 22 December 2008
New server for DBTune
By Yves on Monday 22 December 2008, 10:55
I completed the move of DBTune to a new shiny server yesterday. Things should go way faster, and the server should have a much better uptime. Overall, our experience with 1and1 hosting has been pretty bad: random server reboots, configuration files erased for no reason, and extremely long delays in getting customer support...
Many many thanks to the Centre for Digital Music for hosting the new DBTune!
Monday 15 December 2008
Rockterscale!
By Yves on Monday 15 December 2008, 18:06
Last week, around 10 people from BBC A&Mi, including myself, gathered for two days of hardware hacking. The goal was to build a Rockterscale -- a device that was able to measure how much a band rocks. Since I haven't done any real-time audio processing in a long time, I decided to give that a go - analysing a live audio input and extract some of its characteristics. I used Paul Brossier's Aubio library to do so, as it seemed relatively easy to hack, and was already doing something we thought was great for visualisation purposes: beat tracking from a live audio input. After the first day, we had a bit of C code that extracted the loudness, the spectral centroid and the spectral spread from the live audio input. Then, we sent over the normalised data using Open Sound Control to the visualisation components.
But, of course, the audio signal is not the only thing to consider in order to determine how much a band rocks! We used a number of sensors to capture the reactions of the crowd:
The Hat of Rock
, capturing some headbanging data:
- An accelerometer under the dance-floor/mosh-pit, and a force sensor hooked on the crash barrier:
- A webcam capturing how much movement there is in the crowd:
All the data fed by these different components was visualised on a screen:

and on a physical rockterscale (yes, it does go up to 11 :-))

Here is a small video of all that into action! (I think the best part is the BBC A&Mi people dancing on Ace of Spades to try out the system :-) ).
Friday 14 November 2008
Reuters OpenCalais joins the linked data cloud
By Yves on Friday 14 November 2008, 10:22
Still more fancy linked data to play with - just a couple of weeks after Freebase announced that they publish linked data, OpenCalais just announced that they are going to publish linked data as well, by joining up the results of their entity extraction service to DBpedia URIs.
Wednesday 29 October 2008
SPARQLing a funk legend
By Yves on Wednesday 29 October 2008, 11:33
Freebase does linked data!
By Yves on Wednesday 29 October 2008, 08:53
Just a small post, live from ISWC: Freebase does linked data!
You can try it there, and you can try this instance, for example.
Added to the wonderful David Huynh's Parallax, that's a lot of great news coming from the other side of the Atlantic :-)
Now, to see whether their linked data actually use the Web! Do they link to other web identifiers, available outside Freebase?
I just noticed something weird, also: the read/write permissions are attached to the tracks/films/whatever resources, instead of being attached to the RDF document itself.
Friday 17 October 2008
Next week conferences
By Yves on Friday 17 October 2008, 09:34
I'll be traveling next week to Vienna, for the Web of Data practitioners days in Vienna, where Keith Alexander and I will be giving the first talk (3 hours, yay!). We plan to do quite an exhaustive introduction to linked data and to what happened over the last few years, with quite a few interesting (hopefully!) examples. I'll also give a small introduction to the Music Ontology and to DBTune in the Multimedia session, on the Thursday. The other speakers are truly amazing, so if you're in Vienna next week, please come along! :-)
Next, Patrick and I will be travelling to Karlsruhe, to attend ISWC 2008. This will be my first ISWC, so I am really looking forward to it! And I just noticed the SWI-Prolog folks are presenting a paper there (Thesaurus-based search in large heterogeneous collections), so this will be the perfect occasion for thanking them for the software framework underlying DBTune :-)
Tuesday 7 October 2008
Vocabulary interlinkage diagram
By Yves on Tuesday 7 October 2008, 11:00
The UMBEL people (Fred and Mike) just released a new interlinkage diagram. This time, it doesn't represent the different links amongst datasets made available within the Linking Open Data project, but rather a map of the links amongst vocabularies used on the data web.
The Music Ontology sits right between FOAF, FRBR and the Event ontology (although I would have added the Timeline ontology as well). There is also the Programmes Ontology, at the bottom (which is also interlinked with the Event ontology, btw).
This diagram really helps to see how the current web ontology
landscape
is structured, and I hope we can keep use it to keep track of the
evolution of web ontologies, a bit like what has been done for
the available datasets and interlinks.
« previous entries - page 1 of 4









