Live SPARQL end-point for BBC Programmes
By Yves on Thursday 14 January 2010, 12:30 - Permalink
Update: We seem to have an issue with the 4store hosting the dataset, so the data is stale since the end of February. Update 2: All should be back to normal and in sync. Please comment on this post if you spot any issue, or general slowliness.
Last year, we got OpenLink and Talis to crawl BBC Programmes and provide two SPARQL end-points on top of the aggregated data. However, getting the data by crawling it means that the end-points did not have all the data, and that the data can get quite outdated -- especially as our programme data changes a lot.
At the moment, our data comes from two sources: PIPs (the central programme
database at the BBC) and PIT (our content mangement system for programme
information). In order to populate the /programmes database, we monitor changes
on these two sources and replicate them on our database. We have a small piece
of Ruby/ActiveRecord
software (that we call the Tapp
) which handles this process.
I made a small experiment, converting our ActiveRecord objects to RDF and hooking an HTTP POST or an HTTP DELETE request to a 4store instance for each change we receive. This means that this 4store instance is kept in sync with upstream data sources.
It took a while to backfill, but it is now up-to-date. Check out the SPARQL end-point, a test SPARQL query form and the size of the endpoint (currently about 44 million triples).
The end-point holds all information about services, programmes, categories, versions, broadcasts, ondemands, time intervals and segments, as defined within the Programme Ontology. All of these resources are held within their own named graph, which means we have a very large number of graphs (about 5 million). It makes it far easier to update the endpoint, as we can just replace the whole graph whenever something changes for a resource.
This is still highly experimental though, and and I already found a few bugs: some episodes seem to be missing (for example, some Strictly Come Dancing episodes are missing, for some reason). I've also encountered some really weird crashes of the machine hosting the end-point when concurrently pushing a large number of RDF documents at it - I still didn't succeed to identify the cause of it. To summarise: it might die without notice :-)
Here are some example SPARQL queries:
- All programmes related to James Bond:
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?uri ?label
WHERE {
?uri po:category
<http://www.bbc.co.uk/programmes/people/bmFtZS9ib25kLCBqYW1lcyAobm8gcXVhbGlmaWVyKQ#person> ; rdfs:label ?label
}
- FInd all Eastenders broadcast dates after 2009-01-01, along with the type of the version that was broadcast:
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
PREFIX po: <http://purl.org/ontology/po/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?version_type ?broadcast_start
WHERE
{ <http://www.bbc.co.uk/programmes/b006m86d#programme> po:episode ?episode .
?episode po:version ?version .
?version a ?version_type .
?broadcast po:broadcast_of ?version .
?broadcast event:time ?time .
?time tl:start ?broadcast_start .
FILTER ((?version_type != <http://purl.org/ontology/po/Version>) && (?broadcast_start > "2009-01-01T00:00:00Z"^^xsd:dateTime))}
- Find all programmes that featured both the Foo Fighters and Al Green:
PREFIX po: <http://purl.org/ontology/po/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?programme ?label
WHERE {
?event1 po:track ?track1 .
?track1 foaf:maker ?maker1 . ?maker1 owl:sameAs <http://www.bbc.co.uk/music/artists/67f66c07-6e61-4026-ade5-7e782fad3a5d#artist> .
?event2 po:track ?track2 .
?track2 foaf:maker ?maker2 . ?maker2 owl:sameAs <http://www.bbc.co.uk/music/artists/fb7272ba-f130-4f0a-934d-6eeea4c18c9a#artist> .
?event1 event:time ?t1 .
?event2 event:time ?t2 .
?t1 tl:timeline ?tl .
?t2 tl:timeline ?tl .
?version po:time ?t .
?t tl:timeline ?tl .
?programme po:version ?version .
?programme rdfs:label ?label .
}
Comments
Live is live! The project is on a good way.
Cheers
Very intriguing stuff. Well done.
Amazing coding, got to give it a try to find some Foo Fighters, thanks!
The endpoint is very slow and often with error http 0.
If you have a cool sparql request (like When the next episode of Doctor Who ?), you can use en.sparql.pro to share it. (in french fr.sparql.pro)
Bye.
Hello Karima!
Yes, I am not maintaining that end-point anymore, it was taking too much time, as it's not hosted on the BBC network and everything had to go through a proxy which failed over quite a lot.
It did demonstrate a few useful things (and helped identify a few bugs in 4store too), so I still think it was useful though.
Best,
y
<p><a href="http://www.moncleroutletsito2012s.c..."><strong>Moncler Outlet</strong></a> Utilizzare solo il collo fino al petto e l'addome Tra Yamao,<a href="http://www.moncleroutletsito2012s.c..."><strong>Outlet Moncler</strong></a> resistenza all'acqua Moncler morbido e alto, <a href="http://www.moncleroutletsito2012s.c..."><strong>Piumini Moncler</strong></a> giù più della media più leggero e thinner.<a href="http://www.moncleroutletsito2012s.c..."><strong>Moncler 2012</strong></a> Non solo la forma delicata, può liberamente flex.</p>
connected with world bidding process incidents getting gripped. Approaching revenues into airfare for sale homes have a tendency to end up being recorded by means.<br><p>
item listings within the interesting. All the same, these types ads may end up publisized not too regularly. Its competition within these online auctions might also be quite high seeing that people following the documents can read them. One more great store to locate individuals parties is considered the website. Line is usually a insanity workout video schedule put by which everything: competitions, sales made, deals, and a lot more. are undoubtedly introduced along with performed. In case the mag hunt did not make successes to you, on the internet has to be your.<br><p>
calming alternate option. Invest time to web search and get tried.
something special, although it is actually is included around rust. Some folks like to put up for sale a real compromised motor vehicle as compared to to use a complex renovation job, as soon as you're looking at retro cars on sale, make sure your your skills throw open to get a nonvisual precious stone. Take a trip to van demonstrates to together with adhere to therapist catalogs to keep up with valuable motor bikes within driving distance. You don't have a hurt in taking a look, to encounter this kind of career enjoyed insanity workout reviews schedule skill or awareness that might discovering.<br><p>
your vintage truck often. You'll discover classic cars available on Internet.
Athens GA septic tank providers will variety
from the sewerage process maintenance, comprehensive finish installation, cleaning providers to draining very.
It doesn't halt there; lots of additional are available together with guidance on what's the very best you are able to get from a septic technique with
out shelling out a lot