We went to the Yahoo Hackday this week end, with a couple of people from the C4DM and the BBC. Apart from a flaky wireless connection on the Saturday, it was a really great event, with lots of interesting talks and interesting hacks.
On the Saturday, we learned about Searchmonkey. I tried to create
a small searchmonkey application during the talk, but eventually got
frustrated. Apparently, Searchmonkey indexes RDFa and eRDF , but
<link rel="alternate"/> links towards RDF
representations (neither does it try to do content negotiation). So
in order to create a searchmonkey application for BBC Programmes, I needed to either
include RDFa in all the pages (which, hem, was difficult to do in an hour :-) )
or write an XSLT against our RDF/XML representations, which would just be
Wrong, as there are lots of different
ways to serialise the same RDF in an RDF/XML document.
We also learned about the Guardian Open Platform and Data Store, which holds a huge amount of interesting information. The license terms are also really permissive, even allowing commercial uses of this data. I can't even imagine how useful this data would be if it were linked to other open datasets, e.g. DBpedia, Geonames or Eurostat.
I got also a bit confused by YQL, which seems to be really similar to SPARQL, at least in the underlying concept ("a query language for the web"). However, it seems to be backed by lots of interesting data: almost all of Yahoo services, and a few third-party wrappers, e.g. for Last.fm. I wonder how hard it would be to write a SPARQL end-point that would wrap YQL queries?
Finally, on Saturday evening and Sunday morning, we got some time to actually hack :-) Kurt made a nice MySpace hack, which does an artist lookup on MySpace using BOSS and exposes relevant information extracted using the DBTune RDF wrapper, without having to look at an overloaded MySpace page. It uses the Yahoo Media Player to play the audio files this page links to.
At the same time, we got around to try out some of the things that can be built using the linked data we publish at the BBC, especially the segment RDF I announced on the linked data mailing list a couple of weeks ago. We built a small application which, from a place, gives you BBC programmes that feature an artist that is related in some way to that place. For example, Cardiff, Bristol, London or Lancashire. It might be bit slow (and the number of results are limited) as I didn't have time to implement any sort of caching. The application is crawling from DBpedia to BBC Music to BBC Programmes at each request. I just put the (really hacky) code online.
And we actually won the Backstage price with these hacks! :-)
This last hack illustrates to some extent the things we are investigating as part of the BBC use-cases of the NoTube project. Using these rich connections between things (programmes, artists, events, locations, etc.), it begins to be possible to provide data-rich recommendations backed by real stories (and not only "if you like this, you may like that"). I mentioned these issues in the last chapter of my thesis, and will try to follow up on that here!