
We went to the Yahoo Hackday this week
end, with a couple of people from the C4DM and the BBC. Apart from a flaky wireless connection on the
Saturday, it was a really great event, with lots of interesting talks and
interesting hacks.
On the Saturday, we learned about Searchmonkey. I tried to create
a small searchmonkey application during the talk, but eventually got
frustrated. Apparently, Searchmonkey indexes RDFa and eRDF , but
doesn't follow <link rel="alternate"/> links towards RDF
representations (neither does it try to do content negotiation). So
in order to create a searchmonkey application for BBC Programmes, I needed to either
include RDFa in all the pages (which, hem, was difficult to do in an hour :-) )
or write an XSLT against our RDF/XML representations, which would just be
Wrong, as there are lots of different
ways to serialise the same RDF in an RDF/XML document.
We also learned about the Guardian Open Platform and Data Store, which holds a huge
amount of interesting information. The license terms are also really
permissive, even allowing commercial uses of this data. I can't even imagine
how useful this data would be if it were linked to other open datasets, e.g.
DBpedia, Geonames or Eurostat.
I got also a bit confused by YQL, which seems to be really similar to
SPARQL, at least in the
underlying concept ("a query language for the web"). However, it seems to be
backed by lots of interesting data: almost all of Yahoo services, and a few
third-party wrappers, e.g. for Last.fm. I wonder
how hard it would be to write a SPARQL end-point that would wrap YQL
queries?
Finally, on Saturday evening and Sunday morning, we got some time to
actually hack :-) Kurt made a
nice MySpace hack,
which does an artist lookup on MySpace using BOSS and exposes relevant
information extracted using the DBTune RDF
wrapper, without having to look at an overloaded MySpace page. It uses the
Yahoo Media Player to play the
audio files this page links to.
At the same time, we got around to try out some of the things that can be
built using the linked data we publish at
the BBC, especially the segment RDF I
announced on the linked data mailing list a couple of weeks ago. We built a
small application which, from a place, gives you BBC programmes that feature an
artist that is related in some way to that place. For example, Cardiff, Bristol, London or Lancashire. It might be bit
slow (and the number of results are limited) as I didn't have time to implement
any sort of caching. The application is crawling from DBpedia to BBC
Music to BBC Programmes at
each request. I just put the (really hacky) code online.
And we actually won the Backstage
price with these hacks! :-)
This last hack illustrates to some extent the things we are investigating as
part of the BBC use-cases of the NoTube
project. Using these rich connections between things (programmes, artists,
events, locations, etc.), it begins to be possible to provide data-rich
recommendations backed by real stories (and not only "if you like this, you may
like that"). I mentioned these issues in the last chapter of my thesis, and will try to follow up on that
here!