I just got back from Beijing (I did a two weeks trip around China after the actual conference), where I attended the Linked Data on the Web workshop and the WWW conference.

The workshop was really good, gathering lots of people from the Linking Open Data community (it was the first time I met most of these people, after more than one year working with them :-) ).

The attendance was much higher than expected, with around 100 people registered for the workshop.


It started well with this sentence by Tim Berners-Lee in the workshop introduction:

Linked Data is the Semantic Web done right, and the Web done right.

That's a pretty good way to start a day :-) Then, Chris Bizer did a good overview of what the community has achieved in one year, illustrated by the different versions of Richard's diagram:


All the talks and papers were extremely high quality. I got particularly interested by some of them, including Tim's presentation on the new SPARQL/Update capabilities of the Tabulator data browser. This allows easy interaction with data wikis, where everyone can add or correct information.


I really liked Alexandre Passant's presentation on the Flickr exporter, which is highlighting a mechanism that I used for the Last.fm linked data exporter: linking several identities on several web-sites is just a owl:sameAs link away. Alexandre also did another presentation on MOAT (Meaning of a Tag), a really interesting project allowing to relate tags to Semantic Web URIs. For example, it allows to easily draw a link between my tag "paris texas" to the movie Paris, Texas in DBpedia.

I got a bit confused by Paul Miller's presentation about licensing open data. I have been aware of these efforts mainly by the work of the Open Knowledge Foundation and the Open Data Commons project, and I think these are truly crucial issues: we need open data, and explicit licensing. But perhaps the audience was not so well chosen: most (if not all) of us in the Linking Open Data community do not own the data they publish as RDF and interlink. DBpedia exports data extracted from Wikipedia, DBTune exports data from different music-related sources such as Jamendo or Last.fm, etc. The only data that we can possibly explicitly license are links (the only thing we actually own), and it does not have any values without any data :-) So I guess the outreach should mainly be done to raw data publishers rather than Semantic Web translators? But hopefully, in a near future, the two communities will be the same!


One of my personal highlights was also Christian Becker's presentation about DBpedia mobile: a location-enabled linked data browser for mobile devices., giving you nearby sights and detailed descriptions, restaurants, hotels, etc. We chatted a bit after the workshop with Alexandre and Christian about adding Last.fm events to the DBtune exporter to also display nearby gigs (with optional filtering based on your foaf:interests, of course :-) ).

Jun Zhao's presentation about linked data and provenance for biological resources was extremely interesting: they are dealing with problems strongly similar to ours in a Music Information Retrieval context. How to trust a particular statement (for example, a structural segmentation of a particular track) found on the web? We need to know whether it was written by a human, or derived through a set of algorithms, and in this case, we might want to choose timbre-based instead of chroma-based workflows in the case of Rock music, for example. This is the sort of things we implemented within our Henry software (more to come on that later, including online demo as soon as I put it on better hardware, and (hopefully) a PhD :-D ).

Wolfgang Halb did a presentation about our Riese project, but more on that later as I wrote the back-end software powering it and I'd like to give it a full blog entry soon.

I did a presentation about automatic interlinking algorithms on the data web, with a focus on music-related datasets. I detailed an algorithm we developed for this purpose, propagating similarity measures around web data as long as we can't take an interlinking (creating a bunch of owl:sameAs links) decision. This algorithm is good in the sense that it gives a really low rate of false-positives. On the test-set detailed in the paper, it made no wrong decisions. I blogged about this algorithm earlier.


Some people expressed concerns about the proliferation of owl:sameAs links (highlighted in this presentation by Paolo Bouquet). But I truly think it is a necessary thing, as long as web identifiers are tied to their actual representation. I need to be able to have a web identifier for a song in Jamendo and a web identifier for the same song in Musicbrainz, and I need a way to link these together: owl:sameAs is perfect for that. I wouldn't trust a centralised "identity" system (what actually is identity anyway? :-) ), as it would break the nice decentralised information paradigm we're implementing within the Linking Open Data project.

Anyway, lots of great people, a great time, lots of interesting discussions and new ideas... I am really looking forward for WWW 2009 in Madrid and the next workshop!!!