I just got back from Beijing (I did a two weeks trip around China after the
actual conference), where I attended the Linked Data on the Web workshop
and the WWW conference.
The workshop was really good, gathering lots of people from the Linking Open Data community (it was the first time
I met most of these people, after more than one year working with them :-)
).
The attendance was much higher than expected, with around 100 people
registered for the workshop.

It started well with this sentence by Tim Berners-Lee in the workshop
introduction:
Linked Data is the Semantic Web done right, and the Web done right.
That's a pretty good way to start a day :-) Then,
Chris Bizer did a good overview of what the community has achieved in one
year, illustrated by the different versions of Richard's diagram:

All the talks and papers were extremely high quality. I got particularly
interested by some of them, including
Tim's presentation on the new SPARQL/Update capabilities
of the Tabulator data browser.
This allows easy interaction with data wikis, where everyone
can add or correct information.

I really liked Alexandre Passant's
presentation on the Flickr
exporter, which is highlighting a mechanism that I used for the Last.fm linked data exporter: linking several
identities on several web-sites is just a owl:sameAs link away.
Alexandre also did another
presentation on MOAT (Meaning of a
Tag), a really interesting project allowing to relate tags to Semantic Web
URIs. For example, it allows to easily draw a link between my tag "paris texas"
to the movie Paris, Texas
in DBpedia.
I got a bit confused by Paul Miller's
presentation about licensing open data. I have been aware of these efforts
mainly by the work of the Open Knowledge
Foundation and the Open Data
Commons project, and I think these are truly crucial issues: we need open
data, and explicit licensing. But perhaps the audience was not
so well chosen: most (if not all) of us in the Linking Open Data community do
not own the data they publish as RDF and interlink. DBpedia exports data extracted from Wikipedia,
DBTune exports data from different
music-related sources such as Jamendo or
Last.fm, etc. The only data that we can possibly
explicitly license are links (the only thing we actually own), and it does not
have any values without any data :-) So I guess the outreach should mainly be
done to raw
data publishers rather than Semantic Web translators
?
But hopefully, in a near future, the two communities will be the same!

One of my personal highlights was also Christian
Becker's
presentation about DBpedia mobile: a location-enabled linked data browser
for mobile devices., giving you nearby sights and detailed descriptions,
restaurants, hotels, etc. We chatted a bit after the workshop with Alexandre
and Christian about adding Last.fm events to the DBtune exporter to also
display nearby gigs (with optional filtering based on your
foaf:interests, of course :-) ).
Jun
Zhao's presentation about linked data and provenance for biological
resources was extremely interesting: they are dealing with problems strongly
similar to ours in a Music Information Retrieval context. How to trust a
particular statement (for example, a structural segmentation of a particular
track) found on the web? We need to know whether it was written by a human, or
derived through a set of algorithms, and in this case, we might want to choose
timbre-based instead of chroma-based workflows in the case of Rock music, for
example. This is the sort of things we implemented within our Henry software (more to come on that
later, including online demo as soon as I put it on better hardware, and
(hopefully) a PhD :-D ).
Wolfgang Halb did a
presentation about our Riese
project, but more on that later as I wrote the back-end software powering it
and I'd like to give it a full blog entry soon.
I did
a presentation about automatic interlinking algorithms on the data web,
with a focus on music-related datasets. I detailed an algorithm we developed
for this purpose, propagating similarity measures around web data as long as we
can't take an interlinking (creating a bunch of owl:sameAs links)
decision. This algorithm is good in the sense that it gives a
really low rate of false-positives. On the test-set detailed in
the paper, it made no wrong decisions. I blogged about this algorithm
earlier.

Some people expressed concerns about the proliferation of
owl:sameAs links (highlighted in this
presentation by Paolo Bouquet). But I truly think it is a necessary thing,
as long as web identifiers are tied to their actual representation. I need to
be able to have a web identifier for a song in Jamendo and a web identifier for
the same song in Musicbrainz, and I need a way to link these together:
owl:sameAs is perfect for that. I wouldn't trust a centralised
"identity" system (what actually is identity anyway? :-) ), as it would break
the nice decentralised information paradigm we're implementing within the
Linking Open Data project.
Anyway, lots of great people, a great time, lots of interesting discussions
and new ideas... I am really looking forward for WWW 2009 in Madrid and the next workshop!!!