There seems to be an overall consensus in the Linking Open Data community that we need a way to describe in RDF the different datasets published and interlinked within the project. There is already a Wiki page detailing some aspects of the corresponding vocabulary, called voiD (vocabulary of interlinked datasets).

One thing I would really like this vocabulary to do would be to describe exactly the inner content of a dataset - what could we find in this SPARQL end-point or in this RDF document? I thought quite a lot about this recently, as I begin to really need that. Indeed, when you have RDF documents describing lots of audio annotations, and which generation is really computation intensive, you want to pick just the one that fits your request. There have been quite a lot of similar efforts in the past. However, most of them rely on one or another sort of reification, which makes it quite hard to actually use.

After some failed tries, I came up with the following, which I hope is easy and expressive enough :-)

It relies on a single property void:example, which links a resource identifying a particular dataset to a small RDF document holding an example of what you could find in that dataset. Then, with just a bit of SPARQL magic, you can easily query for datasets having a particular capability. Easy, isn't it? :-)

Here is a real-world example of that. A first RDF document describes one of the DBtune dataset:

:ds1
        a void:Dataset;
        rdfs:label "Jamendo end-point on DBtune";
        dc:source <http://jamendo.com/>;
        foaf:maker <http://moustaki.org/foaf.rdf#moustaki>;
        void:sparql_end_point <http://dbtune.org:2105/sparql/>;
        void:example <http://moustaki.org/void/jamendo_example.n3>;
        .

The void:example property points towards a small RDF file, giving an example of what you can find within this dataset.

Then, the following SPARQL query asks whether this dataset has a SPARQL end-point and holds information about music records, associated tags, and places to download them.

PREFIX void: <http://purl.org/ontology/void#>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX tags: <http://www.holygoat.co.uk/owl/redwood/0.1/tags/>

ASK
FROM NAMED <http://moustaki.org/void/void.n3>
FROM NAMED <http://moustaki.org/void/jamendo_example.n3>
{
        GRAPH <http://moustaki.org/void/void.n3> {
                ?ds a void:Dataset;
                        void:sparql_end_point ?sparql;
                        void:example ?ex.
        }
        GRAPH ?ex {
                ?r a mo:Record;
                        mo:available_as ?l;
                        tags:taggedWithTag ?t.
        }
}

I tried this query with ARQ, and it works perfectly :-)

$ sparql --query=void.sparql
Ask => Yes

Update: It also works with ARC2. Although it does not load automatically the SPARQL FROM clause. You can try the same query on this SPARQL end-point, which previously loaded the two documents (the voiD description and the example).

Update 2: A nice blog post about automatically generating the data you need for describing an end-point - thanks shellac for the pointer!

Update 3: Following discussion on the #swig IRC channel.