Some weeks ago, I released a linked data representation of the Jamendo dataset, a large collection of Creative Commons licensed songs, according to the Music Ontology.

I had some experience with publishing such datasets, through the dump of the Magnatune collection, which I have done through D2R Server, and this D2RQ mapping. The Magnatune dump, through the publishingLocation property, is linked to the dbpedia dataset. Well, it was in fact really easy: the geographical location in the Magnatune database is just a string: France, USA, etc. And the dbpedia URIs I am linking to are just a plain concatenation of such strings and http://dbpedia.org/resource/. All of that (pointing towards custom URI patterns) can be done quite easily through D2R.

However, it was a bit more difficult for the Jamendo dataset...

  • They release their dump in some custom XML schema, and their database is evolving quite fast, so in order to be up-to-date, you have to query their API, which makes it difficult to use a relational database publishing approach.
  • Geographical information is also represented as a string, but it could be France (75) (for Paris, France), Madrid, Spain, etc., which makes it difficult to find a canonical way of constructing dbpedia or Geonames URIs.

Therefore, I released a small program, P2R, making use of a declarative mapping to export a SWI-Prolog knowledge base on the Semantic Web.

With Prolog as a back-end, you can do a lot more stuff than with a plain relational database. I'll try to give an example of this, by describing how I have done to link the Jamendo dataset to the Geonames one.

Prolog-to-RDF

P2R handles declarative mappings associating a Prolog term (just a plain predicate, or a logical formulae combining some predicates) to a set of RDF triples. The resulting RDF is made available through a SPARQL end-point.

For example, the following example maps the predicate artist_dispname to {<artist uri> foaf:name "name"^^xsd:string.}:

match:
        (artist_dispname(Id,Name))
                eq
        [
                rdf(pattern(['http://dbtune.org/jamendo/resource/artist/',Id]),foaf:name,literal(type('http://www.w3.org/2001/XMLSchema#string',Name)))
        ].

Then, when the SPARQL end-point processes a triple pattern such as:

<http://dbtune.org/jamendo/resource/artist/5> foaf:name ?name.

It will bind the term ID to 5, and try to prove artist_dispname(5,Name). This predicate will in fact be defined by the following:

artist_dispname(Id,Name) IF 
        query Jamendo API for names associated to Id AND
        Name is one of these names

(or, instead of querying Jamendo API, it can just parse the XML dump).

Therefore, it will query the Jamendo API, bind Name to the name of the artist, and send back a binding between ?name and "both"^^xsd:string. If the subject was ?artist in our query, we would have retrieved every pairs of artist URI / name.

You then have a SPARQL end point able to answer such queries by asking Jamendo API.

UriSpace

Then, all you have to do is to redirect every URI in your URI space (here, http://dbtune.org/jamendo/resource/) to DESCRIBE queries on the SPARQL end-point that P2R exposes.

I published another piece of code that does the trick, UriSpace, also through a declarative mapping

Linking the Jamendo data set to the Geonames one

As we saw earlier, it is not possible to directly construct an URI from a string denoting a geographical location in the Jamendo dataset. But well, we are not limited on what we can do inside our mappings! Here is the part of the P2R mapping that exposes the foaf:based_near property:

match:
        (artist_geo(Id,GeoString),geonames(GeoString,URI))
                eq
        [
                rdf(pattern(['http://dbtune.org/jamendo/resource/artist/',Id]),foaf:based_near,URI)
        ].

Where, in fact, the geonames(GeoString,URI) predicate is defined as:

geonames(GeoString,URI) IF
        clean GeoString (remove "(" and ")", basically) AND
        query Geonames web service to retrieve the first matching URI with GeoString

And it is done! Now, you can see the link to the Geonames dataset, when getting a Jamendo artist URI:

$ curl -L -H "Accept: application/rdf+xml" http://dbtune.org/jamendo/resource/artist/5
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE rdf:RDF [
    <!ENTITY foaf 'http://xmlns.com/foaf/0.1/'>
    <!ENTITY mo 'http://purl.org/ontology/mo/'>
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'>
]>
<rdf:RDF
    xmlns:foaf="&foaf;"
    xmlns:mo="&mo;"
    xmlns:rdf="&rdf;"
    xmlns:xsd="&xsd;"
>
<mo:MusicArtist rdf:about="http://dbtune.org/jamendo/resource/artist/5">
  <foaf:made rdf:resource="http://dbtune.org/jamendo/resource/record/174"/>
  <foaf:made rdf:resource="http://dbtune.org/jamendo/resource/record/33"/>
  <foaf:based_near rdf:resource="http://sws.geonames.org/2991627/"/>
  <foaf:homepage rdf:resource="http://www.both-world.com"/>
  <foaf:img rdf:resource="http://img.jamendo.com/artists/b/both.jpg"/>
  <foaf:name rdf:datatype="&xsd;string">Both</foaf:name>
</mo:MusicArtist>

<rdf:Description rdf:about="http://dbtune.org/jamendo/resource/record/174">
  <foaf:maker rdf:resource="http://dbtune.org/jamendo/resource/artist/5"/>
</rdf:Description>

<rdf:Description rdf:about="http://dbtune.org/jamendo/resource/record/33">
  <foaf:maker rdf:resource="http://dbtune.org/jamendo/resource/artist/5"/>
</rdf:Description>

</rdf:RDF>

And you can plot some Jamendo artists on a map, using the Tabulator generic data browser.

Some Jamendo artists on a map, using the Tabulator