DBTune blog

To content | To menu | To search

Tag - swi-prolog

Entries feed

Thursday 29 January 2009

Prolog message queue

It's been a long time since I last posted anything here, but things have been pretty hectic recently (I am a doctor, now!! I'll post my thesis here soon).

I've just hacked a really small implementation of an HTTP-driven SWI-Prolog message queue. I've often find myself doing quite expensive computation in Prolog and the best way to easily distribute it is to have a message queue on which you post messages to process (in that case, Prolog terms), and a pool of workers pick messages and process them. Then, if you find your program is still too slow, you can easily add a couple of workers to help going faster.

Wednesday 12 December 2007

HENRY: A small N3 parser/reasoner for SWI-Prolog

Yesterday, I finally took some time to package the little hack I've written last week, based on the SWI N3 parser I wrote back in September.

Update: A newer version with lots of bug fixes is available there.

It's called Henry, it is really small, hopefully quite understandable, and it does N3 reasoning. The good thing is that you can embed such reasoning in the SWI Semantic Web Server, and then access a N3-entailed RDF store using SPARQL.

How to use it?

Just get this tarball, extract it, and make sure you have SWI-Prolog installed, with its Semantic Web library.

Then, the small swicwm.pl script provides a small command-line tool to test it (roughly equivalent, in CWM terms, to cwm $1 -think -data -rdf).

Here is a simple example (shipped in the package, along other funnier examples).

The file uncle.n3 holds:

prefix : <http://example.org/> .
:yves :parent :fabienne.
:fabienne :brother :olivier.
{?c :parent ?f. ?f :brother ?u} => {?c :uncle ?u}.

And:

$ ./swicwm.pl examples/uncle.n3

<!DOCTYPE rdf:RDF [
    <!ENTITY ns1 'http://example.org/'>
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
]>

<rdf:RDF
    xmlns:ns1="&ns1;"
    xmlns:rdf="&rdf;"
>
<rdf:Description rdf:about="&ns1;fabienne">
  <ns1:brother rdf:resource="&ns1;olivier"/>
</rdf:Description>

<rdf:Description rdf:about="&ns1;yves">
  <ns1:parent rdf:resource="&ns1;fabienne"/>
  <ns1:uncle rdf:resource="&ns1;olivier"/>
</rdf:Description>

</rdf:RDF>

How does it work?

Henry is built around my SWI N3 parser, which basically translates the N3 in a quad form, that can be stored in the SWI RDF store. The two tricks to reach such a representation are the following:

  • Each formulae is seen as a named graph, identified by a blank node (there exists a graph, where the following is true...);
  • Universal quantification is captured through a specific set of atoms (a bit like __bnode1 captures an existentially quantified variable).

For example:

prefix : <http://example.org/> .
{?c :parent ?f. ?f :brother ?u} => {?c :uncle ?u}.

would be translated to:

subject||predicate||object||graph
__uqvar_c||http://example.org/parent||__uqvar_f||__graph1
__uqvar_f||http://example.org/brother||__uqvar_u||__graph1
__uqvar_c||http://example.org/uncle||__uqvar_u||__graph2
__graph1||log:implies||__graph2||uncle.n3

Then, when running the compile predicate, such a representation is translated into a bunch of Prolog clauses, such as, in our example:

rdf(C,'http://example.org/uncle',U) :- rdf(C,'http://example.org/parent',F), rdf(F,'http://example.org/brother',U).

Such rules are defined in a particular entailment module, allowing it to be plugged in the SWI Semantic Web server. Of course, rules can get into an infinite loop, and this will make the whole thing crash :-)

I tried to make the handling of lists and builtins as clear and simple as possible. Defining a builtin is as simple as registering a new predicate, associating a particular URI to a prolog predicate (see builtins.pl for an example).

An example using both lists and builtins is the following one. By issuing this SPARQL query to a N3-entailed store:

PREFIX list: <http://www.w3.org/2000/10/swap/list#>

SELECT ?a
WHERE
{?a list:in ("a" "b" "c")}

You will get back a, b and c (you have no clue how much I struggled to make this thing work :-) ).

But take care!

Of course, there are lots of bugs and issues, and I am sure there are lots of cases where it'll crash and burn :-) Anyway, I hope it will be useful.

Friday 16 November 2007

Sindice module for SWI-Prolog

I just committed in the motools sourceforge project a small SWI-Prolog module for accessing the Sindice Semantic Web search engine.

Just about 20 lines of code, and it handles Sindice URI lookups (find documents referencing a particular URI), and keyword lookups (find documents mentioning a similar literal). I guess it sort of proves how well designed the Sindice service is!

Anyway, a typical SWI-Prolog session making use of this module would look like:

?- use_module(sindice).
Yes
?- sindice_q(uri('http://dbtune.org/jamendo/artist/5')).
% Parsed "http://sindice.com/query/lookup?uri=http%3a%2f%2fdbtune.org%2fjamendo%2fartist%2f5" in 0.00 sec; 2 triples
Yes
?- sindice_r(Q,U).
Q = 'http://sindice.com/query/lookup?uri=http://dbtune.org/jamendo/artist/5',
U = 'http://dbtune.org:2105/sparql/?query=describe <http://dbtune.org/jamendo/artist/5>' ;

Q = 'http://sindice.com/query/lookup?uri=http://dbtune.org/jamendo/artist/5',
U = 'http://moustaki.org/resources/jamendo_mbz.rdf'

Then, up to you to crawl further (using rdf_load). By replacing uri by keyword, a keyword search is performed, which results are accessible in the same way.

Thursday 23 August 2007

Small Musicbrainz library for SWI-Prolog

I just put online a really small SWI-Prolog module, allowing to do some queries on the Musicbrainz web service. It provides the following predicates:

  • find_artist_id(+Name,-ID,-Score), which find artist ids given a name, along with a Lucene score
  • find_release_id(+Name,-ID,-Score), which provides the same thing for a release
  • find_track_id(+Name,-ID,-Score), same thing for a track

I wrote only three predicates, because to identify a track, I often found the best way was not to do one single Musicbrainz query with the track name, the artist name, and the release name if it is available, but to do the following:

* Try to identify the artist
* For each artist found, try to identify the release (if it's available)
* For each release try to identify the track

(Which is in fact really similar to what I have done for linking the Jamendo dataset to the Musicbrainz one).

Indeed, when you do a single query, it seems like the Musicbrainz web service does an exact match on the extra arguments, which fails if the album or the artist is badly spelled. And I did not succeed to write a good Lucene query that was doing the identification with such accuracy... I will detail that a bit when the next generation GNAT is in a releasable state:) But well, take care you do not flood the Musicbrainz web service! No more than one query per second!

Thursday 5 July 2007

Specification generation script

I just put online a small Prolog script, allowing to generate XHTML specification out of a RDF vocabulary (it should work for both RDFS and OWL). It is really similar to the Python script the SIOC community uses (thanks Uldis for the code:-) ), regarding the external behavior of the script. It provides a few enhancement though, like support of the status of terms, OWL constructs, and a few other things. You can check the Music Ontology specification to see one example output, generated from a RDFS/OWL file.

Here is the script.

Saturday 26 May 2007

Linking open data: publishing and linking the Jamendo dataset

Some weeks ago, I released a linked data representation of the Jamendo dataset, a large collection of Creative Commons licensed songs, according to the Music Ontology.

I had some experience with publishing such datasets, through the dump of the Magnatune collection, which I have done through D2R Server, and this D2RQ mapping. The Magnatune dump, through the publishingLocation property, is linked to the dbpedia dataset. Well, it was in fact really easy: the geographical location in the Magnatune database is just a string: France, USA, etc. And the dbpedia URIs I am linking to are just a plain concatenation of such strings and http://dbpedia.org/resource/. All of that (pointing towards custom URI patterns) can be done quite easily through D2R.

However, it was a bit more difficult for the Jamendo dataset...

  • They release their dump in some custom XML schema, and their database is evolving quite fast, so in order to be up-to-date, you have to query their API, which makes it difficult to use a relational database publishing approach.
  • Geographical information is also represented as a string, but it could be France (75) (for Paris, France), Madrid, Spain, etc., which makes it difficult to find a canonical way of constructing dbpedia or Geonames URIs.

Therefore, I released a small program, P2R, making use of a declarative mapping to export a SWI-Prolog knowledge base on the Semantic Web.

With Prolog as a back-end, you can do a lot more stuff than with a plain relational database. I'll try to give an example of this, by describing how I have done to link the Jamendo dataset to the Geonames one.

Prolog-to-RDF

P2R handles declarative mappings associating a Prolog term (just a plain predicate, or a logical formulae combining some predicates) to a set of RDF triples. The resulting RDF is made available through a SPARQL end-point.

For example, the following example maps the predicate artist_dispname to {<artist uri> foaf:name "name"^^xsd:string.}:

match:
        (artist_dispname(Id,Name))
                eq
        [
                rdf(pattern(['http://dbtune.org/jamendo/resource/artist/',Id]),foaf:name,literal(type('http://www.w3.org/2001/XMLSchema#string',Name)))
        ].

Then, when the SPARQL end-point processes a triple pattern such as:

<http://dbtune.org/jamendo/resource/artist/5> foaf:name ?name.

It will bind the term ID to 5, and try to prove artist_dispname(5,Name). This predicate will in fact be defined by the following:

artist_dispname(Id,Name) IF 
        query Jamendo API for names associated to Id AND
        Name is one of these names

(or, instead of querying Jamendo API, it can just parse the XML dump).

Therefore, it will query the Jamendo API, bind Name to the name of the artist, and send back a binding between ?name and "both"^^xsd:string. If the subject was ?artist in our query, we would have retrieved every pairs of artist URI / name.

You then have a SPARQL end point able to answer such queries by asking Jamendo API.

UriSpace

Then, all you have to do is to redirect every URI in your URI space (here, http://dbtune.org/jamendo/resource/) to DESCRIBE queries on the SPARQL end-point that P2R exposes.

I published another piece of code that does the trick, UriSpace, also through a declarative mapping

Linking the Jamendo data set to the Geonames one

As we saw earlier, it is not possible to directly construct an URI from a string denoting a geographical location in the Jamendo dataset. But well, we are not limited on what we can do inside our mappings! Here is the part of the P2R mapping that exposes the foaf:based_near property:

match:
        (artist_geo(Id,GeoString),geonames(GeoString,URI))
                eq
        [
                rdf(pattern(['http://dbtune.org/jamendo/resource/artist/',Id]),foaf:based_near,URI)
        ].

Where, in fact, the geonames(GeoString,URI) predicate is defined as:

geonames(GeoString,URI) IF
        clean GeoString (remove "(" and ")", basically) AND
        query Geonames web service to retrieve the first matching URI with GeoString

And it is done! Now, you can see the link to the Geonames dataset, when getting a Jamendo artist URI:

$ curl -L -H "Accept: application/rdf+xml" http://dbtune.org/jamendo/resource/artist/5
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE rdf:RDF [
    <!ENTITY foaf 'http://xmlns.com/foaf/0.1/'>
    <!ENTITY mo 'http://purl.org/ontology/mo/'>
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'>
]>
<rdf:RDF
    xmlns:foaf="&foaf;"
    xmlns:mo="&mo;"
    xmlns:rdf="&rdf;"
    xmlns:xsd="&xsd;"
>
<mo:MusicArtist rdf:about="http://dbtune.org/jamendo/resource/artist/5">
  <foaf:made rdf:resource="http://dbtune.org/jamendo/resource/record/174"/>
  <foaf:made rdf:resource="http://dbtune.org/jamendo/resource/record/33"/>
  <foaf:based_near rdf:resource="http://sws.geonames.org/2991627/"/>
  <foaf:homepage rdf:resource="http://www.both-world.com"/>
  <foaf:img rdf:resource="http://img.jamendo.com/artists/b/both.jpg"/>
  <foaf:name rdf:datatype="&xsd;string">Both</foaf:name>
</mo:MusicArtist>

<rdf:Description rdf:about="http://dbtune.org/jamendo/resource/record/174">
  <foaf:maker rdf:resource="http://dbtune.org/jamendo/resource/artist/5"/>
</rdf:Description>

<rdf:Description rdf:about="http://dbtune.org/jamendo/resource/record/33">
  <foaf:maker rdf:resource="http://dbtune.org/jamendo/resource/artist/5"/>
</rdf:Description>

</rdf:RDF>

And you can plot some Jamendo artists on a map, using the Tabulator generic data browser.

Some Jamendo artists on a map, using the Tabulator