I just put online a really small SWI-Prolog module, allowing to do some queries on the Musicbrainz web service. It provides the following predicates:

  • find_artist_id(+Name,-ID,-Score), which find artist ids given a name, along with a Lucene score
  • find_release_id(+Name,-ID,-Score), which provides the same thing for a release
  • find_track_id(+Name,-ID,-Score), same thing for a track

I wrote only three predicates, because to identify a track, I often found the best way was not to do one single Musicbrainz query with the track name, the artist name, and the release name if it is available, but to do the following:

* Try to identify the artist
* For each artist found, try to identify the release (if it's available)
* For each release try to identify the track

(Which is in fact really similar to what I have done for linking the Jamendo dataset to the Musicbrainz one).

Indeed, when you do a single query, it seems like the Musicbrainz web service does an exact match on the extra arguments, which fails if the album or the artist is badly spelled. And I did not succeed to write a good Lucene query that was doing the identification with such accuracy... I will detail that a bit when the next generation GNAT is in a releasable state:) But well, take care you do not flood the Musicbrainz web service! No more than one query per second!