GNAT 0.1 released
By Yves on Thursday 30 August 2007, 13:39 - Permalink
Chris Sutton and I did some work since the first release of GNAT, and it is now in a releasable state!
You can get it here.
What does it do?
As mentioned in my previous blog post, GNAT is a small software able to link your personal music collection to the Semantic Web. It will find dereferencable identifiers available somewhere on the web for tracks in your collection. Basically, GNAT crawls through your collection, and try by several means to find the corresponding Musicbrainz identifier, which is then used to find the corresponding dereferencable URI in Zitgist. Then, RDF/XML files are put in the corresponding folder:
/music /music/Artist1 /music/Artist1/AlbumA/info_metadata.rdf /music/Artist1/AlbumA/info_fingerprint.rdf /music/Artist1/AlbumB/info_metadata.rdf /music/Artist1/AlbumB/info_fingerprint.rdf
What next?
These files hold a number of
<http://zitgist.com/music/track/...> mo:available_as <local
file> statements. These files can then be used by a tool such as
GNARQL
(which will be properly released next week), which swallows them, exposes a
SPARQL end point, and provides some linked
data crawling facilities (to gather more information about the artists in
our collection, for example), therefore allowing to use the links pictured here
(yes, sorry, I didn't know how to introduce properly the new linking-open-data schema - it looks good! and
keeps on growing!:-) ):

Two identification strategies
GNAT can use two different identification strategies:
- Metadata lookup: in this mode, only available tags are used to identify the track. We chose an identification algorithm which is slower (although if you try to identify, for example, a collection with lots of releases, you won't notice it too much, as only the first track of a release will be slower to identify), but works a bit better than Picard' metadata lookup. Basically, the algorithm used is the same as the one I used to link the Jamendo dataset to the Musicbrainz one.
- Fingerprinting: in this mode, the Music IP fingerprinting client is used in order to find a PUID for the track, which is then used to get back to the Musicbrainz ID. This mode is obviously better when the tags are crap :-)
- The two strategies can be run in parallel, and most of the times, the best identification results are obtained when combining the two...
Usage
- To perform a metadata lookup for the music collection available at
/music:
./AudioCollection.py metadata /music
- To perform a fingerprint-based lookup for the music collection available at
/music:
./AudioCollection.py fingerprint /music
- To clean every previously performed identifications:
./AudioCollection.py clean /music
Dependencies
- MOPY (included) - Music Ontology PYthon interface
- genpuid (included) - MusicIP fingerprinting client
- rdflib -
easy_install rdflib - mutagen -
easy_install mutagen - Musicbrainz2 - You need a version later than 02.08.2007 (sorry)
Comments
You provided wrong address for the rdflib. Correct one is http://rdflib.net/
Ooops, sorry - should be fixed now.
Thanks for noticing that!
Hey Yves, if you haven't already, take a look at http://developer.songbirdnest.com/d... and http://wiki.songbirdnest.com/index....
It'd be really neat to have:
1. GANT which does all of the crawling and tagging in the background
2. A small webservice which you host locally (provides a neat wrapper around a SPARQL endpoint)
3. A songbird extension to interact with that webservice ('tag currently playing file', 'find all tracks by this artist')
Ok, I'm on it :-) Can you drop me an email at yves _at_ dbtune _dot_ org, so that we can keep in touch about it?
Songbird is really, really great...
Cheers,
y