DBTune blog

To content | To menu | To search

Tag - motools

Entries feed

Wednesday 6 February 2008

Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird

Today, I made a small screencast about mixing the following ingredients:

All of that was extremely easy to set up (it actually took me more time to figure out how to make a screencast on a Linux box :-) which I finally did using vnc2swf). Basically, just some tweaked configuration files for ClioPatria, and a small CSS hack, and that was it...

The result is there:

Songbird, Linked Data, Mazzle and Jamendo

(Note that only a few Jamendo artists are displayed now... Otherwise, Google Maps would just crash my laptop :-) ).

Friday 16 November 2007

Sindice module for SWI-Prolog

I just committed in the motools sourceforge project a small SWI-Prolog module for accessing the Sindice Semantic Web search engine.

Just about 20 lines of code, and it handles Sindice URI lookups (find documents referencing a particular URI), and keyword lookups (find documents mentioning a similar literal). I guess it sort of proves how well designed the Sindice service is!

Anyway, a typical SWI-Prolog session making use of this module would look like:

?- use_module(sindice).
Yes
?- sindice_q(uri('http://dbtune.org/jamendo/artist/5')).
% Parsed "http://sindice.com/query/lookup?uri=http%3a%2f%2fdbtune.org%2fjamendo%2fartist%2f5" in 0.00 sec; 2 triples
Yes
?- sindice_r(Q,U).
Q = 'http://sindice.com/query/lookup?uri=http://dbtune.org/jamendo/artist/5',
U = 'http://dbtune.org:2105/sparql/?query=describe <http://dbtune.org/jamendo/artist/5>' ;

Q = 'http://sindice.com/query/lookup?uri=http://dbtune.org/jamendo/artist/5',
U = 'http://moustaki.org/resources/jamendo_mbz.rdf'

Then, up to you to crawl further (using rdf_load). By replacing uri by keyword, a keyword search is performed, which results are accessible in the same way.

Thursday 30 August 2007

GNAT 0.1 released

Chris Sutton and I did some work since the first release of GNAT, and it is now in a releasable state!

You can get it here.

What does it do?

As mentioned in my previous blog post, GNAT is a small software able to link your personal music collection to the Semantic Web. It will find dereferencable identifiers available somewhere on the web for tracks in your collection. Basically, GNAT crawls through your collection, and try by several means to find the corresponding Musicbrainz identifier, which is then used to find the corresponding dereferencable URI in Zitgist. Then, RDF/XML files are put in the corresponding folder:

/music
/music/Artist1
/music/Artist1/AlbumA/info_metadata.rdf
/music/Artist1/AlbumA/info_fingerprint.rdf
/music/Artist1/AlbumB/info_metadata.rdf
/music/Artist1/AlbumB/info_fingerprint.rdf

What next?

These files hold a number of <http://zitgist.com/music/track/...> mo:available_as <local file> statements. These files can then be used by a tool such as GNARQL (which will be properly released next week), which swallows them, exposes a SPARQL end point, and provides some linked data crawling facilities (to gather more information about the artists in our collection, for example), therefore allowing to use the links pictured here (yes, sorry, I didn't know how to introduce properly the new linking-open-data schema - it looks good! and keeps on growing!:-) ):

Linking-Open-Data

Two identification strategies

GNAT can use two different identification strategies:

  • Metadata lookup: in this mode, only available tags are used to identify the track. We chose an identification algorithm which is slower (although if you try to identify, for example, a collection with lots of releases, you won't notice it too much, as only the first track of a release will be slower to identify), but works a bit better than Picard' metadata lookup. Basically, the algorithm used is the same as the one I used to link the Jamendo dataset to the Musicbrainz one.
  • Fingerprinting: in this mode, the Music IP fingerprinting client is used in order to find a PUID for the track, which is then used to get back to the Musicbrainz ID. This mode is obviously better when the tags are crap :-)
  • The two strategies can be run in parallel, and most of the times, the best identification results are obtained when combining the two...

Usage

  • To perform a metadata lookup for the music collection available at /music:

./AudioCollection.py metadata /music

  • To perform a fingerprint-based lookup for the music collection available at /music:

./AudioCollection.py fingerprint /music

  • To clean every previously performed identifications:

./AudioCollection.py clean /music

Dependencies

  • MOPY (included) - Music Ontology PYthon interface
  • genpuid (included) - MusicIP fingerprinting client
  • rdflib - easy_install rdflib
  • mutagen - easy_install mutagen
  • Musicbrainz2 - You need a version later than 02.08.2007 (sorry)

Wednesday 23 May 2007

Find dereferencable URIs for tracks in your personal music collection

Things are moving fast, since my last post. Indeed, Frederick just put online the Musicbrainz RDF dump, with dereferencable URIs, SPARQL end-point, everything. Great job Fred!!

This data set will surely be a sort of hub for music-related data on the Semantic Web, as it gives URIs for a large number of artists, tracks, albums, but also timelines, performances, recordings, etc. Well, almost everything defined in the Music Ontology.

I am happy to announce the first hack using this dataset:-) This is called GNAT (for GNAT is not a tagger). It is just some lines of python code which, from an audio file in your music collection, gives you the corresponding dereferencable URI.

It also puts this URI into the ID3v2 Universal File Identifier (UFID) frame. I am not sure it is the right place to put such an information though, as it is an identifier of the manifestation, not the item iself. Maybe I should use the user-defined link frames in the ID3v2 header...

So it is actually the first step of the application mentioned here!

It is quite easy to use:

$ python trackuri.py 7-don\'t_look_back.mp3

 - ID3 tags

Artist:  Artemis
Title:  Don't Look Back
Album:  Undone


 - Zitgist URI

http://zitgist.com/music/track/2b78923b-c260-44c1-b333-2caa020df172

Then:

$ eyeD3 7-don\'t_look_back.mp3

7-don't_look_back.mp3   [ 3.23 MB ]
--------------------------------------------------------------------------------
Time: 3:31      MPEG1, Layer III        [ 128 kb/s @ 44100 Hz - Stereo ]
--------------------------------------------------------------------------------
ID3 v2.4:
title: Don't Look Back          artist: Artemis
album: Undone           year: 2000
track: 7                genre: Trip-Hop (id 27)
Unique File ID: [http://zitgist.com/music/] http://zitgist.com/music/track/2b78923b-c260-44c1-b333-2caa020df172
Comment: [Description: http] [Lang: ]
//www.magnatune.com/artists/artemis
Comment: [Description: ID3v1 Comment] [Lang: XXX]
From www.magnatune.com

You can also output the corresponding RDF, in RDF/XML or N3:

$ python trackuri.py 1-i\'m_alive.mp3 xml
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:_3="http://purl.org/ontology/mo/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description 
rdf:about=
    "http://zitgist.com/music/track/67a1fab6-aea4-47f4-891d-6d42bb856a40">
    <_3:availableAs rdf:resource=""/>
  </rdf:Description>
</rdf:RDF>
$ python trackuri.py 1-i\'m_alive.mp3 n3

@prefix _3: <http://zitgist.com/music/track/67>.
@prefix _4: <http://purl.org/ontology/mo/>.

 _3:a1fab6-aea4-47f4-891d-6d42bb856a40 _4:availableAs <>. 

... even though I still have to put the good Item URI, instead of <>.

Get it!

You can download the code here, and it is GPL licensed.

The dependencies are:

  • python-id3
  • python-musicbrainz2
  • RDFLib (easy_install -U rdflib)
  • mutagen (easy_install -U mutagen)