DBTune blog

To content | To menu | To search

Tag - music-ontology

Entries feed

Monday 28 July 2008

Music Ontology linked data on BBC.co.uk/music

Just a couple of minutes ago on the Music Ontology mailing list, Nicholas Humfrey from the BBC announced the availability of linked data on BBC Music.

$ rapper -o turtle \

   a mo:MusicGroup;
   foaf:name "Coldplay";
   owl:sameAs <http://dbpedia.org/resource/Coldplay>;

This is just really, really, really great... Congratulations to the /music team!

Update: Tom Scott just wrote a really nice post about the new BBC music site, explaining what the BBC is trying to achieve by going down the linked data path.

Tuesday 1 July 2008

Echonest Analyze XML to Music Ontology RDF

I wrote a small XSL stylesheet to transform the XML results of the Echonest Analyze API to Music Ontology RDF. The Echonest Analyze API is a really great (and simple) web service to process audio files and get back an XML document describing some of their features (rhythm, structure, pitch, timbre, etc.). A lot of people already did really great things with it, from collection management to visualisation.

The XSL is available on that page. The resulting RDF can be queried using SPARQL. For example, the following query selects the boundaries of structural segments (chorus, verse, etc.):

PREFIX af: <http://purl.org/ontology/af/>
PREFIX event: <http://purl.org/NET/c4dm/event.owl#>
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#>

SELECT ?start ?duration
FROM <http://dbtune.org/echonest/analyze-example.rdf>
?e      a af:StructuralSegment;
        event:time ?time.
?time   tl:start ?start;
        tl:duration ?duration.

I also added on that page the small bit to add to the Echonest Analyze XML to make it GRDDL-ready. That means that the XML document can be automatically translated to actual RDF data (which can then be aggregated, stored, linked to, queried, etc.).

<Analysis    xmlns:grddl="http://www.w3.org/2003/g/data-view#" 

This provides a lot more data to aggregate for describing my music collection !

If there is one thing I really wish could be integrated in the Echonest API, it would be a Musicbrainz lookup... Right now, I have to manually link the data I get from it to the rest of my aggregated data. If the Echonest results could include a link to the corresponding Musicbrainz resource, it would really simplify this step :-)

Wednesday 14 May 2008

Data-rich music collection management

I just put a live demo of something I showed earlier on this blog.

You can explore the Creative Commons-licensed part of my music collection (mainly coming from Jamendo) using aggregated Semantic Web data.

For example, here is what you get after clicking on "map" on the right-hand side and "MusicArtist" on the left-hand side:

Data-rich music collection management

The aggregation is done using the GNAT and GNARQL tools available in the motools sourceforge project. The data comes from datasets within the Linking Open Data project.The UI is done by the amazing ClioPatria software, with a really low amount of configuration.

An interesting thing is to load this demo into Songbird, as it can aggregate and play the audio as you crawl around.

Check the demo!

Update: It looks like it doesn't work with IE, but it is fine with Opera and FF2 or FF3. If the map doesn't load at first, just try again and it should be ok.

Monday 7 April 2008

D2RQ mapping for Musicbrainz

I just started a D2R mapping for Musicbrainz, which allows to create a SPARQL end-point and to provide linked data access out of Musicbrainz fairly easily. A D2R instance loaded with the mapping as it is now is also available (be gentle, it is running on a cheap computer :-) ).

Added to the things that are available within the Zitgist mapping:

  • SPARQL end point ;
  • Support for tags ;
  • Supports a couple of advanced relationships (still working my way through it, though) ;
  • Instrument taxonomy directly generated from the db, and related to performance events;
  • Support for orchestras ;
  • Linked with DBpedia for places and Lingvoj for languages

There is still a lot to do, though: it is really a start. The mapping is available on the motools sourceforge project. I hope to post a follow-up soon! (including examples of funny SPARQL queries :-) ).

Update: For some obscure port-forwarding reasons, the SNORQL interface to the SPARQL end point does not work on the test server.

Update 2: This is fixed. (thanks to the anonymous SPARQL crash tester which helped me find the bug, by the way :-) )

Thursday 27 March 2008

The Quest for Canonical Spelling in music metadata

Last.fm recently unveiled their new fingerprinting lookup mechanism. They did aggregate quite a lot of fingerprints (650 million) using their fingerprinting software, which is a nice basis for such a lookup, perhaps bringing a viable alternative to Music DNS. I gave it a try (I just had to build a Linux 64 version of the lookup software), and was quite surprised by the results. The quality of the fingerprinting looks indeed good, but here are the results for a particular song:

<?xml version="1.0"?>
<!DOCTYPE metadata SYSTEM "http://fingerprints.last.fm/xml/metadata.dtd">
<metadata fid="281948" lastmodified="1205776219">
<track confidence="0.622890">
    <artist>Leftover Crack</artist>
    <title>Operation: M.O.V.E.</title>
<track confidence="0.327927">
    <artist>Left&ouml;ver Crack</artist>
    <title>Operation: M.O.V.E.</title>
<track confidence="0.007860">
    <artist>Leftover Crack</artist>
    <title>Operation MOVE</title>
<track confidence="0.006180">
    <artist>Leftover Crack</artist>
    <title>Operation M.O.V.E.</title>
<track confidence="0.004883">
    <artist>Leftover Crack</artist>
    <title>Operation; M.O.V.E.</title>
<track confidence="0.004826">
    <artist>Left&ouml;ver Crack</artist>
    <title>Operation M.O.V.E.</title>
<track confidence="0.004717">
    <artist>Left&ouml;ver Crack</artist>
    <title>13 - operation m.o.v.e</title>

And it goes on and on... There are 21 results for this single track, which all actually correspond to this track.

So, what is disturbing me here? After all, the first result holds textual metadata that I could consider as somehow correct (even if that's not the way I would spell this band's name, but they plan to put a voting system to solve this sort of issues).

The real problem is that there are 21 URI in last.fm for the same thing. The emphasis of the last.fm metadata system is then probably on the textual metadata: two different ways of spelling the name of a band = two bands. But I do think it is wrong: for example, how would you handle the fact that the Russian band Ария is spelled Aria in English? The two spellings are correct, and they correspond to one unique band.

In my opinion, the important thing is the identifier. As long as you have one identifier for one single thing (an artist, an album, a track), you're saved. The relationship between a band, an artist, a track, etc. and its label is clearly a one-to-many one: the quest for a canonical spelling will never end... And what worries me even more is that it tends to kill the spellings in all languages but English (especially if a voting system is in place?).

Once you have a single identifier for a single thing within your system, you can start attaching labels to it, perhaps with a language tag. Then, it is up to the presentation layer to show you the label matching your preferences. And if you tend for such a model, Musicbrainz (centralised and moderated) or RDF and the Music Ontology (decentralised and not moderated) are probably the way to go.

I guess this emphasis on textual metadata is mainly due to the ID3 legacy and other embedded metadata format, which allowed just one single title for the track, the album and the artist to be associated with an audio-file?

I think that the real problem for last.fm will now be to match all the different identifiers they have for a single thing in their system, which is known as the record linkage problem in the database/Semantic Web community. But I also think this is not too far-fetched, as they already began to link their database to the Musicbrainz one?

Tuesday 18 March 2008

Describing a recording session in RDF

Danny Ayers just posted a new Talis Platform application idea, dealing with music/audio equipment. As I was wondering it would actually be nice to have a set of web identifiers and corresponding RDF representation for audio equipment, I remembered a small Music Ontology example I wrote about a year ago. In fact, the Music Ontology (along with the Event ontology) is expressive enough to handle the description of recording sessions. Here is a small excerpt of such a description:

@prefix mo: <http://purl.org/ontology/mo/>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix event: <http://purl.org/NET/c4dm/event.owl#>.
@prefix rd: <http://example.org/audioequipment/>.
@prefix : <#>.

:rec a mo:Recording;
   rdfs:label “live recording of my band in studio”;
   event:sub_event :guitar1, :guitar2, :drums1, :kick1, :sing.

:sing a mo:Recording;
   rdfs:label “Voice recorded with a SM57″;
   event:factor rd:sm57;
   event:place [rdfs:label “Middle of the room-I could be more precise here”].

:kick1 a mo:Recording;
   rdfs:label “Kick drum using a Shure PG52″;
   event:factor rd:pg52;
   event:place [rdfs:label “Kick drum microphone location”].

Well, it would indeed by nice if the rd namespace could point to something real! Who would fancy RDFising Harmony Central? :-)

Wednesday 6 February 2008

Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird

Today, I made a small screencast about mixing the following ingredients:

All of that was extremely easy to set up (it actually took me more time to figure out how to make a screencast on a Linux box :-) which I finally did using vnc2swf). Basically, just some tweaked configuration files for ClioPatria, and a small CSS hack, and that was it...

The result is there:

Songbird, Linked Data, Mazzle and Jamendo

(Note that only a few Jamendo artists are displayed now... Otherwise, Google Maps would just crash my laptop :-) ).

Friday 11 January 2008

Your AudioScrobbler data as Linked Data

I just put online a small service, which converts your AudioScrobbler data to RDF, designed using the Music Ontology: it exposes your last 10 scrobbled tracks.

The funny thing is that it links the track, records, and artists to corresponding dereferencable URIs in the Musicbrainz dataset. So the tracks you were last listening to are part of the Data Web!

Just try it by getting this URI:

http://dbtune.org/last-fm/<last.fm username>

For example, mine is:


Of course, by being linked to dereferencable URIs in Musicbrainz, you are able to access the birth dates of the artists you last listened to, or use the links published by DBPedia to plot your last artists played on map, by just crawling the Data Web a little.

Then, you can link that to your FOAF URI. Mine now holds the following statement:

<http://moustaki.org/foaf.rdf#moustaki> owl:sameAs <http://dbtune.org/last-fm/moustaki>.

Now, my URI looks quite nice, in the Tabulator generic data browser!

Me and my scrobbles

Tuesday 30 October 2007

Specifications for the Event and the Timeline ontologies

It has been a long time since my last post, but I was busy traveling (ISMIR 2007, ACM Multimedia, and AES), and also took some holidays afterwards (first ones since last Xmas... it was great :-) ).

Anyway, in my slow process of getting back to work, I finally wrote specification documents for the Timeline ontology and the Event ontology, that Samer Abdallah and I worked on three years ago. These are really early documentation draft though, and might be a bit unclear, don't hesitate to send me comments about them!

The Timeline ontology, extending some OWL-Time concepts, allows to address time points and intervals on multiple timelines, backing signals, video, performances, works, scores, etc. For example, using this ontology, you can express "from 1 minute and 21 seconds to 1 minutes and 55 seconds on this signal".

Timeline ontology

The Event ontology allows to deal with, well, events. In it, events are seen as arbitrary classification of space/time regions. This definition makes it extremely flexible: it covers everything from music festivals to conferences, meeting notes or even annotations of a signal. It is extremely simple, and defines one single concept (event), and five properties (agent, factor, product, place and time).

Event ontology

The following representations are available for these ontology resources:


$ curl -L -H "Accept: application/rdf+xml" http://purl.org/NET/c4dm/event.owl

  • RDF/Turtle

$ curl -L -H "Accept: text/rdf+n3" http://purl.org/NET/c4dm/event.owl

  • Default (XHTML)

curl -L http://purl.org/NET/c4dm/event.owl

And also, make sure you check out the Chord ontology designed by Chris Sutton, and the associated URI service (eg. A major with an added 7th). All the code (RDF, specification, specification generations cripts, URI parsing, 303 stuff, etc.) is available in the motools sourceforge project.

Tuesday 14 August 2007

New revision of the Music Ontology

The last revision of the Music Ontology (1.12) is finally out - it took indeed some time to get through all the suggested changes on the TODO list! So, what's new in this release?

  • The Instrument concept is now linked to Ivan's taxonomy of Musical Instrument expressed in SKOS, and extracted from the Musicbrainz instrument taxonomy ;
  • Some peer-2-peer related concepts (Bittorrent and ED2K, two new subconcepts of MusicalItem) ;
  • Large amount of URI refactoring for predicates: camelCase becomes camel_case, and nouns are used instead of verbs, to be more N3 compliant. The older predicates are still in the ontology, but marked as deprecated, and declared as being owl:sameAs the newer predicates - so datasets still using the old ones won't hold dead links ;
  • A large number of term description have been re-written to clearly state in which case they should be use, when it can be a bit ambiguous. For example, available_as and free_download: one links to an item (something like ed2k://...), and the other one links to a web page giving access to the song (perhaps through a Flash player) ;
  • Terms are annotated by a mo:level property, specifying to which level (1,2 or 3) they belong. Terms in level 1 allow to describe simple editorial information (ID3v1-like), terms in level 2 allow to describe workflow information (this work was composed by Schubert, performed 10 times, but only 2 of these performances have been recorded, and terms in level 3 allow to describe the inner structure of the different events composing this worflow (at this time, this performer was playing in this particular key) ;
  • But surely, this release main improvement lies into the infrastructure for maintaining the code and the documentation. MO has now a dedicated SourceForge project, with a subversion repository holding the up-to-date RDF code, all the tool chain allowing to generate the whole specification, and a couple of related projects (which I will describe in more details in later posts). Drop me a line if you want to be registered as a developer on the project!

Still, there are a couple of things I'd like to do before the end of the week, like replacing the examples (some of which are pretty out-dated, or just wrong) by real-world MO data (as there begins to quite a lot available out-there:-) ).

Anyway, thanks to everyone who contributed to this release (especially Fred and Chris, and all the people on the mailing list who suggested changes)!!

Wednesday 18 July 2007

Music Ontology: Some thoughts about ontology design

Today, I came across this blog post by Seth Ladd, which has actually nothing to do with ontology design, but with a RESTful way of designing an account activation system. Anyway, the last paragraph of it says:

In summary, I love working with REST because it forces me to think in nouns, which are classes. I find it easier to model the world with nouns than with verbs. Plus, the problem with verbs is you can’t say anything about them, so you lose the ability to add metadata to the events in the system.

This particular sentence reminded me of a lot of discussion on the MO mailing list, which happened when we started looking towards the description of the music production workflow (an idea coming from the older Music Production Ontology) and the Event ontology as a foundation for it. Indeed, the ontology started with only a small number of concepts (well, basically, only the 4 standard FRBR terms), but with many relationships trying to cover a wide range: from this expression is in fact an arrangement of this work to this person is playing this instrument. But, once you want to be more expressive, you are stuck. For example, you can't express things such as this person is playing this instrument in this particular performance anymore---you can't say anything about verbs (unless you go into RDF reification, but, well, who really wants to go into it? :-) ).


When you start talking about a workflow of interconnected events (composition/arrangement/performance/recording, for example), you limit the number of relationships you have to provide (ultimately, relations between things are all held by an event - so you just need the five relationships defined in the Event ontology) in favor of some event concepts and some concepts covering your objects (musical work, score, signal, etc.). Now, you can actually attach any information you want to any of these events, allowing a large number of possible extensions to be built on top of your ontology. For example, we can refer to a custom recording device taxonomy by just stating something like ex:myrecording event:factor ex:shureSM58.

Moreover, the Event ontology also provides a way to break down events, so you can even break complex events (such as a group performance) into simpler events (a particular performer playing a particular instrument at a particular time).

(Actually, there are lots of papers on this sort of subject, like these ones on the ABC/Harmony project, this one on token reification in temporal reasoning or this one on modularisation of domain ontologies.)

Wednesday 23 May 2007

Find dereferencable URIs for tracks in your personal music collection

Things are moving fast, since my last post. Indeed, Frederick just put online the Musicbrainz RDF dump, with dereferencable URIs, SPARQL end-point, everything. Great job Fred!!

This data set will surely be a sort of hub for music-related data on the Semantic Web, as it gives URIs for a large number of artists, tracks, albums, but also timelines, performances, recordings, etc. Well, almost everything defined in the Music Ontology.

I am happy to announce the first hack using this dataset:-) This is called GNAT (for GNAT is not a tagger). It is just some lines of python code which, from an audio file in your music collection, gives you the corresponding dereferencable URI.

It also puts this URI into the ID3v2 Universal File Identifier (UFID) frame. I am not sure it is the right place to put such an information though, as it is an identifier of the manifestation, not the item iself. Maybe I should use the user-defined link frames in the ID3v2 header...

So it is actually the first step of the application mentioned here!

It is quite easy to use:

$ python trackuri.py 7-don\'t_look_back.mp3

 - ID3 tags

Artist:  Artemis
Title:  Don't Look Back
Album:  Undone

 - Zitgist URI



$ eyeD3 7-don\'t_look_back.mp3

7-don't_look_back.mp3   [ 3.23 MB ]
Time: 3:31      MPEG1, Layer III        [ 128 kb/s @ 44100 Hz - Stereo ]
ID3 v2.4:
title: Don't Look Back          artist: Artemis
album: Undone           year: 2000
track: 7                genre: Trip-Hop (id 27)
Unique File ID: [http://zitgist.com/music/] http://zitgist.com/music/track/2b78923b-c260-44c1-b333-2caa020df172
Comment: [Description: http] [Lang: ]
Comment: [Description: ID3v1 Comment] [Lang: XXX]
From www.magnatune.com

You can also output the corresponding RDF, in RDF/XML or N3:

$ python trackuri.py 1-i\'m_alive.mp3 xml
<?xml version="1.0" encoding="UTF-8"?>
    <_3:availableAs rdf:resource=""/>
$ python trackuri.py 1-i\'m_alive.mp3 n3

@prefix _3: <http://zitgist.com/music/track/67>.
@prefix _4: <http://purl.org/ontology/mo/>.

 _3:a1fab6-aea4-47f4-891d-6d42bb856a40 _4:availableAs <>. 

... even though I still have to put the good Item URI, instead of <>.

Get it!

You can download the code here, and it is GPL licensed.

The dependencies are:

  • python-id3
  • python-musicbrainz2
  • RDFLib (easy_install -U rdflib)
  • mutagen (easy_install -U mutagen)

Tuesday 22 May 2007

Music Ontology - 1st project idea

Well, now the Music Ontology begins to be in an usable state (as shown by the Jamendo, the Magnatune, but also Frederick Giasson's Musicbrainz dump, the EASAIER dump of the RSAMD HOTBED database), we have to ask ourselves the question: what next?

I'll try to post some ideas about that, and about potential applications of Music Ontology data (and in particular the Musicbrainz dump, as it will surely be a sort of hub for music-related data on the Semantic Web: it will give URIs for a number of tracks, artists, etc.).

The first thing I'd like to see would be embedding some RDF into an ID3v2 tag. Basically, I'd just like to put one single RDF statement:

<Musicbrainz track URI> mo:availableAs <>.

Well, that's not much, isn't it? Now, a Semantic-Web-enabled music player could follow this link, and get access to all information available on the Semantic Web, live... A good thing could be to embed the Tabulator into Songbird, then allowing you to browse the web of data from your particular item (your audio file) in your collection. Then, here are the free lunches you may get (just trying to think of funny applications - this could evolve) :

  • Place your collection on a map, according to the publication location of your track, or the performance location, or the composition location

My jamendo artists on a map

  • Generate playlists from a particular location
  • Place your collection on a timeline

My jamendo artists on a timeline

  • Generate playlists according to the composition date
  • Explore relationships between artists, generate playlists according to such relationships

Relationships between Metallica and Megadeth

Well, I hope these few examples demonstrate what can be done by interlinking geonames, dbpedia , musicbrainz and jamendo!!

Monday 21 May 2007

"Music and the Web" workshop, AES 122 Vienna Convention

At the beginning of the month, I was invited to speak at the Music and the Web workshop, at the Audio Engineering Society convention, in Vienna.

The first talk was from Scott Cohen, co-founder of The Orchard (btw, I just noticed he was also talking at the WWW conference, last year). He spoke about The death of digital music sales (which is a bit ironic, from the founder of the leading digital music distributor). His main argument was that the music industry will never get enough money by selling digital music, and that it needs to understand the need for an alternative economic model, based on a global license (as was discussed by the French parliament for a really short time, during the DADVSI debates, last year).


The second talk was from Mark Sandler, the head of the Centre for Digital Music, in Queen Mary, University of London. He talked about the OMRAS2 project (OMRAS stands for Online Music Recognition and Searching), and some of the technologies that it will use. Basically, OMRAS2 is about creating a decentralised research environment for musicologists and music information retrieval researchers. Therefore, the Semantic Web definitely seems to fit quite nicely into it:-)


The third talk was from Oscar Celma, working at the Music Technology group in Barcelona. He is the creator of the FOAFing-the-music music recommender, which actually won the 2nd prize of last year Semantic Web Challenge. His talk was about music recommendation (the oh, if you like this, you should like that! problem), and the choice of different technologies (collaborative filtering, content-based) for different needs. He was terribly sick though, but succeeded to make his 40 minutes talk without his voice failing!


The fourth talk was, well, myself:-) I thought it would be a non-expert audience, so I tried to give a not too technical talk. I just did a quick introduction to some Semantic Web concepts, and then dived into the Music Ontology, explaining its basements (Timeline, Event, FRBR, FOAF), the different levels of expressiveness it allows, etc. Then, I talked about linked data. As a conclusion (not much time left), I just highlighted a few bullet points, all related to this Semantic media player which keeps taking a large space in my brain these days.


I had some pretty good feedbacks, and I was really pleased to see a reference to the Music Ontology on Lucas Gonze slides, who was speaking just after me :-) Lucas (too many things to say about him, just check his website, and realise you surely use every day something that he developed) was doing his talk from California, through Skype, and was talking about the Semantic Album - new means of packaging and distributing complex, multi-facet, content. it was a really interesting talk, even though there were some bandwidth problems from time to time.


Finally, there were some time at the end of the workshop for some discussion, which went really well. There were a lot of discussion with someone from an intellectual property agency, mostly reacting to Scott Cohen's talk. Well, I won't go into details here, because I think this discussion deserves a post on its own...

Here is a picture of the audience during the panel.