DBTune blog

To content | To menu | To search

Tag - dbtune

Entries feed

Tuesday 10 February 2009

Thesis uploaded!

I just uploaded my PhD thesis entitled A Distributed Music Information System, which I defended on the 22nd of January. My examiners were David de Roure from University of Southampton and Nicolas Gold from King's College. My PhD supervisor was Mark Sandler.

Here is the abstract:

Information management is an important part of music technologies today, covering the man- agement of public and personal collections, the construction of large editorial databases and the storage of music analysis results. The information management solutions that have emerged for these use-cases are still isolated from each other. The information one of these solutions manages does not benefit from the information another holds.

In this thesis, we develop a distributed music information system that aims at gathering music- related information held by multiple databases or applications. To this end, we use Semantic Web technologies to create a unified information environment. Web identifiers correspond to any items in the music domain: performance, artist, musical work, etc. These web identifiers have structured representations permitting sophisticated reuse by applications, and these representations can quote other web identifiers leading to more information.

We develop a formal ontology for the music domain. This ontology allows us to publish and interlink a wide range of structured music-related data on the Web. We develop an ontology evaluation methodology and use it to evaluate our music ontology. We develop a knowledge representation framework for combining structured web data and analysis tools to derive more information. We apply these different technologies to publish a large amount of pre-existing music-related datasets on the Web. We develop an algorithm to automatically relate such datasets among each other. We create automated music-related Semantic Web agents, able to aggregate musical resources, structured web data and music processing tools to derive and publish new information. Finally, we describe three of our applications using this distributed information environment. These applications deal with personal collection management, enhanced access to large audio streams available on the Web and music recommendation.

So far, just a PDF is available, as I am still fighting with LaTeX2HTML, but there will be an HTML version some time soon :-) I am also planning to upload, at the same place, some extra annexes and extra results I didn't include in the main document. I think I will also blog here about some of the things included in this thesis.

In case you just want to jump to a particular chapter, I will just give some keywords to the different thesis chapters below:

  1. Introduction
  2. Knowledge Representation and Semantic Web technologies: FOL, Description Logics, RDF, Linked Data, OWL, N3.
  3. Conceptualisation of music-related information: web ontologies, music ontology, time ontology, event ontology, workflow-based modelling
  4. Evaluation of the Music Ontology framework: ontology evaluation, data-driven evaluation, task-based evaluation, latent dirichlet allocation
  5. Music processing workflows on the Web: workflows, concurrent transaction logic, N3, N3-Tr, DLP, publication of dynamically generated results, Semantic Web Services
  6. A web of music-related data: linking open data, dbtune, automated interlinking, quantification of structured web data
  7. Automated music processing agents: N3-Tr, Henry, music analysis, workflows, prolog
  8. Case studies: gnat, gnarql, personal music collection management, zempod, music recommendation
  9. Conclusion

Monday 22 December 2008

New server for DBTune

I completed the move of DBTune to a new shiny server yesterday. Things should go way faster, and the server should have a much better uptime. Overall, our experience with 1and1 hosting has been pretty bad: random server reboots, configuration files erased for no reason, and extremely long delays in getting customer support...

Many many thanks to the Centre for Digital Music for hosting the new DBTune!

Sunday 7 September 2008

DBTune wins the second prize in the Triplify challenge!

I submitted DBTune to the Triplify challenge, a couple of months ago. The text of the submission is there. The results of the challenge were given on Friday, at the i-semantics conference. Many many thanks to Michael Hausenblas for representing DBTune there!

And, DBTune won the second prize! Here is a picture of the prize ceremony:

Congratulations to the winners, LinkedMDB, for their amazing work and well-deserved prize, and many thanks to Sören Auer for organizing the challenge!

Thursday 31 July 2008

Semantic search on aggregated music data

I just moved the semantic search demo to a faster server, so it should hopefully be a lot more reliable. This demo uses the amazing ClioPatria on top of an aggregation of music-related data. This aggregation was simply constructed by taking a bunch of Creative Commons MP3s, running GNAT on them, and crawling linked data starting from the web identifiers outputted by GNAT.

I also set up the search tab to work correctly. For example, when you search for "punk", you get the following results.

Punk search 1

Punk search 2

Note that the results are explained: "punk" might be related to the title, the biography, a tag, the lyrics, content-based similarity to something tagged as punk (although it looks like Henry crashed in the middle of the aggregation, so not a lot of such data is available yet), etc. Moreover, you get back different types of resources: artists, records, tracks, lyrics, performances etc.

For example, if you click on one of the records, you get the following.

Punk search 3

This record is available under a Creative Commons license, so you can get a direct access to the corresponding XSPF playlist, Bittorrent items etc., by following the Music Ontology "available as" property. For example, you can click on an XSPF playlist, and listen to the selected record.

Punk search 4

Of course, you can still do the previous things - plotting music artists (or search results, just take a look at the "view" drop-down box) on a map, on a time-line, browse using facets, etc.

Btw, if you like DBTune, please vote for it in the Triplify Challenge! :-)

Wednesday 30 July 2008

Last.fm events and DBpedia mobile

For a recent event at the Dana Centre, I was asked to make a small demo of some nice things you can do with Semantic Web technologies. As it is not funny to re-use demos, I decided to go for something new. So after two hours hacking and skyping with Christian Becker, we added to the last.fm linked data exporter a support for recommended events. I also implemented a bit of geo-coding on the server side (although, with the new last.fm API, I guess this part is becoming useless).

Then, thanks to RDF goodness, it was really straight-forward to make that work with DBpedia mobile. DBpedia mobile is a service getting your geo-location from your mobile device, and displaying you a map with nearby sights, using data from DBpedia. DBpedia mobile also uses the RDF cache of a really nice linked data browser called Marbles.

So, after browsing your DBTune last-fm URI in Marbles, you can go to DBpedia mobile and see recommended events alongside nearby sights. To do so, select the Performances (by moustaki) filter. Here is what I get for my profile, when at the university:

DBpedia mobile and last.fm events

Sunday 27 July 2008

Musicbrainz RDF updated

Well, I guess everything is in the title :-) The dump used is now of the 26th of July. I also moved everything to a much faster server. Also, the D2R mapping is still not 100% complete - I am really slowly getting through it, as PhD writing takes almost all my time these days. I added recently owl:sameAs links to the DBTune Myspace service, so you can easily get from Musicbrainz artists to the corresponding MP3s available on MySpace and their social networks. See for example Madonna, linked through owl:sameAs to the corresponding DBpedia artist and to the corresponding Myspace artist.

Friday 25 July 2008

List of accepted ISMIR 2008 papers

Just spotted through Paul's blog: the list of accepted ISMIR 2008 papers is now available online. All the papers sound really interesting, so I guess it will be a really good ISMIR!! I am especially glad to see that the Variations3 people will present their work on FRBR-based musical metadata. They seem to have done a lot of interesting things over the last year! I also hope we can make things connect in some ways with MO, thanks to this common FRBR backbone.

Anyway, I can't wait for the actual proceedings which, apparently, will be available online prior to the conference. Quite a few of the selected papers are already available on the Web as pre-prints, though (this really interesting one from Patrick Rabbat and Francois Pachet, for example).

I should have uploaded it earlier, but here is the paper we wrote with Mark Sandler. It describes all the structured data publishing and interlinking work we've been doing over the last year, based on the Music Ontology framework we described last year. We tried to illustrate that by (hopefully) fun examples (Mozart and Metallica are closer than you think... :-) ). It also describes a SPARQL-based web service for feature extraction, driven by workflows written in N3.

Thursday 17 July 2008

Literal search using the Jamendo SPARQL end-point

I just wrote a small SWI-Prolog module for literal search using the ClioPatria SPARQL end-point. It uses the rdf_litidex module, and performs a metaphone search on existing literals in the database. All of that is triggered through a built-in RDF predicate.

Here is an example query you can perform on the Jamendo SPARQL end-point (make sure you select lit as the entailment - it will be the default one soon):

SELECT ?o
WHERE
{"punk jazz" <http://purl.org/ontology/swi#soundslike> ?o}

This query binds ?o to all resources within the end-point that are associated with matching literals. For example, you would get back:

The module is available there.

Wednesday 9 July 2008

Nominated!

We learned yesterday that DBTune was nominated for the Triplify Challenge! The other seven projects are really interesting as well, so I guess the competition will be really high! The final results will be given at the I-Semantics conference in early September.

Also, Tim Berners-Lee made a great talk about linked data and the semantic web on Radio 4 earlier today. The first use-case he mentions sounds quite familiar: finding bands based on geo-location data. He already mentioned that in one of his blog posts, linking to this screencast.

An interesting discussion took place on the Linking Open Data mailing list just afterwards, to gather use-cases for explaining to a general public what linked data can be useful for.

Wednesday 2 April 2008

13.1 billion triples

After a rough estimation, it looks like the services hosted on DBTune provide access to 13.1 billion triples, therefore making a significant addition to the data web!

Here is the break-down of such an estimation:

  • MySpace: 250 million people * 50 triples (in average) = 12.5 billion triples ;
  • AudioScrobbler: 1.5 million users (only Europe?) * 400 triples = 600 million ;
  • Jamendo: 1.1 million triples + 5000 links to other data sources ;
  • Magnatune: 322 000 triples + 233 links ;
  • BBC John Peel sessions: 277 000 triples + 2100 links ;
  • Chord URI service: I don't count it, as it is potentially infinite (the RDF descriptions are generated from the chord symbol in the URI).

However, SPARQL end-points are not available for AudioScrobbler and MySpace, as the RDF is generated on-the-fly, from the XML feeds for the earlier, and from scraping for the latter.

Now, I wish linked data could be provided directly by the data sources themselves :-) (Again, all the code used to run the DBTune services is available in the motools project on Sourceforge).

Wednesday 12 March 2008

MySpace RDF service

Thanks to the amazing work of Kurt and Ben on the MyPySpace project, members of the MySpace social network can have a Semantic Web URI!

This small service provides such URIs and corresponding FOAF (top friends, depiction, name) and Music Ontology (URIs of available tracks in the streaming audio cache) RDF representations.

That means I can add such statements to my FOAF profile:

<http://moustaki.org/foaf.rdf#moustaki> foaf:knows <http://dbtune.org/myspace/lesversaillaisesamoustache>.

And then, using the Tabulator Firefox extension:

MySpace friends

PS: The service is still a bit slow and can be highly unstable, though - it is slightly faster with URIs using MySpace UIDs.

PS2: We don't host any data - everything is scraped on the fly using the MyPySpace tools.

Wednesday 6 February 2008

Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird

Today, I made a small screencast about mixing the following ingredients:

All of that was extremely easy to set up (it actually took me more time to figure out how to make a screencast on a Linux box :-) which I finally did using vnc2swf). Basically, just some tweaked configuration files for ClioPatria, and a small CSS hack, and that was it...

The result is there:

Songbird, Linked Data, Mazzle and Jamendo

(Note that only a few Jamendo artists are displayed now... Otherwise, Google Maps would just crash my laptop :-) ).

Tuesday 22 January 2008

Pushing your Last.FM friends in the FOAF-O-Sphere

I just committed some changes to the last.fm linked data service. It now spits out, as well as your last scrobbled tracks linked to corresponding Musicbrainz URIs, your list of last.fm friends (using their URI on this service)

This is quite nice to explore the last scrobbles of the friends of your friends (hello Kurt and Ben!) :)

The friends of my friends on last.fm

Friday 11 January 2008

Your AudioScrobbler data as Linked Data

I just put online a small service, which converts your AudioScrobbler data to RDF, designed using the Music Ontology: it exposes your last 10 scrobbled tracks.

The funny thing is that it links the track, records, and artists to corresponding dereferencable URIs in the Musicbrainz dataset. So the tracks you were last listening to are part of the Data Web!

Just try it by getting this URI:

http://dbtune.org/last-fm/<last.fm username>

For example, mine is:

http://dbtune.org/last-fm/moustaki

Of course, by being linked to dereferencable URIs in Musicbrainz, you are able to access the birth dates of the artists you last listened to, or use the links published by DBPedia to plot your last artists played on map, by just crawling the Data Web a little.

Then, you can link that to your FOAF URI. Mine now holds the following statement:

<http://moustaki.org/foaf.rdf#moustaki> owl:sameAs <http://dbtune.org/last-fm/moustaki>.

Now, my URI looks quite nice, in the Tabulator generic data browser!

Me and my scrobbles

Thursday 29 November 2007

Jamendo RDF updated

I finally updated the Jamendo dataset. Indeed, the previous version was based on a dump from about 5 months ago.

During these few months, their dataset increased a lot (Jamendo rocks... It is clearly my favorite music source)! The corresponding RDF is now just a bit more than one million triple (the whole RDF dump is available).

While updating the dataset, I also fixed a number of issues:

  • Added mo:available_as links towards playlists in XSPF and M3U formats - this is a really cool feature, and fixed the Bittorrent and ED2K links;
  • Fixed some bugs in the Geonames linking - now, almost every artist is linked to the corresponding Geonames URI ;
  • Fixed some Musicbrainz links, but there is still some work to do on that side (I would need to relaunch my record linkage algorithm, but it is a bit slow, and it is a bit late :) ) ;

Thursday 8 November 2007

Finally a Web page for DBTune...

I finally took some time to write a short web page for DBTune, enumerating all the datasets published so far: Jamendo, Magnatune and the BBC John Peel sessions.

It also details the different interlinking, making these datasets part of the Linking Open Data cloud.

I also submitted the different datasets to Sindice (a really cool Semantic Web search engine!), so hopefully they will be indexed soon!

Wednesday 11 July 2007

John Peel sessions available as RDF

Yesterday, I put online the John Peel sessions as linked data (dereferencable identifiers, content negotiation, RDF, etc.).

It uses the data the BBC has released for the Hackday, some weeks ago. I wrote a SWI-Prolog wrapper for this data, which is then made accessible through SPARQL using P2R (which I have updated to handle dynamic construction of literals, by the way) and this mapping. The URIs are then made dereferencable through UriSpace.

Some documentation is available there.

Here are a bunch of URIs that you can try:

And then, for example

$ curl -L -H "Accept: application/rdf+xml" http://dbtune.org/bbc/peel/artist/1036
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE rdf:RDF [
    <!ENTITY foaf 'http://xmlns.com/foaf/0.1/'>
    <!ENTITY mo 'http://purl.org/ontology/mo/'>
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'>
    <!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'>
]>

<rdf:RDF
    xmlns:foaf="&foaf;"
    xmlns:mo="&mo;"
    xmlns:rdf="&rdf;"
    xmlns:rdfs="&rdfs;"
    xmlns:xsd="&xsd;"
>
<mo:MusicArtist rdf:about="http://dbtune.org/bbc/peel/artist/1036">
  <rdfs:label rdf:datatype="&xsd;string">King Crimson</rdfs:label>
  <foaf:img rdf:resource="http://bbc.co.uk/music/king_crimson.jpg"/>
  <foaf:name rdf:datatype="&xsd;string">King Crimson</foaf:name>
</mo:MusicArtist>

<rdf:Description rdf:about="http://dbtune.org/bbc/peel/session/1788">
  <mo:performer rdf:resource="http://dbtune.org/bbc/peel/artist/1036"/>
</rdf:Description>

<rdf:Description rdf:about="http://dbtune.org/bbc/peel/session/1789">
  <mo:performer rdf:resource="http://dbtune.org/bbc/peel/artist/1036"/>
</rdf:Description>

</rdf:RDF>

So far, this dataset is not linked to anything external! But I plan to link it to Musicbrainz, Geonames, and Last.fm snippets soon.

Saturday 26 May 2007

Linking open data: publishing and linking the Jamendo dataset

Some weeks ago, I released a linked data representation of the Jamendo dataset, a large collection of Creative Commons licensed songs, according to the Music Ontology.

I had some experience with publishing such datasets, through the dump of the Magnatune collection, which I have done through D2R Server, and this D2RQ mapping. The Magnatune dump, through the publishingLocation property, is linked to the dbpedia dataset. Well, it was in fact really easy: the geographical location in the Magnatune database is just a string: France, USA, etc. And the dbpedia URIs I am linking to are just a plain concatenation of such strings and http://dbpedia.org/resource/. All of that (pointing towards custom URI patterns) can be done quite easily through D2R.

However, it was a bit more difficult for the Jamendo dataset...

  • They release their dump in some custom XML schema, and their database is evolving quite fast, so in order to be up-to-date, you have to query their API, which makes it difficult to use a relational database publishing approach.
  • Geographical information is also represented as a string, but it could be France (75) (for Paris, France), Madrid, Spain, etc., which makes it difficult to find a canonical way of constructing dbpedia or Geonames URIs.

Therefore, I released a small program, P2R, making use of a declarative mapping to export a SWI-Prolog knowledge base on the Semantic Web.

With Prolog as a back-end, you can do a lot more stuff than with a plain relational database. I'll try to give an example of this, by describing how I have done to link the Jamendo dataset to the Geonames one.

Prolog-to-RDF

P2R handles declarative mappings associating a Prolog term (just a plain predicate, or a logical formulae combining some predicates) to a set of RDF triples. The resulting RDF is made available through a SPARQL end-point.

For example, the following example maps the predicate artist_dispname to {<artist uri> foaf:name "name"^^xsd:string.}:

match:
        (artist_dispname(Id,Name))
                eq
        [
                rdf(pattern(['http://dbtune.org/jamendo/resource/artist/',Id]),foaf:name,literal(type('http://www.w3.org/2001/XMLSchema#string',Name)))
        ].

Then, when the SPARQL end-point processes a triple pattern such as:

<http://dbtune.org/jamendo/resource/artist/5> foaf:name ?name.

It will bind the term ID to 5, and try to prove artist_dispname(5,Name). This predicate will in fact be defined by the following:

artist_dispname(Id,Name) IF 
        query Jamendo API for names associated to Id AND
        Name is one of these names

(or, instead of querying Jamendo API, it can just parse the XML dump).

Therefore, it will query the Jamendo API, bind Name to the name of the artist, and send back a binding between ?name and "both"^^xsd:string. If the subject was ?artist in our query, we would have retrieved every pairs of artist URI / name.

You then have a SPARQL end point able to answer such queries by asking Jamendo API.

UriSpace

Then, all you have to do is to redirect every URI in your URI space (here, http://dbtune.org/jamendo/resource/) to DESCRIBE queries on the SPARQL end-point that P2R exposes.

I published another piece of code that does the trick, UriSpace, also through a declarative mapping

Linking the Jamendo data set to the Geonames one

As we saw earlier, it is not possible to directly construct an URI from a string denoting a geographical location in the Jamendo dataset. But well, we are not limited on what we can do inside our mappings! Here is the part of the P2R mapping that exposes the foaf:based_near property:

match:
        (artist_geo(Id,GeoString),geonames(GeoString,URI))
                eq
        [
                rdf(pattern(['http://dbtune.org/jamendo/resource/artist/',Id]),foaf:based_near,URI)
        ].

Where, in fact, the geonames(GeoString,URI) predicate is defined as:

geonames(GeoString,URI) IF
        clean GeoString (remove "(" and ")", basically) AND
        query Geonames web service to retrieve the first matching URI with GeoString

And it is done! Now, you can see the link to the Geonames dataset, when getting a Jamendo artist URI:

$ curl -L -H "Accept: application/rdf+xml" http://dbtune.org/jamendo/resource/artist/5
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE rdf:RDF [
    <!ENTITY foaf 'http://xmlns.com/foaf/0.1/'>
    <!ENTITY mo 'http://purl.org/ontology/mo/'>
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'>
]>
<rdf:RDF
    xmlns:foaf="&foaf;"
    xmlns:mo="&mo;"
    xmlns:rdf="&rdf;"
    xmlns:xsd="&xsd;"
>
<mo:MusicArtist rdf:about="http://dbtune.org/jamendo/resource/artist/5">
  <foaf:made rdf:resource="http://dbtune.org/jamendo/resource/record/174"/>
  <foaf:made rdf:resource="http://dbtune.org/jamendo/resource/record/33"/>
  <foaf:based_near rdf:resource="http://sws.geonames.org/2991627/"/>
  <foaf:homepage rdf:resource="http://www.both-world.com"/>
  <foaf:img rdf:resource="http://img.jamendo.com/artists/b/both.jpg"/>
  <foaf:name rdf:datatype="&xsd;string">Both</foaf:name>
</mo:MusicArtist>

<rdf:Description rdf:about="http://dbtune.org/jamendo/resource/record/174">
  <foaf:maker rdf:resource="http://dbtune.org/jamendo/resource/artist/5"/>
</rdf:Description>

<rdf:Description rdf:about="http://dbtune.org/jamendo/resource/record/33">
  <foaf:maker rdf:resource="http://dbtune.org/jamendo/resource/artist/5"/>
</rdf:Description>

</rdf:RDF>

And you can plot some Jamendo artists on a map, using the Tabulator generic data browser.

Some Jamendo artists on a map, using the Tabulator