Jem Rayfield wrote a very interesting post on the technologies used by the World Cup BBC web site, which also got covered by Read Write Web.

All this is very exciting, the World Cup Website proved that triple store technologies can be used to drive a production website with significant traffic. I am expecting lots more parts of the BBC web infrastructure to evolve in the same way :-)

There are two issues we are still currently trying to solve though:

  • We need to be able to cluster our triples in several dimension. For example, we may want to have a graph for a particular programme, and a much larger graph for a particular dataset (e.g. programme data, wildlife finder data, world cup data). The smaller graph is used to make our updates relatively cheap (we replace the whole graph whenever we receive an update). The bigger graph is used to give some degree of isolations between the different sources of data. For that, we need graphs within graphs. It can be done with N3-type graph literals, but is impossible to achieve in a standard quad-store setup, where one single triple can't be part of several graphs.
  • With regards to programme data, the main bottleneck we're facing is the number of updates per second we need to be able to process, which most of available triple stores struggle to keep up. The 4store instance on DBTune does keep up, but it has a negative impact on the querying performances, as the write operations are blocking the reads. We were quite surprised to see that the available triple store benchmarks do not take the write throughput into account!