All this is very exciting, the World Cup Website proved that triple store technologies can be used to drive a production website with significant traffic. I am expecting lots more parts of the BBC web infrastructure to evolve in the same way :-)
There are two issues we are still currently trying to solve though:
- We need to be able to cluster our triples in several dimension. For
example, we may want to have a graph for a particular programme, and a much
larger graph for a particular dataset (e.g. programme data, wildlife finder
data, world cup data). The smaller graph is used to make our updates relatively
cheap (we replace the whole graph whenever we receive an update). The bigger
graph is used to give some degree of isolations between the different sources
of data. For that, we need
graphs within graphs. It can be done with N3-type graph literals, but is impossible to achieve in a standard quad-store setup, where one single triple can't be part of several graphs.
- With regards to programme data, the main bottleneck we're facing is the number of updates per second we need to be able to process, which most of available triple stores struggle to keep up. The 4store instance on DBTune does keep up, but it has a negative impact on the querying performances, as the write operations are blocking the reads. We were quite surprised to see that the available triple store benchmarks do not take the write throughput into account!