I was recently looking again at the Named Graphs paper, and was wondering why the consensus for implementing Named Graphs support in triple stores was to store quads.

Named Graphs are defined as follows. We consider U as being the set of all web identifiers, L as being the set of all literals and B the set of all blank nodes. U , L and B are pairwise disjoint. Let N = ULB, then the set T = N × U × N is the set of all RDF triples. The set of RDF graphs G is the power set of T, and a named graph is a pair ng = (u, g), where uU and gG.

As mentioned in one my previous post, one thing that I am really keen on is the idea of having triples that belong to multiple graphs. This situation is already happening in the wild a fair bit. If you look at a /programmes aggregation feed (all available Radio 4 programmes starting with g), it mentions multiple programmes. All of the statements about each of these programmes also belong to the corresponding programme graph (e.g. Gardener's Question).

Nothing in the above definition forbids that from happening - each graph and sub-graph can be named. ( (a,b,c), (d,f,g) ) can be named x and ( (a,b,c) ) can be named y.

Now, most triple stores implement Named Graphs by, in fact, storing quads. They store tuples of the form (subject, predicate, object, graph). Which means that when you try to store the two graphs above, they will in fact store (a,b,c,x), (d,f,g,x) and (a,b,c,y) --- the triples will be replicated in each graph they are mentioned in. For the above /programmes example, it is terribly inefficient. If you crawl /programmes, you will end up hitting lots of such aggregation URLs, leading in a tremendous amount of data duplication - lots of triples will be repeated across graphs. Perhaps it would be better to store something like (a,b,c,(x,y)) and (d,f,g,(x))?

I would be curious to know if anyone else is hitting this issue. I can think of multiple use-cases for triples belonging to multiple graphs: access control, versioning, attribution at different granularities...