PEG ref_uris--benja: Storm blocks with metadata

Author: Benja Fallenstein
Created:2003-08-18
Changed:2003-08-19
Status: Current
Scope:Major
Type:Architecture

Storm blocks are very simple, just a sequence of octets paired with a MIME media type. There are good reasons for this design, explained elsewhere.

However, sometimes you want to include more information with a document, for example the author; a title; copyright information; the creation date; the natural language; licensing information (e.g. copyleft or micropayment) that can be automatically processed by your computer; other files that should be downloaded if you download this one, e.g. images used on a Web page; for audio data, the artist and album; for an image, where it was taken, what is shown on it, and an alternative description for the blind. The list goes on.

Also, you may want to create something more complex than a document represented by a single octet stream. For example, a Web page may be available in different languages, and an image may be available as both image/png and image/svg+xml.

This PEG proposes an extensible architecture that allows for all this.

Issues

Reference URIs

So far, we have used one type of URI in Storm: Block URIs, preliminarily of the form:

urn:x-storm:block:<media-type>,<hash>

This PEG introduces a new kind of URI, for now of the form:

urn:x-storm:ref:<hash>

where <hash> is the hash of a Storm block in Sub-RDF/XML format (see sub_rdf_xml--benja). This block contains metadata about the resource identified by the ref URI.

It follows that the metadata graph itself has a URI in Storm, namely:

urn:x-storm:block:application/rdf+xml,<hash>

This graph defines authoritative metadata for the ref URI.

How to make statements about the ref URI

The metadata RDF graph contains a triple of the following form:

<>  storm:refDefines  _:foo

i.e., with the empty URI as the subject; with storm:refDefines as the property (the storm namespace is defined in this PEG, below), and with a blank (aka anonymous) node as the object.

The empty URI is a relative URI which identifies "this document." Actually, RDF graphs do not contain relative URIs, only serializations of RDF graphs do; the actual triple in the graph is,

<urn:x-storm:block:application/rdf+xml,<hash>> storm:refDefines _:foo

What this triple means is: The subject (an RDF graph) is a metadata graph that defines, through the mechanism outlined in this PEG, a resource, called _:foo in the graph. I.e., _:foo is the same resource as urn:x-storm:ref:<hash>; the problem is that we cannot write the latter in the Sub-RDF/XML graph, because the hash of that graph is used in it; a block cannot include its own hash (without breaking the hash function...).

So whenever we want to make a statement about urn:x-storm:ref:<hash>, instead we make a statement about _:foo. Storm knows that the two nodes represent the same resource.

For example, we can state something like:

_:foo  dc:author   <http://example.org/~alice>
_:foo  cc:license  <http://www.gnu.org/licenses/gpl.html>

(cc is Creative Commons, and dc is Dublin Core.)

Documents

Now we have a way to store arbitrary metadata about our document-- but how do we tell Storm what the content of our document is?

For this, we use a special RDF propery, repr:instance:

_:foo  repr:instance  <urn:x-storm:block:<type2>,<hash2>>

This triple tells Storm that when the user requests _:foo (i.e., the resource denoted by the ref URI), then Storm can serve urn:x-storm:block:<type>,<hash2>.

The repr is for "representation."

In the Web architecture, there are resources, denoted by URIs; for example, "The home page of Amazon, Inc.," or "An image of Sandro Hawke's dog, Taiko."

These resources can have multiple representations, octet streams with media types and other metadata. For example, the home page can have versions in English and French; the image can be available in JPEG or PNG.

A triple with property repr:instance says that the subject is some sort of "document"-- both the home page and the image are documents, but the city of Hameln or the Fenfire project are not-- and that the object is one representation of this document.

Or, maybe more precisely, as the object is also a resource, not a representation: The subject is some sort of document, and all representations of the object are also representations of the subject.

The object may be a Storm URI or any other kind of URI; a Storm implementation is not obligated to support anything else but Storm URIs, though. (In fact, it might warn the user when a ref URI is used to refer to e.g. an HTTP page.)

Alternative representations

A document may have multiple, alternative representations:

_:foo   repr:instance   <urn:x-storm:block:<type1>,<hash1>>
_:foo   repr:instance   <urn:x-storm:block:<type2>,<hash2>>

A Storm implementation can then serve either of these as the document.

Additional triples can be used to describe these representations further:

<urn:x-storm:block:<type1>,<hash1>>  mime:mimeType  "image/png"
<urn:x-storm:block:<type1>,<hash1>>  img:height "100"
<urn:x-storm:block:<type1>,<hash1>>  img:width  "200"

<urn:x-storm:block:<type2>,<hash2>>  mime:mimeType  "image/png"
<urn:x-storm:block:<type2>,<hash2>>  img:height "500"
<urn:x-storm:block:<type2>,<hash2>>  img:width  "1000"

<urn:x-storm:block:<type3>,<hash3>>  mime:mimeType  "image/svg"

Given this, a Storm implementation which understands the img and mime properties could pick either the low or the high resolution version of the image, or the scaleable SVG version, if supported by the client.

An HTTP gateway can use this kind of information to perform content negotiation, selecting one of the alternative versions depending on the client's Language and Accept headers.

Abstract concepts

While block URIs always identify an octet stream with a media type, a ref URI can be used to identify dogs, cars, houses, an RDF class or the theory of relativity: Anything.

Of course you can also use urn-5 for that, but sometimes it is useful to be able to get some authoritative information about a resource-- the ability for a human to put a URI into a browser and get documentation about what it identifies, and the ability for a machine to resolve a URI and get some machine-readable information about it. For example, the ref block for an RDF class could include a human-readable label for the class as well as its superclasses, and refer to some human-readable documentation.

(Fenfire could then, when the class is used in some graph, download its authoritative description and use the human-readable label from that description to show the class.)

In order to be able to put an abstract concept ref URI in a browser and have it resolve to some documentation about the concept, we have to associate it with a representation. For this, we do not use repr:instance, because a description of a concept is not an instance, a version of that concept. Instead, we use

_:foo   repr:description   <urn:x-storm:block:<type>,<hash>>

In general, there should only be one repr:description associated with a resource, although the implementation should treat repr:description the same as repr:instance. If the description needs to be available in different languages or something like that, it should have a ref URI itself.

This is because on the Web, important resources should have their own URIs so that you can link to them and make statements about them-- you want to be able to make statements about both the theory of relativity and the Web page that describes this concept.

Vocabulary defined in this PEG

This PEG defines the following URIs:

http://purl.oclc.org/NET/storm/vocab/ref-uri/refDefines

A property. The subject of triples with this property is a resource that has as (one of) its representation(s) an RDF graph serialized in RDF/XML. The object of the triple is the resource identified by the ref URI that has as its <hash> part the hash of the RDF/XML serialization of the subject.

In practice, this simply means that the subject is a Storm block with media type application/rdf+xml, and the object is the ref URI with the same hash.

http://purl.oclc.org/NET/storm/vocab/representations/representation

A property. The subject is any resource, and the object is a representation of that resource; or more precisely, all representations of the object are also representations of the subject.

If included in the authoritative metadata about the subject, a URI resolver that understands this property shall consider the object of this property as one possible document that can be served as a representation of the subject.

In particular, when a ref URI is e.g. entered into a browser, a URI resolver shall look at the ref block for triples of the form:

_:foo  <...representation>  _:bar

where _:foo is the resource represented by the ref URI.

The objects in these triples (_:bar) are the possible representations of the resource (_:foo).

http://purl.oclc.org/NET/storm/vocab/representations/instance

A property. Both the subject and the object are some kind of "document," something which can be serialized to bits and bytes. The object is some kind of specialization of the subject.

For example, the subject might be "The Bible," and the object might be "The Bible, King James' Version," which is more specific. Or, the subject may be "An image of Sandro Hawke's dog Taiko," and the object may be a PNG or JPEG version of that image.

This is a sub-property of ...representation. A URI resolver shall treat a triple with this property like a triple with property ...representation.

http://purl.oclc.org/NET/storm/vocab/representations/description

A property. The subject is any resource; the object is some kind of "document" which describes the subject.

For example, the subject may be an RDF class, and the object may be a Web page describing how this class is used.

This is a sub-property of ...representation. A URI resolver shall treat a triple with this property like a triple with property ...representation.

No other properties besides the three above shall be treated the same as ...representation, even if some graph states that they are a subproperty of ...representation. This is to make resolution of ref URIs easier.

What this PEG does not define

This PEG doesn't define any "standard" properties for use inside a ref graph, besides the four used above. Other PEGs may define properties to specify e.g. the languages or media types of representations, and dictate resolver behavior in the presence of these properties, for example honoring the Language header in HTTP requests. However, this is left for future specifications.

- Benja