Storm is quite complex with its MIME headers, and prone to become more complex if we choose to separate hashing of headers and bodies (raw_blocks--benja). If we break backward compatibility a single time, as Tuomas suggests, we should take the opportunity to get rid of our mistakes from the past, in order to make the future simpler.
Won't dropping headers make it harder to include metadata?
RESOLVED: MIME headers are a non-extensible form of metadata anyway; if we allow X- headers, we have problems with permanence. We can still put metadata into another block refering to this one; alternatively, many file formats allow inclusion of metadata in the file itself (e.g. PNG).
Content types are now included in the block id (different content type -> different block).
The benefits outweigh the problems by far.
How about metadata that would be included in an HTTP response, such as alternative representations of a resource (different languages etc.)? How about Creative Commons licenses? Wouldn't it be better to have an RDF "header" block containing this data?
What about the hash tree vulnerabilities mentioned in <http://zgp.org/pipermail/p2p-hackers/2002-November/000993.html> / <http://zgp.org/pipermail/p2p-hackers/2002-November/000998.html>?
RESOLVED: They've settled on a new convention, prepending a zero byte to tree leaves and a one to tree branches (concatenated hashes of tree leaves) before hashing. Their software is being updated; there's a Java implementation. We'll be using that (and we'll fully specify it when writing the informal URN namespace registration).
Why bitzi bitprint? What is it? Why not SHA-1?
RESOLVED: Bitprints are a combination of a SHA-1 hash with a Merkle hash tree based on the Tiger hash algorithm. Hash algorithms get broken; when one of the above is broken, you have a transitional period before the other is, too, in which you can e.g. sign blocks, ensuring you can still use them when the other is broken too.
Having a hash tree allows you to download pieces of a block from different sources, verifying each piece individually. This can be of great help in speeding up download times.
Are bitprints too long for short blocks like ours? (How long are the IDs going to be and whether this will be a problem.)
RESOLVED: Here's an example URI, 102 characters long:
This is long, but IMO not 'too long.'
Why this syntax? Why not another?
RESOLVED: For similarity to RFC 2397 (The "data" URL scheme).
Storm blocks do not have headers any more; the hash in their URN is only of the body. Storm URNs have the following form:
<namespace> is an informal URN namespace to be registered, like urn:urn-5. <bitprint> is a Bitzi bitprint as defined by <http://bitzi.com/developer/bitprint>. <mediatype> is the token defined in [RFC2397]--
mediatype := [ type "/" subtype ] *( ";" parameter ) parameter := attribute "=" value
"where [...] 'type', 'subtype', 'attribute' and 'value' are the corresponding tokens from [RFC2045], represented using URL escaped encoding of [RFC2396] as necessary" [RFC2397]. (Escaping is necessary when a character isn't in the set of allowed URN characters.)
"X-" types aren't allowed, as they work against the persistence of Storm blocks; application/octet-stream or similar must be used instead.
Unlike in [RDF2397], if no <mediatype> is given, application/octet-stream is assumed (not text/plain).
There is a public domain Java implementation of bitprints at <http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/bitcollider/jbitprint/>. Bitprints may be registered as a URN namespace in the future, according to Bitzi. However, they will not include a content type.