XML text markup

One of those ideas that come relatively "out of the blue."

We have been thinking about how to represent styled text in XML in RDF. XHTML seemed like a first possibility. Here's something that goes far beyond, in a sense.

We want to allow for "semantic markup:" markup that says what the author means. WYGIWYM-- what you get is what you mean.

However, we don't want that only. We also want presentational markup, because people will want to use it, and if there is no other way they will do presentational markup through semantic markup-- they'll "emphasize" whenever they want to italicize, just to get the effect they want. This is a really bad thing.

Semantic markup should be extensible; you should be able to make up new, meaningful 'styles' as you go.

Here's an example from Tim Bray (source):

<chunk xmlns="http://www.tbray.org/ns/22">The essays found 
on the web at <address>http://www.tbray.org</address> are 
<adj>boring</adj> <adj>pedantic</adj> ravings 
<adverb>clumsily</adverb> authored by a 
<adj>self-styled</adj> technologist.</chunk>

He says he made the namespace up when he typed the above. (It's for example purposes.)

Now, that got me thinking. I propose that the above could be written pretty much like that in our system:

<chunk xmlns="http://example.org/namespace/22#"
       xmlns:tag="http://example.org/namespace/tag#"
       tag:type="para">The essays found on the web 
at <address tag:type="span">http://www.tbray.org</address> 
are <adj tag:type="span">boring</adj> 
<adj tag:type="span">pedantic</adj> ravings 
<adverb tag:type="span">clumsily</adverb> authored by a 
<adj tag:type="span">self-styled</adj> 
technologist.</chunk>

This is, admittedly, clunkier, but it would be auto-generated by the computer. -- Hmm. No, probably it would be better to explicitly declare the type of each tag somewhere; so, ok, something like this:

<ff:text xmlns="..." xmlns:ff="..." xmlns:tags="...">
    <tags:paragraph-tag name="chunk"/>
    <tags:span-tag      name="address"/>
    <tags:span          name="adj"/>
    <tags:span          name="adverb"/>

    <chunk>The essays found on the web at 
    <address>http://www.tbray.org</address> are 
    <adj>boring</adj> <adj>pedantic</adj> ravings 
    <adverb>clumsily</adverb> authored by a 
    <adj>self-styled</adj> technologist.</chunk>
</ff:text>

(Assume the namespace to be as in the previous example.) Like that, this would be feasible to write by hand. I like it.

Now, every tag declared through tags:paragraph-tag or tags:span-tag would have a URI formed in the usual RDF way:

chunk   = http://example.org/namespace/22#chunk
address = http://example.org/namespace/22#address
adj     = http://example.org/namespace/22#adj
adverb  = http://example.org/namespace/22#adverb

And then, you could both use this with an XSLT or CSS style sheet, and with a style definition connected to the URI through RDF. We would normally use the latter.

For presentational markup, we'd use something straight-forward, like:

<span foo:font-face="sans-serif" 
      foo:font-size="18">...</span>

The semantic markup with special tags, as shown above, should only be done at explicit user request. If we only need to associate some text with a URI to connect it to something, we'd do:

<span bla:uri="urn:urn-5:..."/>

The point of all this is: If a user consciously uses the semantic markup technology and user interface, they create a new XML vocabulary. A user in the know could also trivially use an existing vocabulary, like XHTML or DocBook. The result will be XML markup as you expect good XML markup to be. A great win.

- Benja