PEG swamp_rdf_api--tjl:

Author: Tuomas J. Lukka
Last-Modified:2003-08-31
Revision: 1.14
Status: Current (Partially preliminarily implemented [since in its own package])

This document outlines the main issues in the Jena api currently in use and proposes a lightweight api of our own to replace it.

Issues

Problems with jena

The most important problem with Jena appears to be that it does not support observation.

With Gzz, we were moving towards a functional style of programming where we could easily cache the object given by f(node) since the node could be observed.

Jena makes this impossible because there are no change listeners. Wrapping or extending Jena to something that would have them would be a major task which would result in a more complicated API.

Another issue I (personally) have with Jena is that it tries to be too object-oriented: I first thought (and liked that thought!) that Statements and nodes were independent of the model. However, this was not the case.

Efficiency is also important: in order for Fenfire to work properly, ALL searches within memory must be O(1). Jena makes no guarantees, since its goal is to support different implementations of Graph. For us, the different implementations do not matter so much as raw efficiency of the memory-based implementation. This is quite different from most RDF uses, since the usual scenario is that there is not too much RDF (at least so far).

Design

All classes in this API shall be in org.fenfire.swamp.

The resource mapper

by mudyc: What's the meaning of Resource Mapper?
   tjl: Resource Mapper is used in special cases, i.e.,
        when two different rdf spaces are diffed.
        Now we can share the single resource between them.
   mudyc: So it has nothing to do with Literals?
   tjl: No.

The global resource mapper (has to be global since resources are model-agnostic) is simple: The name must be short because it's so widely used.

public class Nodes {
    public static Object get(String res);
    public static Object get(String res, int offs, int len);
    public static Object get(char[] res, int offs, int len);

    public static String toString(Object res);

    /** Append the string version of the resource to the given buffer.
     * In order to avoid creating too many String objects
     * when serializing a space.
     */
    public static void appendToString(Object res, StringBuffer buf);

    public static void write(Object res, OutputStream stream) throws IOException;
    public static void write(Object res, Writer stream) throws IOException;
}

The appendToString method solves one problem we had in Gzz: when saving, too many Strings were created for object names. Similarly, having the toModel method overloaded with different parameter types allows the most efficient creation of resources without conversions.

We may want to make Nodes internally redirectable in the future to allow alternate implementations; the static interface will not change.

The graph object

The ShortRDF class shows what a mess the query functions can easily become. To avoid this, we'll drop the semantics (subject,predicate,object) for now and name all methods according to a general scheme.

public interface ConstGraph {
    Object find1_11X(Object subject, Object predicate);
    Object find1_X11(Object predicate, Object subject);
    ...
    Iterator findN_11X_Iter(Object subject, Object predicate);
    ...
}

public interface Graph extends ConstGraph {
    void set1_11X(Object subject, Object predicate, Object object);
    void set1_X11(Object subject, Object predicate, Object object);
    ...

    void rm_1XX(Object subject);
    void rm_11X(Object subject, Object predicate);
    void rm_X11(Object predicate, Object object);
    ...

    /** Add the given triple to the model.
     */
    void add(Object subject, Object predicate, Object object);
}

The functions are built by the following format: first, the actual function type:

find1

Find a single triple fitting the given parts and return the part marked X. If there is none, null is returned. If there are more than one, an exception is thrown.

Only a single X may be used.

findN

Return an iterator iterating through the triples fitting the given parts, and return. Even if there are none, the iterator is created. Only a single X may be used.

For instance,

findN_1XA(node)

returns all properties that the node has, and

findN_XAA()

finds all nodes that are the subject of any triple.

set1

Remove the other occurrences of the matching triples, replace them with the given new one. For example, if triples (a,b,c) and (a,b,d) and (a,e,d) are in the model, then after

set1_11X(a, b, g)

the model will have the triples (a,b,g) and (a,e,d). Only a single X may be used (restriction may be lifted in the future). Only 1 and X may be used.

rm
Remove the matching triples from the model. Any amount of As may be used.

and, after an underscore, the parameter scheme:

1
Given
X
Requested / set
A
Ignored - may be any

The uniqueness exception

For debugging and possibly cool code hacks, the following error gives enough information to understand what was not unique.

public class NotUniqueError extends Error {
    public final Object subject;
    public final Object predicate;
    public final Object object;
}

The wildcards are set to null.

For example, if the user calls

graph.find1_11X(foo, bar);

and there are the triples (foo, bar, baz) and (foo, bar, zip) in the model, then

NotUniqueError(foo, bar, null)

is generated.

Observing

Observing is a part of ConstGraph::

public ConstGraph getObservedConstGraph(Obs o);

/** This observed graph will not be used any more, and
 * if desired, may be recycled by the ObservableGraph.
 * This operation is allowed to be a no-op.
 */
public void close();

Object find1_11X(Object subject, Object predicate, Obs o);
Object find1_X11(Object predicate, Object subject, Obs o);
...
Iterator findN_11X_Iter(Object subject, Object predicate, Obs o);
...

The find methods with Obses are included in ObservableGraph because this allows the cheap default implementation of ObservedGraph. In an autogenerated implementation, ObservedGraph would also be generated for efficiency.

Literals

For literals, we shall use immutable literal objects.