XPointer

The XPointer Framework is a collection of schemes (note: schemes not schemas) that specify the method for addressing into the XML tree or help in this task. One end of its functionality overlaps with XPath, but the other allows one to address points and ranges inside elements, and this is often precious to us TEI-ers, especially in some stand-off markup systems or for working with ontologies (RDF may use it). It was initially defined as a companion to XLink that took care of URI fragment identifiers (the strings placed after the '#' in a URI).

Some of the schemes, e.g. the simplest element scheme, which uses simple tree-traversal syntax, are supported by any decent XML tool nowadays. It is worth pointing out that any tool that claims to support XInclude, must support the general conventions of the XPointer Framework and the XPointer element scheme (per the XInclude spec). The "general conventions" of the Framework mean, practically, the possibility of addressing elements by their IDs and the possibility of using multiple schemas in a single pointer (see below).

Apart from element, there is also the xmlns scheme that does namespace binding, and finally the xpointer scheme that does an incredible lot of useful things in a very clever way, except it's not supported in full anywhere.

The three schemes mentioned above have been defined or, in the case of xpointer, drafted, by the W3C. The XPointer Framework, however, allows other parties to define their own schemes and get them registered in a special corner of the W3C called the XPointer Registry. And this is the point where we, as the TEI community, may want to focus some of our attention. Thanks to Syd Bauman, a number of schemes (described in the Linking and Alignment chapter of the Guidelines) have been registered with the W3C. One of them, xpath1, is actually shared with other Internet communities, and implemented in Firefox (see bug #182323). Another is smlxpath1 defined by the Service Modeling Language. Yet another is the string-range scheme, one of the TEI-defined schemes that are useful in stand-off architecture and that may address into the content of elements.

Structure of the XPointer Framework
The structure of the XPointer Framework is illustrated below:

Notice that string-range is mentioned twice in the diagram above. This is because of an unfortunate homonymy that has generated some confusion in the past but, hopefully, will not do so any longer.

The XPointer xpointer scheme uses XPath functions and adds to them several others, which, by entering the inter-character space, are able to cleverly address what no XPath has been able to address before. Among these functions is one called string-range.

Thus the difference between the W3C's  function and the TEI's   scheme is not merely a difference in the definition, but also a difference in the status. There is at least one important consequence of this, worth bearing in mind: the status and implementation of TEI's schemes does not directly depend on the status and implementation of the xpointer scheme. Of course, one can hope that the former may piggyback on the latter, from the perspective of software developers.

XPointer syntax
The term "XPointer syntax" is used here informally. Its practical relevance is that it allows for two things: addressing by ID and using multiple pointers. Still informally, the syntax may be sketched as follows:

Pointer 	  ::=   	ID | SchemeBased SchemeBased	 ::=   	PointerPart+

The first alternative allows for things like  (which may also be used as , and the other for sequencing schemes, as in the examples below:



The XPointer processor must evaluate such sequences from left to right and stop whenever it succeeds in making a match. This means that in the first case, if the scheme is recognized, the processor tries to match the first of all <p&gt; elements that can be found in the document, and if it does not succeed, it moves to the second scheme and looks for the first element identified by the ID "p1". Notice that this setup makes it possible for us to use TEI-defined schemes with any processor, provided that we provide some kind of fallback as the last scheme. Does that buy us much? Hardly, but it's a step forward until the processor starts recognizing TEI schemes.

The second example sequence involves namespace binding: the xmlns scheme binds the prefix, returns no result, so the processor moves to the next scheme and tries to make a match against, this time, a namespace-qualified element name.

Here I have to stop for a while

What follows is a mess for now.

Libxml2 is the only XML toolkit that has rudimentary support for XPointer's xpointer schema. In fact, the support is rather bad... but at least it is there and can be fixed and extended if the need for it can be demonstrated. (The trick with diploma works may also work here -- there is a separate XPointer module that the student can concentrate on.)

In fact, the TEI doesn't need the xpointer scheme to be supported in xmllint (libxml2's parser) -- it just needs the general XPointer Framework to work (see the Addendum below for explanation).

I have reported or commented on several XPointer/XInclude-related bugs in libxml2, but without votes, testing and comments from others, fixing them may take a while (insert link to Daniel Veillard's mail here). I have even to talk a colleague into providing a few patches, with some success (link to Jakub Wilk's report of the bug persisting).

This is a fragment of my e-mail, with some links in it, about the xpointer/xinclude stuff

Have to extract some bits from this fragment still.  I searched for freely-available free-standing XPointer-aware tools and found out that only libxml2 (with xmllint) comes reasonably close, but its XPointer support is incomplete and buggy. I reported some of that on TEI-L some time ago. Since then, libxml2 has seen two bugfix releases, but the crucial functionality is still missing.

We have a colleague, Jakub Wilk, who did some bug-hunting and submitted a few patches to libxml2 in his free time, but I guess both his free time and patience have run out now (which I find perfectly understandable).

In case you were interested in pursuing this further, let me give you some links as starters:

"internal error, xpointer.c:2409" when using string-range https://bugzilla.gnome.org/show_bug.cgi?id=562541

Xpointer range-to function loses the end-point children https://bugzilla.gnome.org/show_bug.cgi?id=306081

buggy range XPointer function https://bugzilla.gnome.org/show_bug.cgi?id=584219

buggy string-range XPointer function https://bugzilla.gnome.org/show_bug.cgi?id=583442

I tried to use the xpointer-schema string-range function instead of the TEI-defined string-range schema, but that was impossible for a while, until this bug got fixed:

unrecognized XPointer schemes are not skipped silently https://bugzilla.gnome.org/show_bug.cgi?id=563562

(so there is a light...)

But that would require a few complications in the markup, to provide a cascade of XPointer schemas, with the W3C schema as fallback until the TEI-defined schemas are supported by some tool. 