XPointer

From TEIWiki

Jump to: navigation, search

For use cases where these techniques might be used, see Stand-off use cases.

The XPointer Framework is a collection of schemes (note: schemes not schemas) that specify the method for addressing into the XML tree or help in this task. One end of its functionality overlaps with XPath, but the other allows one to address points and ranges inside elements, and this is often precious to us TEI-ers, especially in some stand-off markup systems or for working with ontologies (RDF may use it). It was initially defined as a companion to XLink that took care of URI fragment identifiers (the strings placed after the '#' in a URI).

Some of the schemes, e.g. the simplest element() scheme, which uses simple tree-traversal syntax, are supported by any decent XML tool nowadays. It is worth pointing out that any tool that claims to support XInclude, must support the general conventions of the XPointer Framework and the XPointer element() scheme (per the XInclude spec). The "general conventions" of the Framework mean, practically, the possibility of addressing elements by their IDs and the possibility of using multiple schemes in a single pointer (see below).

Apart from element(), there is also the xmlns() scheme that does namespace binding, and finally the xpointer() scheme that does an incredible lot of useful things in a very clever way, except it's not supported in full anywhere.

The three schemes mentioned above have been defined or, in the case of xpointer(), drafted, by the W3C. The XPointer Framework, however, allows other parties to define their own schemes and get them registered in a special corner of the W3C called the XPointer Registry. And this is the point where we, as the TEI community, may want to focus some of our attention. Thanks to Syd Bauman, a number of schemes (described in the Linking and Alignment chapter of the Guidelines) have been registered with the W3C. One of them, xpath1(), is actually shared with other Internet communities, and implemented in Firefox (see bug #182323). Another is smlxpath1() defined by the Service Modeling Language. Yet another is the string-range() scheme, one of the TEI-defined schemes that are useful in stand-off architecture and that may address into the content of elements.

Contents

Structure of the XPointer Framework

The structure of the XPointer Framework is illustrated below:

                     XPointer Framework
                      .               .
                     /                 \
                    /                   \
                   .                     .
               syntax              scheme repository
                                     .            .
                                    /              \
                                   /                \
                                  .                  .
                        W3C schemes               external-party (a.o. TEI) schemes
                      .   .       .                     .          .            .
                     /    |        \                   /           |             \
                    /     |         \                 /            |              \
                   .      .          .               .             .               .
              element() xmlns() xpointer()        range() string-range()  ...  xpath2()
                                .       .
                               /         \
                              /           \
                             .             .
                     XPath 1.0 functions    XPointer functions
                                           .   .       .
                                          /    |        \
                                         /     |         \
                                        .      .          .
                                 range-to()  end-point() string-range() ...

Notice that string-range() is mentioned twice in the diagram above. This is because of an unfortunate homonymy that has generated some confusion in the past but, hopefully, will not do so any longer.

The XPointer xpointer() scheme uses XPath functions and adds to them several others, which, by entering the inter-character space, are able to cleverly address what no XPath has been able to address before. Among these functions is one called string-range().

Thus the difference between the W3C's (location-set) string-range(location-set, string-to-match, offset?, length?) function and the TEI's (string) string-range(pointer, offset, length?) scheme is not merely a difference in the definition, but also a difference in the status. There is at least one important consequence of this, worth bearing in mind: the status and implementation of TEI's schemes does not directly depend on the status and implementation of the xpointer() scheme. Of course, one can hope that the former may piggyback on the latter, from the perspective of software developers.

XPointer syntax

The term "XPointer syntax" is used here informally. Its practical relevance is that it allows for two things: addressing by ID and using multiple pointers. Still informally, the syntax may be sketched as follows:

Pointer  	  ::=   	ID | SchemeBased
SchemeBased	  ::=   	PointerPart+

The first alternative allows for addressing by ID, reminiscent of HTML: "resource.xml#ID" (which may also use the element() scheme, as in "resource.xml#element(ID)"), and the other for sequencing schemes, as in the examples below:

  1. "resource.xml#xpointer((//p)[1]) element(p1)"
  2. "resource.xml#xmlns(tei=http://www.tei-c.org/ns/1.0)xpointer(//tei:p[1])"

The XPointer processor must evaluate such sequences from left to right and stop whenever it succeeds in making a match. This means that in the first case, if the scheme is recognized, the processor tries to match the first of all <p> elements that can be found in the document, and if it does not succeed, it moves to the second scheme and looks for the first element identified by the ID "p1". Notice that this setup makes it possible for us to use TEI-defined schemes with any processor, provided that we provide some kind of fallback as the last scheme. Does that buy us much? Hardly, but it's a step forward until the processor starts recognizing TEI schemes.

The second example sequence involves namespace binding: the xmlns() scheme binds the prefix, returns no result, so the processor moves to the next scheme and tries to make a match against, this time, a namespace-qualified element name.


The fact that the non-W3C schemes xpath1() and smlxpath1() are supported and in use (note: what are the tools that support them? -- need to find that out), makes it more likely for us to get the developers to plug in TEI-defined schemes into their tools. Of course these schemes have to be coded first, but that was obvious from the beginning.

Pushing the xpointer() scheme ahead might result in our being able to reuse the routines handling that scheme, which would reduce the work on TEI-scheme-handling.

Link to Xpath12match.xslt


Here I have to stop for a while

What follows is a mess for now.

Lobbying section

(Refer to my unfinished essay on community efforts that will probably have to wait for after TEI-MM; this section is obviously also unfinished.)

libxml2

Libxml2 is to my knowledge the only XML toolkit that has rudimentary support for XPointer's xpointer() schema. In fact, the support is rather bad... but at least it is there and can be fixed and extended if the need for it can be demonstrated. (The trick with diploma works may also work here -- there is a separate XPointer module that the student can concentrate on, possibly having an eye on what is common in handling the xpointer() scheme and the TEI-defined schemes)

In fact, the TEI doesn't need the xpointer() scheme to be supported in xmllint (libxml2's parser) -- it just needs the general XPointer Framework to work (see the Addendum below for explanation).


This is a fragment of my e-mail, with some links in it, about the xpointer/xinclude stuff

Have to extract some bits from this fragment still.

I searched for freely-available free-standing XPointer-aware tools and found out that only libxml2 (with xmllint) comes reasonably close, but its XPointer support is incomplete and buggy. I reported some of that on TEI-L some time ago. Since then, libxml2 has seen two bugfix releases, but the crucial functionality is still missing.

We have a colleague, Jakub Wilk, who did some bug-hunting and submitted a few patches to libxml2 in his free time, but I guess both his free time and patience have run out now (which I find perfectly understandable).

In case you were interested in pursuing this further, let me give you some links as starters:

"internal error, xpointer.c:2409" when using string-range()
https://bugzilla.gnome.org/show_bug.cgi?id=562541
Xpointer range-to function loses the end-point children
https://bugzilla.gnome.org/show_bug.cgi?id=306081
buggy range() XPointer function
https://bugzilla.gnome.org/show_bug.cgi?id=584219
buggy string-range() XPointer function
https://bugzilla.gnome.org/show_bug.cgi?id=583442


I tried to use the xpointer-schema string-range() function instead of the TEI-defined string-range schema, but that was impossible for a while, until this bug got fixed:

unrecognized XPointer schemes are not skipped silently
https://bugzilla.gnome.org/show_bug.cgi?id=563562

(so there is a light...)

But that would require a few complications in the markup, to provide a cascade of XPointer schemas, with the W3C schema as fallback until the TEI-defined schemas are supported by some tool.

Personal tools