Difference between revisions of "Private URI Schemes"

From TEIWiki
Jump to navigation Jump to search
(The Problem)
(A Possible Solution)
Line 18: Line 18:
  
 
when what is actually meant is something like this:
 
when what is actually meant is something like this:
 +
 +
<name ref="../bios/bios.xml#mills">John Powell Mills</name>
 +
 +
Where the key value corresponds to a unique @xml:id within the project, and the project data is stored in an XML database, dereferencing the key to look up the <person> element to which it corresponds is simple. But if the XML data is removed from the context of the XML database and associated XQuery which enables the simple lookup, the relationship between the key and the target element becomes opaque, and any researcher working with the data will have to read the encoding description and reconstruct it.
 +
 +
The proposed solution is to create a method of documenting this relationship which can be mechanically dereferenced as well as being described in human readable text. This would enable a processor to reconstruct the actual path of a link without human intervention. This can be done with a search-and-replace operation, encoded in a manner similar to XPath 2.0's replace function. Imagine this in the encodingDesc of a document:
 +
 +
<listPrivateUri>
 +
  <privateUri prefix="bios" pattern="[a-z]+" replacement="../bios/bios.xml#$1">
 +
    <p>In the context of this project, private URIs with the prefix "bios" point to <person> elements in the </p>
 +
  </privateUri>
 +
</listPrivateUri>

Revision as of 04:39, 26 June 2012

This page contains a draft proposal for a framework whereby private URI schemes used in TEI attributes with datatypes of data.pointer can be documented and dereferenced.


The Problem

For some time now, we have been discussing the use of "magic tokens" in attributes such as @key. Magic tokens are problematic because they are meaningful only within the context of a specific project (@key "provides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind"). We currently suggest that @key attributes be documented through the use of a <taxonomy> element in the TEI header. However, documentation in this way does not provide a machine-readable method of dereferencing a key.

On several occasions, Council has discussed discouraging the use of @key and friends in future, and encourage instead the use of private URI schemes instead. There are many good arguments against the use of private URI schemes (see for instance URI Schemes at the W3C -- but as long as they are restricted to a specific project and well documented in that project, the approach seems a reasonable alternative to magic tokens.

Except that without a solid dereferencing scheme, they're actually no different from magic tokens. There's not much difference between <name key="FRED"> and <name ref="myproj:FRED">.


A Possible Solution

The primary value in using a project-specific key-style attribute is that it's short and simple. In many projects, @key is used when a perfectly straightforward and reliable pointer could be provided, because that pointer would be too long to be manageable by encoders. For instance, the Colonial Despatches project uses keys like this:

<name key="mills">John Powell Mills</name>

when what is actually meant is something like this:

<name ref="../bios/bios.xml#mills">John Powell Mills</name>

Where the key value corresponds to a unique @xml:id within the project, and the project data is stored in an XML database, dereferencing the key to look up the <person> element to which it corresponds is simple. But if the XML data is removed from the context of the XML database and associated XQuery which enables the simple lookup, the relationship between the key and the target element becomes opaque, and any researcher working with the data will have to read the encoding description and reconstruct it.

The proposed solution is to create a method of documenting this relationship which can be mechanically dereferenced as well as being described in human readable text. This would enable a processor to reconstruct the actual path of a link without human intervention. This can be done with a search-and-replace operation, encoded in a manner similar to XPath 2.0's replace function. Imagine this in the encodingDesc of a document:

<listPrivateUri>

 <privateUri prefix="bios" pattern="[a-z]+" replacement="../bios/bios.xml#$1">

In the context of this project, private URIs with the prefix "bios" point to <person> elements in the

 </privateUri>

</listPrivateUri>