Oddbyexample

Synopsis
This utility attempts to work out the minimal TEI customization needed to validate a collection of files. The XSLT (version 2) stylesheet which traverses a nominated directory tree looking for *.xml files which have &lt;TEI&gt; or &lt;teiCorpus&gt; root elements. It analyzes the collection of elements and attributes in the resulting corpus, and compares that to the whole of TEI P5. An ODD file is generated which:


 * loads the required modules
 * deletes any elements which are not used
 * deletes any attributes (including class attributes) which are not used by each element
 * for every attribute which has a TEI "data.enumerated" datatype, constructs a closed  enumerating the values actually used.

From this you can construct a target schema.

User commentary
Please sign all comments.

System requirements
Memory capacity is likely to be an issue for large corpuses. It's not going to read a giant corpus unless you have a great deal of memory to assign to Java. For situations like this, it is suggested that you construct a smaller corpus of representative sample documents and work with that. After generating a schema, you can validate your entire corpus, and each time you find an invalid document, add it to your smaller corpus and start again.

Source code and licensing
Oddbyexample was written by Sebastian Rahtz, and it is dual-licensed:

1. Distributed under a Creative Commons Attribution-ShareAlike 3.0 Unported License http://creativecommons.org/licenses/by-sa/3.0/

2. http://www.opensource.org/licenses/BSD-2-Clause

Support for TEI
Oddbyexample is not yet able to:
 * derive simplified content models (beyond what Roma already does)
 * add new elements and derive content models for them
 * deal with non-TEI namespaces
 * generate attribute datatypes with complex regexps not already specified in TEI specifications
 * create new Schematron constraints etc

Language(s)
XSLT

Documentation
The script assumes you have the TEI package which has a file called "/usr/share/xml/tei/odd/p5subset.xml". If you don't have that, grab http://www.tei-c.org/release/xml/tei/odd/p5subset.xml, put the file somewhere, and add a "defaultSource" parameter to point at it, as shown in the examples below. (Alternatively, you can check out the TEI source and generate p5subset.xml yourself, by running "make p5subset.xml" in the P5 directory.)

Here's a simple command to run Oddbyexample: saxon -it:main -o:my.odd oddbyexample.xsl corpus=/wherever/you/have/yourfiles/

This will produce a file called "my.odd" in the same directory as oddbyexample.xsl. The parameter  tells Saxon to start with the template called "main" in the oddbyexample.xsl file, and   provides the output filename for your odd file.

Many more parameters are documented in the oddbyexample.xsl file itself, but here is a short list of the most useful ones:

corpusList=path/to/my/files/?select=A*.xml

This parameter is passed to the XPath collection function, which accepts a regular expression after "select", so in this instance, it will process all files with names which begin with capital A and have a .xml extension.

verbose=true

This is useful if you want to see which files are actually being processed. For instance, if you've passed a regular expression in the corpusList parameter, you can turn on verbose mode to make sure it's actually processing the files you expected.

keepGlobals=true

By default, any global attributes such as  or   that are never used in the input files will be removed from the resulting ODD file. If you want to keep those attributes, you can set this parameter.

Lots of other parameters are documented in the oddbyexample.xsl file itself.

Tech support
No formal technical support is provided for Oddbyexample. If you post a question to the TEI-L list, though, other users may respond with help.

User community
Many members of the TEI community use Oddbyexample, but there is no formal community forum or mailing list.

Current version number, date of release and previous versions
Current and previous versions are all available through the TEIC GitHub repository. To see the current versions, go to the Git repo at ].

How to download or buy
Grab oddbyexample.xsl from GitHub (https://raw.github.com/TEIC/Stylesheets/master/tools/oddbyexample.xsl)