Oddbyexample

Synopsis
This utility attempts to work out the minimal TEI customization needed to validate a collection of files. The XSLT (version 2) stylesheet which traverses a nominated directory tree looking for *.xml files which have &lt;TEI&gt; or &lt;teiCorpus&gt; root elements. It analyzes the collection of elements and attributes in the resulting corpus, and compares that to the whole of TEI P5. An ODD file is generated which:


 * loads the required modules
 * deletes any elements which are not used
 * deletes any attributes (including class attributes) which are not used by each element
 * for every attribute which has a TEI "data.enumerated" datatype, constructs a closed  enumerating the values actually used.

From this you can construct a target schema.

User commentary
Please sign all comments.

System requirements
Memory capacity is likely to be an issue for large corpuses. It's not going to read a giant corpus unless you have a great deal of memory to assign to Java. For situations like this, it is suggested that you construct a smaller corpus of representative sample documents and work with that. After generating a schema, you can validate your entire corpus, and each time you find an invalid document, add it to your smaller corpus and start again.

Source code and licensing
open source

Support for TEI
Oddbyexample is not yet able to:
 * derive simplified content models (beyond what Roma already does)
 * add new elements and derive content models for them
 * deal with non-TEI namespaces
 * generate attribute datatypes with complex regexps not already specified in TEI specifications
 * create new Schematron constraints etc

Language(s)
XSLT

Documentation
The script assumes you have the TEI package which has a file called "/usr/share/xml/tei/odd/p5subset.xml". If you don't have that, grab http://www.tei-c.org/release/xml/tei/odd/p5subset.xml, put the file somewhere, and add a "tei" parameter to point at it.

Here's a sample command to run it: saxon -o my.odd oddbyexample.xsl oddbyexample.xsl corpus=/wherever/you/have/yourfiles/

How to download or buy
Grab getfiles.xsl and oddbyexample.xsl from Sourceforge (http://tei.svn.sourceforge.net/viewvc/tei/trunk/Stylesheets/tools/oddbyexample.xsl)