Oddbyexample

Synopsis
This utility attempts to work out the minimal TEI customization needed to validate a collection of files. The XSLT (version 2) stylesheet which traverses a nominated directory tree looking for *.xml files which have &lt;TEI&gt; or &lt;teiCorpus&gt; root elements. It analyzes the collection of elements and attributes in the resulting corpus, and compares that to the whole of TEI P5. An ODD file is generated which:


 * loads the required modules
 * deletes any elements which are not used
 * deletes any attributes (including class attributes) which are not used by each element
 * for every attribute which has a TEI "data.enumerated" datatype, constructs a closed  enumerating the values actually used.

From this you can construct a target schema.

User commentary
Please sign all comments.

System requirements
Memory capacity is an issue. It's not going to read a giant corpus without you have a big load of memory to assign to Java.

Source code and licensing
open source

Support for TEI
Limitations:
 * deriving simplified content models (beyond what Roma already does)
 * adding new elements and deriving a content model
 * dealing with non-TEI namespaces
 * generating attribute datatypes with complex regexps
 * working out Schematron constraints etc

Language(s)
XSLT

Documentation
The script assumes you have the TEI package which has a file called "/usr/share/xml/tei/odd/p5subset.xml". If you don't have that, grab http://www.tei-c.org/release/xml/tei/odd/p5subset.xml, put the file somewhere, and add a "tei" parameter to point at it.

Here's a sample command to run it: saxon -o my.odd oddbyexample.xsl oddbyexample.xsl corpus=/wherever/you/have/yourfiles/

How to download or buy
Grab getfiles.xsl and oddbyexample.xsl from Sourceforge (http://tei.svn.sourceforge.net/viewvc/tei/trunk/Stylesheets2/tools2/)