This utility attempts to work out the minimal TEI customization needed to validate a collection of files. The XSLT (version 2) stylesheet which traverses a nominated directory tree looking for *.xml files which have <TEI> or <teiCorpus> root elements. It analyzes the collection of elements and attributes in the resulting corpus, and compares that to the whole of TEI P5. An ODD file is generated which:
- loads the required modules
- deletes any elements which are not used
- deletes any attributes (including class attributes) which are not used by each element
- for every attribute which has a TEI "data.enumerated" datatype, constructs a closed <valList> enumerating the values actually used.
From this you can construct a target schema.
Please sign all comments.
Memory capacity is likely to be an issue for large corpuses. It's not going to read a giant corpus unless you have a great deal of memory to assign to Java. For situations like this, it is suggested that you construct a smaller corpus of representative sample documents and work with that. After generating a schema, you can validate your entire corpus, and each time you find an invalid document, add it to your smaller corpus and start again.
Source code and licensing
Oddbyexample was written by Sebastian Rahtz, and it is dual-licensed:
1. Distributed under a Creative Commons Attribution-ShareAlike 3.0 Unported License http://creativecommons.org/licenses/by-sa/3.0/
Support for TEI
Oddbyexample is not yet able to:
- derive simplified content models (beyond what Roma already does)
- add new elements and derive content models for them
- deal with non-TEI namespaces
- generate attribute datatypes with complex regexps not already specified in TEI specifications
- create new Schematron constraints etc
The script assumes you have the TEI package which has a file called "/usr/share/xml/tei/odd/p5subset.xml". If you don't have that, grab http://www.tei-c.org/release/xml/tei/odd/p5subset.xml, put the file somewhere, and add a "defaultSource" parameter to point at it, as shown in the examples below. (Alternatively, you can check out the TEI source and generate p5subset.xml yourself, by running "make p5subset.xml" in the P5 directory.)
Here's a simple command to run Oddbyexample:
saxon -it:main -o:my.odd oddbyexample.xsl corpus=/wherever/you/have/yourfiles/
This will produce a file called "my.odd" in the same directory as oddbyexample.xsl. The parameter
-it:main tells Saxon to start with the template called "main" in the oddbyexample.xsl file, and
-o:[filename] provides the output filename for your odd file.
Many more parameters are documented in the oddbyexample.xsl file itself, but here is a short list of the most useful ones:
This parameter is passed to the XPath collection() function, which accepts a regular expression after "select", so in this instance, it will process all files with names which begin with capital A and have a .xml extension.
This is useful if you want to see which files are actually being processed. For instance, if you've passed a regular expression in the corpusList parameter, you can turn on verbose mode to make sure it's actually processing the files you expected.
By default, any global attributes such as
@xml:space that are never used in the input files will be removed from the resulting ODD file. If you want to keep those attributes, you can set this parameter.
Lots of other parameters are documented in the oddbyexample.xsl file itself.
No formal technical support is provided for Oddbyexample. If you post a question to the TEI-L list, though, other users may respond with help.
Many members of the TEI community use Oddbyexample, but there is no formal community forum or mailing list.
Current version number, date of release and previous versions
Current and previous versions are all available through the TEIC GitHub repository. To see the current versions, go to the Git repo at [https://github.com/TEIC/Stylesheets/blob/master/tools/oddbyexample.xsl].
How to download or buy
Grab oddbyexample.xsl from GitHub (https://raw.github.com/TEIC/Stylesheets/master/tools/oddbyexample.xsl)