Difference between revisions of "Oddbyexample"

Revision as of 00:19, 12 October 2012

Synopsis

This utility attempts to work out the minimal TEI customization needed to validate a collection of files. The XSLT (version 2) stylesheet which traverses a nominated directory tree looking for *.xml files which have <TEI> or <teiCorpus> root elements. It analyzes the collection of elements and attributes in the resulting corpus, and compares that to the whole of TEI P5. An ODD file is generated which:

loads the required modules
deletes any elements which are not used
deletes any attributes (including class attributes) which are not used by each element
for every attribute which has a TEI "data.enumerated" datatype, constructs a closed <valList> enumerating the values actually used.

From this you can construct a target schema.

Features

User commentary

Please sign all comments.

System requirements

Memory capacity is likely to be an issue for large corpuses. It's not going to read a giant corpus unless you have a great deal of memory to assign to Java. For situations like this, it is suggested that you construct a smaller corpus of representative sample documents and work with that. After generating a schema, you can validate your entire corpus, and each time you find an invalid document, add it to your smaller corpus and start again.

Source code and licensing

open source

Support for TEI

Limitations:

deriving simplified content models (beyond what Roma already does)
adding new elements and deriving a content model
dealing with non-TEI namespaces
generating attribute datatypes with complex regexps
working out Schematron constraints etc

Language(s)

XSLT

Documentation

The script assumes you have the TEI package which has a file called "/usr/share/xml/tei/odd/p5subset.xml". If you don't have that, grab http://www.tei-c.org/release/xml/tei/odd/p5subset.xml, put the file somewhere, and add a "tei" parameter to point at it.

Here's a sample command to run it:

saxon -o my.odd oddbyexample.xsl oddbyexample.xsl corpus=/wherever/you/have/yourfiles/

Tech support

User community

Sample implementations

Current version number and date of release

History of versions

How to download or buy

Grab getfiles.xsl and oddbyexample.xsl from Sourceforge (http://tei.svn.sourceforge.net/viewvc/tei/trunk/Stylesheets/tools/oddbyexample.xsl)

Additional notes

@@ Line 24: / Line 24: @@
 == System requirements ==
-Memory capacity is an issue. It's not going to read a giant corpus without you have a big load of memory to assign to Java.
+Memory capacity is likely to be an issue for large corpuses. It's not going to read a giant corpus unless you have a great deal of memory to assign to Java. For situations like this, it is suggested that you construct a smaller corpus of representative sample documents and work with that. After generating a schema, you can validate your entire corpus, and each time you find an invalid document, add it to your smaller corpus and start again.
 == Source code and licensing ==

Difference between revisions of "Oddbyexample"

Revision as of 00:19, 12 October 2012

Contents

Synopsis

Features

User commentary

System requirements

Source code and licensing

Support for TEI

Language(s)

Documentation

Tech support

User community

Sample implementations

Current version number and date of release

History of versions

How to download or buy

Additional notes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools