Best Practices for TEI in Libraries


 * TEI in Libraries: Home

Introduction
These best practices are for libraries using the Text Encoding Initiative’s Guidelines for Text Encoding and Interchange (P5). They are intended for use in large, library-based digitization projects, but may be useful in other scenarios as well. Consult the full TEI Guidelines for guidance beyond what is described below.

There are many different library text digitization projects, for different purposes. With this in mind, these best practices are meant to be as inclusive as possible by specifying a series of encoding levels. These levels are meant to allow for a range of practice, from wholly automated text creation and encoding, to encoding that requires expert content knowledge, analysis, and editing. The encoding levels are not strictly cumulative: while higher levels tend to build upon lower levels by including more elements, higher levels are not supersets because some elements used at lower levels are not used at all at higher levels.

In brief, the encoding levels are:

These best practices specify a recommended archival storage format. Local system needs may require transformation of documents in this archival format to another XML format for use by a local indexing or delivery software.

In these best practices, use of elements and attributes tends toward explicitness for ease of processing even though a human or possibly machine reader might be able to make inferences based on context.

Relationship to TEI Tite
These best practices are meant to complement the TEI Tite customization of the TEI Guidelines. Whereas TEI Tite is meant for vendors who need exact specifications for encoding without room for interpretation or local practice, these best practices document how a library or other large-scale encoding project might create TEI documents that conform as closely as possible both to common TEI practice and to library standards yet still leave room for local approaches.

If a library uses TEI Tite for outsourced encoding, it should find that converting files from the TEI Tite format to a format conforming to these best practices is not difficult. Tite files may be converted to Level 3 with some loss of granularity and to Level 4 with the addition of some markup, which still amounts to minimal human intervention. The reason Level 3 does not contain as many elements as TEI Tite is to allow for use of this level, whether for mass digitization of born-digital source documents or for upgrading Level 1 or Level 2 texts, with only minimal human intervention.

For a comparison of the TEI Tite schema to these Best Practices, see TEI Tite's Appendix A.

Standards and local practice
The goal of the TEI is interchange, not interoperability. While seamless interoperability of texts created by different organizations is an unobtainable goal, use of a common markup vocabulary and syntax greatly aids interchange. Nevertheless, keep in mind that others&mdash;even within your organization&mdash;may use your texts in the future for other uses than you intended in your encoding.

An encoding project should strive for internal consistency and for use of standards so that the data can be modified or enhanced in the future with ease. In cases where local practice deviates from standards, there should at least be internal consistency in the local practice.

Transcription
When reformatting to digital media using any level of encoding, the electronic text should begin with the transcription of the first word on the first leaf of the original work. It may be impractical or undesirable to transcribe and encode certain features of the text, such as publisher’s advertisements or indexes, but if at all possible, they should be included as links to page images. Any omissions of material found in the original work should be noted in the &lt;editorialDecl&gt; in the TEI header.

Hyphenation
Encoding end-of-line, end-of-column, and end-of-page hyphenation varies considerably in the TEI community. Some capture all hyphens found on the printed page, while others remove those in the middle of words not normally hyphenated for easier implementation of full-text retrieval. If preserving hyphens, some will capture all hyphens using the same character, while others will distinguish hyphens that must be present in any case (often called "hard hyphens") and those that are only present by virtue of being at the end of a line, column, or page (often called "soft hyphens").

This issue is complicated by the fact that Unicode presribes use of a soft hyphen not for a visible hyphen that might have been absent but instead for a place where a hyphen might occur. Furthermore, it includes a "non-breaking hyphen", used in cases like "re-creation" (meaning "to create again", as opposed to recreation, meaning "relaxation"), in addition to a regular hyphen, which would normally count as a word boundary. In short, Unicode is oriented toward electronic text that may be processed with a computer in various ways, not toward capturing source documents.

Since OCR software relies on dictionaries to determine the probability not simply of characters but of whole words, it is often able to handle hyphenation in different ways.

At Levels 1 and 2, no attempt should be made to remove hyphens from the source document or disambiguate hard and soft hyphens. Encode all hyphens appearing in the source document using character U+002D.

At Level 3, either encode hyphens as in Levels 1 and 2 or disambiguate soft and hard hyphens. In the latter case, soft hyphens should be encoded using U+00AD and hard hyphens using U+2010.

At Levels 4 or 5, hyphens must be disambiguated using U+00AD for soft hyphens and U+2010 for hard hyphens.

In any case, the hyphenation practice must be recorded in the editorialDecl.

Do not confuse the following characters with hyphens:


 * en dash (U+2013)
 * em dash (U+2014)
 * minus sign (U+2212)

Filenames
A filename scheme that is internally consistent should be established for the project.

If it is likely that the files will need to be used on more primitive devices (MS-DOS computers or unextended ISO 9660 CDs) it may be useful to limit names to 8 characters (limited to the 26 lower case letters of ASCII, digits, hyphens, and underscore), a dot, and an extension of 3 alphanumeric characters. Likewise, if you will access files using a version of Apple Filing Protocol (AFP) before 3.0, filenames longer than 31 bytes are likely to be corrupted, so you may wish to limit filenames to 31 single-byte (e.g., ASCII) characters.

Otherwise, consider the following best practices when determining the file name scheme for your project:


 * Each filename should contain an identifier that uniquely specifies a single digital object within the parent collection (e.g., a parent collection of text, images and other related materials)
 * Each filename should be fully specified. It should not just be a sequence number that is dependent on location within a directory structure for context
 * Filenames should not include spaces
 * Filenames should follow predictable case constructions (e.g., all lowercase, camelCase, etc.)
 * The first character of the filename should be an ASCII letter ('a' through 'z' or 'A' through 'Z') to comply with current restrictions on identifiers by many programming and XML-based metadata languages
 * The "base" filename may include only ASCII letters ('a' through 'z' and 'A' through 'Z'), ASCII digits ('0' through '9'), hyphens, underscores, and periods. Refrain from using other characters and limit period usage to only once (to separate base name from file extensions).

URIs
A number of attributes take a URI (Uniform Resource Identifier) as their value. Note that in addition to the full form of reference defined by URI syntax, these attributes can take a relative reference (e.g., filename.ext) or a fragment identifier (e.g., #foo).

Text divisions
An encoding project should use only numbered divisions (i.e., &lt;div1&gt;, &lt;div2&gt;, etc.) or unnumbered divisions (i.e., &lt;div&gt;) but not both. This applies both within a TEI document (i.e., within &lt;front&gt;, &lt;body&gt;</tt>, &lt;back&gt;</tt>, even if nested within &lt;group&gt;</tt> or &lt;floatingText&gt;</tt>) and across TEI documents in any given collection. Keep in mind that numbering of &lt;div&gt;</tt>s starts over (at &lt;div1&gt;</tt>) within &lt;floatingText&gt;</tt>, so any software that expects to process nested numbered divisions within a document will need to account for this.

The choice of numbered or unnumbered divisions must be documented with the tagUsage</tt> element in the header. See 4.6, Element Recommendations for the TEI Header, below.

Whether numbered or unnumbered divisions are used, the type</tt> attribute of the division element is not recommended at Level 1 (because only one encoded division in the text exists), is optional at Level 2 (because the division-level metadata need not classify these divisions), is recommended at Level 3 (for broad yet useful analysis of text divisions), and required at Levels 4 and 5 (for full analysis of the text structure).

Page breaks
Page breaks should be encoded using the pb</tt> element, with the value of the n</tt> attribute denoting the number of the page whose text follows this element. The pb</tt> element should always be contained within a text division for ease of retrieval with indexing software. For example, a page break that occurs between chapters 2 and 3 should be encoded soon after the &lt;div></tt> that opens chapter 3 (rather than before the &lt;/div></tt> that ends chapter 2).

Linking between encoded text and images of source documents
There are three recommended mechanisms for linking between the encoded text and facsimile page images of source documents. Projects may either:

<ol style="list-style-type: upper-alpha;"> <li>Use the facs</tt> attribute on each pb</tt> element to point to the corresponding page image using a URI.</li> <li>Use the <tt>facsimile</tt> element to define a set of images that corresponds to the text in conjunction with the <tt>facs</tt> attribute on each <tt>pb</tt> element to point to the corresponding page image using a URI. <li>Use the <tt>xml:id</tt> attribute on each <tt>pb</tt> element and a METS document to provide correspondence between <tt>pb</tt> elements and one or more facsimile page images (e.g., master, web derivatives, etc.).</li> </ol>

For those projects relying on the Metadata Encoding and Transmission Standard (METS), the <tt>xml:id</tt> attribute is used as a conceptual identifier for content as opposed to an explicit pointer (i.e., <tt>facs</tt> attribute) to a specific representation of that content. These identifiers are then used to generate a METS document that bundles the various content types (e.g., master image files, derivative image files for Web delivery, PDFs, etc.), explicitly lists all versions of the content, and defines the relationships between the constituent parts. This is achieved through the use of the <tt><mets:fileSec></tt> and <tt><mets:structMap></tt> sections of the METS document (see sample METS document for a TEI project).

General Guidelines for Attribute Usage
Some general advice on the use of particular attributes follows. (All of these attributes are commonly used on various elements, but not every element requires or even allows these attributes.)

type
Constructing a list of acceptable attribute values for the <tt>type</tt> attribute for each element, on which everyone could agree, is impossible. Instead, it is recommended that projects describe the <tt>type</tt> attribute values used in their texts in the project ODD file and that this list be made available to people using the texts. It is worth noting that, at present, Roma, the web front-end editor for ODD files, does not have a mechanism for providing this documentation — it should be added to the ODD file directly. For a list of standard names and definitions of bibliographic features of printed books, see ABC for Book Collectors by John Carter (8th edition, New Castle, Del. and London: Oak Knoll Books and the British Library, 2004, available online at http://www.ilab.org/images/abcforbookcollectors.pdf).

n
This attribute is sometimes used to number elements for machine processing, but it often includes data represented in the source document, such as page numbers or footnote numbers. Example: <tt>&lt;pb n="456"/&gt;</tt>

key and ref
These attributes are both available on a variety of elements including <tt>&lt;name></tt>, <tt>&lt;persName></tt>, <tt>&lt;author></tt>, and <tt>&lt;title></tt>. They are used to reference external metadata about the content of the element. The <tt>key</tt> attribute may contain any string of Unicode characters, whereas the <tt>ref</tt> attribute must contain a URI (including a relative one, as discussed above). While <tt>key</tt> may supply any identifier, there is no mechanism internal to XML for checking that the value of this attribute is valid.

For example,

gives a project-specific key (in this case <tt>lccn-n78-95332</tt>) for this name in the Library of Congress Name Authority File. Values of <tt>key</tt> attributes may be partially explained in a non-machine-readable way through the use of a <tt>taxonomy</tt> element:

Alternatively, use <tt>ref</tt> with a URI fragment identifier, corresponding to the value of <tt>xml:id</tt> given elsewhere. For example, in the transcription of the text, use

which would be defined in a controlled vocabulary elsewhere:

<listPlace> <place xml:id="tgn_7012924"> <placeName> Indianapolis Indiana </placeName> </listPlace>

Readily available software can then check when it encounters <tt>ref="#tgn_7012924"</tt> that <tt>xml:id="tgn_7012924"</tt> exists elsewhere in the document.

In general we recommend using <tt>ref</tt> when the metadata object being referenced is accessible via a URI (e.g., is on the web), and <tt>key</tt> when it is not. We recommend against using both attributes on the same instance of an element.

rend and rendition
The <tt>rend</tt> and <tt>rendition</tt> attributes may be used when it is desirable to record information about how the textual feature was displayed in the source document.

Never use these attributes on header elements: metadata is transcribed and possibly regularized, as in a catalog record, but its exact appearance is not meant to be captured.

If a project is normalizing the rendering of text objects (for example, such that all titles should be italicized, regardless of how they appeared in the source document), there is no need to use these attributes; instead, a stylesheet will determine that all titles are displayed in italics.

However, if a project is faithfully recording the rendering in the source document, one of these attributes should be used to indicate this rendering, either on all elements to be rendered differently from the surrounding text or on all elements whose rendering does not follow the default stylesheet.

For the value of the <tt>rend</tt> attribute, use only valid CSS properties and values. For example:


 * <tt>&lt;foreign rend="font-style: italic"&gt;</tt>
 * <tt>&lt;title rend="text-decoration: underline; font-size: x-large"&gt;</tt>

Alternatively, use the <tt>rendition</tt> attribute to give an internal scheme:

<tt>&lt;foreign rendition="#i"&gt;</tt>

documented with the <tt>rendition</tt> element in the header:

<tt>&lt;rendition xml:id="i" scheme="css"&gt;font-style: italic&lt;/rendition&gt;</tt>

Use of the <tt>rendition</tt> attribute and element offers an additional level of indirection, decreasing the total number of keystrokes and possibly reducing the chance of typos being introduced in the encoding.

xml:lang
Used to indicate the natural language of the content of an element. It is generally not used for children of the <tt>text</tt> element at Level 1 or Level 2 but is common at Level 3 and above. See the data.language datatype in the TEI Guidelines.

Structure of a TEI Document
The child elements of the <tt>teiHeader</tt> and <tt>text</tt> elements are described below.

Reference

 * Chapter 2, TEI Header, P5 Guidelines

The TEI Header
The TEI header is a metadata record for an encoded text. It includes bibliographic information related to the electronic document and, if appropriate, the bibliographic data for the original analog source document from which the electronic edition was created. The TEI header often includes a description of the encoding decisions or practices used to create the electronic document. While TEI Lite calls the header "the electronic title page", it actually more closely resembles a catalog record with additional data not routinely stored in MARC records.

As with any descriptive metadata, the metadata in the TEI header can serve multiple audiences. In the local context, a TEI header provides metadata about the TEI document, its source, and its provenance. The TEI header may be used for metadata exchange, to automatically create indexes (author lists, title lists) for a collection of TEI documents, and to aid in browsing heterogeneous TEI documents. TEI headers may also be used as a basis for other metadata records (such as MARC or Dublin Core), though generation of other formats may require human intervention because they often are more granular, or have different granularity, than TEI headers.

The TEI Header and MARC
While a TEI header is often perceived as similar to or at least related to a MARC record, a TEI header does not typically have a one-to-one correspondence with a MARC record. One TEI header may be described by multiple MARC analytic records, or one MARC record may be used to describe a collection of TEI documents with individual headers. Furthermore, while a MARC record captures metadata about a bibliographic entity in a library’s collection, a TEI header records information both about an encoded text and about the source document for that encoded text.

Each institution and even each project may have a different approach to the way electronic texts are created in TEI and then represented in a larger public catalog through MARC. At one institution, the same unit (e.g., a cataloging department) may be responsible for creating both TEI Headers and MARC records, while at other institutions the work may be distributed among different units. Within the library domain, metadata or cataloging experts are usually required for at least review and standardization of both the TEI header and the MARC record.

In order to allow automatic generation of TEI headers from MARC records and MARC records from TEI headers, some elements (like <tt>&lt;author&gt;</tt>) contain content not typical for TEI practice but necessary due to a lack of granularity in the MARC format.

The TEI Header and Other Metadata Schemas
Several other descriptive metadata schemas are prevalent within the library domain, including Dublin Core (DC), Dublin Core Qualified (DCQ), and the Metadata Object Description Schema (MODS). Each of these schemas contains elements that capture the same data as many of the elements in the TEI header. As with MARC, a variety of automated or manual workflows can be implemented to crosswalk metadata from one standard to another and provide for increased sharing of metadata about electronic texts in larger contexts. In particular, DC and MODS are common schemas used by the Open Archives Initiative (OAI) and may be particularly valuable for sharing metadata across institutions.

Unfortunately, there is currently no mechanism for specifying that the content of an element should be drawn from an outside metadata source or that this outside metadata source should supplement the content of the element. In the absence of such mechanisms, users of these best practices may use the <tt>idno</tt> element to supply identifiers for outside metadata records and may supply identifiers for certain authority records using the <tt>key</tt> or <tt>ref</tt> attributes, allowed on certain elements.

Determining Data Values for the TEI Header
Within the library domain, there are several authoritative publications on how to create bibliographic and descriptive metadata for objects. These are usually called “content standards”; two prominent examples are the Anglo-American Cataloging Rules Second Edition (AACR2) and the International Standard Bibliographic Description for Electronic Resources (ISBD(ER)). These standards are extensive and outline a set of rules that enforce consistency across a voluminous amount of metadata.

It is recommended that metadata about the source document included in the header be taken from the catalog record for the source document. However, there may be cases when this information is incomplete or insufficient. Furthermore, creation of other TEI header elements may require more context than is available simply from the encoded text. But the analog object may not be available, so the TEI header creator will need access to digitized images or other verifiable information to create accurate metadata.

The following sources of information are recommended in creating the TEI header:

<ol> <li> For an electronic document with a digitized title page and title page verso, the chief source of information is the information coded as the title page and title page verso. Use other sources of information from a physical source document if absolutely certain that it is the source.</li> <li>If there is no digitized title page but the header creator knows the physical source document from which it was derived, the header creator should refer to that source document for metadata creation. Note that a lack of a title page may be for one of many reasons: for example, the original document is a manuscript item, or the electronic edition is a portion of the original object (a poem or short story that was published in a collection or an article from a serial). In all cases, it is recommended that important bibliographic evidence, such as a digitized image of the title page and title page verso for a collection, be provided to the header creator, even if just a piece of the collection is used.</li>

<li>If no title page is present and there is no evidence from a source document, the header creator may assign a title and author, if appropriate, enclosing the information in square brackets (the standard English-language convention for editorial interjections).</li> </ol>

Element Recommendations for the TEI Header
Gray boxes in the source document column indicate that while the corresponding TEI element describes the TEI document, the value of this field is often derived from metadata about the source document, to be found in the MARC fields listed.


 * 100
 * 110
 * 111


 * 700
 * 710
 * 711

│  │   ├   <tt>&lt;title level="_" type="_"&gt;</tt> At least one <tt>title</tt> element is required for the title of the source document. Transcribe the title according to the national cataloging code.

The <tt>level</tt> attribute is used as in the main TEI Guidelines.

Use of the <tt>type</tt> attribute is required. It may have any of the following values as suitable in local practice:
 * <tt>main</tt>
 * <tt>sub</tt>
 * <tt>alt</tt>
 * <tt>short</tt>
 * <tt>desc</tt>
 * <tt>translated</tt>
 * <tt>marc245a</tt> (used for the title proper and alternative title according to the national cataloging code)
 * <tt>marc245b</tt> (used for the the remainder of the title information -- parallel titles, titles subsequent to the first, and other title information -- according to the national cataloging code)
 * <tt>marc245c</tt> (used for the statement of responsibility according to the national cataloging code)
 * <tt>uniform</tt> (used for a uniform title according to the national cataloging code)
 * 130
 * 240
 * 245 $a,$b
 * 246

│  │   ├   <tt>&lt;respStmt&gt;</tt> Statement of responsibility on the source document, according to the national cataloging code. Record one responsibility or party per <tt>&lt;respStmt&gt;</tt>. Each <tt>&lt;respStmt&gt;</tt> must contain either: <ul> <li>one <tt>&lt;resp></tt> element followed by one or more <tt>&lt;name></tt> (or <tt>&lt;persName></tt> or <tt>&lt;orgName></tt>) elements</li> <li>one or more <tt>&lt;name></tt> (or <tt>&lt;persName></tt> or <tt>&lt;orgName></tt>) elements followed by one <tt>&lt;resp></tt> element</li> </ul> Whenever possible, establish or use the form of the name from a national name authority file.

If generating the <tt>&lt;sourceDesc&gt;</tt> from a MARC record, it will be difficult to split the content of the 245c field into <tt>resp</tt> and <tt>name</tt> elements, so it is recommended to use <tt>&lt;title type="marc245c"&gt;</tt> instead of this element. 245 $c 245 $c

│  │   ├   <tt>&lt;edition&gt;</tt> Edition statement (if present). 250

│  │   ├   <tt>&lt;imprint&gt;</tt> n/a n/a

│  │  │   ├   <tt>&lt;pubPlace&gt;</tt> Place of publication from the original source (if present). It is recommended but not required to remove ISBD punctuation for separating areas of the bibliographic description (such as a colon) when deriving from a MARC record. However, leave brackets that indicate supplied information or an abbreviation like "S.l." (for no place of publication). 260 $a

│  │  │   ├   <tt>&lt;publisher&gt;</tt> Name of publisher, distributor, etc. from the original source (if present). It is recommended but not required to remove ISBD punctuation for separating areas of the bibliographic description (such as a comma) when deriving from a MARC record. However, leave brackets that indicate supplied information or an abbreviation like "s.n." (for no publisher). 260 $b

│  │  │   └   <tt>&lt;date when="____"&gt;</tt> Date of publication, distribution, etc. from the original source (if present). The content of the element is the statement of this data according to the national cataloging code. Use the <tt>when</tt> attribute (see att.datable.w3c class) to aid machine processing. 260 $c

│  │   └   <tt>&lt;extent&gt;</tt> Use of this element to describe the extent of the source document is recommended. If the data is generated by hand, it should include a comprehensible statement of the size of the item, such as the number of pages or leaves. If generated from a catalog record, there should be two <tt>&lt;extent&gt;</tt> elements: one for the extent of the item (e.g., number of pages) and other physical details, and a second one for the dimension(s). Both should be recorded according to a national cataloging code. 300

│   ├   <tt>&lt;series&gt;</tt> Name of the series to which the source document belongs. If generating this data from a catalog record, it is likely that you will have only one child element: a <tt>title</tt>.
 * 4xx
 * 8xx

│   ├   <tt>&lt;note&gt;</tt> Optionally, use for notes about the source document, according to a national cataloging code. 5xx

│  └  <tt>&lt;idno&gt;</tt> Optionally use one or more <tt>idno</tt> elements to give identification numbers for the source document, text, or work, whether assigned by the holding library (such as a call number), the publisher of the original document (such as an ISBN), or a standard bibliography (such as an identifier from the Short Title Catalogue or Books in Maori ). Use the following values for the <tt>type</tt> attribute if applicable, and create other values if appropriate:


 * <tt>LC_call_number</tt>
 * <tt>isbn-13</tt>
 * <tt>isbn-10</tt>
 * 015
 * 016
 * 020
 * 024
 * 025
 * 027
 * 028
 * 029
 * 035
 * 050-099

├  <tt>&lt;encodingDesc&gt;</tt> n/a n/a

│  ├  <tt>&lt;projectDesc&gt;&lt;p&gt;</tt> Enter a description of the purpose for which the electronic file was encoded. 500 n/a

│   ├   <tt>&lt;editorialDecl n="_"&gt;</tt> Use the <tt>n</tt> attribute to record the encoding level: <tt>1</tt> for Level 1, <tt>2</tt> for Level 2, etc.

Include one or more <tt>p</tt> elements as children with information on:


 * editorial decisions made during encoding
 * notes about omissions of material found in the original work
 * the format of the data in the header: Does the data in the <tt>&lt;sourceDesc&gt;</tt> follow AACR rules? How about in the <tt>&lt;fileDesc&gt;</tt>? Is ISBD punctuation included?
 * automated processes used to generate the markup or content
 * external files or databases (such as those containing authority data) referenced in the TEI document

Also include one of the following <tt>p</tt> elements as appropriate:

<tt>&lt;p&gt;All hyphens in source document encoded as U+2010.&lt;/p&gt;</tt>

<tt>&lt;p&gt;Soft hyphens encoded as U+00AD; hard hyphens as U+2010.&lt;/p&gt;</tt> n/a
 * 500 for content of p element
 * 856 $z, which includes boilerplate text depending on encoding level and how the TEI document is presented to the user (as page images, text, or both)

│   ├   <tt>&lt;tagsDecl&gt;</tt> n/a n/a

│   │    ├   <tt>&lt;rendition xml:id="_" scheme="css"</tt> Include one or more <tt>rendition</tt> elements for each unique value of a <tt>rendition</tt> attribute (not <tt>rend</tt> attribute) used in the body of the TEI document. n/a n/a

│   │    └   <tt>&lt;namespace name="http://www.tei-c.org/ns/1.0"&gt;&lt;tagUsage&gt;</tt> <tt>&lt;tagUsage&gt;</tt> must be one of the following: n/a n/a

│  └  <tt>&lt;classDecl&gt;&lt;taxonomy xml:id="____"&gt;&lt;bibl&gt;</tt> Use to document classification schemes used in the header or body of the TEI document. For example: 050-099 for call number classification schemes

6xx 2nd indicator or 6xx $2 when 2nd indicator = 7 for subject classification schemes <td style="background-color: silver"> 050-099 for call number classification schemes

6xx 2nd indicator or 6xx $2 when 2nd indicator = 7 for subject classification schemes

├  <tt>&lt;profileDesc&gt;</tt> n/a n/a

│  ├   <tt>&lt;langUsage&gt;</tt> Optionally use this element and child <tt>language</tt> elements to list languages used in the text. This supplements the <tt>xml:lang="___"&gt;</tt> attribute on the <tt>text</tt> (which is outside the header) in cases where more than one language is used in the text. It is not expected that the <tt>langUsage</tt> element will contain any description of language usage. 008/35-37 n/a

│ │   └   <tt>&lt;language ident="___"&gt;</tt> Use one or more <tt>language</tt> elements to indicate language(s) used in the source document. The <tt>ident</tt> attribute is usually sufficient to indicate the language, so this element should normally have no content. In the unusual case where <tt>ident</tt> is insufficient, provide additional information on the language as content of the element.
 * 041
 * 546
 * 041
 * 546

│ └  <tt>&lt;textClass&gt;</tt> n/a n/a

│   ├   <tt>&lt;classCode scheme="___"&gt;</tt> True classification numbers as opposed to call numbers may be entered here. The value of the scheme attribute corresponds to a classification scheme defined previously in <tt>&lt;classDecl&gt;</tt>. Example: 050-099 <td style="background-color: silver">050-099

│  └  <tt>&lt;keywords scheme="____"&gt;</tt> Repeat this element as many times as there are keyword schemes. If the child <tt>term</tt> elements contain terms from a controlled vocabulary, indicate that controlled vocabulary through the scheme attribute. The value of the scheme attribute corresponds to a classification scheme defined previously in <tt>&lt;classDecl&gt;</tt>. Example: 6xx 2nd indicator or 6xx $2 when 2nd indicator = 7 6xx 2nd indicator or 6xx $2 when 2nd indicator = 7

│  └  <tt>&lt;term&gt;</tt> Use for terms from controlled or uncontrolled vocabularies, as indicated in the parent <tt>keywords</tt> element. 6xx 6xx

└ <tt>&lt;revisionDesc&gt;</tt> n/a n/a

└ <tt>&lt;change when="YYYY-MM-DD" who="URI"&gt;</tt> Create a <tt>change</tt> element to record each significant change to the TEI document, in reverse chronological order (i.e., most recent first). A prose description of the change is recorded as the content of each <tt>change</tt> element. This prose may contain lists for organization, and phrase-level markup (like <tt>&lt;gi&gt;</tt>, <tt>&lt;ptr&gt;</tt>, or <tt>&lt;date&gt;</tt>), but not paragraphs.

The date of the change should be recorded using the <tt>when</tt> attribute ((see att.datable.w3c class).

The person who is responsible for making the change is indicated by the <tt>who</tt> attribute of <tt>&lt;change&gt;</tt>. Its value is a URI that points to a <tt>&lt;respStmt&gt;</tt> or <tt>&lt;person&gt;</tt> element that encodes information about the responsible party. Note that this reference is a URI reference and not an ID/IDREF reference, and thus is not checked by validation software. Small projects sometimes take advantage of this by putting information into the URI itself, and not having a <tt>respStmt</tt> or <tt>person</tt> element. For example, the document might simply give <tt>who="#Jane_Smith"</tt>, relying on human readers to understand this reference. n/a n/a

* Use only if TEI header metadata is based on the source document, not the encoded text.

Sample TEI Header
<teiHeader xml:lang="en"> <fileDesc> <titleStmt> Lincoln and Seward. <persName>Welles, Gideon, 1802-1878.</persName> </titleStmt> <publicationStmt> University of Michigan, Digital Library Initiatives These pages may be freely searched and displayed. Permission must be received for subsequent distribution in print or electronically. Please go to           http://www.umdl.umich.edu/ for more information. </publicationStmt> <seriesStmt> <title level="s" type="main">Making of America </seriesStmt>

Level 1 Example: Alger Hiss document
XML comments (such as <tt>&lt;!-- uncorrected OCR for first page image begins here --&gt; </tt>) in this and later examples are illustrative but are not meant to be included in encoded documents.

<TEI xml:id="someid" xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader xml:lang="en"> </teiHeader> <text xml:lang="en"> <ab> <pb n="113" facs="00000001.tif"/>

POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; ALTERNATIVELY, DISCOVERY AND A HEARING SHOULD BE ORDERED. The nature and extent of surveillance of Hiss, his family and associates was not known at the time of trial by the defense. Even now, with the release of some of the govern‐ ment documents concerning FBI investigative techniques regarding Hiss, the full extent of surveillance -- wiretapping, mail open‐ ings, mail covers, physical surveillance, and other intrusive techniques -- is still not 'clear. Nevertheless, it is apparent that information gathered through the exploitation of unlawful wiretaps and other illegal surveillance was used at trial and consequently the conviction must be reversed. Alternatively, further discovery and a hearing is essential to a fair deter‐ mination regarding these issues. FBI surveillance of Hiss began in earnest in 1941 with the institution of a mail cover on his incoming correspondence at his home in connection with an FBI investigation of possible Hatch Act violations. CN Ex. 98A. Another mail cover was placed

-113 -

<pb n="114" facs="00000002.tif"/>

on the Hiss mail in 1945, and at the same time the FBI obtained toll call records from the Hiss residence Telephone for the years 1943 and 1944 as well. CN Ex. 99. In September, 1945, the FBI intercepted telegrams to Hiss as well. CN Ex. 100. In late November, 1945, FBI surveillance of the Hiss residence in Washington, D.C., escalated. For the third time, a mail cover was instituted beginning on November 28, 1945, which was continued at least until 1946. CN Ex. 101 at p. 70; CN Ex. 102. Continuous physical surveillance of Hiss was begun as well. CN Ex. 101 at p. 72. Although this twenty-four-hour surveillance was discontinued on December 14, 1945, physical surveillance was conducted frequently at various times until September, 1947. CN Ex. 102; CN Ex. 103. The most intrusive invasion of petitioner's rights 68/ Also before 1947, a letter from Priscilla Hiss addressed to her son, Timothy Hobson, was intercepted and its contents read. CN Ex. 100A at p. 167. In approximately March, 1947, a letter from a Michael Greenberg addressed to petitioner re‐ garding an application for employment with the United Nations was also intercepted, in a manner not revealed by the docu‐ ments. CN Ex. 100B

-114 -

<pb n="115" facs="00000003.tif"/>

occurred from December 13, 1945 until the Hisses moved from Washington, D.C. to New York City on September 13, 1947. A "technical surveillance," -- a wiretap -- was placed on the Hiss telephone at their residence on P Street-in Washington, D.C. The logs of this surveillance constitute twenty-nine volumes of FBI serials and are roughly 2,500 pages in length, in which an enormous amount of information concerning the Hisses' per‐ sonal lives, relationships with friends and associates, and habits is recorded. The wiretap was installed following FBI Director Hoover's application to the Attorney General for authorization, although no written authorization appears in the documents released to Hiss. The purpose of the application was to gather information regarding Hiss' alleged contacts with Soviet espionage agents and communists in government service, general allegations which had been made by Elizabeth Bentley and Chambers. As one would expect, the interception of every telephone h9/      Hoover's initial request was answered by a note reques‐ ting information on Hiss. CN Ex. 104. Additional information was furnished by letter dated November 30, 1945. CN Ex. 105.

-115 -

</ab> </TEI>

Reference

 * Chapter 3, Elements Available in All TEI Documents
 * Chapter 4, Default Text Structure

Purpose
To create electronic text for full-text searching, linking to page images, and identifying simple structural hierarchy to improve navigation. (For example, you can create a table of contents from such encoding.)

Rationale
The text is mainly subordinate to the page image, though navigational markers (textual divisions, headings) are captured. However, the text could stand alone as electronic text (without page images) if the accuracy of its contents is suitable to its intended use and it is not necessary to display low-level typographic or structural information. Level 2 requires a set of elements more granular than those of Level 1, including bibliographic or structural information below the monographic or volume level. One of the motivations for using Level 2 is to avoid expensive analysis of textual elements and/or the expense of accurate text conversion, e.g., double-keying or detailed proofreading of automatic OCR.

For the most part, Level 2 texts are not intended to be displayed separately from their page images. Level 2 encoding of sections and headings provides greater navigational possibilities than Level 1 encoding, and enables searching to be restricted within particular textual divisions (for example, searching for two phrases within the same chapter).

Level 2 is most suitable for projects in which:


 * a large volume of material is to be made available online quickly
 * a digital image of each page is desired
 * the material is of interest to a large community of users who wish to read texts that allow keyword searching
 * rudimentary search and display capabilities based on the large structures of the text are desired
 * each text is checked to ensure that divisions and headers are properly identified
 * extensibility is desired; that is, one desires to keep open the option for a higher level of encoding to be added at a later date

Workflow
Level 2 generally can be created and encoded by automated means. Pagination is identified as in Level 1, and metadata for the <tt>&lt;div&gt;</tt> structure is created, likely based on the page images. The <tt>&lt;div&gt;</tt> structure metadata might contain the page number on which the division begins and a transcription of that <tt>&lt;div&gt;</tt>’s heading. This metadata is inserted into the raw OCR at the appropriate points, forming a valid XML document. Level 2 texts do not require any special knowledge or manual intervention below the section level.

Element Recommendations for Level 2
Use all elements specified in Level 1 plus the following:

Level 2 Basic Structure
<TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader xml:lang="en"> </teiHeader> <text xml:lang="en"> [title page information, table of contents, prefaces, etc.] [optional] <pb n="1" facs="[URI of page 1 image]"/> [heading of section 1] <ab>[entire contents of section 1 here, with interspersed &lt;pb> elements pointing to page images; in this example there are 26 more pages to section 1]</ab> <pb n="27" facs="[URI of page 27 image]"/> [heading of section 2 subsection 1] <ab>[all the paragraphs of subsection one go here with page breaks inserted]</ab> [optional] </TEI>

Level 2 Alger Hiss document
<TEI xml:id="someid" xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader xml:lang="en"> </teiHeader> <text xml:lang="en"> <pb n="113" facs="00000001.tif"/> POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; ALTERNATIVELY, DISCOVERY AND A HEARING SHOULD BE ORDERED. <ab>

POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; ALTERNATIVELY, DISCOVERY AND A HEARING SHOULD BE ORDERED. The nature and extent of surveillance of Hiss, his family and associates was not known at the time of trial by the defense. Even now, with the release of some of the govern‐ ment documents concerning FBI investigative techniques regarding Hiss, the full extent of surveillance -- wiretapping, mail open‐ ings, mail covers, physical surveillance, and other intrusive techniques -- is still not 'clear. Nevertheless, it is apparent that information gathered through the exploitation of unlawful wiretaps and other illegal surveillance was used at trial and consequently the conviction must be reversed. Alternatively, further discovery and a hearing is essential to a fair deter‐ mination regarding these issues. FBI surveillance of Hiss began in earnest in 1941 with the institution of a mail cover on his incoming correspondence at his home in connection with an FBI investigation of possible Hatch Act violations. CN Ex. 98A. Another mail cover was placed

-113 -

<pb n="114" facs="00000002.tif"/>

on the Hiss mail in 1945, and at the same time the FBI obtained toll call records from the Hiss residence Telephone for the years 1943 and 1944 as well. CN Ex. 99. In September, 1945, the FBI intercepted telegrams to Hiss as well. CN Ex. 100. In late November, 1945, FBI surveillance of the Hiss residence in Washington, D.C., escalated. For the third time, a mail cover was instituted beginning on November 28, 1945, which was continued at least until 1946. CN Ex. 101 at p. 70; CN Ex. 102. Continuous physical surveillance of Hiss was begun as well. CN Ex. 101 at p. 72. Although this twenty-four-hour surveillance was discontinued on December 14, 1945, physical surveillance was conducted frequently at various times until September, 1947. CN Ex. 102; CN Ex. 103. The most intrusive invasion of petitioner's rights 68/ Also before 1947, a letter from Priscilla Hiss addressed to her son, Timothy Hobson, was intercepted and its contents read. CN Ex. 100A at p. 167. In approximately March, 1947, a letter from a Michael Greenberg addressed to petitioner re‐ garding an application for employment with the United Nations was also intercepted, in a manner not revealed by the docu‐ ments. CN Ex. 100B

-114 -

<pb n="115" facs="00000003.tif"/>

occurred from December 13, 1945 until the Hisses moved from Washington, D.C. to New York City on September 13, 1947. A "technical surveillance," -- a wiretap -- was placed on the Hiss telephone at their residence on P Street-in Washington, D.C. The logs of this surveillance constitute twenty-nine volumes of FBI serials and are roughly 2,500 pages in length, in which an enormous amount of information concerning the Hisses' per‐ sonal lives, relationships with friends and associates, and habits is recorded. The wiretap was installed following FBI Director Hoover's application to the Attorney General for authorization, although no written authorization appears in the documents released to Hiss. The purpose of the application was to gather information regarding Hiss' alleged contacts with Soviet espionage agents and communists in government service, general allegations which had been made by Elizabeth Bentley and Chambers. As one would expect, the interception of every telephone h9/      Hoover's initial request was answered by a note reques‐ ting information on Hiss. CN Ex. 104. Additional information was furnished by letter dated November 30, 1945. CN Ex. 105.

-115 -

</ab> </TEI>

Reference

 * Chapter 4, Default Text Structure, P5 Guidelines
 * Chapter 6, Verse, P5 Guidelines
 * Chapter 14, Tables, Formulæ, and Graphics, P5 Guidelines
 * Chapter 16, Linking, Segmentation, and Alignment, P5 Guidelines (for handling notes)

Purpose
To create a stand-alone electronic text and identify hierarchy (logical structure) and typography without content analysis being of primary importance.

Rationale
Encoding at this level offers provides the foundation for upgrading to higher levels of encoding. Level 3 generally requires some human editing, but the features to be encoded are determined by the logical structure and appearance of the text and not specialized content analysis.

Level 3 texts identify front and back matter, divisions within the text, and all paragraph breaks. Floating texts, or sub-texts like a poem or letter embedded in the greater text, are supported in this level. The finer granularity of encoding these features, as well as figures, notes, and all changes of typography, allows a range of options for display, delivery, and searching. For example, one has the option of identifying, and therefore specifying, the display characteristics of different typographic styles, and regularizing the display and placement of note text.

Level 3 texts can stand alone as text without page images, and therefore can be uploaded, downloaded, and delivered quickly, and require less storage space than digital collections with page images. However, the simple level of structural analysis and absence of specialized content analysis reflected in Level 3 encoding may make it desirable for some, depending on project priorities, to include page images in order to provide users with a fuller set of resources.

Level 3 is most suitable for projects with the following characteristics:

<ul> <li>the material is of interest to a large community of users who wish to read texts that allow for keyword searching</li> <li>some sophistication of display, delivery, and searching based on structure of the text is desired</li> <li>each text will undergo quality control to ensure that encoding decisions have been made appropriately</li> <li>the users of the texts may have limited storage or display capabilities</li> <li>the creator of the texts has limited or no ability to provide content expertise to analyze, tag, or review texts</li> <li>extensibility is desired; that is, one desires to keep open the option for a higher level of encoding to be added at a later date</li> </ul>

Workflow
Level 3 texts can be created by conversion from an electronic source such as an HTML file or word-processor document or from a print source, either through OCR or keyboarding. They can be generated trivially by converting from outsourced double-keyboarded texts conforming to TEI Tite, though some granularity of encoding will be lost in the translation.

Element Recommendations for Level 3
Use all elements specified in Levels 1 and 2 except <tt>ab</tt>, plus the following:

Forme Work
Running heads, catch words, page numbers, signatures, and other artifacts derived from printing should not be included in Level 3, with the exception of page numbers, which are recorded using <tt>&lt;pb&gt;</tt>. If upgrading a text from Level 1 or Level 2 that was generated using OCR, discard the forme work information.

Table of Contents

 * Chapter 4.5, Front Matter

You may wish not to include front matter content such as table of contents or lists of illustrations, especially if you plan to automatically generate the contents or lists of illustrations. If you do, however, plan to manually encode the table of contents (or lists of illustrations and similar content), use a <tt>div</tt> element with an appropriate <tt>type</tt> attribute (e.g., <tt>&lt;div type="contents"&gt;</tt>). Within this division, use the <tt>list</tt> element to mark up the table of contents, list of illustrations, etc. Each list item should have a <tt>ptr</tt> or <tt>ref</tt> element with a <tt>target</tt> attribute referencing an <tt>xml:id</tt> attribute on the <tt>&lt;pb&gt;</tt> or "div" element of the referenced page or section. Use <tt>ref</tt> if you wish to transcribe page numbers in the table of contents; use <tt>ptr</tt> if you do not.

Level 3 Basic Structure: Prose
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="VAA2383"> <teiHeader xml:lang="en"> </teiHeader> <text xml:lang="en"> [figure] <titlePage>[text]</titlePage> [text] [text] [book title] [text] [text] [text] [text] [text] [text] [text] </TEI>

Level 3 Basic Structure: Verse
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="VAA2383"> <teiHeader xml:lang="en"> </teiHeader> <text xml:lang="en"> <titlePage>[text]</titlePage> [text] [text] [book title] [section title] THE DAYS GONE BY. <lg> <l>O the days gone by! O the days gone by!</l> <l>The apples in the orchard, and the pathway through the rye;</l> <l>The chirrup of the robin, and the whistle of the quail</l> <l>As he piped across the meadows sweet as any nightingale;</l> <l>When the bloom was on the clover, and the blue was in the sky,</l> <l>And my happy heart brimmed overin the happy days gone by.</l> </lg> <lg>[lines of poetry]</lg> <lg>[lines of poetry]</lg> <lg>[lines of poetry]</lg> </TEI>

Level 3 Table of Contents
CONTENTS I. A Boy and His Dog II. Romance III. The Costume IV. Desperation V. The Pageant of the Table Round

Level 3 Chapter with Letter
<pb xml:id="VAA2383_126" n="118"/> CHAPTER XIV MAURICE LEVY'S CONSTITUTION <hi rend="font-weight: bold">L</hi>O, SAM!" said Maurice cautiously. "What you doin'?"       Penrod at that instant had a singular experiencean intellectual shock like a flash of fire in the        brain. Sitting in darkness, a great light flooded him with wild brilliance. He gasped!        "What you doin'?" asked Maurice for the third time, Sam Williams not having decided upon a reply. <pb xml:id="VAA2383_127" n="119"/>       It was Penrod who answered.        "Drinkin' lickrish water," he said simply, and wiped his mouth with such delicious enjoyment that       Sam's jaded thirst was instantly stimulated. He took the bottle eagerly from Penrod.        "A-a-h!" exclaimed Penrod, smacking his lips. "That was a good un!"        Penrod uttered some muffled words and then waved both armseither in response or as an expression       of his condition of mind; it may have been a gesture of despair. How much intention there was in      this actobviously so rash, considering the position he occupiedit is impossible to say. Undeniably there must remain a suspicion of deliberate purpose. <pb xml:id="VAA2383_138" n="130"/> The damsel curtsied again and handed him the following communication, addressed to herself: <floatingText> "Dear madam Please excuse me from dancing the cotilo with you                                   this afternoon as I have fell off the barn                                 "Sincerly yours<lb/> "<hi rend="font-variant: small-caps">Penrod Schofield</hi>." </floatingText>

Level 3 Alger Hiss document
<TEI xml:id="someid" xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader xml:lang="en"> </teiHeader> <text xml:lang="en"> <pb n="113" facs="00000001.tif"/> POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; ALTERNATIVELY, DISCOVERY AND A HEARING SHOULD BE ORDERED. The nature and extent of surveillance of Hiss, his family and associates was not known at the time of trial by the defense. Even now, with the release of some of the govern&amp#xAD; ment documents concerning FBI investigative techniques regarding Hiss, the full extent of surveillance -- wiretapping, mail open&amp#xAD; ings, mail covers, physical surveillance, and other intrusive techniques -- is still not 'clear. Nevertheless, it is apparent that information gathered through the exploitation of unlawful wiretaps and other illegal surveillance was used at trial and consequently the conviction must be reversed. Alternatively, further discovery and a hearing is essential to a fair deter&amp#xAD; mination regarding these issues.

FBI surveillance of Hiss began in earnest in 1941 with the institution of a mail cover on his incoming correspondence at his home in connection with an FBI investigation of possible Hatch Act violations. CN Ex. 98A. Another mail cover was placed

<pb n="114" facs="00000002.tif"/> on the Hiss mail in 1945, and at the same time the FBI obtained toll call records from the Hiss residence Telephone for the years 1943 and 1944 as well. CN Ex. 99. In September, 1945, the FBI intercepted telegrams to Hiss as well. CN Ex. 100.

In late November, 1945, FBI surveillance of the Hiss residence in Washington, D.C., escalated. For the third time, a mail cover was instituted beginning on November 28, 1945, which was continued at least until 1946. CN Ex. 101 at p. 70; CN Ex. 102. Continuous physical surveillance of Hiss was begun as well. CN Ex. 101 at p. 72. Although this twenty-four-hour surveillance was discontinued on December 14, 1945, physical surveillance was conducted frequently at various times until September, 1947.

<note place="bottom" anchored="true" n="68">Also before 1947, a letter from Priscilla Hiss addressed to her son, Timothy Hobson, was intercepted and its contents read. CN Ex. 100A at p. 167. In approximately March, 1947, a letter from a Michael Greenberg addressed to petitioner re&amp#xAD; garding an application for employment with the United Nations was also intercepted, in a manner not revealed by the docu&amp#xAD; ments. CN Ex. 100B

CN Ex. 102; CN Ex. 103.

The most intrusive invasion of petitioner's rights

<pb n="115" facs="00000003.tif"/> occurred from December 13, 1945 until the Hisses moved from Washington, D.C. to New York City on September 13, 1947. A "technical surveillance," -- a wiretap -- was placed on the Hiss telephone at their residence on P Street-in Washington, D.C. The logs of this surveillance constitute twenty-nine volumes of FBI serials and are roughly 2,500 pages in length, in which an enormous amount of information concerning the Hisses' per&amp#xAD; sonal lives, relationships with friends and associates, and habits is recorded.

The wiretap was installed following FBI Director Hoover's application to the Attorney General for authorization,

<note place="bottom" anchored="true" n="69">Hoover's initial request was answered by a note reques&amp#xAD; ting information on Hiss. CN Ex. 104. Additional information was furnished by letter dated November 30, 1945. CN Ex. 105.

although no written authorization appears in the documents released to Hiss. The purpose of the application was to gather information regarding Hiss' alleged contacts with Soviet espionage agents and communists in government service, general allegations which had been made by Elizabeth Bentley and Chambers.

As one would expect, the interception of every telephone </TEI>

Reference

 * Chapter 3.3 Highlighting and Quotation, P5 Guidelines
 * Chapter 3.4 Simple Editorial Changes, P5 Guidelines
 * Chapter 3.5 Names, Numbers, Dates, Abbreviations, and Addresses, P5 Guidelines
 * Chapter 3.12.1 Core Tags for Verse, P5 Guidelines
 * Chapter 3.12.2 Core Tags for Drama, P5 Guidelines
 * Chapter 4, Default Text Structure, P5 Guidelines
 * Chapter 6, Verse, P5 Guidelines
 * Chapter 7.1.4 Performance Texts: castLists and castItems, P5 Guidelines
 * Chapter 7.2.1 Performance Texts: Major Structural Divisions, P5 Guidelines
 * Chapter 7.2.2 Performance Texts: Speeches and Speakers, P5 Guidelines
 * Chapter 7.2.3 Performance Texts: Stage Directions, P5 Guidelines
 * Chapter 13, Names, Dates, People, and Places, P5 Guidelines
 * Chapter 14, Tables, Formulæ, and Graphics, P5 Guidelines
 * Chapter 14.3 Specific Elements for Graphic Images, P5 Guidelines
 * Chapter 16, Linking, Segmentation, and Alignment, P5 Guidelines

Purpose
To create text that can stand alone as electronic text, identifies hierarchy and typography, specifies function of textual and structural elements, and describes the nature of the content and not merely its appearance. This level is not meant to encode or identify all structural, semantic, or bibliographic features of the text.

Rationale
Greater description of function and content allows for:

<ul> <li>flexibility of display and delivery</li> <li>sophisticated searching within specified textual and structural elements</li> <li>combining the broadest range of uses and audiences</li>

</ul>

Texts encoded at Level 4 are able to stand alone without page images in order for them to be read by students, scholars, and general readers. This level of TEI encoding allows them to be displayed or printed in a variety of ways suitable for classroom or scholarly use.

Level 4 texts contain elements and attributes that describe content. Features of the text that may contribute to meaning, such as indentation of verse lines and typographic change, are preserved. These are textual features that are not encoded at lower levels and that allow the text to be used and understood fully independent of images. The ability to stand alone as text means that Level 4 texts are more nimble and robust for exercises such as format repurposing and textual analysis.

Finally, functionally accurate encoding in Level 4 texts allows them to be searched or displayed in sophisticated ways. For example, a searcher could limit his or her search in a     dramatic text to stage directions or in a verse text to only first lines. In a political tract published by subscription, a     search could be confined to names that appear in lists, thus limiting a search to names of people who subscribed to a     particular volume. This ability to limit searches becomes more significant as textbases become larger, and thus is of great importance to the library community as it attempts to build into the initial design and implementation of textbases the features needed to enhance interoperability.

Level 4 is most suitable for projects with the following characteristics:

<ul> <li>sophisticated search and retrieval capabilities are desired</li>

<li>the texts will be used for textual analysis</li> <li>extensibility is desired; that is, one desires to keep open the option for a higher level of      encoding to be added by the scholarly community at a later date</li> <li>the users of the texts may have limited storage or display capabilities</li> </ul>

Workflow
Text is generated by keyboarding (likely outsourced double keyboarding from page images using TEI Tite) or possibly by correcting OCR text using software that identifies spelling mistakes and consults a log from the OCR software to find regions of uncertainty in the OCR text. If converting from TEI Tite, minimal additional markup must be added, as discussed in Appendix A of TEI Tite.

Element Recommendations for Level 4
Use all elements specified in Levels 1, 2, and 3 except <tt>ab</tt>, plus elements in the following table. Note that some of these elements are defined in Level 3 as well, but their use in Level 4 is more strict.

General Level 4 Recommendations and Examples

 * The use of &lt;group&gt; is required when you need to encode a body of distinct texts that are grouped together and are regarded as a unit. Most typical examples of such composite texts would be anthologies, collected works of an author, etc. Section 4.3.1 Grouped Texts states, “The presence of common front matter referring to the whole collection, possibly in addition to front matter relating to each individual text, is a good indication that a given text might usefully be encoded in this way.”


 * Typographically distinct text may be encoded using the following elements:
 * to represent speech, thought, quotation, etc.:
 * epigraph
 * quote
 * said
 * mentioned
 * soCalled
 * to represent foreign words or phrases, linguistically emphatic or stressed words or phrases, words regarded as technical terms, etc.:
 * emph
 * foreign (e.g. <tt>&lt;foreign xml:lang="fr"&gt;</tt>)
 * gloss
 * term
 * title


 * Any ambiguous typographically distinct text should be encoded as hi (e.g. <tt>&lt;hi rend="font-weight: bold"&gt;</tt>). This element may also be used if the more specific elements above are not used.


 * Any of the following three methods may be used to encode errors or typos in original texts:
 * the sic element used alone is recommended to indicate errors without correcting them
 * the corr element used alone is recommended to provide corrections without indicating the initial error
 * the choice element allows both the apparent error and its editorial correction to be recorded, as in the following examples:

He has no Scruple about Fish; but won't touch a bit of Pork, it being expresly expressly forbidden by their Law. Source: Thomas Bluett. ''Some Memoirs of the Life of Job, the Son of Solomon, the High Priest of Boonda in Africa; Who was a Slave About Two Years in Maryland; and Afterwards Being Brought to England, was Set Free, and Sent to His Native Land in the Year 1734.'' London: Printed for R. Ford, 1734.

or

4. The art of writing she obtained by her own industry and curiosity, and in so short a time that in the year 1765, when she was not more than twelve years of age,she age, she was capable of writing letters to her friends <pb xml:id="p11" n="11"/> on various subjects. She also wrote to several persons in high stations. Source: Abigail Mott, 1766-1851. ''Biographical Sketches and Interesting Anecdotes of Persons of Colour. To Which is Added, a Selection of Pieces in Poetry.'' New-York: M. Day, 1826.


 * Use <tt>&lt;argument&gt;</tt> to encode a prefatory list or prose description of the topics usually discovered at the beginning of a chapter. The content within the <tt>argument</tt> element can be presented as a list or as a paragraph:

<pb xml:id="albert14" n="14"/> CHAPTER I.<lb/>CHARLOTTE BROOKS. Causes of immorality among colored people - Charlotte Brooks - She is sold South - Sunday work. ... Source: Octavia V. Rogers Albert. The House of Bondage, or, Charlotte Brooks and Other Slaves, Original and Life Like, As They Appeared in Their Old Plantation and City Slave Life; Together with Pen-Pictures of the Peculiar Institution, with Sights and Insights into Their New Relations as Freedmen, Freemen, and Citizens. New York: Hunt & Eaton, 1890.

[book title] [chapter title] [text] Here ends the Chapter 1. [chapter title] [text] Here ends the Chapter 2. FINIS.
 * The <tt>trailer</tt> element is recommended to encode a heading- or title-like content at the end of a division (i.e. chapter, book, etc.):


 * The elements <tt>add</tt>, <tt>del</tt>, <tt>unclear</tt>, <tt>gap</tt> may be used to indicate instances when a text (i.e. word or part of it, phrase or part of it) has been added, marked for deletion, or to indicate cases where transcription is difficult (<tt>&lt;unclear></tt>) or impossible (<tt>&lt;gap></tt>) because the material is illegible, invisible, or inaudible (i.e. while transcribing oral history interviews):

But it is well authenticated by the observation of every one, that <del rend="text-decoration: line-through" hand="#JHL">their manner <add rend="vertical-align: super" hand="#JHL">this way—i.e. the above of writing influences the style of compos. of those who practise it considerably, when they grow up to years of manhood; for their productions, <del hand="#JHL" rend="text-decoration: line-through">instead far from being terse, argumentative, convincing, are without head or tail &amp;amp; are generally an incongruous mass mixed up in the most disgusting manner, without divisions or heads &amp;amp; in short without a subject (so to speak). Source: Class Composition of J. Horace Lacy, [January 1851] 1. Lacy, James Horace, 1834-1852 But I still hope for &amp;amp; trust in God and I believe he will animate our brave defenders with a superhuman power and we will yet drive from our soil the hated invaders whose tread profanation, but this is an hour to try men's souls—Fort Donelson has been taken by the enemy. Frank was there and covered himself with honor but his bravery cost him a wound; he was wounded in the leg slightly—a flesh wound only, you must not be uneasy. Source: Kimberly Family Personal Correspondence, 1862-1864. Transcript of the manuscript, UNC-Chapel Hill, Southern Historical Collection.

Level 4 Front and Back Matter

 * The use of the <tt>titlePage</tt> element with appropriate child elements describing the major features of most title pages is required. The child elements are listed in Section 4.6 "Title Pages".


 * <tt>&lt;titlePage&gt;</tt> should include the verso if present, divided by <tt>&lt;pb n="verso"/&gt;</tt>.


 * Frontispieces should be encoded as a <tt>&lt;figure&gt;</tt>, within a separate division (numbered or unnumbered, depending on the general editorial decision for a specific encoding project) and <tt>&lt;p&gt;</tt>.


 * Tables of contents, errata, subscription lists, “other titles by the same author” should be included in a separate division (numbered or unnumbered, depending on the general editorial decision for a specific encoding project), as a <tt>&lt;list&gt;</tt> with <tt>&lt;item&gt;</tt>s.


 * It is recommended that all prefaces, tables of contents, afterwords, appendices, endnotes, and apparatus be encoded with phrase-level elements.


 * For publishers’ advertisements, indexes, and glossaries, or other front or back matter that are not considered of primary importance to the text, there are three options:
 * Fully transcribe and encode. For an index, use <tt>type="index"</tt> on the <tt>&lt;div&gt;</tt>, with <tt>&lt;list&gt;</tt>s to mark up index entries. Use <tt>&lt;ref target="____"&gt;</tt> to mark up page numbers given in the index, with the value of <tt>target</tt> referring to the <tt>xml:id</tt> attribute of the <tt>&lt;pb&gt;</tt> of the referenced page.
 * Link to page images (may omit encoded transcription)
 * Fully omit and note the omission in <tt>&lt;samplingDecl&gt;</tt>

Level 4 Name Tagging

 * Chapter 13.1.1, Linking Names and Their Referents

Names should be encoded using <tt>persName</tt>, <tt>placeName</tt>, <tt>geogName</tt>, and <tt>orgName</tt> elements with the <tt>ref</tt> or <tt>key</tt> attribute providing a reference to a <tt>person</tt>, <tt>place</tt>, or <tt>org</tt> element in an external file or database for managing name normalization and compilation of additional information such as biographical or geospatial information. See the discussion of <tt>ref</tt> and <tt>key</tt> above for how to choose between them.

If using <tt>key</tt>, provide a unique internal identifier, such as in a local database.

If using <tt>ref</tt>, an external TEI file may contain an entry for each name, grouped accordingly under <tt>&lt;listPerson&gt;</tt>, <tt>&lt;listPlace&gt;</tt>, and <tt>&lt;listOrg&gt;</tt>, which is uniquely identified with an <tt>xml:id</tt> attribute. In such a case the value of the <tt>ref</tt> attribute in the main TEI document (the transcription of the source document) references the value of the <tt>xml:id</tt> attribute in the external file. (In the examples below, the external file is named <tt>context.xml</tt> for “contextual information” and is in the same directory as the source file, but it may be named anything and placed anywhere that can be referenced by a URI.)

When referencing external files or databases, it is strongly recommended to provide an explanation in the <tt>&lt;editorialDecl&gt;</tt> section of the TEI header. References to controlled vocabularies and national or local authority files may be signified by a prefix in the <tt>xml:id</tt> attribute (e.g., <tt>tgn_0000000</tt> for the Getty Thesaurus of Geographic Names). When referencing a controlled vocabulary be sure to specify this information in the <tt>&lt;classDecl&gt;</tt> section of the TEI header.

The first Jews arrived in <placeName ref="context.xml#tgn_7012924">Indianapolis</placeName> in the middle of the 19th century. Primarily immigrants from <placeName ref="context.xml#tgn_7000084" > Germany</placeName> and other points in central Europe (though many had lived elsewhere in the   <placeName ref="context.xml#tgn_7012149">United States</placeName> before they arrived in the  city), they were drawn from throughout the Midwest by the growth of commerce and rail lines in    <placeName ref="context.xml#tgn_7012924">Indianapolis</placeName>.
 * Place-name tagging example in main TEI document (the transcription of the source document):

<listPlace> <place xml:id="tgn_7012924"> <placeName> Indianapolis Indiana </placeName> <place xml:id="tgn_7000084"> <placeName> <country xml:lang="de">Deutschland </placeName> <place xml:id="tgn_7012149"> <placeName> United States </placeName> </listPlace>
 * In the external file <tt>context.xml</tt>, for maintaining place name normalization and additional information:

PRIZE LIBRARY GIFT-Indiana University President <persName ref="context.xml#lcnaf_82134365" >Elvis J. Stahr</persName> (right), a former law dean and practicing attorney, reminisces with Professor of Law <persName ref="context.xml#lcnaf_00113347">W. Howard Mann</persName> as the two inspect some of the nearly 3,000 volumes of <orgName ref="context.xml#lcnaf_79006848">U.S.     Supreme Court</orgName> records recently transferred to I.U. from the <orgName ref="context.xml#lcnaf_79109178">Indiana Supreme Court Library</orgName>. The collection, dating back to 1925, is one of the oldest and most complete sets in existence.
 * Personal and organizational name tagging example in main TEI document (the transcription of the source document):

<listPerson> <person xml:id="lcnaf_82134365"> <persName> Stahr Elvis J.      </persName> <person xml:id="lcnaf_00113347"> <persName> Mann W.        Howard </persName> </listPerson> <listOrg> <org xml:id="lcnaf_79006848"> <orgName>United States. Supreme Court</orgName> <org xml:id="lcnaf_79109178"> <orgName>Indiana. Supreme Court</orgName> </listOrg>
 * In the external file <tt>context.xml</tt>, for maintaining personal and organization name normalization and additional information:


 * Alternatively, instead of using an external file for the authority data, use the <tt>key</tt> attribute to point to a unique key in a MySQL table that stores information like county name, FIPS county code, and latitude/longitude values:

When Harry Byrd "retired" to his orchards and Rosemont, his new house outside <placeName key="1498453">Berryville</placeName> in 1930, he was still an energetic young man with a long political career ahead of him.

Level 4 Figures
<tt>&lt;figure&gt;</tt> groups elements representing or containing graphic information such as an illustration or figure; in this context <tt>&lt;figure&gt;</tt> typically contains the following elements:


 * <tt>&lt;head&gt;</tt>, containing a literal transcription of a caption on a figurative image.


 * <tt>&lt;figDesc&gt;</tt>, containing a free text description of the image used, potentially, for searching the images themselves.


 * <tt>&lt;graphic&gt;</tt>, pointing to the URI of the image itself using a <tt>url</tt> attribute and containing other presentation instructions such as dimension at which the graphic should be displayed, etc.

An example of frontispiece encoding:

Sojourner Truth. <figDesc>Woodcut of Sojourner Truth.</figDesc> <graphic url="http://docsouth.unc.edu/neh/truth50/frontis.html" scale="0.5"/> [Etc ...]

Source: Narrative of Sojourner Truth, a Northern Slave, Emancipated from Bodily Servitude by the State of New York, in 1828.

Level 4 Embedded Texts
At Level 4, texts embedded within other texts must be marked as such.

In the case of a quotation from another text, use <tt>&lt;quote&gt;</tt>, and do not include quotation marks in the content of this element or just outside the opening and closing tags. If the rendering of this quotation needs to be recorded, use the <tt>rend</tt> attribute to describe how this quotation is set off from the rest of the text.

If the embedded text is more than a short quotation, use <tt>&lt;floatingText&gt;</tt> even if only an excerpt of the embedded texts is provided. If your project uses <tt>&lt;quote&gt;</tt> to identify quotations, surround instances of floating texts which are quotations with <tt>quote</tt> tags.

Personal letters are a common example of an embedded text. While a collection of letters would use a <tt>div</tt> element for each letter, if a letter is quoted as part of a larger text, use <tt>&lt;floatingText&gt;</tt> <tt>&lt;body&gt;</tt> <tt>&lt;div1 type="letter"&gt;</tt> with <tt>&lt;opener&gt;</tt>, <tt>&lt;dateline&gt;</tt>, <tt>&lt;salute&gt;</tt>, <tt>&lt;signed&gt;</tt>, <tt>&lt;closer&gt;</tt>, <tt>&lt;postscript&gt;</tt> included as appropriate. For example:

She opened and read as follows: <floatingText> AUGUSTA, March 4th, 18— <hi rend="font-style: italic">Mrs. A. Mitten:</hi> "Having recently understood that you have procured a private teacher, we have ventured to stop your advertisement, <hi rend="font-style: italic">though ordered to continue it until forbid,</hi> under the impression that you have probably forgotten to have it stopped. If, however, we have been misinformed, we will promptly resume the publication of it. You will find our account below; which as we are much in want of funds, you will oblige us by settling as soon as convenient. Hoping your teacher is all that you could desire in one,                    "We remain, your ob't. serv'ts, "H—&amp;amp; B—&rdquo;         </floatingText> Source: Augustus Baldwin Longstreet, 1790-1870. Master William Mitten: or, A Youth of Brilliant Talents, Who Was Ruined by Bad Luck. Macon, Ga.: Burke, Boykin, 1864.

Level 4 Drama

 * Within the front matter (<tt>&lt;front&gt;</tt>) of a performance text, cast lists should be encoded as <tt>&lt;castList&gt;</tt>s, with each item in that list encoded as <tt>&lt;castItem&gt;</tt>s. If desired, each <tt>&lt;castItem&gt;</tt> may be uniquely identified with the <tt>xml:id</tt> attribute.

For example,

<castList> Dramatis Personae <castItem xml:id="kllear">LEAR king of Britain</castItem> <castItem xml:id="klfrance">KING OF FRANCE</castItem> <castItem xml:id="klburgundy">DUKE OF BURGUNDY</castItem> <castItem xml:id="klcornwall">DUKE OF CORNWALL</castItem> <castItem xml:id="klalbany">DUKE OF ALBANY</castItem> <castItem xml:id="klkent">EARL OF KENT</castItem> <castItem xml:id="klgloucester">EARL OF GLOUCESTER</castItem> <castItem xml:id="kledgar">EDGAR son to Gloucester.</castItem> <castItem xml:id="kledmund">EDMUND bastard son to Gloucester.</castItem> [. . .]   </castList>

Source: Shakespeare’s King Lear


 * Within the body of performative texts, speeches are encoded as <tt>&lt;sp&gt;</tt> and speakers identified by the <tt>speaker</tt> element, which is a child of <tt>&lt;sp&gt;</tt>.
 * Stage directions are encoded as <tt>&lt;stage&gt;</tt> and enclose content describing scenery, stage directions, etc.
 * When encoding the actual speech content itself, utilize elements and attributes that correspond to the type of dramatic speech presented (e.g. <tt>&lt;p&gt;</tt> for prose speech with <tt>&lt;lb&gt;</tt> to designate a new line in a particular edition of the text or <tt>&lt;lg&gt;</tt> and <tt>&lt;l&gt;</tt> to describe dramatic verse structures).
 * If referencing the <tt>xml:id</tt> defined in the <tt>&lt;castList&gt;</tt> is desired, use the <tt>who</tt> attribute for the IDREF datatype.

Act 1 Scene 1 King Lear's palace. Enter KENT, GLOUCESTER, and EDMUND

KENT I thought the king had more affected the Duke of<lb/> Albany than Cornwall. </sp> GLOUCESTER It did always seem so to us: but now, in the<lb/> division of the kingdom, it appears not which of<lb/> the dukes he values most; for equalities are so<lb/> weighed, that curiosity in neither can make choice<lb/> of either's moiety. </sp> KENT Is not this your son, my lord? </sp> [. . .]

Source: Shakespeare’s King Lear

Level 4 Oral History
Speakers in oral history interviews, i.e. interviewee(s) and interviewer(s), may be identified in the <tt>&lt;teiHeader&gt;</tt> in several ways:


 * In the <tt>&lt;profileDesc&gt;</tt>, in the <tt>&lt;particDesc&gt;</tt>, using the <tt>list</tt> element, with <tt>&lt;name&gt;</tt> inside of <tt>&lt;item&gt;</tt>s
 * As a list of author <tt>&lt;name&gt;</tt>s within <tt>&lt;fileDesc&gt;</tt> / <tt>&lt;titleStmt&gt;</tt>

In either method, use an <tt>xml:id</tt> on the <tt>name</tt> element to uniquely identify the individual participant:

Interview Participants <name xml:id="spk1" key="wf" reg="Friday, William C." type="interviewee">WILLIAM C. FRIDAY , interviewee <name xml:id="spk2" key="wl" reg="Link, William" type="interviewer">WILLIAM LINK, interviewer
 * The list of an interview’s participants can be also listed within the body of the interview (see example below).
 * Questions and answers from interviewees and interviewers are encoded as <tt>&lt;sp&gt;</tt>, with speakers identified within <tt>speaker</tt> elements with a <tt>who</tt> attribute the value of which corresponds to the <tt>xml:id</tt> in the list of interview participants.

[. . . ]

WILLIAM LINK: Last time we were talking about Frank Porter Graham. And I have a couple of questions about Graham, and I wonder if you could clear them up for me. You have mentioned that you had worked with him as a student at North Carolina State, had you met him before? </sp> WILLIAM C. FRIDAY: No. That budget hearing was the first that I knew of him, of course, but the first time that I ever encountered him. I was president of class at N.C. State, and that through me into this kind of public adventure. And so I went merrily on downtown and sat there in the budget hearing, along with the president of the student body, and some others. </sp>

One possible way to synchronize audio and transcript has been introduced in Oral Histories of the American South, using <tt>&lt;milestone&gt;</tt> with a timestamp attribute: <milestone n="7248" unit="empty" type="stop" timestamp="00:08:54"/>

Level 4 Verse
Use <tt>&lt;lg&gt;</tt> and <tt>&lt;l&gt;</tt> as in Level 3. In addition, use the <tt>rend</tt> attribute to indicate lines that are indented.

For example, Fit the First: THE LANDING

<l n="1.1">"Just the place for a Snark!" the Bellman cried,</l> <l rend="margin-left: 0.5in" n="1.2">As he landed his crew with care;</l> <l n="1.3">Supporting each man on the top of the tide</l> <l rend="margin-left: 0.5in" n="1.4">By a finger entwined in his hair.</l> </lg>

<l n="2.1">"Just the place for a Snark! I have said it twice:</l>       <l rend="margin-left: 0.5in" n="2.2">That alone should encourage the crew.</l>        <l n="2.3">Just the place for a Snark! I have said it thrice:</l>        <l rend="margin-left: 0.5in" n="2.4">What I tell you three times is true."</l> </lg>

[ETC....]

Source: Lewis Carroll’s The Hunting of the Snark

Level 4 Milestones
Instead of using the <tt>milestone</tt> element available in TEI, use <tt>&lt;ab type="typography"&gt;</tt>. The content of this element is the character(s) or device used to mark the milestone in the source document. For example:

*****</ab>

Level 4 Alger Hiss document
Note that the soft hyphen character is displayed as an entity reference because this character will not display in many web browsers.

<TEI xml:id="project_document_identifier" xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader xml:lang="en"> </teiHeader> <text xml:lang="en"> <pb n="113" facs="./pageImages/AH4_0113.jpg" ed="typed"/> <pb n="118" ed="subsequent"/> POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S           <lb/>CONVICTION SHOULD BE VACATED; ALTERNATIVELY, <lb/>DISCOVERY AND A HEARING SHOULD BE ORDERED. The nature and extent of surveillance of Hiss, his <lb/>family and associates was not known at the time of trial by           <lb/>the defense. Even now, with the release of some of the govern&amp;#xAD; <lb/>ment documents concerning FBI investigative techniques regarding <lb/>Hiss, the full extent of surveillance -- wiretapping, mail open&amp;#xAD; <lb/>ings, mail covers, physical surveillance, and other intrusive <lb/>techniques -- is still not 'clear. Nevertheless, it is apparent <lb/>that information gathered through the exploitation of unlawful <lb/>wiretaps and other illegal surveillance was used at trial and <lb/>consequently the conviction must be reversed. Alternatively, <lb/>further discovery and a hearing is essential to a fair deter&amp;#xAD; <lb/>mination regarding these issues. FBI surveillance of Hiss began in earnest in 1941 with <lb/>the institution of a mail cover on his incoming correspondence <lb/>at his home in connection with an FBI investigation of possible <lb/>Hatch Act violations. CN Ex. 98A. Another mail cover was placed <pb n="114" facs="./pageImages/AH_0114.jpg"/> <pb n="119" ed="subsequent"/> on the Hiss mail in 1945, and at the same time the FBI obtained <lb/>toll call records from the Hiss residence Telephone for the <lb/>years 1943 and 1944 as well. CN Ex. 99. In September, 1945, <lb/>the FBI intercepted telegrams to Hiss as well. CN Ex. 100. In late November, 1945, FBI surveillance of the Hiss <lb/>residence in Washington, D.C., escalated. For the third time, <lb/>a mail cover was instituted beginning on November 28, 1945, <lb/>which was continued at least until 1946. CN Ex. 101 at p. 70; <lb/>CN Ex. 102. Continuous physical surveillance of Hiss was begun <lb/>as well. CN Ex. 101 at p. 72. Although this twenty-four-hour <lb/>surveillance was discontinued on December 14, 1945, physical <lb/>surveillance was conducted frequently at various times until <lb/>September, 1947.<note place="foot" anchored="true" n="68">Also before 1947, a letter from Priscilla Hiss addressed <lb/>to her son, Timothy Hobson, was intercepted and its contents <lb/>read. CN Ex. 100A at p. 167. In approximately March, 1947, <lb/>a letter from a Michael Greenberg addressed to petitioner re&amp;#xAD; <lb/>garding an application for employment with the United Nations <lb/>was also intercepted, in a manner not revealed by the docu&amp;#xAD; <lb/>ments. CN Ex. 100B CN Ex. 102; CN Ex. 103. The most intrusive invasion of petitioner's rights <pb n="115" facs="./pageImages/AH_0115.jpg"/> <pb n="120" ed="subsequent"/> <lb/>occurred from December 13, 1945 until the Hisses moved from <lb/>Washington, D.C. to New York City on September 13, 1947. A           <soCalled>technical surveillance</soCalled>, -- a wiretap -- was placed on the Hiss <lb/>telephone at their residence on P Street-in Washington, D.C.           <lb/>The logs of this surveillance constitute twenty-nine volumes <lb/>of FBI serials and are roughly 2,500 pages in length, in which <lb/>an enormous amount of information concerning the Hisses' per&amp;#xAD; <lb/>sonal lives, relationships with friends and associates, and <lb/>habits is recorded. The wiretap was installed following FBI Director Hoover's           <lb/>application to the Attorney General for authorization, <note place="bottom" anchored="true" n="69">Hoover's             initial request was answered by a note reques&amp;#xAD; <lb/>ting information on Hiss. CN Ex. 104. Additional information <lb/>was furnished by letter dated November 30, 1945. CN Ex. 105. <lb/>although no written authorization appears in the documents released to           <lb/>Hiss. The purpose of the application was to gather information <lb/>regarding Hiss' alleged contacts with Soviet espionage agents and <lb/>communists in government service, general allegations which had <lb/>been made by Elizabeth Bentley and Chambers. As one would expect, the interception of every telephone </TEI>

LEVEL 5: Scholarly Encoding Projects
Level 5 texts are those that require substantial human intervention by encoders with subject knowledge. These texts might include encodings of semantic, linguistic, prosodic, or other features well beyond the basic structural elements discussed in Levels 1-4 above. They might also include elements for editorial, critical, or analytical additions; manuscript descriptions; translations; or other textual apparatus. It is impossible to make concrete recommendations for encoding at this level since the scholarly analysis required is usually specific to each project; instead, Level 5 offers the full set of P5 elements as needed.

Reference

 * Complete P5 Guidelines

Purpose
To create deeply analytical encoded texts that might be appropriate for specific research purposes, as part of a scholarly publishing project, or for any other encoding practices in library-based text encoding.

Rationale
A significant number of library-based projects engage in high-level analytical text encoding as part of their efforts in digitization, scholarly editing, academic support, or other research. Level 5 is intended to represent that work, which can take advantage of the full richness of the complete TEI Guidelines, while still acknowledging the impact of library-specific practices on encoded text that is created under the auspices of a library.

The specific influences of library practice on a Level-5 encoded text are expressed primarily in adherence to the General Recommendations and TEI Header sections above.

Element Recommendations and Examples
Because of the vast range of possibilities for Level-5 encoding, we have chosen to provide neither a list of recommended elements nor any specific examples for this Level.

Please refer to the TEI Header section above for examples of <tt>&lt;TeiHeader&gt;</tt> element usage, and to the General Recommendations section and the Complete TEI P5 Guidelines for element recommendations and usage examples within the <tt>&lt;text&gt;</tt> element.

Acknowledgments
This document is the result of a group of individuals with a range of experience with TEI text encoding, which formed together under the TEI Special Interest Group on Libraries and Digital Library Federation umbrellas. We would like to thank and acknowledge all of those who have given their time and expertise to develop these best practices.

The individuals who have contributed to the writing of this document are:


 * Syd Bauman, Brown University
 * Michelle Dalmau, Indiana University
 * Matthew Gibson, University of Virginia
 * Kevin Hawkins, University of Michigan
 * Lisa McAulay, University of California, Los Angeles
 * Renee McBride, University of North Carolina, Chapel Hill
 * Melanie Schlosser, Ohio State University
 * Natasha Smith, University of North Carolina, Chapel Hill
 * Vitus Tang, Stanford University
 * Richard Wisneski, Case Western University
 * Glen Worthey, Stanford University

The individuals who have contributed to the planning of this document are:


 * Syd Bauman, Brown University
 * Michelle Dalmau, Indiana University
 * Matthew Gibson, University of Virginia
 * Kevin Hawkins, University of Michigan
 * Lisa McAulay, University of California, Los Angeles
 * Chris Powell, University of Michigan
 * Andrew Rouner, Washington University in St. Louis
 * Melanie Schlosser, Ohio State University
 * Natasha Smith, University of North Carolina, Chapel Hill
 * Perry Willett, California Digital Library
 * Richard Wisneski, Case Western University
 * Glen Worthey, Stanford University

The individuals who have contributed to copyediting of this document are:
 * Susan Lorand, University of Michigan
 * Becky Welzenbach, University of Michigan

Lastly, we would like to thank the Digital Library Federation (DLF) for sponsoring two in-person meetings as part of the Spring 2008 Forum in Minneapolis, Minnesota, and the Spring 2009 Forum in Raleigh, North Carolina, in support of our revision work. The DLF also provided teleconferencing support for our regularly scheduled meetings.

Appendix A: History of This Document
This document was formerly known as "TEI Text Encoding in Libraries Guidelines for Best Encoding Practices".

The Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange (referred to as    the TEI Guidelines) were first published in 1994 and represent a tremendous achievement in electronic text standards by providing a highly sophisticated structure for encoding electronic text. Digital librarians have benefited greatly from the standardization provided by these guidelines, and the potential for interoperability and long-term preservation of digital collections facilitated by their wide adoption.

In 1998, the Digital Library Federation (DLF) sponsored the TEI and XML in Digital Libraries Workshop at the Library of Congress to discuss the use of the TEI Guidelines in libraries for electronic text, and to create a set of best practices for librarians implementing them. From this workshop, three working groups were formed, the members of which represented some of the largest and most mature digital library programs in the U.S.

Group 1 was charged to recommend some best practices for TEI header content and to review the relationship between the Text Encoding Initiative header and MARC. To this end, representatives of the University of Virginia Library and the University of Michigan Library gathered in Ann Arbor in early October 1998 to develop a recommended practice guide. This work was assisted by similar efforts that had taken place in the United Kingdom under the auspices of the Oxford Text Archive the previous year. The section on the header is based on a draft of those recommended practices. It was submitted to various constituencies for comment. In 2008 and 2009, it was heavily revised by Melanie Schlosser, Kevin Hawkins, and other members of the TEI SIG on Libraries.

Group 2 was charged with developing a set of recommendations for libraries using the TEI Guidelines in electronic text encoding. This group included the following representatives from six libraries: <ul> <li>LeeEllen Friedland, Library of Congress</li> <li>Nancy Kushigian, University of California, Davis</li> <li>Chris Powell, University of Michigan</li> <li>David Seaman, University of Virginia</li> <li>Natasha Smith, University of North Carolina, Chapel Hill</li> <li>Perry Willett, Indiana University (chair)</li> </ul> At the ALA Midwinter Meeting (January 1999), the DLF task force revised a draft set of best practices, called TEI Text Encoding in Libraries: Guidelines for Best Practices (often referred to as <i>TEI in Libraries     Guidelines</i>). The revised recommendations were circulated to the conference working group in May 1999 and presented at the joint annual meeting of the Association of Computers and the Humanities and Association of Literary and Linguistic Computing in June 1999. Version 1.0 was circulated for comments in    August 1999. These guidelines were endorsed by the DLF, and have been used by many digital libraries, including those of the task force members, as a model for their own local best practices. Libraries, museums, and end-users have benefitted from a set of best practices for electronic text in a number of    ways, including better interoperability between electronic text collections, better documented practices among digital libraries, and a starting point for discussion of best practices with commercial publishers regarding electronic text creation.

Written in 1998, this first iteration of TEI in Libraries Guidelines made no mention of XML, XSLT, or any of the other powerful tools that have now become common parlance and practice in creating digital documents and collections. Based on these important changes in markup technology, it came to the attention of the DLF and members of the original Task Force that the TEI in Libraries Guidelines required substantial revision. In 2002, the TEI Consortium published a new edition of the complete TEI Guidelines that conformed to XML specifications. In order to remain useful, the <i>TEI in     Libraries Guidelines</i> had to be updated to reflect these developments.

Furthermore, librarians need more guidance than the original TEI in Libraries Guidelines provided. There are many library-specific encoding issues which need to be addressed and documented to    ensure consistency. The intention of this document is to provide recommended paths of encoding for these issues.

In addition, these library guidelines have the potential to be much more useful if they can serve as a    training document from which librarians can learn about text encoding and addressing particular encoding challenges. To fulfill this role, the guidelines require more examples and detailed explanations, giving documentation of the use of TEI in a library context. Librarians also need a set of standards and best practices for vendors and publishers who create electronic text for digital libraries, so that these collections adhere to the same archival standards as locally-created electronic text collections. With detailed guidelines that could serve as an encoding specification, librarians might encourage vendors to follow the principles in these standards, to facilitate the long-term preservation of    commercially published electronic text collections, and more readily allow for cross-collection searching.

In order to facilitate the evolution of this document, another DLF-sponsored Task Force&mdash;some of the representatives of which were on the original Task Force&mdash;met on October 24-25, 2003 at    the Cosmos Club in Washington, D.C.: <ul> <li>Richard Gartner, Oxford University Library</li> <li>Matthew Gibson, University of Virginia Library</li> <li>Kirk Hastings, California Digital Library</li> <li>Chris Powell, University of Michigan</li> <li>Merrilee Proffitt, RLG</li> <li>David Seaman, Digital Library Federation</li> <li>Natasha Smith, University of North Carolina, Chapel Hill</li> <li>Perry Willett, Indiana University (chair)</li> </ul> These representatives met to revise the original TEI in Libraries Guidelines in order that they: <ul> <li>reflect changes occuring within the text encoding world generally and within the TEI community specifically</li> <li>further illuminate the different levels of encoding by offering clearer and more robust examples.</li> </ul> After producing Version 2.0 of the Guidelines, this group (with some changes in membership) met again at the Cosmos Club on February 13-14, 2006. Those in attendance were: <ul> <li>Syd Bauman, The TEI Consortium</li> <li>Richard Gartner, Oxford University Library (by phone)</li> <li>Matthew Gibson, University of Virginia (chair)</li> <li>Chris Powell, University of Michigan</li> <li>Merrilee Proffitt, RLG</li> <li>David Seaman, Digital Library Federation</li> <li>Natasha Smith, University of North Carolina, Chapel Hill</li> <li>Perry Willett, University of Michigan</li> </ul>

= Appendix B: Formal Specification =

(This section will be generated automatically once ODDs are created.)