Best Practices for TEI in Libraries

TEI in Libraries: Home

Introduction
These recommendations are for libraries using the Text Encoding Initiative’s Guidelines for Text Encoding and Interchange (P5). They are intended for use in large, library-based digitization projects, but may be useful in other scenarios as well.

There are many different library text digitization projects, for different purposes. With this in mind, the Task Force has attempted to make these recommendations as inclusive as possible by developing a series of encoding levels. These levels are meant to allow for a range of practice, from wholly automated text creation and encoding, to encoding that requires expert content knowledge, analysis, and editing.

Recommendations for Levels 1-4 are intended for projects wishing to create encoded electronic text with structural markup, but minimal semantic or content markup. Also, the encoding levels are cumulative: encoding requirements at each level incorporate the requirements of lower levels. Levels 1-4 allow the conversion and encoding of texts to be performed without the assistance of deep content knowledge and can be enriched with more markup at any time. Level 5, in contrast, requires scholarly analysis.

General Recommendations
An encoding project should strive for internal consistency and for use of standards so that the data can be modified or enhanced in the future with ease. In cases where local practice deviates from standards, there should at least be internal consistency in the local practice.

 When reformatting to digital media using any level of encoding, the electronic text should begin with the transcription of the first word on     the first leaf of the original work. It may be impractical or undesirable to transcribe and encode certain features of the text, such as publisher's advertisements or indexes, but if at all possible, they should be included as links to page images. Any omissions of material found in the original work should be noted in the &lt;editorialDecl&gt; in the TEI header.

A filename scheme should be established for the project. Filenames should ensure cross-platform compatibility: use only the characters A-Z, a-z, and 0-9 in filenames, and avoid file extensions longer than three characters.

An encoding project should use only numbered divisions (i.e., &lt;div1>, &lt;div2>, etc.) or unnumbered divisions (i.e., &lt;div>) but not both. This applies both within a TEI document (i.e., within &lt;front>, &lt;body>, &lt;back></tt>, even if nested within &lt;group></tt> or &lt;floatingText></tt>) and across TEI documents in any given collection. Keep in mind that since numbering of divs starts over (at div1</tt>) within floatingText</tt>; therefore, any software that expects precise nesting of numbered divs within a document will need to account for this.</li>

Whether numbered or unnumbered divisions are used, the @type</tt> attribute of the division element is not recommended at level 1, is optional at level 2, is recommended at level 3, and required at levels 4 and 5.</li>

Page breaks should be encoded using the &lt;pb&gt; element, which should demark the top of a page (i.e. the text of page	seven should immediately follow &lt;pb n="7"/&gt;), and should always be contained within a div for ease of retrieval with indexing software. For example, a page break that occurs between chapters 2 and 3 should be encoded near the top of the &lt;div> that holds chapter 3 (rather than near the bottom	of the &lt;div> that holds chapter 2). </li> </ul>

Structure of a TEI Document
A valid TEI XML document contains must contain the following elements: <ul>a root tei element, containing:<ul> a teiHeader element</li> a text element </li></ul></li></ul>

Within those two elements, there are additional requirements, which are discussed in these guidelines and in the complete TEI P5 Guidelines. The teiHeader element serves as a description of the document presented in the text element. The text element contains the encoded text document.

Reference

 * Chapter 2, TEI Header, P5 Guidelines

The TEI Header
The TEI Header is a metadata record that describes an electronic text encoded according to the TEI specification. The purpose of the TEI Header is to declare the bibliographic information related to the electronic document and if appropriate, the bibliographic data for original analog source document from which the electronic edition was created. The TEI Header often includes a description of the encoding decisions or practices used to create the electronic document. Since the advent of the TEI twenty years ago, many people have described the TEI Header as a title page for the electronic edition, and many librarians have compared it to traditional library catalog records (MARC).

As with any descriptive metadata, the metadata in the TEI Header can serve multiple audiences. In the local context, a TEI Header can be used to automatically create indexes (author lists, title lists) for a collection of electronic texts using specific software. The TEI Header can also be used to create other metadata records, either manually or automatically, such as MARC, Dublin Core, or other standards. The metadata in the TEI Header is useful to internal staff, colleagues within and beyond the institution, as well as to a variety of end users. Among institutions, the TEI Header can be used to exchange metadata and build meta-collections or integrated search portals.

I wouldn't say that MARC records are derived from the Header. DC and even MODS, yes, but I don't think the Header is structured enough to generate MARC. (Mdalmau)

The TEI Header and MARC
While a TEI header is often perceived as similar to or at least related to a MARC record, a TEI header does not typically have a one-to-one correspondence with a MARC record. One TEI header may be described by multiple MARC analytic records, or one MARC record may be used to describe a collection of TEI documents with individual headers. Furthermore, while a MARC record records metadata about a bibliographic entity in a library's collection, a TEI header records information both about an encoded text and about the source document for that encoded text.

Each institution and even each project may have a different approach to the way electronic texts are created in TEI and then represented in a larger public catalog through MARC. At one institution, the same unit (e.g., a cataloging department) may be responsible for creating both TEI Headers and MARC records, while at other institutions the work may be distributed among different units. Within the library domain, metadata or cataloging experts are usually required for at least review and standardization of both the TEI Header and the MARC record.

The TEI Header and Other Metadata Schemas
Several other descriptive metadata schemas are prevalent within the library domain, including Dublin Core (DC), Dublin Core Qualified (DCQ), and the Metadata Object Description Schema (MODS). Each of these schemas contains elements that capture the same data as many of the elements in the TEI Header. As with MARC, a variety of automated or manual workflows can be implemented to crosswalk metadata from one standard to another and provide for increased sharing of metadata about electronic texts in larger contexts. In particular, DC and MODS are common schemas used by the Open Access Initiative (OAI) and may be particularly valuable for sharing metadata across institutions.

Determining Data Values for the TEI Header
Within the library domain, there are several authoritative publications on how to create bibliographic and descriptive metadata for objects. These are usually called “content standards;” two prominent examples are the Anglo-American Cataloging Rules Second Edition (AACR2) and the International Standard Bibliographic Description for Electronic Resources (ISBD(ER)). These standards are extensive and outline a set of rules that enforce consistency across a voluminous amount of metadata.

Perhaps the primary purpose of these content standards is to give rules for what sources of information may be used in transcribing or generating metadata about a bibliographic entity. Within an electronic context, the analog object may not be available, so the TEI Header creator will need access to digitized images or other verifiable information to create accurate metadata.

The following sources of information are recommended in creating the TEI Header:

<ol>  For an electronic document with a digitized title page and title page verso:<ol style="list-style: lower-alpha;">  Chief source of information is the information coded as title page.</li>  Use added information from an originating paper document if absolutely certain it is the source.</li></ol></li> If there is no digitized title page but the header creator has satisfactory evidence of the source document, the header creator should refer to the source document for metadata creation. The lack of a title page may be for one of many reasons, among them: the original document is a manuscript item or the electronic edition is a portion of the original object (a poem or short story that was published in a collection or an article from a serial). In all cases, it is recommended that important bibliographic evidence, such as a digitized image of the title page and title page verso for a collection, be provided to the header creator, even if just a piece of the collection is used.</li>

If no title page is present and there is no evidence of a source document, the header creator<ol style="list-style: lower-alpha;"> <li> May assign a title and author, if appropriate.</li> <li> Enclose the information in brackets, using the standard English language convention for editorial interjections.</li></ol></li></ol>

Element Recommendations for the &lt;teiHeader&gt;


│  └  &lt;classDecl&gt;&lt;taxonomy xml:id="____"&gt;&lt;bibl&gt; Use to document classification schemes used in the header or body of the TEI document. For example:

├  &lt;profileDesc&gt;

└ &lt;textClass&gt; The elements below are contained within this element.

│   ├   &lt;classCode scheme="___"&gt; True classification numbers as opposed to call numbers can be entered here. The value of the scheme attribute corresponds to a classification scheme defined previously in classDecl. Example:

│  └  &lt;keywords scheme="____"&gt; Repeat this element as many times as there are keyword schemes. If the child <tt>term</tt> elements contain terms from a controlled vocabulary, indicate that controlled vocabulary through the scheme attribute. The value of the scheme attribute corresponds to a classification scheme defined previously in <tt>classDecl</tt>. Example:

│  └  &lt;term&gt; Use for terms from controlled or uncontrolled vocabularies, as indicated in the parent <tt>keywords</tt> element.

└ &lt;revisionDesc&gt;

└ &lt;change when="YYYY-MM-DD" who="URI"&gt; Create a <tt>change></tt> element to record each significant change to the TEI document, in reverse chronological order (i.e., most recent first). A prose description of the change is recorded as the content of each <tt>&lt;change></tt> element. This prose may contain lists for organization, and phrase-level markup (like <tt>&lt;gi></tt>, <tt>&lt;ptr></tt>, or <tt>&lt;date></tt>), but not paragraphs.

The date of the change in ISO 8601 form (YYYY-MM-DD) should be recorded on the <tt>when=</tt> attribute.

The person who is responsible for making the change is indicated by the <tt>who=</tt> attribute of <tt>&lt;change></tt>. Its value is a URI that points to a <tt>&lt;respStmt></tt> or <tt>&lt;person></tt> element that encodes information about the responsible party. Note that this reference is a URI reference and not an ID/IDREF reference, and thus is not checked by validation software. Small projects sometimes take advantage of this by putting information into the URI itself, and not having a <tt>&lt;respStmt></tt> or <tt>&lt;person></tt> element. E.g., <tt>who="#Kevin_Hawkins"</tt>.

Sample TEI Header
<teiHeader> <fileDesc> <titleStmt> Lincoln and Seward. <persName>Welles, Gideon, 1802-1878.</persName> </titleStmt> <publicationStmt> University of Michigan, Digital Library Initiatives These pages may be freely searched and displayed. Permission must be received for subsequent distribution in print or electronically. Please go to           http://www.umdl.umich.edu/ for more information. 1996      </publicationStmt> <seriesStmt> <title level="s" type="main">Making of America </seriesStmt>

Level 1 Example: Basic Structure
Would we have an "n" attribute here? I think that requires human intervention. (Mdalmau)

<TEI xml:id="someid" xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> [Source and processing information goes here] </teiHeader> <ab> <pb xml:id="p00000001" n="1"/> [main body of the unmarked up plain text begins here] <pb xml:id="p00000002" n="2"/> [more plain text goes here with appropriate page breaks interspersed] ... <pb xml:id="p00000145" n="145"/> [more plain text] <pb xml:id="p00000146" n="146"/> [text ends here] </ab> </TEI>

See an example in context…

Reference

 * Chapter 3, Elements Available in All TEI Documents (see "paragraphs" and "milestone" elements)
 * Chapter 4, Default Text Structure

Purpose
To create electronic text for full-text searching, linking to page images, and identifying simple structural hierarchy to improve navigation. (For example, you can create a table of contents from such encoding.)

Rationale
The text is mainly subordinate to the page image, though navigational markers (textual divisions, headings) are captured. However, the text could stand alone as electronic text (without page images) if the accuracy of its contents is suitable to its intended use and it is not necessary to display low-level typographic or structural information. Level 2 requires a set of elements more granular than those of Level 1, including bibliographic or structural information below the monographic or volume level. One of the motivations for using Level 2 is avoiding expensive analysis of textual elements and/or expensive double-keying or detailed proofreading of automatic OCR.

Though texts at Level 2 can be created and encoded by automated means, based on the typographic elements in the electronic file (for example, bold centered text at the top of the page surrounded by whitespace indicates a new chapter heading, and thus a new division), it is not likely to be absolutely reliable across a large body of material, especially if the materials are from earlier than 1900. Level 2 encoding requires some human intervention to identify each textual division and heading. Level 2 texts do not require any special knowledge or manual intervention below the section level.

For the most part, Level 2 texts are not intended to be displayed separately from their page images. Level 2 encoding of sections and headings provides greater navigational possibilities than Level 1 encoding, and enables searching to be restricted within particular textual divisions (for example, searching for two phrases within the same chapter).

Level 2 is most suitable for projects in which:


 * a large volume of material is to be made available online quickly
 * a digital image of each page is desired
 * the material is of interest to a large community of users who wish to read texts that allow keyword searching
 * rudimentary search and display capabilities based on the large structures of the text are desired
 * each text is checked to ensure that divisions and headers are properly identified
 * extensibility is desired; that is, one desires to keep open the option for a higher level of encoding to be added at a later date

All elements specified in Level 1 plus the following:

Level 2 Example: Basic Structure
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.tei-c.org/ns/1.0 http://www.tei-c.org/release/xml/tei/custom/schema/xsd/teilite.xsd"> <teiHeader type="text">[See above for an example of a TEI Header]</teiHeader> [titlepage information, table of contents, prefaces, etc.][optional] <pb xml:id="p21198-zz0002mpqr" n="1"/> A DISSERTATION UPON Religious Worship. <ab>[a whole section is contained within this anonymous block tag; interspersed with <pb> elements pointing to page images]<pb xml:id="p21198-zz0002mpwb" n="2"/></ab> <pb xml:id="p21198-zz0002mq0c" n="27"/> <ab> <figure xml:id="ill005"> <graphic url="imag1.jpg"/> </ab> CHAP. I. The Origin of the Customs and Ceremonies of the Jews. their federal Divisions; and the various Particulars wherein they differ. <ab>[all the paragraphs of chapter one go here with page breaks inserted]</ab> [optional] </TEI>

[I do have a full example [very long!!!], Just tell me where to load it to. (Emcaulay)

Reference

 * Chapter 4, Default Text Structure, P5 Guidelines
 * Chapter 6, Verse, P5 Guidelines
 * Chapter 14, Tables, Formulæ, and Graphics, P5 Guidelines
 * Chapter 16, Linking, Segmentation, and Alignment, P5 Guidelines (for handling notes)

Purpose
To create text that can stand alone as electronic text and identifies hierarchy (logical structure) and typography without content analysis being of primary importance.

Rationale
Level 3 texts can be created by conversion from an electronic source such as HTML or word-processed documents or a print source with the automatic generation of full text by Optical Character Recognition software. Level 3 texts can also be created from scratch (e.g.,, transcription, born digital, etc.). Encoding at this level offers the advantage of the TEI header, interoperability with other TEI collections, and extensibility to higher levels of encoding. Level 3 generally requires some human editing, but the features to be encoded are determined by the logical structure and appearance of the text and not specialized content analysis.

Level 3 texts identify front and back matter, and all paragraph breaks. The finer granularity of     encoding these features, as well as figures, notes, and all changes of typography, allows a range of      options for display, delivery, and searching. For example, one has the option of identifying and, therefore, specifying the display charactersitics of different typographic styles, and regularizing the display and placement of note text.

Level 3 texts can stand alone as text without page images and, therefore, can be uploaded, downloaded and delivered quickly, and require less storage space than digital collections with page images. However, the simple level of structural anaylsis and absence of specialized content analysis reflected in Level 3 encoding may make it desirable for some, depending on project priorities, to include page images in order to provide users with a fuller set of resources.

Level 3 is most suitable for projects with the following characteristics:

<ul> <li>the material is of interest to a large community of users who wish to read texts that allow keyword searching</li> <li>some sophistication of display, delivery, and searching based on structure of the text is desired</li> <li>each text will be checked to ensure that encoding decisions have been made appropriately</li> <li>the users of the texts may have limited storage or display capabilities</li> <li>the creator of the texts has limited or no ability to provide content expertise to analyze, tag, or review texts</li>

<li>extensibility is desired; that is, one desires to keep open the option for a higher level of      encoding to be added at a later date</li> </ul>

All elements specified in Levels 1 and 2, plus the following:

General Level 3 Recommendations
Front Matter

&lt;div type="contents"&gt;: Use lists to mark up the table of contents with the &lt;ptr&gt; tag used to reference the starting page number. The &lt;ptr&gt; tag can reference the &lt;pb&gt; identifier or an identifier (e.g., @xml:id) placed in the corresponding division of text.

Body

&lt;note&gt;: It may be desirable to move footnotes from their original location in the text. If left at the bottom of a page, a note may become included in another paragraph or section of the encoded text, and thus separated from its reference. There are options for placement of footnotes if they are moved:

<ul> <li>Inline. The note is inserted at the point of reference. An n attribute records the value of the note reference if there is one.</li> <li>End-of-Division. Notes moved to the end of the corresponding division of the text (e.g., end of chapter).</li> </ul>

Back Matter

&lt;div type="index"&gt;: Use [lists http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CONOIX] to mark up index entries with the &lt;ref&gt; tag used to reference the corresponding page number. Add the "target" attribute (@target) to reference the &lt;pb&gt; identifier to generate links from the index into the text proper.

Basic Structure: Prose (See full example)
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="VAA2383"> <teiHeader> [stuff] </teiHeader> [figure] <titlePage>[text]</titlePage> [text] [text] [book title] [text] [text] [text] [text] [text] [text] [text] </TEI>

Basic Structure: Verse (See full example)
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="VAA2383"> <teiHeader> [info] </teiHeader> <titlePage>[text]</titlePage> [text] [text] [book title] [section title] THE DAYS GONE BY. <lg> O the days gone by! O the days gone by!</l> The apples in the orchard, and the pathway through the rye;</l> The chirrup of the robin, and the whistle of the quail</l> As he piped across the meadows sweet as any nightingale;</l> When the bloom was on the clover, and the blue was in the sky,</l> And my happy heart brimmed overin the happy days gone by.</l> </lg> <lg>[lines of poetry]</lg> <lg>[lines of poetry]</lg> <lg>[lines of poetry]</lg> </TEI>

Table of Contents
CONTENTS I. A Boy and His Dog 3</hi> <ptr target="#VAA2383_011"/> II. Romance 12</hi> <ptr target="#VAA2383_020"/> III. The Costume 21</hi> <ptr target="#VAA2383_029"/> IV. Desperation 30</hi> <ptr target="#VAA2383_038"/> V. The Pageant of the Table Round 38</hi> <ptr target="#VAA2383_046"/>

Chapter with Letter
<pb xml:id="VAA2383_126" n="118"/> CHAPTER XIV MAURICE LEVY'S CONSTITUTION L</hi>O, SAM!" said Maurice cautiously. "What you doin'?"          Penrod at that instant had a singular experiencean intellectual shock like a flash of fire in the                    brain. Sitting in darkness, a great light flooded him with wild brilliance. He gasped!            "What you doin'?" asked Maurice for the third time, Sam Williams not having decided upon a reply.            <pb xml:id="VAA2383_127" n="119"/>           It was Penrod who answered.            "Drinkin' lickrish water," he said simply, and wiped his mouth with such delicious enjoyment that                        Sam's jaded thirst was instantly stimulated. He took the bottle eagerly from Penrod.             "A-a-h!" exclaimed Penrod, smacking his lips. "That was a good un!"                     Penrod uttered some muffled words and then waved both armseither in response or as an expression of his condition of mind; it may have been a gesture of despair. How much intention there was in                       this actobviously so rash, considering the position he occupiedit is impossible to say. Undeniably there must remain a suspicion of deliberate purpose. <pb xml:id="VAA2383_138" n="130"/> The damsel curtsied again and handed him the following communication, addressed to herself: <floatingText> "Dear madam Please excuse me from dancing the cotilo with you                                   this afternoon as I have fell off the barn                                 "Sincerly yours<lb/> "P ENROD</hi> S CHOFIELD</hi>." </floatingText>

Reference

 * Chapter 4, Default Text Structure, P5 Guidelines
 * Chapter 6, Verse, P5 Guidelines
 * Chapter 7, Performance Texts, P5 Guidelines
 * Chapter 14, Tables, Formulæ, and Graphics, P5 Guidelines
 * Chapter 16, Linking, Segmentation, and Alignment, P5 Guidelines
 * Chapter 3.3 Highlighting and Quotation, P5 Guidelines (see hi element)
 * Chapter 3.3.2 Emphasis, Foreign Words, and Unusual Language, P5 Guidelines (see foreign and emph elements)

Purpose
To create text that can stand alone as electronic text, identifies hierarchy and typography, specifies function of textual and structural elements, and describes the nature of the content and not merely its appearance. This level is not meant to encode or identify all structural, semantic or bibliographic features of the text.

Rationale
Greater description of function and content allows for:

<ul> <li>flexibility of display and delivery</li> <li>sophisticated searching within specified textual and structural elements</li> <li>combining the broadest range of uses and audiences</li>

</ul>

Texts encoded at Level 4 are able to stand alone as part of a library collection, and do not require page images in order for them to be read by students, scholars and general readers. This level of TEI encoding allows them to be displayed or printed in a variety of ways suitable for classroom or scholarly use.

Level 4 texts contain elements and attributes that describe content. Features of     the text that may contribute to meaning, such as indentation of verse lines and typographic change, are preserved. These are textual features that are not encoded at lower levels and that allow the text to be     used and understood fully independent of images.

The ability to stand alone as text means that Level 4 texts are more nimble and robust for exercises such as format repurposing and textual analysis.

Finally, functionally accurate encoding in Level 4 texts allows them to be searched or displayed in sophisticated ways. For example, a searcher could limit his or her search in a     dramatic text to stage directions or in a verse text to only first lines. In a politicall tract published by subscription, a     search could be confined to names that appear in lists, thus limiting a search to names of people who subscribed to a     particular volume. This ability to limit searches becomes more significant as textbases become larger, and thus is of great importance to the library community as it attempts to build into the initial design and implementation of textbases the features needed to enhance interoperability.

Level 4 is most suitable for projects with the following characteristics:

<ul> <li>sophisticated search and retrieval capabilities are desired</li>

<li>the texts will be used for textual analysis</li> <li>extensibility is desired; that is, one desires to keep open the option for a higher level of      encoding to be added by the scholarly community at a later date</li> <li>the users of the texts may have limited storage or display capabilities</li> </ul>

General Level 4 Recommendations
<ul> <li>Typographically distinct text should be encoded as appropriate, e.g. with &lt;term>, &lt;q>, &lt;gloss>, &lt;mentioned>, &lt;soCalled>, &lt;foreign&gt;, &lt;title&gt;, or &lt;emph&gt;. Any ambiguous emphasized text should be       encoded as &lt;hi&gt; (e.g. &lt;hi rend="bold"&gt;).</li> <li>It is recommended that the &lt;sic&gt; element be used to       indicate typographic errors, with corrections (if any) noted as the value of the corr= attribute.</li> <li>&lt;titlePage&gt; should include the verso if present, divided by &lt;pb n="verso"/&gt;. Tables of contents, errata, subscription lists, "other titles by the same author" should be included in a separate numbered division, as a &lt;list&gt; with &lt;item&gt;s. Frontispieces should be encoded as a &lt;figure&gt;, within a separate numbered division and &lt;p&gt;.</li> </ul>

See some examples in context…

Level 4 Prose
<ul> <li>Letters that occur within the text body provide some challenges. It is recommended that quoted letters that occur as part of a text (and not collections of letters themselves) be encoded within &lt;q&gt;&lt;text&gt;&lt;body&gt;&lt;div1 type="letter"&gt;, with &lt;opener&gt;, &lt;dateline&gt;, &lt;salute&gt;, &lt;signed&gt;, &lt;closer&gt; included as appropriate.</li>

<li>Quotations that do not occur inline, but are set off typographically in some way, should be       encoded as &lt;q&gt;.</li> <li>Notes are to be encoded as described in Level 3.</li> <li>&lt;argument&gt;, &lt;opener&gt;, &lt;epigraph&gt;, &lt;closer&gt;, &lt;trailer&gt;, &lt;add&gt;, &lt;del&gt;, &lt;unclear&gt; as appropriate.</li>

</ul>

See some examples in context…

Level 4 Drama
<ul> <li>Cast lists should be encoded as &lt;list&gt;s, with &lt;item&gt;s.</li>

<li>Speeches are encoded as &lt;sp&gt;, with speakers identified within &lt;speaker&gt; elements; stage directions are encoded as &lt;stage&gt; and enclose block level content describing scenery, etc.</li> </ul>

See some examples in context…

Level 4 Oral History
<ul> <li>Speakers in interviews can be identified in the &lt;teiHeader&gt; in several ways. <ul> <li>In the &lt;profileDesc&gt;, in the &lt;particDesc&gt; in a &lt;list&gt;, with &lt;name&gt; inside of &lt;item&gt;s.</li>

<li>As a list of author &lt;name&gt;s within &lt;fileDesc&gt;&lt;titleStmt&gt;</li> </ul> </li> <li>In either method, use an xml:id= on the &lt;name&gt; element to uniquely identify the individual participant</li>

<li>Questions and answers from interviewees and interviewers are encoded as &lt;sp&gt;, with speakers identified within &lt;speaker&gt; elements with a who= attribute the value of which corresponds to the xml:id= in the list of interview participants.</li> </ul> See some examples in context…

Level 4 Verse
<ul> <li>All verse, even poems without separate stanzas or verse paragraphs, should be contained within a       line group element &lt;lg&gt;. This will assist with automated processing and retrieval.</li> <li>It is common to see informal divisions within poems, noted by a string of asterisks or periods. These should be encoded as &lt;milestone/&gt;s with attribute values of unit="typography" and n="" indicating the character used and its occurrence, &lt;milestone unit="typography" n="******"/&gt;.</li> <li>&lt;l&gt; It is recommended that indentation be recorded and that the rend attribute be       used to do this.</li> </ul> See some examples in context…

Level 4 Front and Back Matter
<ul> <li>It is recommended that all prefaces, tables of contents, afterwords, appendices, endnotes and apparatus be encoded. For publisher's advertisements, indexes, and glossaries or other front or back matter that isn't considered of primary importance to the text, there are three options:<ul> <li>Fully transcribe and encode</li> <li>Link to page images (may include an unencoded transcription)</li> <li>Omit, noted in &lt;editorialDesc&gt;</li> </ul> </li> </ul> See some examples in context…

LEVEL 5: Scholarly Encoding Projects
Level 5 texts are those that require subject knowledge, and encode semantic, linguistic, prosodic, or     other elements beyond a basic structural level.

General Guidelines for Attribute Usage
Some general advice on the use of particular attributes follows. <ul> <li>type=: Constructing a list of acceptable attribute values for type that could find wide agreement is impossible. Instead, it is recommended that projects describe the type= attribute values used in their texts in the project ODD file or other documentation and that this list be made available to people using the texts. See ABC for Book Collectors by John Carter (8th edition, New Castle, Del. and London: Oak Knoll Books and the British Library, 2004, available online at http://www.ilab.org/images/abcforbookcollectors.pdf ) for a list of standard names and definitions of bibliographic features of printed books. For those elements where type is not required, such as &lt;head&gt; and &lt;title&gt;, use the attribute values for subtitles and additional titles, but not main titles. Example: &lt;div1 type="volume"&gt;</li>

<li>n=: Sometimes an n= (number) attribute can be used by itself. For instance in the case of      pagebreaks: Example: &lt;pb n="456"/&gt;</li> <li>xml:id=: If you are in a situation that requires you to      uniquely identify an element that will be used to reference another specific location in one or more texts, use an xml:id= attribute. The value of this attribute must be unique within a      document, and must be composed of alphanumeric characters, dots, hyphens, and underscores, and must start with a      letter. Example: &lt;note xml:id="n5" n="5"&gt;</li>

<li>target=: follows the same syntactic rules as the xml:id= attribute value. In fact, target= and xml:id= are often used in      conjunction with one another as in the case of footnotes where the &lt;anchor xml:id="n5" n="5"&gt; is at a specific place in the text and is referred to by the &lt;note target="#n5" n="5"&gt; which contains the actual information of the footnote itself elsewhere.</li>

<li>rend=: Difficulty using rend= attributes occurs when it is desirable to record more than one rendition feature. With this in mind, it is recommended that projects employ the following adaptation of "rendition ladders", a concept developed at the Brown University Women Writers Project. This system allows for sets of multiple renditional features to be included in one rend= value. Rendition ladders consist of categories of      renditional features with values of each of those features enclosed in parentheses. rend= should only be used to override a default value. For instance, if all text encoded as &lt;hi&gt; is defined as being rendered in italics, there is no reason to encode text as &lt;hi rend="font(italics)"&gt; Combining renditional features would result in a tag with attributes such as &lt;l rend="font(italics)align(right)"&gt; <ul> <li>font italics, bold, fsc (full and smallcaps), smallcap, underlined, gothic</li> <li>align right, left, center, block</li> <li>indent Values in parentheses should indicate the number of tabstops to be indented, e.g., &lt;l rend="indent(1)"&gt;</li> </ul> </li> <li>lang=: Use ISO639-2 three-character language codes. Note that this recommendation is slightly different from that of the TEI P5 Guidelines, which recommends the BCP 47 language codes.</li> </ul> References:

Appendix A: History of this Document
The Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange (referred to as    the TEI Guidelines) were first published in 1994 and represent a tremendous achievement in electronic text standards by providing a highly sophisticated structure for encoding electronic text. Digital librarians have benefited greatly from the standardization provided by these guidelines, and the potential for interoperability and long-term preservation of digital collections facilitated by their wide adoption.

In 1998, the Digital Library Federation (DLF) sponsored the TEI and XML in Digital Libraries Workshop at the Library of Congress to discuss the use of the TEI Guidelines in libraries for electronic text, and to create a set of best practices for librarians implementing them. From this workshop, three working groups were formed the members of which represented some of the largest and most mature digital library programs in the U.S.

Group 1 was charged to recommend some best practices for TEI header content and to review the relationship between the Text Encoding Initiative header and MARC. To this end, representatives of the University of Virginia Library and the University of Michigan Library gathered in Ann Arbor in early October 1998 to develop a recommended practice guide. This work was assisted by similar efforts that had taken place in the United Kingdom under the auspices of the Oxford Text Archive the previous year. The section on the header is based on a draft of those recommended practices. It was submitted to various constituencies for comment. In 2008 and 2009, it was heavily revised by Melanie Schlosser, Kevin Hawkins, and other members of the TEI SIG on Libraries.

Group 2 was charged with developing a set of recommendations for libraries using the TEI Guidelines in electronic text encoding. This group included the following representatives from six libraries: <ul> <li>LeeEllen Friedland, Library of Congress</li> <li>Nancy Kushigian, University of California, Davis</li> <li>Christina Powell, University of Michigan</li> <li>David Seaman, University of Virginia</li> <li>Natalia Smith, University of North Carolina at Chapel Hill</li> <li>Perry Willett, Indiana University (chair)</li> </ul> At the ALA mid-winter (January 1999), the DLF task force revised a draft set of best practices, called TEI Text Encoding in Libraries: Guidelines for Best Practices (referred to as <i>TEI in Libraries     Guidelines</i>). The revised recommendations were circulated to the conference working group in May 1999 and presented at the joint annual meeting of the Association of Computers and the Humanities and Association of Literary and Linguistic Computing in June 1999. Version 1.0 was circulated for comments in    August 1999. These guidelines were endorsed by the DLF, and have been used by many digital libraries, including those of the task force members, as a model for their own local best practices. Libraries, museums and end-users have benefitted from a set of best practices for electronic text in a number of    ways, including better interoperability between electronic text collections, better documented practices among digital libraries, and a starting point for discussion of best practices with commercial publishers regarding electronic text creation.

Written in 1998, this first iteration of TEI in Libraries Guidelines made no mention of XML, XSLT, or any of the other powerful tools that have now become common parlance and practice in creating digital documents and collections. Based on these important changes in markup technology, it came to the attention of the DLF and members of the original Task Force that the TEI in Libraries Guidelines required substantial revision. In 2002, the TEI Consortium published a new edition of the complete TEI Guidelines that conformed to XML specifications. In order to remain useful, the <i>TEI in     Libraries Guidelines</i> had to be updated to reflect these developments.

Furthermore, librarians need more guidance than the original TEI in Libraries Guidelines provided. There are many library-specific encoding issues which need to be addressed and documented to    ensure consistency. The intention of this document is to provide recommended paths of encoding for these issues.

In addition, these library guidelines have the potential to be much more useful if they can serve as a    training document from which librarians can learn about text encoding and addressing particular encoding challenges. To fulfill this role, the guidelines require more examples and detailed explanations, giving documentation of the use of TEI in a library context. Librarians also need a set of standards and best practices for vendors and publishers who create electronic text for digital libraries, so that these collections adhere to the same archival standards as locally-created electronic text collections. With detailed guidelines that could serve as an encoding specification, librarians might encourage vendors to follow the principles in these standards, to facilitate the long-term preservation of    commercially published electronic text collections, and more readily allow for cross-collection searching.

In order to facilitate the evolution of this document, another DLF-sponsored Task Force&mdash;some of the representatives of which were on the original Task Force&mdash;met on October 24-25, 2003 at    the Cosmos Club in Washington, D.C.: <ul> <li>Richard Gartner, Oxford University Library</li> <li>Matthew Gibson, University of Virginia Library</li> <li>Kirk Hastings, California Digital Library</li> <li>Christina Powell, University of Michigan</li> <li>Merrilee Proffitt, RLG</li> <li>David Seaman, Digital Library Federation</li> <li>Natalia Smith, University of North Carolina at Chapel Hill</li> <li>Perry Willett, Indiana University (chair)</li> </ul> These representatives met to revise the original TEI in Libraries Guidelines in order that they: <ul> <li>reflect changes occuring within the text encoding world generally and within the TEI community specifically</li> <li>further illuminate the different levels of encoding by offering clearer and more robust examples.</li> </ul> After producing Version 2.0 of the Guidelines, this group (with some changes in membership) met again at the Cosmos Club on February 13-14, 2006. Those in attendance were: <ul> <li>Syd Bauman, The TEI Consortium</li> <li>Richard Gartner, Oxford University Library (by phone)</li> <li>Matthew Gibson, Virginia Foundation for the Humanities (chair)</li> <li>Merrilee Proffitt, RLG</li> <li>Chris Powell, The University of Michigan</li> <li>David Seaman, Digital Library Federation</li> <li>Natasha Smith, University of North Carolina at Chapel Hill</li> <li>Perry Willett, The University of Michigan</li> </ul>