Samples of TEI texts
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.
Explicitly Pedagogical Samples
- TEI By Example is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. CC licensed.
- ala2004 (EpiDoc XML) from the Aphrodisias in Late Antiquity publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the EpiDoc DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under Creative Commons Attribution, so please feel free to do whatever you like with them! (Format: TEI P4)
- Archimedes Palimpsest, XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5
- The Auchinleck Manuscript, made available by the Oxford Text Archive contact firstname.lastname@example.org. This text originates from the Auchinleck Manuscript Project at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.
- Duke Databank/Heidelberg/APIS (EpiDoc XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the Integrating Digital Papyrology project. Format: TEI P5.
- EpiDoc Demo Website, a growing collection of sample EpiDoc XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)
- Folger Digital Texts: From the Folger Shakespeare Library, "Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure."
- A subset of Project Gutenberg is available as TEI, go to http://www.gutenberg.org/catalog/world/search and select "TEI Text Encoding Initiative (tei)" as the file type.
- IAph2007 (EpiDoc XML files) from the Inscriptions of Aphrodisias (2007) publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the EpiDoc DTD (version 5). These files are licensed under Creative Commons Attribution (UK), so please feel free to do exciting things with them. (Format: TEI P4)
- Inscriptions of Roman Tripolitania 2009 (EpiDoc XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.
- Files referenced in Timothy J. Finney, "Manuscript Markup," in The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial transcription of the Freer manuscript of Paul (Gregory-Aland I 016), a transform, a stylesheet and a web page produced from the transcription by the transform. (Format: TEI P5)
- The NZETC has a range of New Zealand and Pacific-Islands texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:
- Use of <revisionDesc> and <change> tags to implement workflow
- <name> tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at )
- Use of xml:lang="en" and xml:lang="mi" for texts with English and Maori (plus small amounts of other languages)
- Page images, facsimile PDFs and typeset PDFs (some texts only, for example this letter)
- Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).
- The University of Oxford Text Archive (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.
- The Perseus Project makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.
- The Samyukta Agama Project at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.
- The Chinese Buddhist Bibliographies Project at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.
- The Chinese Buddhist Temple Gazetteers Project at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.
- The Migration Samples page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.
- The BVH project (Virtual Humanistic Libraries) is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on Epistemon. These files are licenced under Creative Commons Attribution.
- TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)
- The SARIT project has recently brought out an electronic TEI-encoded edition of a 2007 print publication. It is a work on Buddhist tantric religion: Christian K. Wedemeyer, ed., Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa, (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [here]. Clicking [Downloads] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition. It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.
- La Queste del Saint Graal (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for download under a CC BY-NC-SA 3.0 license.
- "Tales" by Edgar Allan Poe at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.
- tei-examples -- Examples of TEI documents dealing with different use-cases.
- FreeDict is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting the SVN repository directly may be the better way out.
- Du Cange is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an open source project. The TEI choices are documented (in french).
- Littré a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, documented in French with the words of Littré himself