SIG:IndicTexts

The purpose of the TEI Special Interest Group “Indic Texts” is to allow scholars engaged in the study of Indic texts to develop and document best practices in applying the TEI’s Guidelines to these kinds of texts. To participate, please join the mailing list at http://lists.lists.tei-c.org/mailman/listinfo/indic-texts.

There are several respects in which the applicability of the TEI guidelines to these texts is less than obvious. These relate to distinctive features of Indic textuality, including:


 * the use of syllabic scripts, and the non-coincidence of grapheme- (syllable-) and word-boundaries;
 * the application of phonotactic rules (sandhi) that further obscures the boundaries between words; extensive compounding;
 * the use of distinctive media and writing supports (such as birch bark, palm leaves, and copper plates);
 * distinctive metrical patterns with different types of caesuras;
 * the prominence of the commentary as a genre, and the depth of intertextual relations this implies;
 * the frequent reuse of texts in other texts, which requires careful and deliberate application of the "quoteLike" module.

The expected outcome of the SIG’s work is a practical guide that analyzes common cases in the markup of Indic texts and proposes ways in which the analytical tools provided by the TEI Guidelines might best be used in these cases, discussing benefits and drawbacks of the solutions possible. Ideally, this guide will become a part of the TEI Guidelines.

Manuscript Transcription
Canonically, an akṣara makes up a single "grapheme," and this is reflected in Unicode representations of Indic scripts, where consonants and independent vowels are encoded first, and then vowel-markers (and dependent consonants like anusvāraḥ and visargaḥ) are encoded subsequently as combining characters. Unless marked with a combining vowel character, or a cancellation character, consonants are understood to have an inherent vowel a.

Cancelling dependent vowels
In manuscripts, dependent vowel markers can be cancelled, and the consonant is then read with the inherent vowel a. If you want to encode this kind of change, there are technical problems, whether one is using an Indic script or an alphabetic transliteration system (like IAST or ISO-15919):
 * In Indic scripts, rendering problems are likely if the cancelled vowel marker is enclosed within the  tags, since it is a combining character;
 * In transliteration, the deletion of one vowel must be accompanied by the addition of the inherent vowel, although there is no addition marked as such in the manuscript.