Difference between revisions of "SIG:IndicTexts"

From TEIWiki
Jump to navigation Jump to search
Line 26: Line 26:
 
== Manuscript Transcription ==
 
== Manuscript Transcription ==
  
Canonically, an akṣara makes up a single "grapheme," and this is reflected in Unicode representations of Indic scripts, where consonants and independent vowels are encoded first, and then vowel-markers (and dependent consonants like ''anusvāraḥ'' and ''visargaḥ'') are encoded subsequently as combining characters. Unless marked with a combining vowel character, or a cancellation character, consonants are understood to have an inherent vowel ''a''.
+
Canonically, an akṣara makes up a single "grapheme," and this is reflected in Unicode representations of Indic scripts, where consonants and independent vowels are encoded first, and then vowel-markers (and dependent consonants like ''anusvāraḥ'' and ''visargaḥ'') are encoded subsequently as combining characters. Unless marked with a combining vowel character, or a cancellation character, consonants are understood to have an inherent vowel ''a''. The sequence of consonants within conjuncts is also canonically the same as their phonological sequence. Thus in the conjunct "rg", the "r" is represented before the "g" in transliteration, in Devanagari र्ग (0930 + 094D + 0917) and in Kannada ರ್ಗ (0CB0 + 0CCD + 0C97), although it is rendered on top of the "g" in Devanagari and to the right of the "g" in Kannada.  
  
 
=== Cancelling dependent vowels ===
 
=== Cancelling dependent vowels ===
 
In manuscripts, dependent vowel markers can be cancelled, and the consonant is then read with the inherent vowel ''a''. If you want to encode this kind of change, there are technical problems, whether one is using an Indic script or an alphabetic transliteration system (like IAST or ISO-15919):
 
In manuscripts, dependent vowel markers can be cancelled, and the consonant is then read with the inherent vowel ''a''. If you want to encode this kind of change, there are technical problems, whether one is using an Indic script or an alphabetic transliteration system (like IAST or ISO-15919):
* In Indic scripts, rendering problems are likely if the cancelled vowel marker is enclosed within the <code language="xml">&lt;del&gtl;&lt;/del&gt;</code> tags, since it is a combining character;
+
* In Indic scripts, rendering problems are likely if the cancelled vowel marker is enclosed within the <code language="xml">&lt;del&gt;&lt;/del&gt;</code> tags, since it is a combining character;
 
* In transliteration, the deletion of one vowel must be accompanied by the addition of the inherent vowel, although there is no addition marked as such in the manuscript.
 
* In transliteration, the deletion of one vowel must be accompanied by the addition of the inherent vowel, although there is no addition marked as such in the manuscript.
 +
 +
The consensus seems to be: wrap the consonant, to which these modifications are referred, in the &lt;subst&gt; element, and use the &lt;@place="implicit"&gt; attribute on &lt;add&gt; in reference to the vowel, as follows:
 +
 +
<code language="xml">
 +
&lt;subst&gt;ḷ&lt;del type="cancelled"&gt;o&lt;/del&gt;&lt;add place="implicit"&gt;a&lt;/add&gt;&lt;/subst&gt;
 +
</code>

Revision as of 20:58, 4 April 2018

The purpose of the TEI Special Interest Group “Indic Texts” is to allow scholars engaged in the study of Indic texts to develop and document best practices in applying the TEI’s Guidelines to these kinds of texts. To participate, please join the mailing list at http://lists.lists.tei-c.org/mailman/listinfo/indic-texts.

There are several respects in which the applicability of the TEI guidelines to these texts is less than obvious. These relate to distinctive features of Indic textuality, including:

  • the use of syllabic scripts, and the non-coincidence of grapheme- (syllable-) and word-boundaries;
  • the application of phonotactic rules (sandhi) that further obscures the boundaries between words; extensive compounding;
  • the use of distinctive media and writing supports (such as birch bark, palm leaves, and copper plates);
  • distinctive metrical patterns with different types of caesuras;
  • the prominence of the commentary as a genre, and the depth of intertextual relations this implies;
  • the frequent reuse of texts in other texts, which requires careful and deliberate application of the "quoteLike" module.

The expected outcome of the SIG’s work is a practical guide that analyzes common cases in the markup of Indic texts and proposes ways in which the analytical tools provided by the TEI Guidelines might best be used in these cases, discussing benefits and drawbacks of the solutions possible. Ideally, this guide will become a part of the TEI Guidelines.


Manuscript Transcription

Canonically, an akṣara makes up a single "grapheme," and this is reflected in Unicode representations of Indic scripts, where consonants and independent vowels are encoded first, and then vowel-markers (and dependent consonants like anusvāraḥ and visargaḥ) are encoded subsequently as combining characters. Unless marked with a combining vowel character, or a cancellation character, consonants are understood to have an inherent vowel a. The sequence of consonants within conjuncts is also canonically the same as their phonological sequence. Thus in the conjunct "rg", the "r" is represented before the "g" in transliteration, in Devanagari र्ग (0930 + 094D + 0917) and in Kannada ರ್ಗ (0CB0 + 0CCD + 0C97), although it is rendered on top of the "g" in Devanagari and to the right of the "g" in Kannada.

Cancelling dependent vowels

In manuscripts, dependent vowel markers can be cancelled, and the consonant is then read with the inherent vowel a. If you want to encode this kind of change, there are technical problems, whether one is using an Indic script or an alphabetic transliteration system (like IAST or ISO-15919):

  • In Indic scripts, rendering problems are likely if the cancelled vowel marker is enclosed within the <del></del> tags, since it is a combining character;
  • In transliteration, the deletion of one vowel must be accompanied by the addition of the inherent vowel, although there is no addition marked as such in the manuscript.

The consensus seems to be: wrap the consonant, to which these modifications are referred, in the <subst> element, and use the <@place="implicit"> attribute on <add> in reference to the vowel, as follows:

<subst>ḷ<del type="cancelled">o</del><add place="implicit">a</add></subst>