Difference between revisions of "GeneticEditionDraf1Comments"

From TEIWiki
Jump to navigation Jump to search
(New page: Section 3.2: In what sense is chapter 10 of the TEI Guidelines not a "thorough manuscript description" whereas HNML is? Section 4.0: I confess to once proposing the use of groups of chan...)
 
 
(88 intermediate revisions by the same user not shown)
Line 1: Line 1:
Section 3.2: In what sense is chapter 10 of the TEI Guidelines not a "thorough manuscript description" whereas HNML is?
+
Because it is difficult to record many versions in one file using markup, the proposal recommends a document-centric approach. In this method each physical document is encoded separately, even when they are just drafts of the one text. As a result there is a great deal of redundant information in their representation. This only serves to increase the work of editors and software in maintaining copies of text that are supposed to be linked or identical. It would be much more efficient and simpler to represent each instance of a piece of text that occurs exactly once in a work by a unique piece of text.
  
Section 4.0: I confess to once proposing the use of groups of changes in the Wittgenstein manuscripts. Whenever I found a simple case that could be grouped (which was rarely) I put it in, but the editors always took it out. Why? Because you can't group changes much at the markup level. The hierarchy you introduce by doing that soon breaks down in practice. Genetic texts aren't hierarchical.
+
The section on 'grouping changes' assumes that manuscript texts have a structure that can be broken down into a hierarchy of changes that can be conveniently grouped and nested arbitrarily. Similarly in section 4.1 a strict hierarchy is imposed consisting of document->writing surface->zone->line. Since Barnard's paper in 1988 where he pointed out the inherent failure of markup to adequately represent a trivial case of nested speeches and lines in Shakespeare, the problem of overlap has become the dominant issue in the digital encoding of historical texts. This representation, which seeks to reassert the OHCO thesis, which has been withdrawn by its own authors, will fail to adequately represent these complex genetic texts, which are primarily non-hierarchical in structure. Is it really possible any longer, for texts that will be subject to anything from mild to extreme overlap, to propose a standard for the future that essentially ignores the overlap problem? The past twenty years of research on this topic cannot be so lightly set aside.  
  
Genetic relations "between different parts of the text, within a single document and across several documents" cannot be expressed at all efficiently in markup. Even if you could encode it somehow using, say, links, what is going to be the user interface? You should specify that too, because if you can't use it, it may as well not be there.
+
The proposal also does not explain how it is intended to 'collate' XML documents arranged in this structure, especially when the variants are distributed via two mechanisms: as markup in individual files and also as links between documentary versions. Collation programs work by comparing basically plain text files, containing only light markup for references in COCOA or empty XML elements (as in the case of Juxta). The virtual absence of collation programs able to process arbitrary XML renders this proposal at least very difficult to achieve. It would be better if a purely digital representation of the text were the objective, since in this case, an apparatus would not be needed.
  
Section 4.1: So we have a strict hierarchy: document->writing surface->zone->line. What about changes in the margin that cross "zones", arrows connecting text in different "zones", underlining and crossing-out that spans "zones" etc etc? Not only that but zones "can be nested and grouped". Handwritten documents unfortunately don't have a hierarchical structure. Not at all. Elli Mylonas in 1996, wrote: "We now know that the breaking of strict hierarchies is the rule, not the exception". And Alan Renear in 1997: "Whatever may be said for hierarchy as a tendency, it does not seem to be, even in its perspective-contingent form, an essential aspect of textual structure." If anything has been learned in the last 20 years using markup to encode literary documents it is that.
+
The mechanism for transposition as described also sounds infeasible. It is unclear what is meant by the proposed standoff mechanism. However, if this allows chunks of transposed text to be moved around, this will fail if the chunks contain non-well-formed markup or if the destination location does not permit that markup in the schema at that point. Also if transpositions between physical versions are allowed - and this actually comprises the majority of cases - how can such a mechanism work, especially when transposed chunks may well overlap?
  
Section 4.4: How exactly are you going to collate texts in XML? If you produce an apparatus criticus of an XML text you will get unmatched tags in the footnotes. Then you will have to take out the tags to process it, so why encode it in XML in the first place?
+
The main advantage claimed for HNML and LEG/GML (=Genetic Markup Language) is that they are more succinct than a TEI encoding. If the proposed markup encoding standard is incorporated into TEI, however, this advantage will be lost. The proposed codes will just become part of the more generic, and hence more verbose, TEI language. There seems very little in the sketched proposals here that cannot already be encoded in the TEI Guidelines as they currently stand. The authors should spell out clearly which elements and attributes in their view need to be added, how accurately they can represent the textual phenomena, and how efficiently they will work in software.
 
 
Section 4.4.3: How do we make markup comment on itself? There is no clean way, except by using hacks.
 
 
 
Section 4.4.4: "The re-alignment of the transposed blocks or segments can be supplied via a stand-off mechanism" I very much doubt that this is possible. Firstly if blocks can be transposed willy-nilly they are either well-formed or not well formed. If not well formed (as is usually the case) you can't define and transpose them in this way. If well formed the schema of the document cannot permit simple chunks of well-formed XML to be shifted around the document and placed wherever is required by the transposition. You can also have transposition of markup, e.g. the moving of a line-break. Standoff techniques are mostly used in corpus linguistics texts that are unchangeable. In a humanities text you must be able to edit the text, that means redefining the standoff information if you use byte offsets and at least having to be careful if you use word elements with ids. The latter approach is not easier because it obscures the text with a flood of mostly useless markup codes. Why are current techniques for representing transposition in markup so bad? Because it's not possible to do any better.
 
 
 
Section 4.5.1: This seems to rely on interlinking to connect elements in the hierarchy by another, non-hierarchical path. Firstly, there is blind faith here that encoding the features somehow in markup will solve the problem. But you have to '''prove''' that you can compute it. You can't compute an arbitrary network of links. Try to solve the Travelling Salesman Problem instead. It's probably easier. Secondly, how many humanists would willingly work with such a complex system of markup as you propose here?
 

Latest revision as of 13:50, 24 May 2009

Because it is difficult to record many versions in one file using markup, the proposal recommends a document-centric approach. In this method each physical document is encoded separately, even when they are just drafts of the one text. As a result there is a great deal of redundant information in their representation. This only serves to increase the work of editors and software in maintaining copies of text that are supposed to be linked or identical. It would be much more efficient and simpler to represent each instance of a piece of text that occurs exactly once in a work by a unique piece of text.

The section on 'grouping changes' assumes that manuscript texts have a structure that can be broken down into a hierarchy of changes that can be conveniently grouped and nested arbitrarily. Similarly in section 4.1 a strict hierarchy is imposed consisting of document->writing surface->zone->line. Since Barnard's paper in 1988 where he pointed out the inherent failure of markup to adequately represent a trivial case of nested speeches and lines in Shakespeare, the problem of overlap has become the dominant issue in the digital encoding of historical texts. This representation, which seeks to reassert the OHCO thesis, which has been withdrawn by its own authors, will fail to adequately represent these complex genetic texts, which are primarily non-hierarchical in structure. Is it really possible any longer, for texts that will be subject to anything from mild to extreme overlap, to propose a standard for the future that essentially ignores the overlap problem? The past twenty years of research on this topic cannot be so lightly set aside.

The proposal also does not explain how it is intended to 'collate' XML documents arranged in this structure, especially when the variants are distributed via two mechanisms: as markup in individual files and also as links between documentary versions. Collation programs work by comparing basically plain text files, containing only light markup for references in COCOA or empty XML elements (as in the case of Juxta). The virtual absence of collation programs able to process arbitrary XML renders this proposal at least very difficult to achieve. It would be better if a purely digital representation of the text were the objective, since in this case, an apparatus would not be needed.

The mechanism for transposition as described also sounds infeasible. It is unclear what is meant by the proposed standoff mechanism. However, if this allows chunks of transposed text to be moved around, this will fail if the chunks contain non-well-formed markup or if the destination location does not permit that markup in the schema at that point. Also if transpositions between physical versions are allowed - and this actually comprises the majority of cases - how can such a mechanism work, especially when transposed chunks may well overlap?

The main advantage claimed for HNML and LEG/GML (=Genetic Markup Language) is that they are more succinct than a TEI encoding. If the proposed markup encoding standard is incorporated into TEI, however, this advantage will be lost. The proposed codes will just become part of the more generic, and hence more verbose, TEI language. There seems very little in the sketched proposals here that cannot already be encoded in the TEI Guidelines as they currently stand. The authors should spell out clearly which elements and attributes in their view need to be added, how accurately they can represent the textual phenomena, and how efficiently they will work in software.