Difference between revisions of "GeneticEditionDraf1Comments"

From TEIWiki
Jump to navigation Jump to search
Line 1: Line 1:
 
Because it is difficult to record many versions in one file using markup, the proposal recommends a document-centric approach. In this method each physical document is encoded separately, even when they are just drafts of the one text. As a result there is a great deal of redundant information in their representation. This only serves to increase the work of editors and software in maintaining copies of text that are supposed to be linked or identical. It would be much more efficient and simpler to represent each instance of a piece of text that occurs exactly once in a work by a unique piece of text.  
 
Because it is difficult to record many versions in one file using markup, the proposal recommends a document-centric approach. In this method each physical document is encoded separately, even when they are just drafts of the one text. As a result there is a great deal of redundant information in their representation. This only serves to increase the work of editors and software in maintaining copies of text that are supposed to be linked or identical. It would be much more efficient and simpler to represent each instance of a piece of text that occurs exactly once in a work by a unique piece of text.  
  
The section on 'grouping changes' assumes that manuscript texts have a structure that can be broken down into a hierarchy of changes that can be conveniently grouped and nested arbitrarily. Similarly in section 4.1 a strict hierarchy is imposed consisting of document->writing surface->zone->line. Since Barnard's paper in 1988 where he pointed out the inherent failure of markup to adequately represent a trivial case of nested speeches and lines in Shakespeare, the problem of overlap has become the dominant issue in the digital encoding of historical texts. This representation, which seeks to reassert the OHCO thesis, which has been withdrawn by its own authors, will fail to adequately represent these complex genetic texts until it is recognised that they are primarily non-hierarchical. The excessive use of linking between XML elements to represent complex textual phenomena, as described in the proposal and its supporting documentation, can only result in spaghetti-like markup that will be difficult to edit, excessively complex, uncomputable, and inadequate in its representation of the data.
+
The section on 'grouping changes' assumes that manuscript texts have a structure that can be broken down into a hierarchy of changes that can be conveniently grouped and nested arbitrarily. Similarly in section 4.1 a strict hierarchy is imposed consisting of document->writing surface->zone->line. Since Barnard's paper in 1988 where he pointed out the inherent failure of markup to adequately represent a trivial case of nested speeches and lines in Shakespeare, the problem of overlap has become the dominant issue in the digital encoding of historical texts. This representation, which seeks to reassert the OHCO thesis, which has been withdrawn by its own authors, will fail to adequately represent these complex genetic texts until it is recognised that they are primarily non-hierarchical. The excessive use of linking between XML elements to represent complex textual phenomena, as described in the proposal and its supporting documentation, can only result in spaghetti-like markup that will be difficult to edit, excessively complex, incomputable, and inadequate in its representation of the data.
  
 
The proposal does not explain how it is intended to 'collate' XML documents arranged in this structure, especially when the variants are distributed via two mechanisms: as markup in individual files and also as links between documentary versions. Collation programs work by comparing basically plain text files, containing only light markup for references in COCOA or empty XML elements (as in the case of Juxta). The virtual absence of collation programs able to process arbitrary XML renders this proposal at least very difficult to achieve. It would be better if a purely digital representation of the text were the objective, since in this case, an apparatus would not be needed.
 
The proposal does not explain how it is intended to 'collate' XML documents arranged in this structure, especially when the variants are distributed via two mechanisms: as markup in individual files and also as links between documentary versions. Collation programs work by comparing basically plain text files, containing only light markup for references in COCOA or empty XML elements (as in the case of Juxta). The virtual absence of collation programs able to process arbitrary XML renders this proposal at least very difficult to achieve. It would be better if a purely digital representation of the text were the objective, since in this case, an apparatus would not be needed.

Revision as of 22:43, 21 May 2009

Because it is difficult to record many versions in one file using markup, the proposal recommends a document-centric approach. In this method each physical document is encoded separately, even when they are just drafts of the one text. As a result there is a great deal of redundant information in their representation. This only serves to increase the work of editors and software in maintaining copies of text that are supposed to be linked or identical. It would be much more efficient and simpler to represent each instance of a piece of text that occurs exactly once in a work by a unique piece of text.

The section on 'grouping changes' assumes that manuscript texts have a structure that can be broken down into a hierarchy of changes that can be conveniently grouped and nested arbitrarily. Similarly in section 4.1 a strict hierarchy is imposed consisting of document->writing surface->zone->line. Since Barnard's paper in 1988 where he pointed out the inherent failure of markup to adequately represent a trivial case of nested speeches and lines in Shakespeare, the problem of overlap has become the dominant issue in the digital encoding of historical texts. This representation, which seeks to reassert the OHCO thesis, which has been withdrawn by its own authors, will fail to adequately represent these complex genetic texts until it is recognised that they are primarily non-hierarchical. The excessive use of linking between XML elements to represent complex textual phenomena, as described in the proposal and its supporting documentation, can only result in spaghetti-like markup that will be difficult to edit, excessively complex, incomputable, and inadequate in its representation of the data.

The proposal does not explain how it is intended to 'collate' XML documents arranged in this structure, especially when the variants are distributed via two mechanisms: as markup in individual files and also as links between documentary versions. Collation programs work by comparing basically plain text files, containing only light markup for references in COCOA or empty XML elements (as in the case of Juxta). The virtual absence of collation programs able to process arbitrary XML renders this proposal at least very difficult to achieve. It would be better if a purely digital representation of the text were the objective, since in this case, an apparatus would not be needed.

The mechanism for transposition as described also sounds infeasible. It is unclear what is meant by the proposed standoff mechanism. However, if this allows chunks of transposed text to be moved around, this will fail if the chunks contain non-well-formed markup or if the destination location does not permit that markup in the schema at that point. Also if transpositions between physical versions are allowed - and this actually comprises the majority of cases - how can such a mechanism work, especially when transposed chunks may well overlap?

The main advantage claimed for HNML and LEG/GML (=Genetic Markup Language) is that they are more succinct than a TEI encoding. If the proposed markup encoding standard is incorporated into TEI, however, this advantage will be lost. The proposed codes will just become part of the more generic, and hence more verbose, TEI language. There seems very little in the sketched proposals here that cannot already be encoded in the TEI Guidelines as they currently stand. The authors should spell out clearly which elements and attributes in their view need to be added, and what functional advantage they expect to result from the proposed modifications.

The public discussion of this draft is also a little underwhelming. Most of those who will be expected to use the encoding guidelines for genetic editions will have had no say in its development. In this Web 2.0 age at least an open, online discussion forum would be normal. Instead we have a small group of academics who discuss the contents behind closed doors. End users may perhaps be excused for ignoring a result which is not subject to true peer review.