Difference between revisions of "BPG Revision 3 Public Comment Feedback"
Jump to navigation
Jump to search
Line 4: | Line 4: | ||
= Public Comment Feedback = | = Public Comment Feedback = | ||
− | Posted below are public comments we have received as of April | + | Posted below are public comments we have received as of April 30, 2009, 2:14 pm EST for review during our DLF-sponsored meeting on May 6th. |
− | All comments, except for | + | All comments, except for two, posted thus far have been on the TEI-LIB list. The non-list feedback was sent directly to Kevin and Michelle and were anonymized. |
Comments can also be followed by searching the TEI-LIB Archive, April 2009, Week 4 threads: | Comments can also be followed by searching the TEI-LIB Archive, April 2009, Week 4 threads: | ||
Line 45: | Line 45: | ||
* Soft hyphens at the end of a line or page were the greatest sinners in terms of unnecessary variance across projects [harvesting texts]. The new guidelines say nothing about EOL phenomena, but this seems to me an area where you can make things simpler for diverse users by nailing down a single practice as a firm standard. Using \u2011 consistently would help. | * Soft hyphens at the end of a line or page were the greatest sinners in terms of unnecessary variance across projects [harvesting texts]. The new guidelines say nothing about EOL phenomena, but this seems to me an area where you can make things simpler for diverse users by nailing down a single practice as a firm standard. Using \u2011 consistently would help. | ||
+ | === On Levels === | ||
+ | * It is, I think, implicit in the document, but not spelled out with sufficient clarity that the levels build on each other, so that in principle any Level 1 document can be transformed into a Level 4 document without having to redo stuff at Level 1. It would be nice if that hierarchy could be expressed in some visual form so that you can see a Level q document morphing into Level 4. Now different versions of the same document appear at different places in your draft, but you have to do a lot of scrolling forward and backward to get the point. There may be other ways of organizing the text so that the foundational nature of Level 1 and the 'building on each other' of subsequent levels can be made clearer as a guiding principle of the whole enterprise. | ||
=== Numbered Divs === | === Numbered Divs === |
Revision as of 20:15, 30 April 2009
Contents
Public Comment Feedback
Posted below are public comments we have received as of April 30, 2009, 2:14 pm EST for review during our DLF-sponsored meeting on May 6th. All comments, except for two, posted thus far have been on the TEI-LIB list. The non-list feedback was sent directly to Kevin and Michelle and were anonymized.
Comments can also be followed by searching the TEI-LIB Archive, April 2009, Week 4 threads: https://listserv.indiana.edu/cgi-bin/wa-iub.exe?A1=ind0904D&L=TEILIB-L&X=3D9C3B36AB0318DF79&Y
TEI Header
- Why do MARC245a / MARC245b / MARC245c include upper case when all the other type values I've ever seen use only lower case? Could we also have a brief mention in the guidelines of the difference between those three? The NZETC currently uses type="marc245" (changed recently from type="245" when we move to P5), and not being a cataloguer, I have no idea whether that should be a, b, or c.
- With indo, there is no discussion of what to do with 'local standards' for example in New Zealand there's a very important numbering system called Books In Maori or BIM. We've been thinking of using this in the TEI header, but it would be completely opaque to those outside New Zealand and the field of Polynesian Languages and thus needs some explanation, but it's not clear where that should go.
- Maybe add a note that when using type="translated" xml:lang also needs to be used?
- The programmers [struggle with constructions like] <author><persName>Welles, Gideon, 1802-1878.</persName></author> where a name and a date are mixed in the same field. This seems a carry-over from printed Library of Congress catalog cards, but from a programming perspective names and dates should not be mixed.
Level 1, Fully Automated Conversion and Encoding
Level 2, Minimal Encoding
- Div1, div, and ab are all found in level 1, and hence covered by the introductory sentence to the table. Either have a table consisting solely of front, back, and head, or say something like: "Use all elements specified in Level 1 with the following additions and refinements." Later on, you don't carry previously mentioned elements forward.
Level 3, Simple Analysis
Level 4, Basic Content Analysis
- Perhaps add a column indicating the TEI Tite equivalents?
- in General Level 4 Recommendations and Examples the examples have (eg)
their mannerwhich I suspect should betheir mannersince the datatype of @hand is data.pointer.- Eek! Of course, you are correct. But it's worse than that, because apparently no one has edited this example yet at all. The values of rend= don't match the recommendations made later in the same document (not a real problem, in my book, as rend= is such a variable), but worst of all, by far, the example is not well-formed (there is an "&" on the last line)!
Level 5, Scholarly Encoding Projects
- Add a section on allowable tags, but indicate that it is [not?] the whole set.
General recommendations/comments
- There are _lots_ of references to specific tags in the document. It would be great it these were converted to links to the page in the standard for the tag.
- There are lots of examples in the text. Would it be possible to submit these (or at least the short ones) for inclusion into the text of the standard?
- I think these are probably symptomatic of the fact that I feel the guidelines are too far from main TEI standard, but that the separation is possibly more about presentation and less about content. By reinforcing the links between the guidelines an the standard we bring them closer together in practise.
- This is an interesting subject which has been discussed several times - how to manage the submission and inclusion of such a set of examples. I am not sure we know the answer yet. Would we have to be told which element it was primarily designe to illustrate, or does that not matter? As Stuart already knows (it is not publicly available yet), the next release of P5 will have collated collections of example online, something like http://tei.oucs.ox.ac.uk/P5/Guidelines-web/en/html/examples-relation.html. If we had a DLF corpus of new examples, it would be relatively easy to add them to this display, with an indication whence they came. I'd be interested to work on that.
- Would contributing the examples mean that they got validated and checked against the schematron regularly as a matter of course?
- Definitely, yes. I mean, I have not thought the details through, but every component of the Guidelines must be validated as far as we possibly can. You'd be amazed (or probably not!) at how many simple errors get caught by the validation of examples and the other parts of the test process.
- Soft hyphens at the end of a line or page were the greatest sinners in terms of unnecessary variance across projects [harvesting texts]. The new guidelines say nothing about EOL phenomena, but this seems to me an area where you can make things simpler for diverse users by nailing down a single practice as a firm standard. Using \u2011 consistently would help.
On Levels
- It is, I think, implicit in the document, but not spelled out with sufficient clarity that the levels build on each other, so that in principle any Level 1 document can be transformed into a Level 4 document without having to redo stuff at Level 1. It would be nice if that hierarchy could be expressed in some visual form so that you can see a Level q document morphing into Level 4. Now different versions of the same document appear at different places in your draft, but you have to do a lot of scrolling forward and backward to get the point. There may be other ways of organizing the text so that the foundational nature of Level 1 and the 'building on each other' of subsequent levels can be made clearer as a guiding principle of the whole enterprise.
Numbered Divs
- Although it is not really present in the Guidelines, my sense is that numbered divs should almost be treated as deprecated. The problems they create, especially in interchange and aggregation is large. See http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html#DSDIV3 for a bit of a discussion.
- Under General Recommendations, there is a heading 'Text divisions', where it says: "Whether numbered or unnumbered divisions are used, the type attribute of the division element is not recommended at level 1, is optional at level 2, is recommended at level 3, and required at levels 4 and 5." Why? Is that a TEI recommendation, does it build on the earlier Guidelines or what? It is by no means obvious what the reason for this recommendation is.
Linking between text and images
- Does this mean that you are explicitly not using the text | facsimile mechanism introduced in P5? Perhaps if in fact this is not recommended for library practice it should be indicated?
Structure of the TEI Document
- Should you perhaps link in this (whole) front section to the relevant sections of the guidelines (and reference tite, where relevant). And generally be more generous with references to P5. Later you do a lot of that, but these sections here would also work.
ODDs / Schematron / Validation
Michelle excluded more technical discussions about developing the ODDs, schematron generation, etc. since that discussion doesn't necessarily impact the revisions. It is most helpful to Syd when he gets ready to generate the ODDs for levels 1-4.
- I have not followed this closely, so I don't know if someone is already working on a formal TEI customization of these best encoding practices? [in reference to schematron validation]
- The intent of at least part of the group (without support of others, but with no active objections, either, I don't think) is to convert this document into a driver and multiple ODD files[1], one for each of levels 1-4. When (or if) that happens, the normal stylesheet processing of the document would produce links, much as you see in the Guidelines.
- I created an ODD for level 1 over a year ago, just to prove the point that it could be done quite reasonably. Of course the definition of level 1 has changed since then, and I have not updated the ODD to match. I don't plan to until after the document has been stabilized. At that time (as I said above), I'm hoping to create formal specifications for all 4 levels.
- I have been debating about whether to use the tool chain as provided and create four separate ODD files, or to update the tool chain so it can process more than one <schemaSpec> per <TEI>, and use a single ODD file that defines all four levels.
- I haven't looked closely, so excuse me if this is stupid, but are these "levels" real levels? That is to say, is a level-4 conforming document also a level-1 one? Or are level-4 features actually illegal at level-1? If so, you'd only need one ODD, which would be nice. Alternatively you could maybe add a @level attribute somewhere and then use that to check allowed/disallowed features as schematron rules.
- I am not actually 100% sure whether a level-1 document (the most constrained) is actually a proper subset of level-4 or not, but it doesn't really matter. Although perhaps we could get away with one ODD in such a case, I'm not sure we'd want to. One point is to have a schema that constrains as tightly as possible. But also I don't personally really expect that many projects to use the customizations provided off-the-shelf. I expect it will be much more common that a digital library project uses one of these customizations as the starting place for its own.
- kinds defeats the purpose of best practice guidelines, no?
- I completely disagree. I can't imagine using these guidelines as-is for any published texts, but there is lots of good stuff here that I'd like to use (and have checked automagically by schema/schematron/whatever). The fact that we have our own quirks doesn't undermine the value of community to us.
Rendition
- The Library Guidelines should be recommending the use of @rendition rather than @rend with Brown rendition ladders. While the Brown system proved very useful when we only had @rend, @rendition provides a standards-based mechanism for recording multiple rendition features. It has the further advantage of being self-documenting through the use of <rendition> elements in the header. Traffic on the TEI-L list provides good evidence that @rendition has been accepted and is being used by significant numbers in the TEI community. One goal of the new Library Guidelines should be to move to P5, and @rend with rendition ladders perpetuates a P4 (and earlier) need that is no longer necessary in P5.
Interoperability / Harvesting of Texts
- Our overriding impression was that each of these archives made perfectly sensible decisions about this or that within its own domain, and none of them paid any attention to how its texts might be mixed and matched with other texts. That was reasonable ten years ago. But now we live in a world where you can multiple copies of all these archives on the hard drive of a single laptop, and people will want to mix and match. I didn't see in the new guidelines any statement of the kind: 'Whatever you do, keep very actively in mind the possibility that some folks from some other projects may want to use your texts for their purposes." The rhetoric of this draft is still of the 'different strokes for different folks' kind rather than actively and constantly nagging people to do the same thing in the same way, where the cost of this to you is small and the benefit to others is great.