BPG Revision 3 Public Comment Feedback

From TEIWiki
Jump to navigation Jump to search

TEI in Libraries: Home


Public Comment Feedback

Posted below are public comments we have received as of May 6, 2009, 11:39 EST for review during our DLF-sponsored meeting on May 6th. Most comments, except for two, posted thus far have been on the TEILIB-L list. The non-list feedback was sent directly to Kevin and Michelle or were part of the BOF discussion and were anonymized.

Comments can also be followed by searching the TEILIB-L Archive, April 2009, Week 4 threads: https://listserv.indiana.edu/cgi-bin/wa-iub.exe?A1=ind0904D&L=TEILIB-L&X=3D9C3B36AB0318DF79&Y

TEI Header

  • Why do MARC245a / MARC245b / MARC245c include upper case when all the other type values I've ever seen use only lower case? Could we also have a brief mention in the guidelines of the difference between those three? The NZETC currently uses type="marc245" (changed recently from type="245" when we move to P5), and not being a cataloguer, I have no idea whether that should be a, b, or c.
  • With indo, there is no discussion of what to do with 'local standards' for example in New Zealand there's a very important numbering system called Books In Maori or BIM. We've been thinking of using this in the TEI header, but it would be completely opaque to those outside New Zealand and the field of Polynesian Languages and thus needs some explanation, but it's not clear where that should go.
  • Maybe add a note that when using type="translated" xml:lang also needs to be used?
  • The programmers [struggle with constructions like] <author><persName>Welles, Gideon, 1802-1878.</persName></author> where a name and a date are mixed in the same field. This seems a carry-over from printed Library of Congress catalog cards, but from a programming perspective names and dates should not be mixed.
  • "Each <respStmt> must have only one resp child element and one name element, though they may occur in either order." This is rather harsh. TEI doesn't enforce this, and it can lead to massive redundancy, if one <resp> is shared by lots of people. Is there good reasons to limit it this way? I don't see programs having a problem with resp+,name+
  • I am missing profileDesc/langUsage - I guess that if everything is en(g) then why bother, but if your scope is any language?
    • In profileDesc, langUsage that element is missing. Needs to be added.
  • The recommendations for imprint/pubPlace and imprint/publisher (within sourceDesc) reference "ISBN" punctuation, but this should be "ISBD" punctuation.
  • Provide guidance for how deep to go into the header structure (whether MARC-based or not) and provide rationale for why certain tags (i.e., biblStruct) are recommended
  • Recommendation for if you are using another metadata standard to capture canonical metadata how to recommend how much of that source metadata record to include in the TEI Header.

Level 1, Fully Automated Conversion and Encoding

Level 2, Minimal Encoding

  • Div1, div, and ab are all found in level 1, and hence covered by the introductory sentence to the table. Either have a table consisting solely of front, back, and head, or say something like: "Use all elements specified in Level 1 with the following additions and refinements." Later on, you don't carry previously mentioned elements forward.

Level 3, Simple Analysis

Level 4, Basic Content Analysis

  • Perhaps add a column indicating the TEI Tite equivalents?
  • in General Level 4 Recommendations and Examples the examples have (eg) <del rend="overstrike" hand="JHL">their manner</del> which I suspect should be <del rend="overstrike" hand="#JHL">their manner</del> since the datatype of @hand is data.pointer.
    • Eek! Of course, you are correct. But it's worse than that, because apparently no one has edited this example yet at all. The values of rend= don't match the recommendations made later in the same document (not a real problem, in my book, as rend= is such a variable), but worst of all, by far, the example is not well-formed (there is an "&" on the last line)!

Level 5, Scholarly Encoding Projects

  • Add a section on allowable tags, but indicate that it is [not?] the whole set.

General recommendations/comments

  • There are _lots_ of references to specific tags in the document. It would be great it these were converted to links to the page in the standard for the tag.
  • This might be a bit more work than you are looking for, but if TEI tags were linked with their definitions in the guidelines, it could be useful to the reader. The mapping from element name to url is in P5 straightforward.
  • There are lots of examples in the text. Would it be possible to submit these (or at least the short ones) for inclusion into the text of the standard?
    • I think these are probably symptomatic of the fact that I feel the guidelines are too far from main TEI standard, but that the separation is possibly more about presentation and less about content. By reinforcing the links between the guidelines an the standard we bring them closer together in practise.
    • This is an interesting subject which has been discussed several times - how to manage the submission and inclusion of such a set of examples. I am not sure we know the answer yet. Would we have to be told which element it was primarily designe to illustrate, or does that not matter? As Stuart already knows (it is not publicly available yet), the next release of P5 will have collated collections of example online, something like http://tei.oucs.ox.ac.uk/P5/Guidelines-web/en/html/examples-relation.html. If we had a DLF corpus of new examples, it would be relatively easy to add them to this display, with an indication whence they came. I'd be interested to work on that.
    • Would contributing the examples mean that they got validated and checked against the schematron regularly as a matter of course?
      • Definitely, yes. I mean, I have not thought the details through, but every component of the Guidelines must be validated as far as we possibly can. You'd be amazed (or probably not!) at how many simple errors get caught by the validation of examples and the other parts of the test process.
  • Soft hyphens at the end of a line or page were the greatest sinners in terms of unnecessary variance across projects [harvesting texts]. The new guidelines say nothing about EOL phenomena, but this seems to me an area where you can make things simpler for diverse users by nailing down a single practice as a firm standard. Using \u2011 consistently would help.
  • Outsource workflow scenario (needs to be developed, but Tite is a related workflow); Develop and integrate workflows
  • that the Guidelines encourage the use of unicode rather than named character entities

On Filenames

  • There is some middle ground between dos and filenames using unicode; for unix, you can have 4 letter extensions and mixed case, but better not whitespace or non-ascii letters. I think this is in general a good guideline. Not sure how to formulate this though.

On Page Breaks

  • "Page breaks should be encoded using the pb element," maybe better: "using <pb/>."
  • I find the text a bit confusing, as "should always be contained within a text division" ok, so no ..<pb/>
    ... but I don't see that "contained within a text division" implies: "(rather than before the
    that ends chapter 2)" (although I do agree that
    <pb/> is better)

On Levels

  • It is, I think, implicit in the document, but not spelled out with sufficient clarity that the levels build on each other, so that in principle any Level 1 document can be transformed into a Level 4 document without having to redo stuff at Level 1. It would be nice if that hierarchy could be expressed in some visual form so that you can see a Level q document morphing into Level 4. Now different versions of the same document appear at different places in your draft, but you have to do a lot of scrolling forward and backward to get the point. There may be other ways of organizing the text so that the foundational nature of Level 1 and the 'building on each other' of subsequent levels can be made clearer as a guiding principle of the whole enterprise.
  • "Recommendations for Levels 1-4 ." These levels somehow come out of the blue - maybe a sentence saying what they are?
  • "These recommendations are meant to complement the TEI Tite customization of the TEI Guidelines." a link & one sentence on Tite? Maybe you plan to extend the introduction eventually - purpose, scope of guidelines?

On xml:lang

  • "Generally not used at levels 1 or 2" I really wish I knew what there levels were :), but anyway, why not use it? If a text is in Slovene, I'd sure document this somewhere (on <TEI> or <text>). Maybe also useful to say that 3-letter ISO codes are recommended. "At levels 3 or 4, it should contain the appropriate language subtag;" Why "subtag", and not just tag? Tag is XML loaded, so, better, "code"?

Numbered Divs

  • Although it is not really present in the Guidelines, my sense is that numbered divs should almost be treated as deprecated. The problems they create, especially in interchange and aggregation is large. See http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html#DSDIV3 for a bit of a discussion.
  • Under General Recommendations, there is a heading 'Text divisions', where it says: "Whether numbered or unnumbered divisions are used, the type attribute of the division element is not recommended at level 1, is optional at level 2, is recommended at level 3, and required at levels 4 and 5." Why? Is that a TEI recommendation, does it build on the earlier Guidelines or what? It is by no means obvious what the reason for this recommendation is.

Linking between text and images

  • Does this mean that you are explicitly not using the text | facsimile mechanism introduced in P5? Perhaps if in fact this is not recommended for library practice it should be indicated?
  • Standard TEI way of linking to image would be via @facs to facsimile/@xml:id, rather than directly to image; as you don't mention it, does this mean you advise against using it? It should maybe be explicated.
  • My question relates to digital facsimiles. http://www.personal.umich.edu/~kshawkin/teiinlibraries/#Linking_between_encoded_text_and_images_of_source_documents Why do the guidelines recommend using METS rather than the much simpler tei:facsimile? Is it that tei:facsimile is missing something essential?
    • I suspect that this (and the absence of reference to <facsimile>) may be for historical reasons -- at one stage the word I heard was that these Guidelines were anxious not to include too many P5-specific features, so as to remain useful to P4 users. I am heartened to see that they now state explicitly that the the LibraryGuidelines are intended to be a customization of P5, but unsurprised to see that some P4-idioms like this remain.

Structure of the TEI Document

  • Should you perhaps link in this (whole) front section to the relevant sections of the guidelines (and reference tite, where relevant). And generally be more generous with references to P5. Later you do a lot of that, but these sections here would also work.

ODDs / Schematron / Validation

Michelle excluded more technical discussions about developing the ODDs, schematron generation, etc. since that discussion doesn't necessarily impact the revisions. It is most helpful to Syd when he gets ready to generate the ODDs for levels 1-4.

  • I have not followed this closely, so I don't know if someone is already working on a formal TEI customization of these best encoding practices? [in reference to schematron validation]
    • The intent of at least part of the group (without support of others, but with no active objections, either, I don't think) is to convert this document into a driver and multiple ODD files[1], one for each of levels 1-4. When (or if) that happens, the normal stylesheet processing of the document would produce links, much as you see in the Guidelines.
    • I created an ODD for level 1 over a year ago, just to prove the point that it could be done quite reasonably. Of course the definition of level 1 has changed since then, and I have not updated the ODD to match. I don't plan to until after the document has been stabilized. At that time (as I said above), I'm hoping to create formal specifications for all 4 levels.
  • I have been debating about whether to use the tool chain as provided and create four separate ODD files, or to update the tool chain so it can process more than one <schemaSpec> per <TEI>, and use a single ODD file that defines all four levels.
  • I haven't looked closely, so excuse me if this is stupid, but are these "levels" real levels? That is to say, is a level-4 conforming document also a level-1 one? Or are level-4 features actually illegal at level-1? If so, you'd only need one ODD, which would be nice. Alternatively you could maybe add a @level attribute somewhere and then use that to check allowed/disallowed features as schematron rules.
    • I am not actually 100% sure whether a level-1 document (the most constrained) is actually a proper subset of level-4 or not, but it doesn't really matter. Although perhaps we could get away with one ODD in such a case, I'm not sure we'd want to. One point is to have a schema that constrains as tightly as possible. But also I don't personally really expect that many projects to use the customizations provided off-the-shelf. I expect it will be much more common that a digital library project uses one of these customizations as the starting place for its own.
    • kinds defeats the purpose of best practice guidelines, no?
    • I completely disagree. I can't imagine using these guidelines as-is for any published texts, but there is lots of good stuff here that I'd like to use (and have checked automagically by schema/schematron/whatever). The fact that we have our own quirks doesn't undermine the value of community to us.

Rendition

  • The Library Guidelines should be recommending the use of @rendition rather than @rend with Brown rendition ladders. While the Brown system proved very useful when we only had @rend, @rendition provides a standards-based mechanism for recording multiple rendition features. It has the further advantage of being self-documenting through the use of <rendition> elements in the header. Traffic on the TEI-L list provides good evidence that @rendition has been accepted and is being used by significant numbers in the TEI community. One goal of the new Library Guidelines should be to move to P5, and @rend with rendition ladders perpetuates a P4 (and earlier) need that is no longer necessary in P5.
  • I wonder about the use of "rendition ladders" instead of CSS? CSS could be used inline as the value of @rend (like @style in HTML), but NB in P5 one can now explicitly include a full CSS stylesheet in the teiHeader, containing complex styles (collections of CSS property assignments), and then reference these by name, using @rendition.
    • I have to agree; most importantly because most of us have no software to parse the "rendition ladders". The idea was a good one when it was devised, but I think CSS notation is so widespread now that it is silly not to use it. Especially since it can so easily be used for actual rendition.
    • As has already commented on rendition ladders, it seems a bit odd to be recommending a syntax which is so little implemented when one that *every browser* supports (to some extent) could be equally well used. If the feeling is that doing the job to the max, using @rendition and the <rendition> element, is felt to be too complex, would it would not be more useful to specify a commonly-implemented subset of CSS styles that most browsers can be relied on to not make too much of a hash of?

Interoperability / Harvesting of Texts

  • Our overriding impression was that each of these archives made perfectly sensible decisions about this or that within its own domain, and none of them paid any attention to how its texts might be mixed and matched with other texts. That was reasonable ten years ago. But now we live in a world where you can multiple copies of all these archives on the hard drive of a single laptop, and people will want to mix and match. I didn't see in the new guidelines any statement of the kind: 'Whatever you do, keep very actively in mind the possibility that some folks from some other projects may want to use your texts for their purposes." The rhetoric of this draft is still of the 'different strokes for different folks' kind rather than actively and constantly nagging people to do the same thing in the same way, where the cost of this to you is small and the benefit to others is great.
    • I would just want to point out that concerns over interoperability are not merely a reflection of our own particular desires (as text analysis practitioners) for consistency. Interoperability is stated prominently as the first of four "design goals" for TEI in the P5 Guidelines ("provide a standard format for data interchange"). For that reason alone, insistence on some of Martin's concerns should, I think, be part of any best practices statement.