Difference between revisions of "Talk:Best Practices for TEI in Libraries"

From TEIWiki
Jump to navigation Jump to search
(Kevin's responses to Becky's edits)
(replaced contents with latest list of things to be done before release)
Line 1: Line 1:
'''''Introduction'''''
+
The following are revisions to make to the BP before making an official "release". There is a separate list of [[Future changes to Best Practices for TEI in Libraries]].
   
 
1) Definition of level 5 encoding currently reads:
 
  
"The text is generated either through corrected OCR or keyboarding, but the tagging requires substantial human intervention by encoders with subject knowledge. "
+
= Dependent upon pending revisions to Tite =
  
I suggest instead:
+
== Add Tite as Level 3.5 ==
  
"The text is generated either through corrected OCR or keyboarding, '''and''' the tagging requires substantial human intervention by encoders with subject knowledge, "
+
This was [[Minutes_from_November_14%2C_2009#Harmonizing_TEI_Tite_with_the_Best_Practices:_Is_it_worth_pursuing.3F|strongly recommended by Daniel Pitti]] in Ann Arbor because he felt certain that administrators and funders would be confused about the difference between TEI Tite and the Best Practices ("don't the libraries already have a TEI customization?"); in fact, Kevin has known this same confusion to arise among TEI Council members.  While we have a section of the BP discussion its relationship to Tite, by having a Level 3.5, we can be more explicit about mapping between the two.
  
because corrected OCR, keyboarding, and expert tagging ALL require substantial human intervention (though the first two, of course, don't require subject knowledge, and perhaps that is the point of the original phrasing)
+
Naturally we will not be able to describe Tite the way we do other levels -- by simply saying "all the elements in the previous levels, plus the following".  Tite uses different element names of all sorts. There's no point in having Syd make an ODD for Tite since one already exists.  So what Kevin envisions here is a sort of "sidebar" about Tite, inserted between Levels 3 and 4 that discusses Tite in a bit more detail than we currently have in the beginning of the BP, with particular discussion of mapping between the two.
  
: Corrected. ([[User:Kshawkin|Kshawkin]])
+
Would someone be willing to write a first draft of all of this?  Two paragraphs are already written for you, and you can pull more information from Tite's discussion of an earlier version of the Best Practices.
  
2) "If a library uses TEI Tite to outsource its encoding, it should find conversion of TEI Tite files to be trivial: to Level 3 with some loss of granularity and to Level 4 with the addition of some markup, which amounts to minimal human intervention."
+
== Revise section on hyphenation ==
  
Should the colon after "trivial" be there?
+
Revise the section on hyphenation per outcome of the discussion on TEI-L and perhaps also on how this is handled in the ongoing Tite revisions.
  
: Now substantially reworded for clarity.  You're not the first to be confused by this passage. ([[User:Kshawkin|Kshawkin]])
+
= Other issues to resolve before releasing =
  
'''''2.9 General Guidelines for Attribute Usage'''''
+
== Test ODDs and schemas derived from them ==
  
1) Since this isn't a comprehensive list of attributes (I don't think), why bother including the "xml:id" and "target" attributes if specific details about how libraries should use these is not actually included in this document? Is the documentation for these elements considered important to these guidelines, but too extensive to replicate? How does this differ from the specific best practices given for other attributes listed here, like "n" or "rend"?
+
Test Syd's ODDs and schemas derived from them: http://bauman.zapto.org/~syd/temp/BestPractices/ .  Just go to that URL, download the .rng files, and create a new XML document based on the schema. So if it allows you to insert all the elements you expect to be able to insert. Syd has been asked to make the following changes:
  
: There used to be specific guidance on these elements, which was stripped downGood point. I removed these sections. ([[User:Kshawkin|Kshawkin]])
+
* in header ODD, allow only a structured <publicationStmt>
 +
* lib1.rng: <oXygen/> says "Errors encountered: Probably no start pattern found".
 +
* The only allowed child of front, body, or back *at any level* should be a div.
 +
* note should not be allowed at in Level 1 or Level 2
 +
* ab should be the only child allowed of any div (in both Level 1 and Level 2)This element seems to be missing from the schema.
 +
* floatingText is missing in Level 3 or Level 4 schemas.
  
2) Under "key and ref":
+
== Use of any P5 attributes ==
  
"For example,
+
Determine whether to change the prose of the BP to say that you can use any attribute you find in P5 for elements within <text> (as opposed to in <teiHeader>, where Kevin believes we've settled on using just the attributes given in the BP section on the header).
  
<author><persName type="marc100" key="lccn-n78-95332">Shakespeare, William, 1564-1616</persName></author>
+
== Direction of pointing between note references and notes themselves ==
  
gives a project-interal key (lccn-n78-95332) for this name in the Library of Congress Name Authority File. Values of key attributes may be partially explained in a non-machine-readable way through use of a taxonomy element: "
+
Decide whether to change back to having &lt;ref&gt; point to &lt;note&gt; instead of &lt;note&gt; point to &lt;ref&gt;, as Syd recommended. See this ticket:
  
should "project-interal" be "project-internal?"  Or "project-integral?" Or something else?
+
https://sourceforge.net/tracker/?func=detail&aid=2796148&group_id=106328&atid=644062
  
: Was supposed to be "project-internal".  I've now reworded to say "project-specific". ([[User:Kshawkin|Kshawkin]])
+
and this change to the Guidelines:
  
3) Under "rend and rendition":  
+
http://tei.svn.sourceforge.net/viewvc/tei/trunk/P5/Source/Guidelines/en/CO-CoreElements.xml?r1=6937&r2=6936&pathrev=6937
  
"The rend and rendition attributes may be used when it is desirable to record information about how the content object was displayed in the source document. "
+
or, for the full story, see Kevin's email from Nov. 6 and previous quoted messages.
  
Is it meant to read "content object," or just "content," or even just "object?"  Having both sounds strange to me, but perhaps it's TEI terminology with which I'm not familiar.
+
== meeting element ==
  
: This is jargon which I picked up from Allen RenearI've changed to "textual feature". ([[User:Kshawkin|Kshawkin]])
+
Decide whether to include &lt;meeting&gt; in sourceDesc/biblStruct/monogr/ and/or in titleStmt.  (Per a change on 2010-01-15 in SourceForge, meeting is now allowed in titleStmt.)  As Kevin discussed in an email sent on Oct. 12, the name of a meeting is usually included in a MARC record, but it's not distinguished from an author or editor in the same way TEI divides up the worldThe essential question is: if you digitize a volume of conference proceedings, is the name of the meeting, as opposed to the title of the volume, really important enough to warrant inclusion in the TEI header?  If so, we need to wrestle with the questions Kevin brought up on Oct. 12.
  
'''''4.2 The TEI Header'''''
+
== appInfo and application ==
  
1) Currently reads:
+
Decide whether to include &lt;appInfo&gt; and &lt;application&gt; in our header recommendationsIn email discussions, Syd saw them as useful, but Lisa didn't think we need them.
"The TEI header is a metadata record that describes an electronic text encoded according to the TEI specification."
 
 
 
Since there are multiple levels of encoding (does this translate to multiple "specifications?"), should this read either
 
 
 
a) "...encoded according to '''a''' TEI specification"
 
or
 
b) "...encoded according to the TEI specification'''s'''" ?
 
 
 
: Reworded this paragraph a bit. ([[User:Kshawkin|Kshawkin]])
 
 
 
'''''4.4 The TEI Header and Other Metadata Schemas
 
'''''
 
1) Currently reads:
 
 
 
"Unfortunately, there is currently no mechanism for specifying that the content of an element should be drawn from an outside metadata source or that it should supplement the content of the element"
 
 
 
To me, the "it" was confusing/ambiguous--I suggest instead:
 
 
 
"Unfortunately, there is currently no mechanism for specifying that the content of an element should be drawn from an outside metadata source or that '''outside metadata''' should supplement the content of the element"
 
 
 
This feels a little more redundant/wordy, perhaps, but it is clearer.
 
 
 
: Reworded ([[User:Kshawkin|Kshawkin]])
 
 
 
'''''4.5 Determining Data Values for the TEI Header'''''
 
 
 
1) Currently reads:
 
 
 
"'''If there is no digitized title page but the header creator has satisfactory evidence of the source document, the header creator should refer to the source document for metadata creation.''' The lack of a title page may be for one of many reasons: for example, the original document is a manuscript item, or the electronic edition is a portion of the original object (a poem or short story that was published in a collection or an article from a serial). In all cases, it is recommended that important bibliographic evidence, such as a digitized image of the title page and title page verso for a collection, be provided to the header creator, even if just a piece of the collection is used."
 
 
 
Does "source document" refer an analog (physical) source document? Or digitized pages, just lacking a title page?  Or OCR or keyboarded text? Or any or all of these things?  What counts as "evidence" of a source document?
 
 
 
: Reworded. ([[User:Kshawkin|Kshawkin]])
 
 
 
Follow up question: If the electronic text already exists, wouldn't title page information be captured in the <code><text></code> element, and so metadata for the header could be gathered from here even without a facsimile of the title page?
 
 
 
: Yes, this is what was meant for the first item in the list: the case of a digitized title page and title page verso, where you use encoded text as the source.
 
 
 
'''''4.6 Element Recommendations for the TEI Header'''''
 
 
 
1) Under the instructions for the <code>title</code> element that falls within <code><sourceDesc></code>, it currently reads:
 
 
 
"At least one title element is required for the title of the source document. Give the title according to the national cataloging code. Use a type attribute with a value of marc245c to give the statement of responsibility from a MARC record. "
 
 
 
The information in the second sentence (about marc245c) is immediately reiterated, along with other information, in a list of the possible <code>type</code> attributes that can be used for this element.  So, stating it here seems unnecessary and also confusing--without having seen yet that we can also use marc245a and marc245b for the other elements of the title, I don't know why we've skipped right to statements of responsibility in a title element (but I'm not a cataloger)
 
 
 
: Reworded. ([[User:Kshawkin|Kshawkin]])
 
 
 
2) Within <code><profileDesc></code>, is <code><keywords scheme=></code> only used if its <code><term></code> children come from a specific controlled vocabulary?  Can there be <code><term></code>s without a parent <code><keywords scheme=></code>?
 
 
 
: No, the term element may not occur without a containing keywords element. ([[User:Kshawkin|Kshawkin]])
 
 
 
'''''5.1.3 Rationale for Level 1 Encoding'''''
 
 
 
Under the characteristics of projects best-suited for Level 1 encoding, may I suggest the following addition:
 
 
 
-the source documents are printed or nearly all printed
 
 
 
Or something along those lines, since OCR is pretty useless on manuscript materials, and even on complicated typefaces like blackletter/Fraktur.
 
 
 
: OCR is getting better at handling these things if done properly, so I hesitate to say this. ([[User:Kshawkin|Kshawkin]])
 
 
 
The other characteristics listed do describe the project workflow, not the physical documents, so perhaps this type of specific characteristic is not appropriate after all.  It seems worth noting somewhere in some fashion that certain collections, such as manuscript materials and some types of print, will by their very nature almost certainly require a higher level of encoding.  But where or how?  Or is this even necessary?  Perhaps it is already implied, assumed, or generally understood.
 
 
 
: I think this is understood, but I'll think about it a bit more.  ([[User:Kshawkin|Kshawkin]])
 

Revision as of 20:43, 14 March 2010

The following are revisions to make to the BP before making an official "release". There is a separate list of Future changes to Best Practices for TEI in Libraries.

Dependent upon pending revisions to Tite

Add Tite as Level 3.5

This was strongly recommended by Daniel Pitti in Ann Arbor because he felt certain that administrators and funders would be confused about the difference between TEI Tite and the Best Practices ("don't the libraries already have a TEI customization?"); in fact, Kevin has known this same confusion to arise among TEI Council members. While we have a section of the BP discussion its relationship to Tite, by having a Level 3.5, we can be more explicit about mapping between the two.

Naturally we will not be able to describe Tite the way we do other levels -- by simply saying "all the elements in the previous levels, plus the following". Tite uses different element names of all sorts. There's no point in having Syd make an ODD for Tite since one already exists. So what Kevin envisions here is a sort of "sidebar" about Tite, inserted between Levels 3 and 4 that discusses Tite in a bit more detail than we currently have in the beginning of the BP, with particular discussion of mapping between the two.

Would someone be willing to write a first draft of all of this? Two paragraphs are already written for you, and you can pull more information from Tite's discussion of an earlier version of the Best Practices.

Revise section on hyphenation

Revise the section on hyphenation per outcome of the discussion on TEI-L and perhaps also on how this is handled in the ongoing Tite revisions.

Other issues to resolve before releasing

Test ODDs and schemas derived from them

Test Syd's ODDs and schemas derived from them: http://bauman.zapto.org/~syd/temp/BestPractices/ . Just go to that URL, download the .rng files, and create a new XML document based on the schema. So if it allows you to insert all the elements you expect to be able to insert. Syd has been asked to make the following changes:

  • in header ODD, allow only a structured <publicationStmt>
  • lib1.rng: <oXygen/> says "Errors encountered: Probably no start pattern found".
  • The only allowed child of front, body, or back *at any level* should be a div.
  • note should not be allowed at in Level 1 or Level 2
  • ab should be the only child allowed of any div (in both Level 1 and Level 2). This element seems to be missing from the schema.
  • floatingText is missing in Level 3 or Level 4 schemas.

Use of any P5 attributes

Determine whether to change the prose of the BP to say that you can use any attribute you find in P5 for elements within <text> (as opposed to in <teiHeader>, where Kevin believes we've settled on using just the attributes given in the BP section on the header).

Direction of pointing between note references and notes themselves

Decide whether to change back to having <ref> point to <note> instead of <note> point to <ref>, as Syd recommended. See this ticket:

https://sourceforge.net/tracker/?func=detail&aid=2796148&group_id=106328&atid=644062

and this change to the Guidelines:

http://tei.svn.sourceforge.net/viewvc/tei/trunk/P5/Source/Guidelines/en/CO-CoreElements.xml?r1=6937&r2=6936&pathrev=6937

or, for the full story, see Kevin's email from Nov. 6 and previous quoted messages.

meeting element

Decide whether to include <meeting> in sourceDesc/biblStruct/monogr/ and/or in titleStmt. (Per a change on 2010-01-15 in SourceForge, meeting is now allowed in titleStmt.) As Kevin discussed in an email sent on Oct. 12, the name of a meeting is usually included in a MARC record, but it's not distinguished from an author or editor in the same way TEI divides up the world. The essential question is: if you digitize a volume of conference proceedings, is the name of the meeting, as opposed to the title of the volume, really important enough to warrant inclusion in the TEI header? If so, we need to wrestle with the questions Kevin brought up on Oct. 12.

appInfo and application

Decide whether to include <appInfo> and <application> in our header recommendations. In email discussions, Syd saw them as useful, but Lisa didn't think we need them.