Difference between revisions of "Talk:Best Practices for TEI in Libraries"

Revision as of 20:33, 18 June 2009

Introduction

1) Definition of level 5 encoding currently reads:

"The text is generated either through corrected OCR or keyboarding, but the tagging requires substantial human intervention by encoders with subject knowledge. "

I suggest instead:

"The text is generated either through corrected OCR or keyboarding, and the tagging requires substantial human intervention by encoders with subject knowledge, "

because corrected OCR, keyboarding, and expert tagging ALL require substantial human intervention (though the first two, of course, don't require subject knowledge, and perhaps that is the point of the original phrasing)

2) "If a library uses TEI Tite to outsource its encoding, it should find conversion of TEI Tite files to be trivial: to Level 3 with some loss of granularity and to Level 4 with the addition of some markup, which amounts to minimal human intervention."

Should the colon after "trivial" be there?

2.9 General Guidelines for Attribute Usage

1) Since this isn't a comprehensive list of attributes (I don't think), why bother including the "xml:id" and "target" attributes if specific details about how libraries should use these is not actually included in this document? Is the documentation for these elements considered important to these guidelines, but too extensive to replicate? How does this differ from the specific best practices given for other attributes listed here, like "n" or "rend"?

2) Under "key and ref":

"For example,

<author><persName type="marc100" key="lccn-n78-95332">Shakespeare, William, 1564-1616</persName></author>

gives a project-interal key (lccn-n78-95332) for this name in the Library of Congress Name Authority File. Values of key attributes may be partially explained in a non-machine-readable way through use of a taxonomy element: "

should "project-interal" be "project-internal?" Or "project-integral?" Or something else?

3) Under "rend and rendition":

"The rend and rendition attributes may be used when it is desirable to record information about how the content object was displayed in the source document. "

Is it meant to read "content object," or just "content," or even just "object?" Having both sounds strange to me, but perhaps it's TEI terminology with which I'm not familiar.

4.2 The TEI Header

1) Currently reads: "The TEI header is a metadata record that describes an electronic text encoded according to the TEI specification."

Since there are multiple levels of encoding (does this translate to multiple "specifications?"), should this read either

a) "...encoded according to a TEI specification" or b) "...encoded according to the TEI specifications" ?

4.4 The TEI Header and Other Metadata Schemas

1) Currently reads:

"Unfortunately, there is currently no mechanism for specifying that the content of an element should be drawn from an outside metadata source or that it should supplement the content of the element"

To me, the "it" was confusing/ambiguous--I suggest instead:

"Unfortunately, there is currently no mechanism for specifying that the content of an element should be drawn from an outside metadata source or that outside metadata should supplement the content of the element"

This feels a little more redundant/wordy, perhaps, but it is clearer.

4.5 Determining Data Values for the TEI Header

1) Currently reads:

"If there is no digitized title page but the header creator has satisfactory evidence of the source document, the header creator should refer to the source document for metadata creation. The lack of a title page may be for one of many reasons: for example, the original document is a manuscript item, or the electronic edition is a portion of the original object (a poem or short story that was published in a collection or an article from a serial). In all cases, it is recommended that important bibliographic evidence, such as a digitized image of the title page and title page verso for a collection, be provided to the header creator, even if just a piece of the collection is used."

Does "source document" refer an analog (physical) source document? Or digitized pages, just lacking a title page? Or OCR or keyboarded text? Or any or all of these things? What counts as "evidence" of a source document?

Follow up question: If the electronic text already exists, wouldn't title page information be captured in the <text> element, and so metadata for the header could be gathered from here even without a facsimile of the title page?

4.6 Element Recommendations for the TEI Header

1) Under the instructions for the title element that falls within <sourceDesc>, it currently reads:

"At least one title element is required for the title of the source document. Give the title according to the national cataloging code. Use a type attribute with a value of marc245c to give the statement of responsibility from a MARC record. "

The information in the second sentence (about marc245c) is immediately reiterated, along with other information, in a list of the possible type attributes that can be used for this element. So, stating it here seems unnecessary and also confusing--without having seen yet that we can also use marc245a and marc245b for the other elements of the title, I don't know why we've skipped right to statements of responsibility in a title element (but I'm not a cataloger)

@@ Line 1: / Line 1: @@
-==Introduction ==
+'''''Introduction'''''
 ) Definition of level 5 encoding currently reads:
@@ Line 17: / Line 17: @@
-== 2.9 General Guidelines for Attribute Usage  ==
+'''''2.9 General Guidelines for Attribute Usage'''''
 ) Since this isn't a comprehensive list of attributes (I don't think), why bother including the "xml:id" and "target" attributes if specific details about how libraries should use these is not actually included in this document?  Is the documentation for these elements considered important to these guidelines, but too extensive to replicate?  How does this differ from the specific best practices given for other attributes listed here, like "n" or "rend"?
@@ Line 38: / Line 38: @@
-== 4.2 The TEI Header ==
+'''''4.2 The TEI Header'''''
 ) Currently reads:
@@ Line 50: / Line 50: @@
-== 4.4 The TEI Header and Other Metadata Schemas ==
+'''''4.4 The TEI Header and Other Metadata Schemas
+'''''
 ) Currently reads:
@@ Line 63: / Line 63: @@
-== 4.5 Determining Data Values for the TEI Header ==
+'''''4.5 Determining Data Values for the TEI Header'''''
 ) Currently reads:
@@ Line 73: / Line 73: @@
 Follow up question: If the electronic text already exists, wouldn't title page information be captured in the <code><text></code> element, and so metadata for the header could be gathered from here even without a facsimile of the title page?
-== 4.6 Element Recommendations for the TEI Header ==
+'''''4.6 Element Recommendations for the TEI Header'''''
 ) Under the instructions for the <code>title</code> element that falls within <code><sourceDesc></code>, it currently reads:

Difference between revisions of "Talk:Best Practices for TEI in Libraries"

Revision as of 20:33, 18 June 2009

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools