Talk:Best Practices for TEI in Libraries
Possible expanded filename recommendations
Standardized file naming for a particular encoding project is key for reliable online storage and delivery of these files. Consider the following best practices when determining the file name scheme for your project:
- Each filename should contain an identifier that uniquely specifies a single digital object within the parent collection (e.g., a parent collection of text, images and other related materials)
- Each filename should be fully specified. It should not just be a sequence number that is dependent on location within a directory structure for context
- Filenames should not include spaces
- Filenames should following a predicatble case constructions (e.g., all lowercase, camelCase, etc.)
- The first character of the filename should be an ASCII letter ('a' through 'z' or 'A' through 'Z') to comply with current restrictions on identifiers by many programming and metadata languages such as METS
- The "base" filename may include only ASCII letters ('a' through 'z' and 'A' through 'Z'), ASCII digits ('0' through '9'), hyphens, underscores, and periods. Refrain from using other characters and limit period usage to only once (to separate base name from file extensions).
For those saving files to CD-ROM for storage or file transfer, file naming should follow ISO 9660 conventions: 8-character filenames, 3-character extensions, using A-Z, a-z, 0-9, underscores and hyphens.
@type and @key on persName and orgName
In the best practices document, the author element is described as follows:
One or more author elements (one name per element) are used to encode the name for the personal author or corporate body responsible for the creation of the source document, even if this creator is not the main entry in the catalog record. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the form of the name from a national name authority file.
Since the forms of names used in name authority files have a rigid form that doesn't look like a name in the TEI sense (strings like "Welles, Gideon, 1802-1878" offend the sensibilities of XML folks), during the Raleigh meeting we decided that name authority records given in the header should have a type attribute, similar to that used on <title>. So on fileDesc/titleStmt/author and fileDesc/sourceDesc/biblStruct/monogr/author the following values of @type would be allowed:
marc100 marc110
We would not allow marc111 or marc130 because these MARC fields, while used for main entries in cataloging, are not authors in the TEI sense. As explained in the description of the author element, this element should be used for personal authors or corporate bodies, not necessarily main entries.
We also decided to recommend use of @key, as in the "Level 4 Name Tagging" section, to reference authority file records. In "Level 4 Name Tagging", it says, "the key attribute points to the unique key in the database table or, as with the ref attribute, the key attribute can point to the xml:id value in the external file".
However, @key does not take an IDREF data type (as it was called in the old days), and once I tried to create examples, I realized we don't want to be in the business of adding a <taxonomy> for each authority record referenced elsewhere in the header. So I think what we want is this:
<author><persName type="marc100" key="lccn-n78-95332">Shakespeare, William, 1564-1616</persName></author> <author><orgName type="marc110" key="lccn-n50-63455">National Organization for Women</orgName></author> <author>(unknown)</author>
plus this elsewhere in the header:
<taxonomy xml:id="lccn"><bibl>Library of Congress Control Number</bibl></taxonomy>
Sound right?