Minutes from November 8, 2008

TEI in Libraries: Home

SIG on Libraries Meeting Minutes November 8, 2008 Room G17, Centre for Computing in the Humanities, King's College London, 26-29 Drury Lane, London WC2B 5RL

Present:
 * Syd Bauman (Brown University)
 * Marcus Bingenheimer (Dharma Drum Buddhist College)
 * Christiane Fritze (Deutsches Textarchiv)
 * Kevin Hawkins (University of Michigan)
 * Stephen Yearl (formerly of Yale University)
 * Morfudd Jones (National Library of Wales)
 * Daniel Pitti (University of Virginia)
 * Laurent Romary (Max Planck Digital Library)
 * Natasha Smith (University of North Carolina)
 * Andrew Rouner (Washington University in St. Louis)
 * Paul Schaffner (University of Michigan)
 * John Walsh (Indiana University)
 * Eddie Woodward (Florida State University)

The meeting was convened at 9:35 a.m.

Those present introduced themselves and their interest in the group.

Natasha reported that Michelle Dalmau (Indiana University), co-convenor of the SIG, emailed to say she is ill and unable to attend.

Daniel said that as treasurer of the TEI, he has noticed that the majority of TEI member institutions that actually pay are large American research libraries. He said he has been concerned that the TEI is not addressing their interests and is afraid that the developing economic crisis will lead to 5-15% budget cuts at libraries, who inevitably cut organizational memberships before other expenditures. He said the TEI needs to demonstrate its value to them so they will continue to support the TEI.

Daniel noted that so far none of the Mellon grant money for the TEI Tite study has been spent.

Kevin summarized the work of the SIG since the Minneapolis meeting in April (a joint meeting of the SIG and of the DLF-sponsored Task Force, which no longer exists) on the revisions to *TEI in Libraries: Guidelines for Best Practices*. He said the group also discussed TEI Tite, but it was unclear to what extent the SIG has ownership over this project.

Daniel asked whether any functional objectives have been specified for the best practices. He said that having these will help as a "selling point" for the document.

Syd suggested moving the "introduction" section, which is actually a detailed history of the document, to an appendix.

(Christiane joined the group.)

Kevin asked everyone to add their comments to the best practices guide in the wiki.

Daniel asked what the relationship is between the best practices document and TEI Tite. It was suggested that we might think of both of these in terms of stepwise conversion of documents from lower to higher levels of encoding.

Laurent said that TEI Tite schema has constraints reflecting its content model, so we need some sort of "transitional compatibility" with the best practices guidelines.

Syd noted that, as he has previously suggested, he would like to split the best practices document into a prose introduction and ODDs for each level of markup and its schema.

Daniel suggested that we think of TEI Tite as Level 0 in the best practices guidelines. Syd said it might more properly be Level 1.5.

Daniel encouraged us to keep the best practices document brief so as not to overwhelm readers. Syd said he hoped that the prose introduction would help readers choose an appropriate level.

Daniel said he thought that some of the Mellon grant could be apportioned to support the work of our SIG.

Paul said that TEI Tite was derived from existing DTDs used by the California Digital Library, the University of Michigan, and the University of Virginia for encoding by vendors. None of these were thought of as transitional formats. The Michigan DTD, for example, was intended to accommodate Level-4 encoding.

Andrew said he was skeptical that a new user of the best practices could choose an appropriate encoding level by reading only the prose introduction to the levels. He suggested we rethink our assumption about stepwise conversion since content is unlikely to be upgraded to a higher encoding level in the future.

Daniel asked whether explicit rationales for each level of encoding are given. Natasha said they are.

Laurent said we need to have encoding levels so that delivery and search software can expect predictable encoding.

Laurent asked whether the TEI Tite schema is stable or will evolve in the future. Paul responded that it might still evolve: he just realized he was supposed to evaluate TEI Tite after the SIG meeting last year in College Park but has neglected to do so.

Daniel agreed with Andrew that libraries are likely to choose an encoding level for a set of content but never upgrade the markup in the future, though someone else might reuse those texts later. Daniel said that we should instead focus on how to foster mass conversion of content.

Marcus urged the group to examine the TEI Tite tag set closely to make sure it's usable by everyone: for example, without the  element, it's not at all usable for his sort of East Asian texts.

Laurent said he feels the SIG should make a statement on TEI Tite. Daniel agreed, suggesting that we submit a proposal to the TEI Council.

Andrew said he was skeptical that we can formulate a single usable vendor spec (like TEI Tite) that would satisfy all of our various needs. Daniel, however, responded that the risk of attempting to create one is not high and noted that we are building on the experience of the early adopters (Calfornia, Michigan, and Virginia).

Andrew said we may need to allow some flexibility in the specification. Syd responded that there actually should be no flexibility because vendors want unambiguous rules to follow.

Daniel noted that the adoption of TEI Tite does not mean that institutions will not be able to negotiate with vendors on their own for more specialized encoding.

There was a discussion about whether TEI Tite is designed for outsourced encoding of large or small collections.

Kevin and Syd noted three possible topics for further discussion at the current meeting: TEI Tite, the best practices document, and FRBR (which Torsten Schaßan told Kevin that he would very much like our group to discuss).

[items missing from record]

Daniel said he was in favor of the SIG taking active responsibility for the best practices document but also of having official involvement in TEI Tite. He reiterated that since Mellon gave the money to the TEI Consortium and since our SIG is the logical group to study TEI Tite, the Board could commission us to study it.

Marcus asked whether the best practices document and TEI Tite would be integrated. Daniel said they should instead complement each other.

Kevin noted that that the HTML version of TEI Tite was not publicly available as far as he knows. Syd found it at http://www.tei-c.org/release/doc/tei-p5-exemplars/html/tei_tite.doc.html. Kevin said he would email Chris Ruotolo to ask her to add a link to it.

There was a discussion of forming a clearinghouse or maintenance agency for TEI Tite.

There was a discussion of the role of SIGs in the TEI Consortium. Laurent said that the TEI Council agreed in Galway that SIGs should start having specific responsibilities. Natasha replied that we still need to bring issues to the Council for decisions to be reached.

[coffee break]

(John joined the group.)

Syd suggested developing a plan of action with a timeline.

John asked whether we know that existing projects that use vendors are willing to switch to using TEI Tite.

Laurent said ____.

John said using TEI Tite would save them money.

Laurent said ____. He said he would like the TEI Consortium to maintain TEI Tite through the SIG on Libraries.

Paul noted that TEI Tite was modeled on P4.

Marcus said it has many elements but is missing some, like .

Laurent said we need to agree on both an architecture for TEI Tite and a strategy for reviewing it.

Laurent suggested allowing for various subsets of TEI Tite to be used for different projects [like the Chicago pizza model]. Syd reiterated that we need just one spec for all uses, and ___ noted that a project is free to discard extra tagging created by the vendor but not needed for the project.

Andrew noted that the TEI Tite spec refers to documentation from Virginia which perhaps should be incorporated into the TEI Tite spec.

Syd said it was envisioned that libraries using TEI Tite would not have to deal directly with a vendor at all; instead, there would be a broker between libraries and vendors so that these two parties would never need to talk directly. In such a case, there's no point in having any customizations. He reiterated that anyone could pay a vendor to do extra tagging beyond that included in TEI Tite.

Andrew gave an example where detailed encoding of names is important to an encoding project. Natasha said this is a case where additional markup, either done by a vendor or done in house, would be required.

Andrew cautioned that if TEI Tite lacks an element that projects need identified, the spec might not catch on and therefore might not lead to any increase in membership in the TEI.

Laurent said we also need to _____.

There was a discussion of numbered versus unnumbered divs in TEI Tite. It was decided that transforming from one to the other is trivial, so we shouldn't worry about whether the spec has one or the other.

Syd noted that anyone could customize TEI Tite using another ODD.

Laurent and Syd said that while the desired goal is to have a brokered relationship, we have to accept that there may be "leakage" where people use TEI Tite on their own for their own purposes, possibly even modifying it.

Laurent said we will need to be sure to to review the spec periodically.

Marcus asked whether we envision libraries receiving a ready-to-use text from the vendor or needing to add tagging for their specific needs. He said that his projects do a lot of their own analysis that couldn't be outsourced.

Syd conceded that some projects, including the Brown University Women Writers Project, where Level-4 encoding would be considered "light", would need to add encoding. Natasha responded that the WWP is not the rule for library-based encoding projects.

Laurent encouraged the group not to block the possibility of having a brokered relationship with vendors. He said projects with special needs might still get cost savings if their initial encoding is done in TEI Tite (and additional encoding is added in house). Syd noted that there is a similar goal [to what exactly?] behind the best practices document.

Laurent suggested there be a clear relationship between the two. Syd suggested TEI Tite be made an encoding level in the best practices document.

Kevin asked everyone to review the TEI Tite spec for changes that would be needed for your project to adopt TEI Tite for use with vendors. Andrew added that it would be good to identify use cases. Syd agreed that contextualizing is important.

Syd suggested setting a date for a conference call to discuss our reviews of TEI Tite or simply using the wiki.

Laurent asked whether we should have a call for input beyond those gathered today. Syd suggested having the current group look at TEI Tite now but later put out a wider call for input.

Natasha noted that we already have a call scheduled for Dec. 2. Kevin said this date was chosen at the end of the last conference call but probably should be announced more widely.

Laurent said we need one point of entry for information on the SIG's activities.

Natasha said this call was scheduled to continue the work on the best practices guidelines, so we shouldn't take over this meeting for discussing TEI Tite. She suggested instead announcing TEI Tite on TEI-L (since its existence isn't widely known) and asking that feedback on it be sent to Kevin and Michelle. They could compile responses for discussion during our conference call.

Syd found an a page in the wiki ( http://www.tei-c.org/wiki/index.php/Review_of_Tite_Scheme ) that said that Paul, Lisa, and Cassandra would review TEI Tite. He said he would replace this with a note that everyone would review it and send comments to Kevin and Michelle.

Kevin asked everyone to review of TEI Tite and send comments on the schema and its use scenarios to him by January 15. He said he would compile them and send a message to TEILIB-L to schedule a conference call to discuss these comments soon thereafter. He said participation in the call would not be required of those who submitted comments, but those participating should have read the TEI Tite specification.

Syd said he was unable to edit the wiki page due to the bad wireless connection.

Kevin asked whether we should review the functional requirements for the best practices document. Others said no.

Kevin asked for other topics of discussion.

Marcus noted that someone else had trouble figuring out how much it costs to join the TEI Consortium, and he had to find this information buried in the TEI website. He said we're making it difficult to join. Syd said the TEI Board wants to make the membership form a web form. Laurent said this would be discussed at tomorrow's Board meeting.

Laurent asked whether the best practices document will live once the revision is finalized. Everyone agreed it should live on the TEI website since the DLF is no longer involved in it.

Laurent asked whether the best practices document would be expressed as an ODD. Syd said he's working on this. Kevin said we would continue to edit it in the wiki for now for convenience but that Syd would transform into an ODD once the revision is finalized.

John suggested instead making it a TEI document with XIncludes.

Various people suggested discussing the relationship between TEI Tite and the best practices document.

Laurent suggested we wait till both TEI Tite and the best practices document are more stable before trying to integrate them.

[items missing from record]

Natasha said she doesn't understand what relationship they could have to each other since they have different goals.

[discussion omitted]

Kevin said that TEI Tite is designed to handle the majority of the needs for outsourced encoding, whereas the best practices document helps us clarify our thinking about what markup is essential and to set minimum standards for markup.

John asked whether there might be other "Tites", allowing multiple encoding levels for vendor-created encoded texts. Kevin suggested revisiting this after revising both documents, as Laurent suggested. Kevin said that we might then find an appropriate level for TEI Tite documents in the best practices guidelines and might consider whether to make other "Tites" for other encoding levels.

The meeting adjourned at _:__.