Minutes from November 3, 2007

TEI-C Annual Meeting, Saturday, November 3 DAY 3 SIG TEI in Libraries

Libraries SIG will convene Saturday, November 3 from 10:00 am - 12:30 pm
 * Greetings and round-table introductions
 * Discussion
 * Impact of mass digitization on text encoding
 * Quality control scripts/tools (beyond Schematron)
 * Mapping the TEI Header; pointing to other metadata schemes
 * Outsource v. in-house encoding (strategies, vendor recommendations, etc.)
 * The TEI TiteA standard for off-site text encoding
 * Serials encoding: General issues and issues with granular metadata
 * Migrating to P5
 * Issues to address? Need for working groups? Next steps for the group

Notes taken by: Lisa McAulay, UCLA Updated by: Michelle Dalmau, IU (convener)

In Attendance
 * Michelle Dalmau - manages electronic text projects in DLP, Indiana University, SIG convener
 * Natasha Smith - been with TF in Libraries since its inception
 * Daniel Pitti - interested in TEI and Libraries, although not really involved in projects that pertain exactly right now
 * Katherine Walter, U Nebraska-Lincoln, co-direct research center for the humanities, National Digital Newspaper Program for Nebraska, Nebraska public documents, does scholarly and library project
 * Morfudd Jones, National Library of Wales, dictionary work for Cambrian (welsh journal), welsh journals on line
 * Chris Ruotolo, UVA Library, we have big stores of legacy content in different flavors of TEI, migrating to an IR
 * Syd Bauman, Women Writers Project (not library-associated project)
 * Cassandra Stokes, washington university (st louis) in the Library 3 text based projects, new member
 * Paul Schaffner, uMich, basically P3
 * Cronan O Doibhlin, University College, Cork, Ireland, Special Collections -- Celt project (10 years old)
 * Elizabeth McAulay, UCLA
 * Pat Yott, Brown University
 * Grace Wiersma, MARC Database Quality Technician, MIT Libraries
 * John Unsworth, Dean of GSLIS, UIUC

Action Items


 * Test/Map the Tite Scheme: Paul Shaffner (Mich), Elizabeth McAulay (UCLA) and Cassandra Stokes (Wash U)
 * Invesitigate RLG terms of agreement with Apex re: EAD encoding: Daniel Pitti will contact Merrilee Proffitt for details
 * Flag issues of non-compliance with TEI Text Encoding in Libraries Guidelines for Best Encoding Practices: Syd Bauman (Brown)
 * Discuss possible funding opportunities with Mellon: John Unsworth (UIUC)
 * Update DLF; Serve as bridge between DLF TEI Task Force and Libraries SIG: Michelle Dalmau (IU)

Summary of Discussion


 * Evaluate current outsourcing practices of TEI-C libraries in light of the Tite Scheme to determine if the TEI-C can secure agreements with reputable vendors for lower-cost conversion and encoding of aggregated content.


 * Evaluate how the library community understands the Tite scheme and how it would impact their current outsourcing practices and workflow; determine if additional documentation is necessary; develop documentation. Currently, some from the SIG (Paul Schaffner, Cassandra Stokes and someone else) volunteered to informally evaluate and put Tite to the test by mapping their own schemas to Tite or encoding from scratch.


 * Revise/update the TEI Text Encoding in Libraries Guidelines for Best Practices in light of Tite. Editorial work is also needed in an attempt to make the guidelines conformant and perhaps to address additional levels of encoding.  It is feasible for the SIG to take on this work especially since Syd will flag the problem areas.  Translations of the Guidelines were also recommended.


 * Develop a strategy for promoting Tite to the library communities in the US and abroad. This was seen as critical.


 * Generate XSLT for conformant P4 and P5 from TEI Tite, which includes thorough documentation and testing (for a complete bundle with Tite). The general consensus is that developing the style sheets won't be as time consuming as testing and proper documentation.  Translating the documentation was also recommended.


 * Build bridge between DLF TEI Task Force and TEI-C Libraries SIG

Discussion note: Though the minutes reflect a transcript format, the minutes are not a literal or even complete transcription of the event.

DP: The status of the TEI Tite SB: Are we going to put resources into it? Are we just going to throw it out there? DP: Let's look at the politics of it? I believe DLF was the one behind pushing for this. SB: John Unsworth, in search of services, and he said to the Board that we need to make more member services. And he proposed a specification for initial capture of a vendor. We might make some progress here. I think the plan was to have the TEI-C serve as an aggregator of small projects. Small liberal arts colleges in particular could take advantage of it. DP: There's also an intellectual argument behind it. TEILite was a little too rich for library purposes. Large-scale projects in libraries tended to go to a smaller tagsets -- their own flavors limited interoperability. If we can get consensus among Libraries to use one single TEI constrained flavor then we can start doing implementations together. SB: How it got to be where it is. The DLF and the TEI is interested in this project. The DLF has money and the TEI doesn't. The DLF provided the support for metings. NS: overview of the history of DLF and the TEI in Libraries Workgroup. David Seaman provided money. DP: Now that David Seaman has left, is there still support from the DLF for TEI in Libraries? SB: Perry Trolard (GA that John Unsworth found) did a specification. I was hoping the SIG would look at this specification and be critical. It's written as an ODD. The Council saw that it was an ODD and said we have to bring it into P5. They tried to make it conformant with TEI P5. It can never be conformant, though, because it's supposed to save keystrokes. And TEI P5 requires namespaces -- which doesn't save keystrokes. DP: Somebody needs to step up and do some communication. I would say that one of the things that needs to be done and organize the Library community to look at this and evaluate it. SB: I'm not really interested in details, but I'm interested in the "show-stoppers". DP: When you look at it, is it worth conforming to the spec to get the benefits? MD: Has it been tried? DP: !ACTION! Paul can take three instances and test them. SB: some discussion of whether there's a problem -- what tagging differences would make you sad. It would be good to have 2 XSLTs that would transform files back from vendor into the Has anyone talked to a vendor? DP: at least to Apex, yes. NS: John U. was talking about sending out a survey to find out who would like to use it? MD: DLF is interested in this and would like to fund a person to do some. SB: SIGs cannot ask for external money without talking to the Board of Directors. I can't imagine saying no. SB: The Tight project doesn't need much work. I want to know if the spec DP: The three key institutions that need to look at it are the 3 institutions whose guidelines were used as the template. CR: I don't think Uva will ever use it. DP: I think it's important to see if CDL likes it. (and the two others -- including ) DP: You have to look at this as a Library communication problem. PS: I am going to look into the spec right away. SB: I notice no one jumping up to do the XSLT. If it's bundled as a package that would be great. DP: What would it cost to write the XSLTs? SB: Chris R. and Pat Y. have just volunteered to do it. DP: Let's get some funding to do this and they're going to have a deliverable. SB: It's a scheme with instructions. Documentation for the vendor (not as long as Umich and CDL instructions) DP: Before you do the XSLT, we should do a crosswalk. This could be really informative for writing an XSLT. And you find out right away whether you have a showstopper. '''!ACTION! -- Cassandra at Washington U will look at it, as well as Lisa (UCLA), UMich (Peter)''' MD: How would it work to have the TEI-C (we're going to add a new host, and each host has to contribute a lot of work) ? How would they work as an aggregator. DISCUSSION: how is it going to be done -- how do you aggregate? Who coordinates? It's hard to manage vendors even within your own institution. RLG negotiated a blanket rate for the EAD (with Apex). Then all institutions got a price break if they were using that. MD: Summary of TEI Tight Spec Discussion DP: Mapping from the TEI Spec; taking it out to libraries, ALA presentation, raise awareness, exposing it; the IFLA agenda, do we make paper promotion; develop a strategy for promoting it; LITA, PARS, ALCTS, (and we'll brainstorm more) We'll work together via email to create a pitch MD: I'll serve as the bridge between the SIG and the DLF. DLF would like to migrate the TEI in Libraries out of the DLF website since they aren't in the business of maintaining standards. They need to be revised in light of TEI conformance, P5 and Tite. SB: The TEI-C isn't behind this document because it is not conformant with TEI. Should the levels be kept the way they are. DP: This should get vetted. From a TEI point of view, they don't have to sanction something. PY: Can we place it on the Library of Congress website the way METS or MODS standard is. MD: If we start to amass documentation, we'll need a reliable, visible space. SB: The TEI-C should not try to monitor something what's on the wiki of the SIG. However, the more you publicize something the more TEI-C will want to look at. MD: We can look into moving the "TEI in Libraries" document into Library of Congress space. Hopefully, we can start to make some documentation. SB: Is the SIG wiki the permanent home of documentation? MD: No, that's a good revision space. JU: We have two audiences. Scholars and Librarians. In other conversations, there's been comments about giving an introduction to newer people. "Are you here for the first time?" DISCUSSION -- can we offer documentation on the TEI site? Yes. The problem is more that the "TEI in Libraries" promotes naughty usage! John said he'd help. JU: I think there's room for more educational materials and the group is not opposed to that. MD: Let's bundle that into the proposal. SB: It might be nice that the Tite should be different levels. JU: no that's losing sight of the purpose of the Tite project. NS: These levels were also tied up to specific projects. At the last meeting, we had trouble finding somebody using Level 3. Because it was generated completely out of LC practice. Discussion of what Level the TEITIght spec is at. JU: Let's try to request Board approval tomorrow. SB: I don't know where the guidelines should live. And we don't have to decide that now. NS: What's wrong with putting them on the TEI site? SB: They violate the TEI guidelines and that's not good with the Board.
 * we'll have the three volunteers take a look at it TEI Tight and get feedback
 * start talking to DLF / TEI Board about funding opportunities for the survey

DISCUSSION: How are the guidelines not kosher? Syd will review the document and flag instances of non-conformance.

NS: There are lots of places to put it: Activities, Communities, Guidelines MD: We should start by finding out what other documentation is being generated. SB: Let's pass it off to Chris and Julia, to find a good place for them. JU: In the education SIG downstairs they were talking about using YouTube MD: !ACTION! -- you will talk to the Board about the TEI Libraries SIG putting forth a request for funding to DLF for the TEI Tight Spec (survey, marketing) and also the "TEI in Libraries" MD: We talked about scope (how many pages?) SB: What does aggregating mean? How much work does it take? JU: I think we'd have to review what people are sending us. The economy scale is related to sending them like things. We need to do some grouping. SB: How would it work? JU: So how does it work with the EAD spec? DP: The idea is that you get a sufficient number of the members to committing to submitting a number of pages. MD: This is the purpose of the survey. Is there any documentation about this? DP: I can ask Merilee Profit how this was done. Essentially, RLG negotiated a contract with Apex.

DISCUSSION: Types of questions to ask in the survey (surveying outsourcing practices: quantity, types of materials. levels of encoding, etc.). End result, if members of the consortium submit x million of pages to vendors that adhere to Tite, then members can collective benefit from the pre-negotiated discount. Need to establish the benefit of Tite and then market heavily.