Minutes from October 29, 2015
Jump to navigation
Jump to search
Participants:
- Kevin Hawkins (University of North Texas)
- Stefanie Gehrke (Biblissima)
- Paul Schaffner (University of Michigan)
- Elli Mylonas (Brown University)
- Syd Bauman (Northeastern University)
- Stefan Majewski (Austrian National Library)
- Kiyonori Nagasaki (University of Tokyo)
- Antonio Rojas Castro (Universitat Pompeu Fabra)
- Nick Homenda (Indiana University Bloomington)
- Michelle Dalmau (Indiana University Bloomington)
- Julia Flanders (Northeastern University)
- Patrik Granholm (National Library of Sweden)
Mailinglist: see subscription information on SIG webpage.
Contents
Best Practices for TEI in Libraries
- set of ODD files in GitHub
- introduces concept of levels of encoding (levels 1 to 5)
- ODDs for levels 1-4 provide schemas for validation per level
- last revised in 2011 -> outdated. in particular, need to account for addition of <xenodata> (and addition of @style)
Kevin urged people to discuss the proposed changes in the issues at https://github.com/kshawkin/Best-Practices-for-TEI-in-Libraries/issues?
Repository Writing Permission: Syd, Michelle, Kevin
- to add: Stefanie, Paul
- Kevin: will create a team concept in GitHub
comments on BP that arose while reviewing some GitHub issues:
- why focus so much on mass digitisation?
- there are no <elementSpec> in the ODDs
- occasional pointers from the Best Practices to related sections of the Guidelines
- level 1 [and level 2] is “not TEI conformant”
- no word-level-segmentation or OCR data (coordinates and certainty values) yet (rich OCR at ÖNB for example, they plan to put all of their data into TEI files)
- proposal: add that to Best Practices
- Kevin to ask a contact person from eMOP (at Texas A&M) whether they already have code to convert output from Tesseract (or from Google Books, which is very similar, or from ABBYY FineReader) to TEI. Actually, it turns out that hOCR-to-TEI conversion was developed during the DH2015 hackathon: see hOCR2TEI.)
- no section of Best Practices about TEI and RDF / Linked Data
- to be added
- Possibly contact Dawn Childress (now at UCLA) to see if she has experience
going between TEI and RDF (for bibliographic data)).
Relation between Best Practices for TEI in Libraries and TEI Simple?
- TEI Simple is two things:
- a schema that is mostly fixed and publicly available: https://github.com/TEIC/TEI-Simple .
- an associated processing model that is still under development and has not been publicized
- we could benefit from the processing model?
- Syd points out that there is a difference between processing and transforming data
to TEI
Other issues
- TEI and METS/MODS:
- Syd recently learned that syncRO Soft would be happy to add frameworks for METS and MODS to oXygen if someone else would be willing to maintain the frameworks. Syd said this is a pity now that we have access to <xenodata>.
- Elli can contact a colleague of hers to investigate. We have to find out if there are use-cases
- Syd recently learned that syncRO Soft would be happy to add frameworks for METS and MODS to oXygen if someone else would be willing to maintain the frameworks. Syd said this is a pity now that we have access to <xenodata>.
- Support for TEI in Hydra?
- If we find out about anyone using Hydra to deliver TEI XML, we should create a wiki page.
- Michelle will forward a message to TEILIB-L about a project in Denmark that she recently learned about.
How to proceed with revision to Best Practices
- Syd will try updating the @version on the Best Practices ODDs to the current version of P5 and see what breaks.
- we will have to (re-)read TEI Simple documentation, presentations, and articles and think about how it relates to the Best Practices. Do we really want to launch a revision now that TEI Simple exists? Maybe TEI Simple can be slightly revised and then we can get rid of the Best Practices?
- Kevin will draft an announcement to TEILIB-L announcing the launch of the project. He’ll send to those who attended the meeting in Lyon for comments before actually sending to TEILIB-L. In brief:
- Creation of a workgroup of the SIG for a new version (4.0) of the Best Practices for TEI in Libraries.
- Will hold conference calls every second month for an hour at 9 a.m. Eastern Time. We’ll use Skype or Google Hangouts.
- We’ll use this email list to communicate, so we hope that you will be willing to join us and, if not, won’t mind the extra traffic.
- We’ll post agendas in advance and do work in between calls.