TEI for Linguists - minutes - 13nov10

SIG “TEI for Linguists” -- meeting 1, minutes

Zadar, 13 Nov 2010, 9 a.m. - 10.30 a.m.

Presentation of the new SIG ‘TEI for linguists’

 * Linguists have always been involved in the TEI, especially at the beginnings (see ACL and ALLC), Don Walker -- one of the co-founders, was a computational linguist
 * A new generation of linguists more and more interested in using TEI in their work. However the use of the Guidelines is not widespread enough: a lot of linguists (computational, classic) don’t use and don’t know much about TEI:
 * The feature structure mechanisms is widely used, but not the TEI-conformant representations of them;
 * TEI for the transcription of speech is also used rarely..
 * The SIG should be thinking about appropriate PR methods in order to spread the word about TEI also to people who do not usually frequent DH and TEI related conferences. Attentive frequenting of the general TEI list should be essential, in order to catch problems of linguistic nature and invite people to join the SIG (or at least to make sure that they are aware of the existence of the SIG; often it may happen that people from outside the TEI community who would choose not to subscribe to TEI-L may decide to subscribe to a low-volume, focused LingSIG list.

Recruiting members

 * Attendees are invited to join the newly created TEI mailing list to discuss relevant topics and follow SIG developments.
 * Introduction of participants: ca. 27 people

Action plan

 * Proposals for a short name: Linguistics, LingSIG, LinguistSIG
 * note: at the meeting, we agreed that the SIG already functions under an alias of "Linguistics SIG", after the meeting, we can observe the short form "LingSIG" getting established (on a par with "MS-SIG", etc.)
 * Following the MS-SIG experience, a survey on LR is proposed:
 * A page o the wiki will contain a Bibliography of articles on relevant topics, where anyone can submit a title (we should aim at having a fullish Bibliography page)
 * An email will be sent to the TEI to invite participation;
 * Categorization of the field of linguistics: endangered languages, typology, transcription, dictionaries, but also ‘GIS in linguistics‘– separate the field of linguistics by categories and fill it with resources we know.
 * Milestones should be set out, priorities listed with approximate and realistic deadlines (by next meeting, next year, next Guidelines release etc.), what are we going to achieve by the next meeting?
 * Preparing documents:
 * Actual chapter vs. virtual chapter: Virtual chapter, a chapter that is not on the TOC of the guidelines but created along with it, refers to other existing chapters, reuse of existing resources for linguistic needs – customization, recommendation; compile things make it available and see how people react, then create a real chapter out of it.
 * Instead of a virtual chapter we could also prepare Virtual Guidelines, i.e. a version of the Guidelines with things filtered out that are not relevant for linguistics and some additions to focus the linguistic point of view.
 * Idea: We should have a thriving virtual chapter by the next meeting, when we can decide whether there’s definitely the need to a real chapter or an ODD?
 * We need arguments why linguists should use the TEI. Re-usability – important issue for funders. We need usability of LR to increase. E.g. lexicographers don’t need TEI in their everyday work, they may work in different databases, but they need TEI to preserve their work. The well-documented semantics and documentation style (ODD) as well as the public accessibility of documentation are also factors that should be capitalized on. (As Damir Ćavar put it: we want to make people want to use the TEI for their linguistic/LR projects; that may turn out to be quite a task).
 * We also need to be aware of the limitations of the TEI and know where to stop and hand over some tasks to other communities (thanks to Øyvind Eide for this observation).

Interfacing other SIGs

 * Tools – check how compatible (potential) linguistic tools are with the TEI (e.g. the eXist db was mentioned as particularly TEI-friendly), there is a need for more tools; co-operate with the Tools SIG on the identification and possible adaptation of tools for use in the scope of the LingSIG.
 * Music – how they handle annotation of binary files vs. speech corpora
 * Ontologies – grammatical, lexical, participant list in a spoken discourse – some methods may be similar, Sebastian Rahtz’s TEI XML – RDF XML conversion methods.
 * The Overlap SIG has done its job and as such should not be expected to be active, but the particular members of that SIG may be willing to co-operate.

External demands on the TEI

 * To simplify ("follow the way of the CES") or to be more detailed wrt some issues – both probably
 * General recommendations or project specific specification?
 * We don't want to follow links on guidelines, we may want something more friendly

We use markup to annotate resources, there are also annotation standards being developed by ISO TC 37 SC 4 and we want to co-operate with the ISO researchers. In ISO the work focuses on very specific levels, it's not exactly the same community, there are overlaps, but not many. ISO brings strong confidence in computational linguistics, e.g. morpho-syntactic annotation. There is a signed joint agreement between ISO and TEI, so they can exchange data freely.

More topics on mailing list, everyone is invited to join.

The conveners are grateful to Beata Wójtowicz for taking minutes of the meeting and to Damir Ćavar for making it possible for one of us to participate in the meeting remotely, via skype.