SIG:CMC/CLARIN-D schema draft for representing CMC in TEI (2015)

About this schema
ODD file: [ADD RESOURCE]

This ODD describes an encoding schema for genres of computer-mediated communication (CMC) / social media. It is meant as a contribution to the work and discussions in the special interest group “Computer-mediated communication" (CMC-SIG) of the Text Encoding Initiative (TEI). The schema has been developed in the context of the CLARIN-D curation project "ChatCorpus2CLARIN".

Authors: Michael Beißwenger, Eric Ehrhardt, Axel Herold, Harald Lüngen and Angelika Storrer.

The schema is based on version P5 (2.9.0) of the TEI Guidelines for Electronic Text Encoding and Interchange (henceforth: ‘TEI-P5’) and uses customizations to adapt the models defined in TEI-P5 for the modeling of structural and linguistic peculiarities of CMC genres. The schema takes into consideration previous schema drafts that have been developed by members of the SIG (the ‘DeRiK schema’ described in Beißwenger et al. 2012, the ‘CoMeRe schema’ described in Chanier et al. 2014) as well as feedback and discussions on these previous drafts received at the TEI conferences 2011 and 2013 and at workshops held in the context of the DFG scientific network Empirikom.

Characteristics of the schema (TEI customizations)
The schema uses four types of customizations:
 * 1) The content models of three elements from TEI-P5 have been modified (&lt;s>, &lt;p>, ) to include the model model.floatP.cmc (s.b.)
 * 2) The three new elements, , and  have been introduced.
 * 3) Two attribute classes have been modified to introduce the CMC-specific new attribute auto, and to allow the existing attribute who to appear on all elements: att.ascribed, att.global
 * 4) Two classes have been introduced to combine the new, CMC-specific elements (model.divPart.cmc), or to combine existing TEI-P5 elements for less restricted usage in CMC documents (model.floatP.cmc):,  , ,.

In addition to these customizations, we have defined best practices for using the TEI-P5 models of ,, , , &lt;div>, , and others for annotating CMC phenomena.

Status of the schema
Consider this schema as a draft and as a basis for further discussions. A rationale for the models included in the schema will be given as part of the panel “TEI across corpora, languages and genres: Towards a standard for the representation of social media and computer-mediated communication” at the TEI Conference and Members Meeting 2015 in Lyon We are looking forward to feedback and further suggestions at the conference, via the SIG space in the TEI wiki and/or via the SIG’s mailing list tei-cmc@googlegroups.com.

References
Description of the DeRiK schema: Michael Beißwenger, Maria Ermakova, Alexander Geyken, Lothar Lemnitzer, and Angelika Storrer (2012). « A TEI Schema for the Representation of Computer-mediated Communication », Journal of the Text Encoding Initiative, Issue 3. DOI : 10.4000/jtei.476

Description of the CoMeRe project: Chanier,T., Poudat,C., Sagot, B., Antoniadis, G., Wigham,C. R., Hriba, L.,Longhi, J. & Seddah, D. (2014) « The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres ». Special issue on « Building And Annotating Corpora Of Computer-Mediated Discourse: Issues and Challenges at the Interface of Corpus and Computational Linguistics ». JLCL (Journal of Language Technology and Computational Linguistics), pp. 1-31