SIG:CMC/CMC-core schema for representing CMC in TEI (2019)
This page is part of the wiki space of the TEI-SIG “Computer-mediated communication".
Contents
ODD file, RNG schema and encoding examples
ZIP archive (ODD and RNG files):
- cmc-core.zip (September 24, 2019)
HTML versions of the ODD for online browsing:
- ODD, short (new & modified models only): cmc-core_short.html
- ODD, complete: cmc-core.html
Manually annotated sample xml files:
- mocoda1.tei (Whatsapp chat, excerpt. With TEI markup for e.g. emojis, spoken posts, one with vocal events in background )
- mocoda2.tei (Whatsapp chat, excerpt. With TEI markup for e.g. emojis, posts with graphics)
- tweets.tei (Sequence of tweets. With TEI markup for e.g. footer, graphics, retweet, hashtags, mentions, twitter-id)
- Talk_Astronomical_object-Wikipedia.tei (Sequence of discussion posts from a wikpedia talk page. With TEI markup for e.g. indentation, signature, timestamp)
- 2213001_Internet-Relay-Chat_Netzwerk3_Kanal1_2005-07-14.tei (IRC chat logfile, with TEI markup for e.g. logfile structure, tokenisation, lemmatisation, POS tags, named entities, anonymisation)
- 2ndlife.tei (Multimodal interaction in 2nd Life. With TEI markup for e.g. spoken utterances, written posts, vocal event, kinesic events, pauses)
Extension of CMC-core for the annotation of wiki talk pages:
A cmc-wikitalk schema was produced by ODD chaining with CMC-core and was used for encoding the wikipedia talk example.
About this schema
This ODD describes an encoding schema for genres of computer-mediated communication (CMC) / social media. It is meant to define a basic setup that one needs to encode CMC corpora, but which is not in the TEI yet.
Authors: Michael Beißwenger, Harald Lüngen, Laura Herzberg, and Ciara R. Wigham.
The schema is based on version P5 (3.3.0) of the TEI Guidelines for Electronic Text Encoding and Interchange (henceforth: ‘TEI-P5’) and uses customizations to adapt the models defined in TEI-P5 for the modeling of structural and linguistic peculiarities of CMC genres. The schema takes into consideration previous schema drafts that have been developed by members of the SIG (the 'DeRiK schema' described in Beißwenger et al. 2012, the 'CoMeRe schema' described in Chanier et al. 2014, and the 'CLARIN-D schema' described in Lüngen et al. (2016)) as well as discussions on a core schema at the SIG meeting in Vienna at the TEI conference on 30 September 2016] and a meeting of the SIG core group in Essen on 21 June 2017.
Status of the schema
This schema and examples shall accompany a TEI feature request to be submitted to the TEI Council by the end of 2019. A rationale for the models included in the schema is given in the ODD and in the articles under #References.
Characteristics of the schema: TEI customizations
The ODD introduces four types of customizations :
- A new module named 'cmc' is introduced. It is referenced by the new model classe model.divPart.cmc, by the new attribute class att.global.cmc, and by the new element <post>.
- The new element <post> is introduced.
- The new attribute class att.global.cmc is introduced. It defines the new global attribute creation. The existing attribute class att.global has been modified to additionally be a member of att.global.cmc .
- The class model.divPart.cmc is defined. model.divPart.cmc is a member of model.divPart and serves as a container of the new, CMC-specific element <post>.
In addition to these customizations, we have prepared encoding examples for the genres chat, wiki talk, second life, and twitter, based on existing corpora.
References
- Michael Beißwenger, Harald Lüngen (2019, accepted): CMC-core: a schema for the representation of CMC corpora in TEI (preprint). To appear in a special edition of the journal Corpus.
- Michael Beißwenger, Laura Herzberg, Harald Lüngen, Ciara R. Wigham: CMC-core: A basic schema for encoding CMC corpora in TEI. Conference poster, 7th Conference on CMC and Social Media Corpora for the Humanities (CMC-Corpora2019). University Cergy-Pontoise, Paris, France, September 9, 2019.( conference poster as pdf | abstract in conference proceedings )