Mission statement

The internet and social media have given rise to a broad range of new communicative genres which are subsumed under the term computer-mediated communication (CMC) – genres such as chats, forums, text messaging (SMS, WhatsApp), interaction on wiki talk pages and in blog comments, via Twitter, on social network sites, and in multimodal 3D environments. A TEI standard for the representation of those genres and their structural and linguistic peculiarities is a desideratum both in the fields of digital humanities and computer sciences. Such a standard would foster interoperability between language resources as well as the analysis and automatic exploitation of resources of that kind in several respect:

It would allow scholars for building interoperable CMC corpora for different languages and thus enhance the empirical basis for doing CMC research across languages and cultures.
It would allow scholars for building CMC resources which are interoperable with text and speech corpora that are already represented in TEI and thus pave the way for corpus-based research on language use across different types of corpora (= comparative analysis of the language use in CMC, in edited text and in spoken language).
Through including models for the description of not only verbal but also of non-verbal acts, it would allow scholars to describe and analyse CMC accross different modalities.

The TEI special interest group (SIG) "computer-mediated communication" is developing and discussing suggestions for adapting the TEI guidelines for the representation of genres of computer-mediated communication. The focus of the group's work is on (but not limited to) tasks such as:

modeling user contributions (posts) to written CMC interactions (which share features both with written and spoken language) as well as the interplay of written posts, spoken utterances and non-verbal acts in multimodal CMC environments;
modeling CMC document structures ("CMC macrostructures" – e.g., forum threads, wiki talk pages, chat logfiles, Twitter timelines etc.);
annotating linguistic features within user posts ("CMC microstructures" – elements such as emoticons, addressing terms, hashtags; quotes from prior posts; etc.);
representing linked data and media objects connected with/embedded in CMC discourse;
metadata schemas for the description of CMC resources;
developing perspectives for the representation of discourse in multimodal cmc environments in which the participants in one interaction space combine a variety of modalities from written, spoken and non-verbal modes.

CMC module and chapter in the TEI Guidelines ("CMC-TEI")

As of 2024, a dedicated TEI CMC module and accompanying chapter in the Guidelines have been introduced, see Chapter 9 in the TEI Guidelines, co-written by members of the SIG and members of the TEI Technical Council. It should serve as a new and stable standard and should be used in new projects. It supersedes all previous schemas and customisations.

CMC-TEI Encoding examples

See github repo at https://github.com/TEI-CMC-SIG

Stylesheet cmccore2cmctei.xsl

for converting TEI documents using the cmc-core customisation into TEI documents using the newer TEI CMC module and good practices

See github repo at https://github.com/TEI-CMC-SIG

Documentation of recent SIG activities

[2025] Virtual SIG Meeting

Restart of the SIG on 19th September 2025 during a slot at the TEI Conference 2025: Meeting Memo

[2024] Inclusion of CMC module and chapter in the TEI Guidelines

Based on the 2019 feature request by the SIG, a whole module and chapter on CMC encoding in TEI was written in collaboration with a subgroup of the TEI Council and added to the guidelines.

[2019] Feature request

based on cmc-core, submitted by the SIG to the TEI github repo.

[2019] Creation of cmc-core

Finalization of the cmc-core schema and of a scientific paper giving the rationale for the included basic models and architecture for the representation of CMC corpora in TEI (Beißwenger/Lüngen 2019). Presentation and discussion of cmc-core at the 7th conference on CMC and social media corpora in Paris (September 2019). Preparation of a feature request based on cmc-core (to be submitted until christmas 2019).

Documentation of past schema drafts from the SIG 2012-2019

There are several TEI schemas that have been developed and discussed in the context of the work of the SIG. When designing new drafts, the models specified in previous drafts as well as the discussions within the group have been taken into consideration. The drafts are listed below in reverse chronological order:

"CMC-core schema" (2019): TEI Schema and ODD from the TEI CMC SIG

Context: The ODD describes the the core of the previous schemata, i.e. the basic setup that one needs to encode a CMC corpus, but which is not in the TEI yet. It was developed by four core members of the TEI CMC SIG.

Authors: Michael Beißwenger, Laura Herzberg, Harald Lüngen, Ciara Wigham.

Main characteristics compared to previous customisations (CLARIN-D, CoMeRe, DeRiK): More reducing to the max:

Removal of the <prod> element; we came to the conclusion that its functions are well expressed by the available TEI elements <kinesic> and <incident>.
Removal of elements and redefinitions that were needed only for wiki talk corpora; these should not be part of the core and were moved in a sub-schema.
Renaming of the former attribute @auto as @creation and a redefinition of its value set

ODD / documentation of the schema: see detail page: SIG:CMC/CMC-core schema for representing CMC in TEI (2019)

Articles describing CMC-core and its use:

Michael Beißwenger, Harald Lüngen (2019, submitted): CMC-core: a schema for the representation of CMC corpora in TEI. Journal article, preview version on request.

Michael Beißwenger, Laura Herzberg, Harald Lüngen, Ciara R. Wigham: CMC-core: A basic schema for encoding CMC corpora in TEI. Conference poster, 7th Conference on CMC and Social Media Corpora for the Humanities (CMC-Corpora2019). University Cergy-Pontoise, Paris, France, September 9, 2019.( conference poster as pdf | abstract in conference proceedings )

"CLARIN-D schema" (2015): TEI Schema and ODD from the CLARIN-D curation project ChatCorpus2CLARIN

Project context: The schema has been developed and tested with data from several CMC genres (chats, tweets, whatsapp, wikipedia talk pages, ...) as part of the work of the German CLARIN-D curation project ChatCorpus2CLARIN.

Authors: Michael Beißwenger, Eric Ehrhardt, Axel Herold, Harald Lüngen, Angelika Storrer.

Main characteristics compared to previous schema drafts (CoMeRe, DeRiK):

Reduction of new elements through re-modeling some CMC-specific concepts from the previous schemas with „standard“ TEI (guiding principle: "reduce to the max": introduction of new models and modification of existing models only for concepts which are needed in any case; for everything else: definition of best practices for the use of existing models in TEI-P5)
Definition of an interface to part-of-speech annotations (using <w> and <phr>)

ODD / documentation of the schema: see detail page: SIG:CMC/CLARIN-D schema draft for representing CMC in TEI (2015)

Presentations of the CLARIN-D schema: The schema will be discussed in two panels at the following conferences:

TEI across corpora, languages and genres: Towards a standard for the representation of social media and computer-mediated communication. Panel at the Annual Conference and Members Meeting of the Text Encoding Initiative 2015: "Connect, Animate, Innovate", Université Lumière, Lyon 2 (F), 29 October 2015 (organized by Michael Beißwenger & Thierry Chanier).
Towards an encoding standard for social media and CMC: Experiences from German and French corpus projects using TEI. Panel at the International Research Days: Social Media and CMC Corpora for the eHumanities, Université Rennes 2, Rennes (F), 23-24 October 2015 (organized by Michael Beißwenger & Thierry Chanier).

"CoMeRe schema" (2014): TEI schema and ODD from the CoMeRe network

Project context: The schema has been developed in the context of the French network CoMeRe (Communication médiée par les réseaux) and used for annotation of several corpora of French CMC (SMS, tweets, chat, weblogs, multimodal CMC, ...).

Authors: Thierry Chanier, Céline Poudat, Benoit Sagot, Georges Antoniadis, Ciara R. Wigham, Linda Hriba, Julien Longhi, Djamé Seddah.

Main characteristics compared to the previous schema draft (DeRiK):

Introduction of an element <prod> for the representation of non-verbal acts
(re-)efinition of <post>, <prod> and <u> as models which may be combined within one interaction (= installation of one main result of the SIG meeting 2013 in Rome). => make the schema fit for multimodal CMC
includes a metadata schema for CMC

ODD / documentation of the schema: see detail pages:

Article describing the CoMeRe schema and its use:

Thierry Chanier, Celine Poudat, Benoit Sagot, Georges Antoniadis, Ciara Wigham, Linda Hriba, Julien Longhi, Djamé Seddah (2014): The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres. In: Beißwenger, Michael; Oostdijk, Nelleke; Storrer, Angelika; van den Heuvel, Henk (Eds., 2014): Building and Annotating Corpora of Computer-Mediated Communication: Issues and Challenges at the Interface of Corpus and Computational Linguistics. Special Issue, Journal of Language Technology and Computational Linguistics (JLCL 2/2014).

"DeRiK schema" (2012): TEI schema and ODD for CMC from the DeRiK project

Project context: The schema has been developed and tested with data from several CMC genres as part of the preliminary work for the project "Building a reference corpzus of German CMC" (DeRiK). It is also part of the results of the DFG scientific network Empirikom. After publication in the jTEI (Beißwenger et al. 2012) the schema has further been tested with the Wikipedia talk pages corpus in DeReKo (Margaretha & Lüngen 2014).

Authors: Michael Beißwenger, Maria Ermakova, Alexander Geyken, Lothar Lemnitzer, Angelika Storrer.

Main characteristics:

Introduction of an element model <post> for written user contributions to CMC interactions which combines features of text divisions and spoken utterances.
Adaptation of the existing element model <div> for the representation of CMC threads and logfiles.
Models for CMC-specific phenomena below the <post> level.

ODD / documentation of the schema: see detail page: SIG:CMC/DeRiK schema draft for representing CMC in TEI (2012)

Articles describing the DeRiK schema and its use:

Beißwenger, Michael; Ermakova, Maria; Geyken, Alexander; Lemnitzer, Lothar; Storrer, Angelika (2012): A TEI Schema for the Representation of Computer-mediated Communication. In: Journal of the Text Encoding Initiative (jTEI), Issue 3 | November 2012 (DOI: 10.4000/jtei.476).
Eliza Margaretha, Harald Lüngen (2014): Building Linguistic Corpora from Wikipedia Articles and Discussions text. In: Beißwenger, Michael; Oostdijk, Nelleke; Storrer, Angelika; van den Heuvel, Henk (Eds., 2014): Building and Annotating Corpora of Computer-Mediated Communication: Issues and Challenges at the Interface of Corpus and Computational Linguistics. Special Issue, Journal of Language Technology and Computational Linguistics (JLCL 2/2014).

Presentations of the DeRiK schema: The schema has been presented and discussed at the following conferences:

Representing genres of computer-mediated communication in TEI. Panel at the 2011 TEI Annual Conference and Members Meeting "Philology in the Digital Age", Universität Würzburg (D) (organized by Michael Beißwenger & Lothar Lemnitzer).
A TEI Schema for the Annotation of CMC Genres. Talk held by Michael Beißwenger, Maria Ermakova, Alexander Geyken, Lothar Lemnitzer & Angelika Storrer at the international workshop "Building Corpora of Computer-Mediated Communication: Issues, Challenges, and Perspectives", TU Dortmund University (D), 13-15 February 2013.
Computer-Mediated Communication in TEI: What Lies Ahead. PAnel the 2013 TEI Annual Conference and Members Meeting "The Linked TEI: Text Encoding in the Web", Università La Sapienza, Rom (IT) (organized by Michael Beißwenger & Lothar Lemnitzer).

Documentation of past SIG activities

[2018] Two virtual meetings via Skype

Discussion of minor modeling and encoding issues on basis of a 1st draft version of the 'reduce to the max' schema.

[2017] SIG meeting as a satellite event of the DHCMC2017 workshop in Essen/Germany (June 21, 2017)

A core group with colleagues from Clermont, Bolzano, Mannheim, Gießen and Essen met at the university of Duisburg-Essen for a 3-hour meeting to plan the relevant steps towards a 'reduce to the max' version of the schema drafts discussed within the SIG which can serve as the basis of a feature request.

[2016] Talk and SIG meeting at the TEI Conference and Members' Meeting 2016 in Vienna/Austria (September 30, 2016)

In the talk "Converting and Representing Social Media Corpora into TEI: Schema and Best Practices from CLARIN-D" we reported about the recent CMC-TEI schema version developed for the remodeling of an existing German chat corpus. (Slides (pdf))

The talk was followed by a SIG meeting inm which we made plans for the next steps towards a feature request for CMC2TEI. A summary of the meeting will be made available on this page (soon): SIG meeting in Vienna, 30 September 2016.

[2015] Connect, Animate, Innovate: TEI Conference and Members' Meeting 2015 in Lyon/FR (October 28-31, 2015)

Special topic panel "TEI across corpora, languages and genres: Towards a standard for the representation of social media and computer-mediated communication" (organized by Michael Beißwenger & Thierry Chanier). View panel description (pdf)

[2015] 3rd Conference on CMC and Social Media Corpora for the Humanities [cmccorpora2015] in Rennes/FR (October 23-24, 2015)

Special topic panel "Towards an encoding standard for social media and CMC: Experiences from German and French corpus projects using TEI" has been accepted for the conference (organized by Michael Beißwenger & Thierry Chanier). View panel description (pdf)

[2014] 3rd SIG meeting at the 4th DARIAH-EU VCC meeting in Rome/IT (September 17-18, 2014)

During the DARIAH meeting, a community meeting on "TEI CMC: Models and tools for structuring & annotating corpora of social media / computer-mediated communication" has been held by members of the SIG. Details on the goal and contents of the meeting can be found on the page: SIG:CMC/Technical Meeting on CMC at DARIAH VCC 2014.

[2014] SIG meeting at the 2nd Conference on CMC and Social Media Corpora for the Humanities [cmccorpora2014] in Dortmund/DE (February 20, 2014)

The 2nd SIG meeting was held as part of the 7th workshop of the scientific network Empirikom "Social Media Corpora for the eHumanities: Standards, Challenges, and Perspectives" in Dortmund.

A report about the meeting including slides of all presentations can be found on the page: 2nd CMC-SIG meeting in Dortmund, 20 February 2014.

[2013] SIG meeting at TEI-MM in Rome/IT (October 03, 2013)

The 1st SIG meeting was held as part of the TEI Conference and Members Meeting in October 2013 in Rome.

A report about the meeting can be found on the page: 1st CMC-SIG meeting in Rome, 3 October 2013.

[2013] Panel on CMC in TEI at the TEI-MM in Rome/IT (October 04, 2013)

In addition to the 1st SIG meeting, Michael Beißwenger & Lothar Lemnitzer organized a special-topic panel on "Computer-mediated commuication in TEI: What lies ahead" with contributions of several members of the SIG that was held at the TEI-MM in Rome. The three presentations in the panel gave a report about of experiences with modeling CMC in XML and an outline of phenomena and issues related with the representation of CMC in TEI from the perspective of corpus projects from France, Germany, Italy and the Netherlands. The overall goal of the panel was to stimulate the further discussion within the TEI community about how a standard for the representation of CMC in TEI should look like and what might be a practical and reasonable way to go about creating such a standard.

Documentation of the panel:

Members of the SIG up to 2017

Michael Beißwenger - University of Duisburg-Essen (DE)
Thierry Chanier – Université Blaise Pascal, Clermont-Ferrand (FR)
Isabella Chiari – Università "La Sapienza", Rome (IT)
Tomaž Erjavec - Jožef Stefan Institute (SI)
Maria Ermakova – Berlin-Brandenburg of Sciences and the Humanities (DE)
Darja Fišer - University of Ljubljana (SI
Marcel Fladrich - University of Hamburg (DE)
Maarten van Gompel – Radboud University Nijmegen (NL)
Natalia Grabar - CNRS, University of Lille 3 (FR)
Holger Grumt Suárez - Justus Liebig Universität Gießen (DE)
Iris Hendrickx – Radboud University Nijmegen (NL)
Axel Herold – Berlin-Brandenburg of Sciences and the Humanities (DE)
Laura Herzberg - University of Mannheim (DE)
Henk van den Heuvel – Radboud University Nijmegen (NL)
Natali Karlova-Bourbonus - Justus Liebig Universität Gießen (DE)
Nikola Ljubešić - University of Zagreb (HR)
Lydia-Mai Ho-Dac, University of Toulouse (FR)
Kun Jin – Université Blaise Pascal, Clermont-Ferrand (FR)
Lothar Lemnitzer – Berlin-Brandenburg of Sciences and the Humanities (DE)
Harald Lüngen - Institut für deutsche Sprache, Mannheim (DE)
Eliza Margaretha - Institut für deutsche Sprache, Mannheim (DE)
Céline Poudat, University of Nice Sophia Antipolis (FR)
Roman Schnneider - Institut für deutsche Sprache, Mannheim (DE)
Angelika Storrer – University of Mannheim (DE)
Ludovic Tanguy - University of Toulouse (FR)

Mailing list & further information

For exchange (inbetween the TEI-MMs), the SIG will use the talk pages in the TEI wiki and a mailing list.

Mailing list: tei-cmc@lists.uni-due.de (You can register for the list at https://lists.uni-due.de/mailman/listinfo/tei-cmc)
SIG page on the TEI website

SIG:Computer-Mediated Communication

Contents

Mission statement

CMC module and chapter in the TEI Guidelines ("CMC-TEI")

CMC-TEI Encoding examples

Stylesheet cmccore2cmctei.xsl

Documentation of recent SIG activities

[2025] Virtual SIG Meeting

[2024] Inclusion of CMC module and chapter in the TEI Guidelines

[2019] Feature request

[2019] Creation of cmc-core

Documentation of past schema drafts from the SIG 2012-2019

"CMC-core schema" (2019): TEI Schema and ODD from the TEI CMC SIG

"CLARIN-D schema" (2015): TEI Schema and ODD from the CLARIN-D curation project ChatCorpus2CLARIN

"CoMeRe schema" (2014): TEI schema and ODD from the CoMeRe network

"DeRiK schema" (2012): TEI schema and ODD for CMC from the DeRiK project

Documentation of past SIG activities

[2018] Two virtual meetings via Skype

[2017] SIG meeting as a satellite event of the DHCMC2017 workshop in Essen/Germany (June 21, 2017)

[2016] Talk and SIG meeting at the TEI Conference and Members' Meeting 2016 in Vienna/Austria (September 30, 2016)

[2015] Connect, Animate, Innovate: TEI Conference and Members' Meeting 2015 in Lyon/FR (October 28-31, 2015)

[2015] 3rd Conference on CMC and Social Media Corpora for the Humanities [cmccorpora2015] in Rennes/FR (October 23-24, 2015)

[2014] 3rd SIG meeting at the 4th DARIAH-EU VCC meeting in Rome/IT (September 17-18, 2014)

[2014] SIG meeting at the 2nd Conference on CMC and Social Media Corpora for the Humanities [cmccorpora2014] in Dortmund/DE (February 20, 2014)

[2013] SIG meeting at TEI-MM in Rome/IT (October 03, 2013)

[2013] Panel on CMC in TEI at the TEI-MM in Rome/IT (October 04, 2013)

Members of the SIG up to 2017

Mailing list & further information

Navigation menu

SIG:Computer-Mediated Communication

Mission statement

CMC module and chapter in the TEI Guidelines ("CMC-TEI")

CMC-TEI Encoding examples

Stylesheet cmccore2cmctei.xsl

Documentation of recent SIG activities

[2025] Virtual SIG Meeting

[2024] Inclusion of CMC module and chapter in the TEI Guidelines

[2019] Feature request

[2019] Creation of cmc-core

Documentation of past schema drafts from the SIG 2012-2019

"CMC-core schema" (2019): TEI Schema and ODD from the TEI CMC SIG

"CLARIN-D schema" (2015): TEI Schema and ODD from the CLARIN-D curation project ChatCorpus2CLARIN

"CoMeRe schema" (2014): TEI schema and ODD from the CoMeRe network

"DeRiK schema" (2012): TEI schema and ODD for CMC from the DeRiK project

Documentation of past SIG activities

[2018] Two virtual meetings via Skype

[2017] SIG meeting as a satellite event of the DHCMC2017 workshop in Essen/Germany (June 21, 2017)

[2016] Talk and SIG meeting at the TEI Conference and Members' Meeting 2016 in Vienna/Austria (September 30, 2016)

[2015] Connect, Animate, Innovate: TEI Conference and Members' Meeting 2015 in Lyon/FR (October 28-31, 2015)

[2015] 3rd Conference on CMC and Social Media Corpora for the Humanities [cmccorpora2015] in Rennes/FR (October 23-24, 2015)

[2014] 3rd SIG meeting at the 4th DARIAH-EU VCC meeting in Rome/IT (September 17-18, 2014)

[2014] SIG meeting at the 2nd Conference on CMC and Social Media Corpora for the Humanities [cmccorpora2014] in Dortmund/DE (February 20, 2014)

[2013] SIG meeting at TEI-MM in Rome/IT (October 03, 2013)

[2013] Panel on CMC in TEI at the TEI-MM in Rome/IT (October 04, 2013)

Members of the SIG up to 2017

Mailing list & further information

Navigation menu

Search