SIG:CMC/CoMeRe schema draft for representing CMC in TEI (2014)

Status of this draft
This page describes a draft for a basic schema for representing genres on computer-mediated communication (CMC) in TEI. The draft has been created by members of the TEI-SIG "Computer-Mediated Communication".

The SIG encourages everybody to discuss this draft and give their feedback/comments using the "discussion" function on top of this page. The comments/discussions will be carefully taken into consideration in the further development of the schema.

The history of the draft is documented on the main wiki page of the SIG. This page should be read in parallel to SIG:CMC/Draft: A metadata schema for CMC

Authors of this draft: Thierry Chanier, N.N., N.N.

Interaction
Participants are in the same interaction space (IS) when they can interact (but not necessarily do it, cf. lurkers). They interact through input devices,(microphone, keyboard, mouse, gloves, etc.), which let them use the modality tools and output devices, mainly producing visual or oral signals. (These however, will not be described in this article). Hence when participants cannot hear nor see the other participants’ actions, they are not in the same IS. Of course, participants may not be participants during the whole time frame of the IS. They can enter late, or leave early.

In an IS, actions occur between participants. Let us call the trace of an action within an environment and one particular modality an “act”. Acts are generated by participants, and sometimes by the system. Some of them may be considered as directly communicative (verbal ones in synchronous text or oral modalities). Others may not be directly communicative but may represent the cause of communicative reaction / interaction (e.g. when participants write collaboratively in an online word processor and comment on their work). Participants see and hear what others are doing. These actions may represent the rationale for participants to be there and to interact (produce something collectively). Hence the distinction between acts, directly communicative or not, is irrelevant.

An important distinction may be made between an IS where only one modality (tool) is used by participants, and an IS where several occur. In the next section, we start presenting some examples of mono modality environments where actions occur en bloc (Beisswenger et al., 2012). A more complicated case appears when an IS uses several modalities. We will also find an example in this article.

Within other multimodal environments verbal (speech, text chat) and nonverbal acts occur simultaneously. The main purpose of transcriptions is then to describe inter-relations amongst acts and within acts: the participant’s utterance may be re-planned when s/he talks depending on other specific acts occurring at the same time (see Wigham & Chanier, 2013). Indeed, written communication can be simultaneously combined with other modalities. For example, there are situations where a participant does not plan an utterance as a one-shot process before it is sent as an en bloc message to a server, which in turn displays it to the other participants as an non modifiable piece of language (e.g. as a text chat turn). Undeniably, an utterance can also be planned, then modified in the throes of the interaction while taking into account what is happening in other modalities of communication (e.g. in an audio chat turn.

Examples of macrostructures
Accordingly to (Beiswenger et al., 2012), we refer to macro-structure when considering the general information attached to an interaction (adresses, copy-to, title, label, readers, attached files, etc.) as well as issues dealing with way of arranging sets of interactions per modality or interaction space. The micro-structure of the text (next section) refers to the type of elements found in the actual contents of the interaction (, or ) for example interaction words, emoticons, hash code, etc.

Before assembling general proposals concerning and and their attributes, let us consider some examples of CMC interactions.

Multimodal example
From : cmr-copeas-tei-v1. Context: Context: Lyceum audio-graphic conference environment, 3 learners (English L2) working into a word processor: one writing, others helping.
 * (1.2): collaborative word processor
 * (1.3): audio, clarification
 * (1.4): textchat, correction (with error)
 * (1.5): textchat, request clarification

(1) (1.2) (modify),paragraph (ad,For example:to have comparaison between                 web sites, to know more criterias for a good site) (1.3)euh + to                 euh + to use the good euh + good euh vocabulary + I think euh + we euh + we wrote euh for me I have euh +++ I've euh much progress in euh in use the good vocabulary to euh + to evaluate euh a website (1.4) according to the differents criteria (1.5) ?

Textchat
From cmr-getalp_org-rhone-alpes-tei-v1. Textchat turns correspond here, respectively, to :
 * the first 3: messages
 * 4: change alias
 * 5: change rights

(2)  Apres je vé faire ma physique c aussi les equation bilan  Aujourd'hui c la journée equation  lol  Changement de pseudo: Tsu -&amp;gt; Tsu[H  #Rhone-alpes:changement de mode(s) '+o Mega-Link' par Hera!services@olympe.epiknet.org

SMS
From cmr-smslareunion-tei-v1. Two SMS messages sent by the same person to the same addressee. Note that here the adresse is not explicitly encoded (phone numbers of adresses have not been collected, hence are not known, and there is no attribute in TEI to encode adresses (same problem with speech / oral chapter in TEI)

(3)  é@??$?Le + triste c ke tu na aucune phraz agréabl et ke tu va encor me dir ke c moi ki Merde par mon attitu2! Moi je deman2 pa mieu ke klke mot agréabl échangé […]  ...2 te comporter comme ca avec moi. Je ve bien admettr mes erreur kan j'agi vraimen mal comm hier mé fo pa exagérer. Si t pa d'accor c ton droi. Si tentain.le rest c à dirreposer dé question sur 1 sujet déjà expliké c pa 1 raison valabl pr         ke tu te monte contr moi.pr moi ossi ca suffi.

Discussion forum
From cmr-simuligne-tei-v1. The author of the message is a native speaker of French who is replying to a post made by a learner of French. Each person mentioned has been identified in the message structure (author, list of readers -here shortened-) and in its contents (addressee, signature of the author, attached file). This information may lead to other types of research on discourse and group interactions. For example, who takes the position of a leader, or an animator in a group? Can subgroups of communication be traced within a group, thanks to an analysis of clusters, cliques?

(4)  les sons du Suffolk    Read [other readers]  Puisqu'on parlait de ce qui est par la fenêtre, j'ai mis mon micro tout près de la fenêtre ce soir... Le coucou s'était déjà couché, malheureusement, mais les autres chantent très fort! Les 'Bullocks' ont tous le même père: un grand taureau Charolais, qui est le père de tous les boeufs de la région! bonne nuit à tous<name ref="#cmr-Simu-Al5" type="person"> Marja

Wikipedia discussion
to be added

Blog
From cmr-infral-tei-v1. One message and its comment. (5)      <post xml:id="cmr-blog-a2" synch="#T2" who="#P2" type="blog-message"> Présentation de ma personne étapeE1 ; Bon soir à tous!<lb/> Maintenant, je vais commencer avec les présentations.......              <lb/> Je pense que vous avez vu que je m'appelle <name ref="#P2">Kerstin. J'ai 22 ans. Mon nom est un nom suédois qui est très fréquent en Allemagne. Comme vous savez peut-être, on a commencé nos études en master cette semaine. <lb/> Ma              famille - mes parents et mes deux soeurs -habite à Osnabrueck. C'est une ville qui est pas loin de Brême. Après avoir passé mon bac à Osnabrueck, j'ai commencé mes études de francais et de sport à Brême. La raison pour laquelle j'ai choisi ces deux matières est que j'aime faire du sport (jouer au tennis, nager) et que j'adore la              culture francaise. J'adore la langue francaise et le pays me plaît beaucoup (le              paysage francais....). <lb/> Les deux étés passés, j'ai fait un stage de plus que deux mois en Suisse francophone et en France (près de Lyon) pour améliorer mes connaissances de la langue francaise et la pratique du francais à l'oral. <lb/> En ce              qui concerne mes études de francais, ce qui me plaît surtout, c'est, d'explorer la               culture francaise d'une manière différente (les textes littéraires, les séquences               vidéos......). <lb/> J'attends vos présentations et je vous souhaite encore un bon soir........ <lb/> A bientôt, <name ref="#P2">Kerstin <post xml:id="cmr-blog-a3" synch="#T3" who="#P3" type="blog-comment" ref="#cmr-blog-a2"> Hallo Kirstin! J'ai lu que tu as fait des stages e...            Hallo<name ref="#P2">Kirstin ! J'ai lu que tu as fait des stages en Suisse francophone ! Où exactement car j'habite près de la frontière suisse (à 1h de              Lausanne !)! Je pense qu'on aura l'occasion d'en reparler ! Bis Bald

Email
From cmr-simuligne-tei-v1. On email snet to one person and read by this person.

(6) <post xml:id="cmr-Simu-Aq-At-Outbox-0080" when="2001-05-12T01:15:00" who="#cmr-Simu-At" type="email-message"> ta photo  <person corresp="#cmr-Simu-Al6"> <event type="SendTo"> SendTo <person corresp="#cmr-Simu-Al6"> <event type="Read" when="2001-05-12T01:15:00"> Read </listPerson> Coucou<name ref="#cmr-Simu-Al6" type="person"> Mia, Tu peux aller te voir dans Publications : maintenant, tu y existe en totalité ! A bientôt,<name ref="#cmr-Simu-At" type="person"> Anna

The element
In our schema, the element is the basic structural element of a CMC document corresponding to textual "enbloc" interactions. We consider it a macrostructural element, but it is the pivot between the higher level macrostructural components thread and logfile and the microstructure of the content which it encloses. The structure of is based on that of the existing  element.

The  and elements have the following similarities:
 * and are high-level elements, belonging to the same class(model.divLike);
 * and contain the major divisions of text;
 * and have similar internal content.

It is important to note that, like , does not belong to the class of pLike elements. One may consist of one or more paragraphs, similar to a. While a division may represent, for example, a chapter of a book, represents one user contribution to some computer-mediated communication event (forum, blog, web-discussion, or chat). Such a contribution can contain multiple paragraphs, just like. In the chat, all postings consist of exactly one paragraph and the portion of text exhibits no special markup, but on the Wikipedia talk page given in figure 2, some of the postings contain divisions and markup that the authors inserted into the content of their postings in order to structure their content. Therefore, cannot be a model.pLike element.

The  and elements have the following differences: document (such as an entire forum thread, an entire blog with user comments, or a chat logfile).
 * is a self-nesting element, while is not;
 * s can only appear inside of a division which encloses one complete CMC

In other words, is a child element of and shares its content model except that it does not contain divisions and does not embed itself. Normally, consists of one or more paragraphs. In some cases a posting contains a head, typically with a title.

Attributes for
Here is a summary of the attributes and other information which may be attached to different types of CMC environments. Note that information relative to StatusRead and Receiver have been encoded in TEI in the of the (close to the - for email, forum, blog- and (for blog)). Attributes relative to Wikipedia forums have not yet been used (see the corresponding section).

Types of divisions for the interaction space
As already seen an interaction space may be described at 2 very different levels:
 * 1) the meta level (see SIG:CMC/Draft: A metadata schema for CMC) ;
 * 2) the interactions per themselves (i.e. the set of acts. These acts, in all examples given here, are included within a division which correspond to a session or a division within a division.

We may distinguish several types of divisions :


 * div type=”thread”, e.g. forum, blog with different tools and then
 * child element : with different types
 * div type =”logfile”, e.g. textchat, SMS, with different tools
 * child element : with different types for example within a textchat
 * div type =”oral-discourse" for audiochat
 * child element :  see chapter TEI on speech
 * div type=”multi-modalities”
 * child element :
 * child element :
 * child element : (for iconic acts - vote, raise_hand, brief_absence_act, etc. -, all collective tools – wordprocessor, semantic map, whiteboard, etc. - nonverbal communication ) . As an example of non-verbal classification of acts, see the figure besides which represents non-verbal acts in Second Life as encoded by (Wigham & Chanier, 2013).

element
As explained the element refers to acts which are non-verbal, are part of the interction process, at the same level than the and  elements. After example (1) given herebefore, here is another example (7) of interactions between one tutor and learners (From : cmr-copeas-tei-v1 . Context: Context: Lyceum audio-graphic conference environment).
 * (7.1) audio act : yes/no question by the tutor
 * (7.2) positive answer givent through a non-verbal modality by a learner (inconic system,  modality here named "vote" with content "agree")
 * (7.3) audio act : yes/no question by the tutor
 * (7.4) textchat act : complementary info given by a learner
 * (7.5) to (7.9) : yes / no answers of 5 participants through the iconic system

(7) (7.1)<u xml:id="cmr-copeas-R2_lobby-a_1297" xml:lang="eng" start="#cmr-copeas-tl_r-w107" end="#cmr-copeas-tl_r-w109" who="#AR4"> euh no + euh I don't know the + the style ++ in french it's a band + named + {les enfoirés} ++ you know euh + {enfoirés} |+++ (7.2)<prod xml:id="cmr-copeas-R2_lobby-a_1298" synch="#cmr-copeas-tl_r-w108" who="#AR7" type="vote">agree (7.3)<u xml:id="cmr-copeas-R2_lobby-a_1299" xml:lang="eng" start="#cmr-copeas-tl_r-w109" end="#cmr-copeas-tl_r-w110" who="#TutR">anybody else know | (7.4)<post xml:id="cmr-copeas-R2_lobby-a_1301" xml:lang="unk" synch="#cmr-copeas-tl_r-w110" who="#AR6" type="chat-message"> french's singers (7.5)<prod xml:id="cmr-copeas-R2_lobby-a_1302" synch="#cmr-copeas-tl_r-w111" who="#AR3" type="vote">agree (7.6)<prod xml:id="cmr-copeas-R2_lobby-a_1303" synch="#cmr-copeas-tl_r-w111" who="#AR2" type="vote">agree (7.7)<prod xml:id="cmr-copeas-R2_lobby-a_1304" synch="#cmr-copeas-tl_r-w112" who="#AR6" type="vote">agree (7.8)<prod xml:id="cmr-copeas-R2_lobby-a_1305" synch="#cmr-copeas-tl_r-w113" who="#TutR" type="vote">disagree (7.9)<prod xml:id="cmr-copeas-R2_lobby-a_1306" synch="#cmr-copeas-tl_r-w114" who="#AR1" type="vote">disagree

The contents of the element is fairly simple in (7), whereas it is much more complicated in the element of example (1). In (1) it corresponds to an act of typing within a collaborative word processor. It is up to the researchers who transcribe (out of videoscreen captures) actions within online collaborative tools to decide which kind ofcoding scheme they want. We should not impose anything for this contents. The only mandatory information should be restrained to attributes.

This element does not exist in the current TEI version. Of course, the element name may be debatable (here the name "prod" corresponds to the fact that the corresponding non verbal act is a production made by a participant), but not its function.

We have considered some TEI elements relared to non verbal features before introducing.

Elements than cannot be used as an act of type prod

 * : “contains a brief informal description of what a participant in a language interaction is doing other than speaking, if anything.”, it is a brief description, no attribute @who, too low-level
 * : “marks any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc”. Has been designed as integrated inside , but may be used at the same level. However the name is wrong. Kinesic is a specific non-verbal notion related to gaze, posture, gesture, not a general one.
 * : “marks any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication.” Name unacceptable, runs against the interaction and communicative framework.

This is the whole philosophy / theoretical standpoint of the TEI chapter on speech that cannot be applied to non verbal description placed as the same level as text and sppech. These TEI elements, related to an utterance, are not really considered as being part in the interaction, at the same level as the utterance’s one. Their naming is also unacceptable and cannot refer to concepts we have mentioned here.

Microstructure
text text