SIG:CMC/CoMeRe schema draft for representing CMC in TEI (2014)
Contents
Status of this draft
This page describes a draft for a basic schema for representing genres on computer-mediated communication (CMC) in TEI. The draft has been created by members of the TEI-SIG "Computer-Mediated Communication".
The SIG encourages everybody to discuss this draft and give their feedback/comments using the "discussion" function on top of this page. The comments/discussions will be carefully taken into consideration in the further development of the schema.
The history of the draft is documented on the main wiki page of the SIG. This page should be read in parallel to SIG:CMC/Draft: A metadata schema for CMC
Authors of this draft: Thierry Chanier, N.N., N.N.
Interaction types
Interaction
Participants are in the same interaction space (IS) when they can interact (but not necessarily do it, cf. lurkers). They interact through input devices,(microphone, keyboard, mouse, gloves, etc.), which let them use the modality tools and output devices, mainly producing visual or oral signals. (These however, will not be described in this article). Hence when participants cannot hear nor see the other participants’ actions, they are not in the same IS. Of course, participants may not be participants during the whole time frame of the IS. They can enter late, or leave early.
In an IS, actions occur between participants. Let us call the trace of an action within an environment and one particular modality an “act”. Acts are generated by participants, and sometimes by the system. Some of them may be considered as directly communicative (verbal ones in synchronous text or oral modalities). Others may not be directly communicative but may represent the cause of communicative reaction / interaction (e.g. when participants write collaboratively in an online word processor and comment on their work). Participants see and hear what others are doing. These actions may represent the rationale for participants to be there and to interact (produce something collectively). Hence the distinction between acts, directly communicative or not, is irrelevant.
An important distinction may be made between an IS where only one modality (tool) is used by participants, and an IS where several occur. In the next section, we start presenting some examples of mono modality environments where actions occur en bloc (Beisswenger et al., 2012)<ref name="Beißwenger"/>. A more complicated case appears when an IS uses several modalities. We will also find an example in this article.
Within other multimodal environments verbal (speech, text chat) and nonverbal acts occur simultaneously. The main purpose of transcriptions is then to describe inter-relations amongst acts and within acts: the participant’s utterance may be re-planned when s/he talks depending on other specific acts occurring at the same time (see Wigham & Chanier, 2013)<ref name="WighamRecall"/><ref name="WighamCall"/><ref name="ChanierAlsic"/><ref name="Ciekanski"/>. Indeed, written communication can be simultaneously combined with other modalities. For example, there are situations where a participant does not plan an utterance as a one-shot process before it is sent as an en bloc message to a server, which in turn displays it to the other participants as an non modifiable piece of language (e.g. as a text chat turn). Undeniably, an utterance can also be planned, then modified in the throes of the interaction while taking into account what is happening in other modalities of communication (e.g. in an audio chat turn.
CMC environment |
Mode & modality |
TEI element |
Main macrostrcuture issues with TEI |
Corpora presently under processing into TEI |
SMS |
Text |
<post> |
No notion of addresse (in <head> ?) |
Y |
Textchat |
Text |
<post> |
@alias, @type, @subtype |
Y |
|
Text |
<post> |
Addresses, readers, copy, attached file, etc. (in <head>) |
Y |
Discussion Forum |
Text |
<post> |
Threads (opening, answering) : @ref |
Y |
Wikipedia discussion forum |
Text |
<post> |
Reply difficult to identify (indent): @ref |
Y |
Blog |
Text + image |
<post> |
Message & comment : @ref |
Y |
Audio conferencing system (e.g. Skype) |
Text |
<post> |
|
N |
Complex CMC environment |
||||
LMS (Learning Management System) : WebCT, Moodle |
Text |
<post> |
One TEIcorpus file |
Y |
Audio-graphic conference system (e.g. Lyceum, Centra) |
Text + audio + nonverbal |
<post> <u> <prod> |
Every element at the same level, i.e. mixing of
these elements within a <div> |
Y |
Video-graphic or 3D environment (e.g. : Second Life) |
Text + audio + nonverbal |
<post> <u> <prod> |
Idem + set of video files attached to the TEI file |
Y |
Examples of macrostructures
Accordingly to (Beiswenger et al., 2012)<ref name="Beißwenger"/>, we refer to macro-structure when considering the general information attached to an interaction (adresses, copy-to, title, label, readers, attached files, etc.) as well as issues dealing with way of arranging sets of interactions per modality or interaction space. The micro-structure of the text (next section) refers to the type of elements found in the actual contents of the interaction (<post>, <u> or <prod>) for example interaction words, emoticons, hash code, etc.
Before assembling general proposals concerning <post> and <prod> and their attributes, let us consider some examples of CMC interactions.
Multimodal example
From : cmr-copeas-tei-v1<ref name="comere"/>. Context: Context: Lyceum audio-graphic conference environment, 3 learners (English L2) working into a word processor: one writing, others helping.
- (1.2): collaborative word processor
- (1.3): audio, clarification
- (1.4): textchat, correction (with error)
- (1.5): textchat, request clarification
(1) (1.2)<prod xml:id="cmr-copeas-T8_s101_ecriture_multimodale-a_14481" xml:lang="unk" start="#cmr-copeas-tl_t-w1979" end="#cmr-copeas-tl_t-w1987" who="#AT3" type="text_doc"> (modify),paragraph (ad,For example:to have comparaison between web sites, to know more criterias for a good site)</prod> (1.3)<u xml:id="cmr-copeas-T8_s101_ecriture_multimodale-a_14482" xml:lang="eng" start="#cmr-copeas-tl_t-w1988" end="#cmr-copeas-tl_t-w1993" who="#AT1">euh + to euh + to use the good euh + good euh vocabulary + I think euh + we euh + we wrote euh for me I have euh +++ I've euh much progress in euh in use the good vocabulary to euh + to evaluate euh a website</u> (1.4)<post xml:id="cmr-copeas-T8_s101_ecriture_multimodale-a_14483" xml:lang="unk" synch="#cmr-copeas-tl_t-w1990" who="#AT6" type="chat-message"> <p>according to the differents criteria</p> </post> (1.5)<post xml:id="cmr-copeas-T8_s101_ecriture_multimodale-a_14484" xml:lang="unk" synch="#cmr-copeas-tl_t-w1991" who="#AT6" type="chat-message"> <p>?</p>
Textchat
From cmr-getalp_org-rhone-alpes-tei-v1<ref name="comere"/>. Textchat turns correspond here, respectively, to :
- the first 3: messages
- 4: change alias
- 5: change rights
(2) <post xml:id="cmr-get-c065-a21693" when-iso="2004-03-18T14:09" who="#cmr-get-c065-p39174" alias="cortex_taff" type="chat-message"> <p>Apres je vé faire ma physique c aussi les equation bilan</p></post> <post xml:id="cmr-get-c065-a21694" when-iso="2004-03-18T14:09" who="#cmr-get-c065-p39174" alias="cortex_taff" type="chat-message"> <p>Aujourd'hui c la journée equation</p></post> <post xml:id="cmr-get-c065-a21697" when-iso="2004-03-18T14:11" who="#cmr-get-c065-p36208" alias="roulie" type="chat-message"> <p>lol</p></post> <post xml:id="cmr-get-c065-a21699" when-iso="2004-03-18T14:13" who="#cmr-get-c065-p120845" alias="Tsu" type="chat-event" subtype="changementpseudo"> <p>Changement de pseudo: Tsu -> Tsu[H</p><add><code>alias_change(Tsu,Tsu[H)</code></add></post> <post xml:id="cmr-get-c065-a21705" when-iso="2004-03-18T14:18" who="#unknow" type="chat-event" subtype="changementmode"> <p>#Rhone-alpes:changement de mode(s) '+o Mega-Link' par Hera!services@olympe.epiknet.org</p></post>
SMS
From cmr-smslareunion-tei-v1<ref name="comere"/>. Two SMS messages sent by the same person to the same addressee. Note that here the adresse is not explicitly encoded (phone numbers of adresses have not been collected, hence are not known, and there is no attribute in TEI to encode adresses (same problem with speech / oral chapter in TEI)
(3) <post xml:id="cmr-slr-c001-a00011" when-iso="2008-04-14T10:17:11" who="#cmr-slr-c001-p010" type="sms"> <p>é@??$?Le + triste c ke tu na aucune phraz agréabl et ke tu va encor me dir ke c moi ki Merde par mon attitu2! Moi je deman2 pa mieu ke klke mot agréabl échangé</p> […] <post xml:id="cmr-slr-c001-a00304" when-iso="2008-04-15T20:23:59" who="#cmr-slr-c001-p010" type="sms"> <p>...2 te comporter comme ca avec moi. Je ve bien admettr mes erreur kan j'agi vraimen mal comm hier mé fo pa exagérer. Si t pa d'accor c ton droi. Si tentain.le rest c à dirreposer dé question sur 1 sujet déjà expliké c pa 1 raison valabl pr ke tu te monte contr moi.pr moi ossi ca suffi.</p>
Discussion forum
From cmr-simuligne-tei-v1<ref name="comere"/>. The author of the message is a native speaker of French who is replying to a post made by a learner of French. Each person mentioned has been identified in the message structure (author, list of readers -here shortened-) and in its contents (addressee, signature of the author, attached file). This information may lead to other types of research on discourse and group interactions. For example, who takes the position of a leader, or an animator in a group? Can subgroups of communication be traced within a group, thanks to an analysis of clusters, cliques?
(4) <post xml:id="cmr-Simu-Aquitania-Principal_27.04-13.05.01-59" when="2001-05-02T07:58:00" who="#cmr-Simu-Al5" type="forum-message"> <head> <title>les sons du Suffolk</title> <listPerson> <person corresp="#cmr-Simu-An3"> <event type="Read" when="2001-05-02T07:58:00"> <label>Read</label> </event> </person> [other readers] </listPerson> </head> <p>Puisqu'on parlait de ce qui est par la fenêtre, j'ai mis mon micro tout près de la fenêtre ce soir... Le coucou s'était déjà couché, malheureusement, mais les autres chantent très fort! Les 'Bullocks' ont tous le même père: un grand taureau Charolais, qui est le père de tous les boeufs de la région! bonne nuit à tous<name ref="#cmr-Simu-Al5" type="person"><forename>Marja</forename></name> </p> <trailer> <ref type="attached_file" target="#Simu_Aqui_forum_attach_988833475" >suffolk.qcp</ref> </trailer> </post>
Wikipedia discussion
to be added
Blog
From cmr-infral-tei-v1<ref name="comere"/>. One message and its comment .
(5) <post xml:id="cmr-blog-a2" synch="#T2" who="#P2" type="blog-message"> <head> <title>Présentation de ma personne</title> <label>étapeE1 ; </label> </head> <p>Bon soir à tous!<lb/> Maintenant, je vais commencer avec les présentations....... <lb/> Je pense que vous avez vu que je m'appelle <name ref="#P2">Kerstin</name> . J'ai 22 ans. Mon nom est un nom suédois qui est très fréquent en Allemagne. Comme vous savez peut-être, on a commencé nos études en master cette semaine. <lb/> Ma famille - mes parents et mes deux soeurs -habite à Osnabrueck. C'est une ville qui est pas loin de Brême. Après avoir passé mon bac à Osnabrueck, j'ai commencé mes études de francais et de sport à Brême. La raison pour laquelle j'ai choisi ces deux matières est que j'aime faire du sport (jouer au tennis, nager) et que j'adore la culture francaise. J'adore la langue francaise et le pays me plaît beaucoup (le paysage francais....). <lb/> Les deux étés passés, j'ai fait un stage de plus que deux mois en Suisse francophone et en France (près de Lyon) pour améliorer mes connaissances de la langue francaise et la pratique du francais à l'oral. <lb/> En ce qui concerne mes études de francais, ce qui me plaît surtout, c'est, d'explorer la culture francaise d'une manière différente (les textes littéraires, les séquences vidéos......). <lb/> J'attends vos présentations et je vous souhaite encore un bon soir........ <lb/> A bientôt, <name ref="#P2">Kerstin</name></p> </post> <post xml:id="cmr-blog-a3" synch="#T3" who="#P3" type="blog-comment" ref="#cmr-blog-a2"> <head> <title>Hallo Kirstin! J'ai lu que tu as fait des stages e...</title> </head> <p>Hallo<name ref="#P2">Kirstin</name> ! J'ai lu que tu as fait des stages en Suisse francophone ! Où exactement car j'habite près de la frontière suisse (à 1h de Lausanne !)! Je pense qu'on aura l'occasion d'en reparler ! Bis Bald </p> </post>
From cmr-simuligne-tei-v1<ref name="comere"/>. On email snet to one person and read by this person.
(6) <post xml:id="cmr-Simu-Aq-At-Outbox-0080" when="2001-05-12T01:15:00" who="#cmr-Simu-At" type="email-message"> <head> <title>ta photo</title> <listPerson> <person corresp="#cmr-Simu-Al6"> <event type="SendTo"> <label>SendTo</label></event></person> <person corresp="#cmr-Simu-Al6"> <event type="Read" when="2001-05-12T01:15:00"> <label>Read</label></event></person> </listPerson> </head> <p>Coucou<name ref="#cmr-Simu-Al6" type="person"><forename>Mia</forename></name>, Tu peux aller te voir dans Publications : maintenant, tu y existe en totalité ! A bientôt,<name ref="#cmr-Simu-At" type="person"><forename>Anna</forename></name> </p> </post>
Macrostructure: the <post> element
The element
In our schema, the element <post> is the basic structural element of a CMC document corresponding to textual "enbloc" interactions. We consider it a macrostructural element, but it is the pivot between the higher level macrostructural components thread and logfile and the microstructure of the content which it encloses . The structure of <post> is based on that of the existing <div> element.
The <div> and <post> elements have the following similarities:
- <div> and <post> are high-level elements, belonging to the same class(model.divLike);
- <div> and <post> contain the major divisions of text;
- <div> and <post> have similar internal content.
It is important to note that <post>, like <div>, does not belong to the class of pLike elements. One <post> may consist of one or more paragraphs, similar to a <nowiki><div>. While a division may represent, for example, a chapter of a book, <posting> represents one user contribution to some computer-mediated communication event (forum, blog, web-discussion, or chat). Such a contribution can contain multiple paragraphs, just like <div>. In the chat, all postings consist of exactly one paragraph and the portion of text exhibits no special markup, but on the Wikipedia talk page given in figure 2, some of the postings contain divisions and markup that the authors inserted into the content of their postings in order to structure their content. Therefore, <post> cannot be a model.pLike element.
The <div> and <post> elements have the following differences:
- <div> is a self-nesting element, while <post> is not;
- <post>s can only appear inside of a division which encloses one complete CMC
document (such as an entire forum thread, an entire blog with user comments, or a chat logfile).
In other words, <post> is a child element of
it does not contain divisions and does not embed itself. Normally, <posting> consists of one or more paragraphs. In some cases a posting contains a head, typically with a title.
Attributes for <post>
Here is a summary of the attributes and other information which may be attached to different types of CMC environments. Note that information relative to StatusRead and Receiver have been encoded in TEI in the <head> of the <post> (close to the <title> - for email, forum, blog- and <label> (for blog)). Attributes relative to Wikipedia forums have not yet been used (see the corresponding section).
|
SMS |
Textchat |
Blog |
|
Discussion forum |
Wiki forum |
@xml:lang |
y |
y |
y |
y |
y |
y |
@xml:id |
y |
y |
y |
y |
y |
y |
@who |
y |
y |
y |
y |
y |
y |
@when / when-iso / synch |
y |
y |
y |
y |
y |
y |
@type |
y |
y |
y |
y |
y |
y |
@subtype |
N |
Option |
N |
N |
N |
N |
@ref |
N |
N |
Y (comment) |
Y (respond) |
Y (respond) |
Y(respond) |
Not in TEI |
||||||
@alias |
|
Option |
|
|
|
|
Reciever |
|
|
|
SendTo, Cc, Bcc |
|
|
StatusRead |
|
|
Option |
Option |
Option |
|
@revisedWhen |
|
|
|
|
|
Y |
@revisedBy |
|
|
|
|
|
Y |
@identLevel |
N |
N |
N |
N |
N |
Y |
Macrostructure & multimodality: u and prod elements
Types of divisions for the interaction space
As already seen an interaction space may be described at 2 very different levels:
- 1) the meta level (see SIG:CMC/Draft: A metadata schema for CMC) ;
- 2) the interactions per themselves (i.e. the set of acts. These acts, in all examples given here, are included within a division which correspond to a session or a division within a division.
We may distinguish several types of divisions :
- div type=”thread” , e.g. forum, blog with different tools and then
- child element : <post>with different types
- div type =”logfile” , e.g. textchat, SMS, with different tools
- child element : <post> with different types for example within a textchat
- div type =”oral-discourse" for audiochat
- child element : <u> see chapter TEI on speech
- div type=”multi-modalities”
- child element : <post>
- child element : <u>
- child element : <prod> (for iconic acts - vote, raise_hand, brief_absence_act, etc. - , all collective tools – wordprocessor, semantic map, whiteboard, etc. - nonverbal communication ) . As an example of non-verbal classification of acts, see the figure besides which represents non-verbal acts in Second Life as encoded by (Wigham & Chanier, 2013)<ref name="WighamRecall"/>.
<prod> element
As explained the <prod> element refers to acts which are non-verbal, are part of the interction process, at the same level than the <post> and <u> elements. After example (1) given herebefore, here is another example (7) of interactions between one tutor and learners (From : cmr-copeas-tei-v1<ref name="comere"/>. Context: Context: Lyceum audio-graphic conference environment).
- (7.1) audio act : yes/no question by the tutor
- (7.2) positive answer givent through a non-verbal modality by a learner (inconic system, modality here named "vote" with content "agree")
- (7.3) audio act : yes/no question by the tutor
- (7.4) textchat act : complementary info given by a learner
- (7.5) to (7.9) : yes / no answers of 5 participants through the iconic system
(7) (7.1)<u xml:id="cmr-copeas-R2_lobby-a_1297" xml:lang="eng" start="#cmr-copeas-tl_r-w107" end="#cmr-copeas-tl_r-w109" who="#AR4"> euh no + euh I don't know the + the style ++ in french it's a band + named + {les enfoirés} ++ you know euh + {enfoirés} |+++</u> (7.2)<prod xml:id="cmr-copeas-R2_lobby-a_1298" synch="#cmr-copeas-tl_r-w108" who="#AR7" type="vote">agree</prod> (7.3)<u xml:id="cmr-copeas-R2_lobby-a_1299" xml:lang="eng" start="#cmr-copeas-tl_r-w109" end="#cmr-copeas-tl_r-w110" who="#TutR">anybody else know |</u> (7.4)<post xml:id="cmr-copeas-R2_lobby-a_1301" xml:lang="unk" synch="#cmr-copeas-tl_r-w110" who="#AR6" type="chat-message"> <p>french's singers</p> </post> (7.5)<prod xml:id="cmr-copeas-R2_lobby-a_1302" synch="#cmr-copeas-tl_r-w111" who="#AR3" type="vote">agree</prod> (7.6)<prod xml:id="cmr-copeas-R2_lobby-a_1303" synch="#cmr-copeas-tl_r-w111" who="#AR2" type="vote">agree</prod> (7.7)<prod xml:id="cmr-copeas-R2_lobby-a_1304" synch="#cmr-copeas-tl_r-w112" who="#AR6" type="vote">agree</prod> (7.8)<prod xml:id="cmr-copeas-R2_lobby-a_1305" synch="#cmr-copeas-tl_r-w113" who="#TutR" type="vote">disagree</prod> (7.9)<prod xml:id="cmr-copeas-R2_lobby-a_1306" synch="#cmr-copeas-tl_r-w114" who="#AR1" type="vote">disagree</prod>
The contents of the <prod> element is fairly simple in (7), whereas it is much more complicated in the <prod> element of example (1). In (1) it corresponds to an act of typing within a collaborative word processor. It is up to the researchers who transcribe (out of videoscreen captures) actions within online collaborative tools to decide which kind ofcoding scheme they want. We should not impose anything for this contents. The only mandatory information should be restrained to attributes.
This element does not exist in the current TEI version. Of course, the element name may be debatable (here the name "prod" corresponds to the fact that the corresponding non verbal act is a production made by a participant), but not its function.
We have considered some TEI elements relared to non verbal features before introducing <prod>.
Elements than cannot be used as an act of type prod
- <activity>: “contains a brief informal description of what a participant in a language interaction is doing other than speaking, if anything.”, it is a brief description, no attribute @who, too low-level
- <kinesic>: “marks any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc”. Has been designed as integrated inside <u> , but may be used at the same level. However the name is wrong. Kinesic is a specific non-verbal notion related to gaze, posture, gesture, not a general one.
- <incident>: “marks any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication.” Name unacceptable, runs against the interaction and communicative framework.
This is the whole philosophy / theoretical standpoint of the TEI chapter on speech that cannot be applied to non verbal description placed as the same level as text and sppech. These TEI elements, related to an utterance, are not really considered as being part in the interaction, at the same level as the utterance’s one. Their naming is also unacceptable and cannot refer to concepts we have mentioned here.
Attributes of <u> and <prod>
Microstructure
text <ref name="Beißwenger"/>text
References
<references> <ref name="Beißwenger"> Beißwenger, M., Ermakova, M., Geyken, A., Lemnitzer, L &, and Storrer, A (2012). "A TEI Schema for the Representation of Computer-mediated Communication", Journal of the Text Encoding Initiative, 3. [1] ; DOI : 10.4000/jtei.476</ref> <ref name="ChanierAlsic">Chanier, T., Vetter. A. (2006). "Multimodalité et expression en langue étrangère dans une plate-forme audio-synchrone". Apprentissage des langues et Système d'Information et de Communication (Alsic), vol. 9. pp 61-101. DOI : 10.4000/alsic.270 [2]</ref> <ref name="Ciekanski">Ciekanski, M., Chanier, T (2008). Developing online multimodal verbal communication to enhance the writing process in an audio-graphic conferencing environment. Recall, vol. 20 (2), Cambridge University Press. 162-182. doi:10.1017/S0958344008000426 [3]</ref> <ref name="WighamRecall">Wigham, C.R. & T. Chanier (2013a). "A study of verbal and nonverbal communication in Second Life. the ARCHI21 experience". ReCALL 25(1), Cambridge Journals. DOI: 10.1017/S0958344012000250 [4]</ref> <ref name="WighamCall">Wigham, C.R. & T. Chanier (to appear in 2013b). Interactions between text chat and audio modalities for L2 communication and feedback in the synthetic world Second Life. (CALL) Computer Assisted Language Learning. DOI: 10.1080/09588221.2013.851702 [5]</ref> <ref name="Derik">DeRiK (2013). Description of the DeRiK (Deutsches Referenzkorpus zur internetbasierten Kommunikation) project for CMC in German databank of corpora encoded into TEI. [6] <ref name="comere">CoMeRe(2014). Website documentation of the CoMeRe (Communicaiotn Médiée par les Réseaux)project. CMC in French databank of corpora encoded into TEI [7]</ref> </references>