Difference between revisions of "SIG:Overlap"

From TEIWiki
Jump to navigation Jump to search
m
(reverted spam attack)
Line 1: Line 1:
[http://gratis.rigour.info/content/view/the-one.htm the one that i want] [http://down.rigour.info/blog/o-sole.htm o sole mio tom wallace] [http://speed.rigour.info/adidas-calcio.htm adidas calcio kaiser 5] [http://uve.rigour.info/blog/panasonic-lumix.htm panasonic lumix fz5] [http://pagina.rigour.info/golf/ golf 16] [http://guest.rigour.info/library/html/lechtal-alpi/ lechtal, alpi di-] [http://down.rigour.info/blog/castropignano.htm castropignano] [http://online.rigour.info/carrie-otis/ carrie otis] [http://it.rigour.info/topic/kurt-angle/ kurt angle] [http://content.rigour.info/html/ali-sulla/ ali sulla cina] [http://content.rigour.info/html/la-notte/ la notte delle spie] [http://pagina.rigour.info/bootex-log/ bootex log] [http://italy.rigour.info/blog/wallpaper-di.htm wallpaper di giorgia palmas] [http://dir.rigour.info/tunica-abbigliamento/ tunica abbigliamento] [http://paga.rigour.info/paolo-villaggio/ paolo villaggio. box set] [http://uve.rigour.info/blog/frasi-auguri.htm frasi auguri cresima] [http://all.rigour.info/www-mountain/ www mountain bike] [http://all.rigour.info/creme-cellulite/ creme cellulite] [http://content.rigour.info/html/tuta-pilota/ tuta pilota] [http://dir.rigour.info/franco-zeffirelli/ franco zeffirelli la traviata] [http://content.rigour.info/html/hashimoto/ hashimoto] [http://up.rigour.info/description/lib/stau-cu.htm stau cu fratii mei] [http://down.rigour.info/blog/vreau-sa.htm vreau sa plang in fata ta] [http://guest.rigour.info/library/html/il-pugile/ il pugile sentimentale] [http://blog.rigour.info/web/sacca-passeggino/ sacca passeggino] [http://web.rigour.info/adobe-acrobat/ adobe acrobat 6 0 professional] [http://milano.rigour.info/vanity-agency.htm vanity agency] [http://guest.rigour.info/library/html/o-fuori/ o fuori legge pino mauro] [http://content.rigour.info/html/router-switch/ router switch adsl] [http://pagina.rigour.info/help/ help] [http://speed.rigour.info/kimberly-davis.htm kimberly davis] [http://web.rigour.info/sony-dcrhc/ sony dcr-hc 30e] [http://italy.rigour.info/blog/il-buio.htm il buio nella mente] [http://here.rigour.info/blog/il-sociale.htm il sociale a sammichele di bari] [http://online.rigour.info/il-ritorno/ il ritorno di via col vento] [http://here.rigour.info/blog/e-c.htm e c m] [http://gratis.rigour.info/content/view/vacanza-appartamento.htm vacanza appartamento puglia] [http://up.rigour.info/description/lib/you-re.htm you re just too good to be true] [http://it.rigour.info/topic/dvd-sette/ dvd sette samurai speciale] [http://uve.rigour.info/blog/salome-oscar.htm salome oscar wilde] [http://pagina.rigour.info/donna-fotografia/ donna fotografia] [http://speed.rigour.info/francesco-capodacqua.htm francesco capodacqua] [http://it.rigour.info/topic/onspeed/ onspeed] [http://blog.rigour.info/web/croacia/ croacia] [http://italy.rigour.info/blog/zilliacus-emil.htm zilliacus, emil] [http://paga.rigour.info/ati-radeon/ ati radeon 9550 256mb agp] [http://load.rigour.info/view/new/sballati-al.htm sballati al college] [http://note.rigour.info/content/view/piantine-vie.htm piantine vie bologna] [http://all.rigour.info/scopate-gratis/ scopate gratis] [http://down.rigour.info/blog/guida-micheline.htm guida micheline] [http://blog.rigour.info/web/monster-rancher/ monster rancher 3] [http://speed.rigour.info/il-giusto.htm il giusto e il bene ross libri] [http://pagina.rigour.info/risme-a/ risme a3] [http://online.rigour.info/crossfire-chrysler/ crossfire chrysler] [http://up.rigour.info/description/lib/palmare-con.htm palmare con navigatore satellitare] [http://load.rigour.info/view/new/soft-dvx.htm soft dvx] [http://down.rigour.info/blog/seagate-barracuda.htm seagate barracuda 7200 7 200gb] [http://gratis.rigour.info/content/view/carta-yu.htm carta yu gi oh] [http://note.rigour.info/content/view/girls-from.htm girls from ipanema] [http://note.rigour.info/content/view/comue-di.htm comue di lerici] [http://content.rigour.info/html/la-vie/ la vie zero] [http://blog.rigour.info/web/minolta-dual/ minolta dual iv] [http://paga.rigour.info/www-footlocker/ www footlocker europe it] [http://web.rigour.info/la-dottoressa/ la dottoressa ci sta con il colonnello] [http://speed.rigour.info/madura.htm madura] [http://load.rigour.info/view/new/dominazione.htm dominazione] [http://all.rigour.info/ricetta-bimby/ ricetta bimby] [http://note.rigour.info/content/view/tv-y.htm tv y novelas] [http://down.rigour.info/blog/cellulari-lg.htm cellulari lg foto] [http://all.rigour.info/domaggiore/ domaggiore] [http://uve.rigour.info/blog/vacanze-isole.htm vacanze isole tremiti] [http://blog.rigour.info/web/digicom-isdn/ digicom isdn] [http://uve.rigour.info/blog/ramona-cheorleu.htm ramona cheorleu] [http://blog.rigour.info/web/seno/ seno 7] [http://note.rigour.info/content/view/video-jakass.htm video jakass] [http://milano.rigour.info/brigitta-boobs.htm brigitta boobs] [http://dir.rigour.info/la-banda/ la banda della uno bianca] [http://italy.rigour.info/blog/horror-games.htm horror games italiani] [http://content.rigour.info/html/dibujos-tridimensionales/ dibujos tridimensionales] [http://content.rigour.info/html/brother/ brother 931] [http://up.rigour.info/description/lib/seconda-prova.htm seconda prova maturita liceo classico eu] [http://all.rigour.info/nikon/ nikon 35-70] [http://note.rigour.info/content/view/amped-.htm amped 2 in immagini] [http://guest.rigour.info/library/html/fotografia-paesaggio/ fotografia paesaggio] [http://all.rigour.info/filmes/ filmes] [http://here.rigour.info/blog/hp.htm hp 335] [http://note.rigour.info/content/view/pavement.htm pavement] [http://paga.rigour.info/blaukpunt-san/ blaukpunt san remo] [http://uve.rigour.info/blog/ragazza-tettona.htm ragazza tettona] [http://it.rigour.info/topic/display/ display 7250] [http://it.rigour.info/topic/propiedades-de/ propiedades de la resta] [http://content.rigour.info/html/codici-sblocco/ codici sblocco motorola 835] [http://down.rigour.info/blog/ticket-sanitari.htm ticket sanitari] [http://note.rigour.info/content/view/scopate-tra.htm scopate tra froci e froci] [http://blog.rigour.info/web/fucile-pesca/ fucile pesca] [http://speed.rigour.info/eskisehir-provincia.htm eskisehir (provincia)] [http://pagina.rigour.info/serial-clone/ serial clone dvd] [http://it.rigour.info/topic/jimsplace/ jimsplace] [http://pagina.rigour.info/il-lancio/ il lancio del pinguin 4] [http://speed.rigour.info/studio-deli.htm studio deli] [http://down.rigour.info/blog/stryker-it.htm stryker it] [http://speed.rigour.info/centri-estetici.htm centri estetici roma] [http://all.rigour.info/spyder/ spyder] [http://uve.rigour.info/blog/casta-bikini.htm casta bikini] [http://all.rigour.info/nice-ass/ nice ass] [http://load.rigour.info/view/new/viaccess-cod.htm viaccess cod] [http://italy.rigour.info/blog/hli.htm hli] [http://up.rigour.info/description/lib/stampante-ink.htm stampante ink jet picturemate] [http://speed.rigour.info/tesine-gratis.htm tesine gratis per esame di stato] [http://load.rigour.info/view/new/ufficio-alessandria.htm ufficio alessandria] [http://note.rigour.info/content/view/www-rotin.htm www rotin com] [http://gratis.rigour.info/content/view/gigione-video.htm gigione video] [http://uve.rigour.info/blog/potemkin.htm potemkin] [http://all.rigour.info/cd-di/ cd di blue] [http://speed.rigour.info/ragazze-da.htm ragazze da conoscere] [http://milano.rigour.info/maglia-a.htm maglia a v] [http://it.rigour.info/topic/a-riva/ a riva o arriva] [http://online.rigour.info/notebook-processore/ notebook processore centrino] [http://paga.rigour.info/processori-amd/ processori amd sempron 2600] [http://uve.rigour.info/blog/calzature-bambini.htm calzature bambini] [http://guest.rigour.info/library/html/lettori-dvd/ lettori dvd combinato] [http://down.rigour.info/blog/laboratori-sit.htm laboratori sit] [http://paga.rigour.info/outtrigger/ outtrigger] [http://here.rigour.info/blog/fisting-vaginale.htm fisting vaginale] [http://uve.rigour.info/blog/otago.htm otago] [http://web.rigour.info/terrorismo-in/ terrorismo in italia] [http://online.rigour.info/pod/ pod] [http://here.rigour.info/blog/guggle.htm guggle] [http://dir.rigour.info/tafi/ tafi] [http://milano.rigour.info/monster-dog.htm monster dog] [http://web.rigour.info/aliyan/ aliyan] [http://gratis.rigour.info/content/view/costruzioni-lego.htm costruzioni lego duplo] [http://pagina.rigour.info/http-bb/ http b2b private com] [http://guest.rigour.info/library/html/una-voce/ una voce nel tuo cuore] [http://web.rigour.info/duplicazione-cd/ duplicazione cd dvd] [http://dir.rigour.info/logitech-harmony/ logitech harmony 885] [http://dir.rigour.info/trapani-venezia/ trapani venezia biglietti aerei] [http://up.rigour.info/description/lib/bomboniera-milano.htm bomboniera milano] [http://dir.rigour.info/kristiania/ kristiania] [http://speed.rigour.info/volo-low.htm volo low cost napoli] [http://up.rigour.info/description/lib/rosetana-calcio.htm rosetana calcio] [http://dir.rigour.info/televisore-plasma/ televisore plasma lg 60] [http://web.rigour.info/dvd-r/ dvd r tdk printable] [http://it.rigour.info/topic/defender-castle/ defender castle] [http://milano.rigour.info/fisio-cellulari.htm fisio cellulari philips] [http://dir.rigour.info/top/ top 10 1985] [http://web.rigour.info/cristina-aguileira/ cristina aguileira] [http://blog.rigour.info/web/cuore-prigioniero/ cuore prigioniero] [http://gratis.rigour.info/content/view/laisse-parler.htm laisse parler les gens] [http://it.rigour.info/topic/pizzerie-firenze/ pizzerie firenze] [http://it.rigour.info/topic/collesano/ collesano] [http://here.rigour.info/blog/sharp-xvz.htm sharp xv-z90] [http://online.rigour.info/nh-amsterdam/ nh amsterdam centre hotel] [http://up.rigour.info/description/lib/voi-assassini.htm voi assassini] [http://online.rigour.info/foto-di/ foto di loredana lecciso] [http://here.rigour.info/blog/fiat-bravo.htm fiat bravo 1.9] [http://blog.rigour.info/web/www-laurax/ www laurax com] [http://load.rigour.info/view/new/la-casa.htm la casa sperduta nel parco] [http://content.rigour.info/html/black-e/ black e decker cycl] [http://paga.rigour.info/ramundo-maria/ ramundo maria grazia] [http://online.rigour.info/celebrita-nuda/ celebrita nuda] [http://dir.rigour.info/gazzettino-il/ gazzettino, il-] [http://online.rigour.info/filamti/ filamti] [http://web.rigour.info/camera-innsbruck/ camera innsbruck] [http://uve.rigour.info/blog/scattava.htm scattava] [http://guest.rigour.info/library/html/herson/ herson] [http://web.rigour.info/www-regione/ www regione] [http://online.rigour.info/finanziamento-genova/ finanziamento genova] [http://load.rigour.info/view/new/nina-sky.htm nina sky y feat jabba] [http://load.rigour.info/view/new/the-lion.htm the lion sleeps tonight] [http://milano.rigour.info/spartiti-forza.htm spartiti forza venite gente] [http://milano.rigour.info/riparare-il.htm riparare il gommone] [http://blog.rigour.info/web/fbi-contro/ f.b.i. contro cosa nostra] [http://load.rigour.info/view/new/coppia-marche.htm coppia marche 45 39] [http://content.rigour.info/html/celeron-j/ celeron 330j] [http://gratis.rigour.info/content/view/lapiedra-de.htm lapiedra de anamara] [http://load.rigour.info/view/new/canon-f.htm canon 4200f scanner] [http://note.rigour.info/content/view/pc-so.htm pc2100 so dimm] [http://gratis.rigour.info/content/view/obsesin.htm obsesin] [http://gratis.rigour.info/content/view/webcam-usb.htm webcam usb con microfono] [http://content.rigour.info/html/online-personal/ online personal ads] [http://dir.rigour.info/ma-ia/ ma ia hii] [http://load.rigour.info/view/new/lettore-memory.htm lettore memory card sd] [http://uve.rigour.info/blog/ente-turismo.htm ente turismo spagnolo] [http://here.rigour.info/blog/us-and.htm us and them] [http://down.rigour.info/blog/marcello-benedetto.htm marcello benedetto] [http://note.rigour.info/content/view/sigla-campioni.htm sigla campioni nel cuore] [http://note.rigour.info/content/view/temi-svolti.htm temi svolti olocausto] [http://web.rigour.info/new-car/ new car prices] [http://online.rigour.info/gps-vista/ gps vista] [http://it.rigour.info/topic/hoover-lavatrici/ hoover lavatrici 3 kg] [http://down.rigour.info/blog/dfi-lanparty.htm dfi lanparty nf4 ultra d] [http://load.rigour.info/view/new/il-cavaliere.htm il cavaliere misterioso film dvd] [http://content.rigour.info/html/laurea-ringraziamenti/ laurea ringraziamenti] [http://here.rigour.info/blog/la-bella.htm la bella americana] [http://guest.rigour.info/library/html/escort-women/ escort women] [http://blog.rigour.info/web/jolly-mask/ jolly mask] [http://here.rigour.info/blog/volo-low.htm volo low cost milano new york] [http://blog.rigour.info/web/mercedes-livorno/ mercedes livorno] [http://web.rigour.info/borsa-per/ borsa per dvd in alluminio] [[Category:SIG|Overlap]]
+
[[Category:SIG|Overlap]]
  
 
== Introduction ==
 
== Introduction ==

Revision as of 15:36, 29 August 2007


Introduction

The goal of the TEI Overlapping Markup SIG is to bring together users of the TEI who are acutely interested in issues of multiple hierarchies and in particular handling those in XML. It will do this by:

  1. running a mailing list about overlapping hierarchies and solutions to encoding them
  2. assess the TEI and suggest improvements and alterations to the TEI-Council

The SIG is convened by Dot Porter (dporter@uky.edu). If you have developed an approach to overlapping markup, you'd like to comment on existing approaches below, or if you would like to add a citation to the bibliographies, please feel free to log into the Wiki and add you contributions. The SIG runs a mailing list on this topic. To join visit


Approaches to Handling Overlapping XML Markup

Multiple Hierarchies

The TEI P4 Guidelines provide a chapter that discusses some ways to deal with markup that is not hierarchical (Chapter 31, "Multiple Hierarchies", http://www.tei-c.org/P4X/NH.html). Specific problems mentioned in that chapter include many that should be familiar to even the most basic user of TEI markup:

  • in narrative, a speech by a character may begin in the middle of a paragraph and continue for several more paragraphs
  • in a verse text, the encoder may need to tag both the formal structure of the verse (its stanzas and lines) and its syntactic structures (which sometimes nest within the metrical structure and sometimes cross metrical boundaries)
  • in any kind of text, the encoder may wish to record the physical structure of volume, page, column, and line, as well as the formal or logical structure of chapters and paragraphs or acts and scenes, etc.
  • in verse drama, the structure of acts, scenes, and speeches often conflicts with the metrical structure
  • in any kind of text, an embedded text (e.g. a play within a play, or a song) may be interrupted by other matter; the encoder may wish to establish explicitly the logical unity of the embedded material (e.g. to identify the song as a single song, and to mark its internal formal structure)
  • in a dictionary, different types of information (e.g. orthography, syllabification, and hyphenation) may be combined within a single notation; the encoder may wish both to preserve the presentation of the material in the source text and to disentangle the logically distinct pieces of information in the interests of more convenient processing of the lexical information

Below are some approaches for using multiple hierarchies in XML, both for encoding them and for processing them.


Kentucky GODDAG

Bibliography:

ABSTRACT: This document provides semantics of the Extended XPath language (EXPath) for Concurrent Markup Hierarchies (CMH).


ABSTRACT: XPath is a language for addressing parts of an XML document. It is used in many XML query languages and it can be used by itself for querying XML documents. While XPath is, in general, efficient for querying individual XML documents, it lacks the features for querying over collections of documents or joining parts of the same document.

As the amount of complex document-centric XML data is continually increasing, querying such documents has drawn surprisingly little attention. We propose an XPath axes extension to deal with querying collections of document-centric XML documents sharing the same content (called concurrent XML). The algorithms we propose to evaluate the extended axes work in linear time combined complexity (number of documents and total size of documents).


ABSTRACT: The problem of concurrent markup hierarchies in XML encodings of documents has attracted attention of a number of humanities researchers in recent years. The key problem with using concurrent hierarchies to encode documents is that markup in one hierarchy is not necessarily well-formed with respect to the markup in another hierarchy. Previously proposed solutions to this problem rely on the XML expertise of the editors and their ability to maintain correct DTDs for complex markup languages. In this paper, we approach the problem of maintenance of concurrent XML markup from the Computer Science perspective. We propose a framework that allows the editors to concentrate on the semantic aspects of the encoding, while leaving the burden of maintaining XML documents to the software. The paper describes the formal notion of the concurrent markup languages and the algorighms for automatic maintenance of XML documents with concurrent markup.

HORSE

Bibliography:

INTRODUCTION: "Overlap" describes cases where some markup structures do not nest neatly into others, such as when a quotation starts in the middle of one paragraph and ends in the middle of the next. OSIS [Duru03], a standard XML schema for Biblical and related materials, has to deal with extreme amounts of overlap. The simplest is book/chapter/verse and book/story/paragraph hierarchies that pervasively diverge; but many types of overlap are more complicated than this.

The basic options for dealing with overlap in the context of SGML [ISO 8879] or XML [Bray98] are described in the TEI Guidelines [TEI]. I summarize these with their strengths and weaknesses. Previous proposals for expressing overlap, or at least kinds of overlap, don't work well enough for the severe and frequent cases found in OSIS. Thus, I present a variation on TEI milestone markup that has several advantages, though it is not a panacea. This is now the normative way of encoding non-hierarchical structures in OSIS documents.

Citations:

[Bray98] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. W3C Recommendation 10-February-1998. [Duru03] Patrick Durusau and Steven J. DeRose. "OSIS: A Users' Guide to the Open Scripture Information Standard." Bible Technologies Group, 2003. [ISO 8879] International Organization for Standardization. 1986. ISO 8879: 1986(E). Information Processing: Text and Office Information Systems: Standard Generalized Markup Language. [TEI] Michael Sperberg-McQueen and Lou Burnard (eds). Technical Topics: Multiple Hierarchies. Chapter 31 in the TEI Guidelines for Electronic Text Encoding and Interchange. http://xml.coverpages.org/teichap31.html


  • S. Bauman, "TEI HORSEing around: Handling overlap using the Trojan Horse method" Presentation at Extreme Markup 2005 (Link to be added following the conference)

ABSTRACT: The Text Encoding Initiative’s typed segment-boundary delimiter method is only one of several proposed mechanisms for handling overlap in TEI documents. HORSE (aka CLIX) defines a method by which an XML element is used normally when possible and as an improved version of the typed segment-boundary delimiter method when an overlap problem is encountered. A significant portion of the rules necessary for validation of HORSE markup can be expressed using Schematron. This, combined with an utter hack that can "HORSEify" the declaration of elements in a TEI Relax NG grammar, can provide a potential significant step forward in handling overlap in TEI documents.

Just-In-Time-Trees

Segment Trees

Bibliography:

  • J. W. Jaromczyk, et al. "A web interface to image-based concurrent markup using image maps." Proceedings, 6th ACM International Workshop on Web Information and Data Management (WIDM 2004), November 12-13, 2004, Washington, DC.
  • J. W. Jaromczyk, et al. "On Visualization of Complex Image-Based Markup," Proceedings, International Conference on Computer Vision and Geometry. Warsaw, Poland, September 2004.
  • J. W. Jaromczyk and N. Moore, "Geometric data structures for multihierarchical XML tagging of manuscripts," Proceedings of the 20th European Workshop on Computational Geometry, Seville, Spain, March 2004.


Treespaces

From Peter van Hardenberg, University of Victoria (pvh@uvic.ca)

In brief, a treespace document is a document which contains multiple XML documents within it. Treespaces are each, when viewed alone, valid XML documents, and include some syntax for assigning tags to a particular tree. Tags from differing trees may nest however is convenient.

The notion of the treespace is akin to that of the namespace. Just as a single DTD cannot encompass the full range of desired documents (particularly documents containing fragments from multiple sources), a single nesting tree cannot encode all documents. In essence, treespaces are a continuation of the OHCO hypotheses of Allen Renear. He proposes (with OHCO-3) that a document can be decomposed into multiple hierarchies, each one describing a "view" of a document. Unfortunately, this does not provide a conceptual mechanism for dealing with documents that may have overlapping trees combined from various structures.

To resolve this problem, trees must be considered to apply to a span of a document. This, in essence, creates a hybrid spanning/nesting model. Each tree is susceptible to the standard queries of any XML document and can have a DTD or Schema applied to it individually.

More work is necessary to determine useful extensions for determining relationships between trees.

Similarly, a suitably elegant syntax has not yet been developed.

Implementation of a "treespace" document structure is relatively easy aside from the above caveats. All parsers maintain a stack of "open" tags which are used for validation purposes. To extend an existing parser to test wellformedness of a treespace document requires maintaining a seperate tag-context for each tree. No support for relating trees has been considered at this time -- each tree stands alone in the current model.

Other Approaches to Concurrent Hierarchies

ABSTRACT: The implementation of concurrent markup by Durusau and O'Donnell (Extreme Markup 2001) relies upon related but separate principles. First, markup, commonly described in tree notation, is actually metadata about PCDATA. Second, the membership of any "atom of PCDATA" in a given hierarchy can be recorded as metadata for that PCDATA. These two principles have allowed the authoring and querying of overlapping hierarchies using standard XML software.

This presentation moves beyond the use of text snippets to illustrate overlapping hierarchies and applies the authors' technique to one of the classics of Western literature, John Milton's Paradise Lost. This research has resulted in the first release of overlapping texts for experimentation on overlapping hierarchies and in a firmer theoretical foundation for current and future research on this topic.


ABSTRACT: XML has a tree-structued data model, which is used to uniformly represent structured as well as semi-structured data, and also enable concise query specification in XQuery, via the use of its XPath (twig) patterns. This in turn can leverage the recently developed technology of structural join algorighms to evaluat the query efficiently. In this paper, we identify a fundamental tension in XML data modeling: (1) data represented as deep trees (which can make effective use of twig patterns) are often un-normalized, leading to update anomalies, while (ii) normalized data tends to be shallow, resulting in heavy use of expensive value-based joins in queries.

Our solution to this data modeling problem is a novel multi-colored trees (MCT) logical data model, which is an evolutionary extension of the XML data model, and permits trees with multi-colored nodes to signify their participation in multiple hierarchies. This adds significant semantic structure to individual data nodes. We extend XQuery expressions to navigate between structurally related nodes, taking color into account, and also to create new colored trees as restructurings of an MCT database. While MCT serves as a significant evolutionary extension to XML as a logical data model, one of the key roles of XML is for information exchange. To enable exchange of MCT information, we develop algorighms for optimally serializing an MCT database as XML. We discuss alternative physical representations for MCT databases, using relations and native XML databases, and describe an implementation on top of the Timber native XML database. Experimental evaluation, using our prototype implementation, shows that not only are MCT queries/updates more succinct and easier to express than equivalent shallow tree XML queries, but they can also be significantly more efficient to evaluate than equivalent deep and shallow tree XML queries/updates.


ABSTRACT: An approach to the unification of XML (Extensible Markup Language) documents with identical textual content and concurrent markup in the framework of XML-based multi-layer annotation is introduced. A Prolog program allows the possible relationships between element instances on two annotation layers that share PCDATA to be explored and also the computing of a target node hierarchy for a well-formed, merged XML document. Special attention is paid to identity conflicts between element instances, for which a default solution that takes into account metarelations that hold between element types on the different annotation layers is provided. In addition, rules can be specified by a user to prescribe how identity conflicts should be solved for certain element types.

Other Approaches to Overlapping XML Markup (not concurrent hierarchies)

Non-XML Approaches

Layered Markup Annotation Language

LMNL, pronounced liminal: "an experimental approach to digital text encoding that supports, in SGML/XML terms, overlapping elements (ranges in LMNL) and structured attributes (annotations in LMNL)."
Project website includes a tutorial and much other informative material: http://www.lmnl.net/index.html

TexMECS

C. Huitfeldt and C. M. Sperberg-McQueen, "TexMECS: An experimental markup meta-language for complex documents", 25 January 2001, rev. 17 February 2001 (http://helmer.aksis.uib.no/claus/mlcd/papers/texmecs.html)


CoNLL-2005 Shared Task format

Shared Task Chairs: Xavier Carreras and Lluís Màrquez http://www.lsi.upc.edu/~srlconll/examples.html


Other Non-XML Approaches

Bibliography:

  • M. Hilbert et al. "Making CONCUR work," Presentation at Extreme Markup 2005 (Link to be added following the conference)

ABSTRACT:

The SGML feature CONCUR allowed for a document to be simultaneously marked up in multiple conflicting hierarchical tagsets but validated and interpreted in one tagset at a time. Alas, CONCUR was rarely implemented, and XML does not address the problem of conflicting hierarchies at all. The MuLaX document syntax is a non-XML syntax that enables multiply-encoded hierarchies by distinguishing different “layers” in the hierarchy by adding a layer ID as a prefix to the element names. The IDs tie all the elements in a single hierarchy together in an “annotation layer”. Extraction of a single annotation layer results in a well-formed XML document, and each annotation layer may be associated with an XML schema. The MuLaX processing model developed works on the nodes of one annotation layer at a time. Furthermore, an alternative processing model is proposed which uses a multi-rooted trees approach. CONCUR lives!

General Bibliography

  • S. J. DeRose et al. (1990), 'What is Text, Really?', Journal of Computing in Higher Education, 1.2: 3-26.

Full-text available through ACM Portal (subscription only): http://portal.acm.org/citation.cfm?doid=264842.264843

Abstract: The way in which text is represented on a computer affects the kinds of uses to which it can be put by its creator and by subsequent users. The electronic document model currently in use is impoverished and restrictive. The authors argue that text is best represented as an ordered hierarchy of content object (OHCO), because that is what text really is. This model conforms with emerging standards such as SGML and contains within it advantages for the writer, publisher, and researcher. The authors then describe how the hierarchical model can allow future use and reuse of the document as a database, hypertext, or network.


Abstract: We examine the claim that 'text is an ordered hierarchy of content objects'; this thesis was affirmed by the authors, and others, in the late 1980s and has been associated with certain approaches to text processing and the encoding of literary texts. First we discuss the nature of this claim and its connection with the history of text processing and text encoding standardization projects such as SGML and the Text Encoding Initiative. We then describe how the experience of the text encoding community, as represented and codified in the TEI Guidelines, has raised difficulties for this thesis. Next we consider two progressively weaker versions of this thesis formulated in response to these difficulties. Ultimately we find that no version appears to be free from counterexample.

Although none of these formulations proves to be theoretically sound, they are nonetheless methodologically illuminating as each generalizes actual encoding practices, making explicit certain assumptions that, even though they have been fundamental to the working methodologies of most text encoding projects, have never been explicitly articulated, let alone explained or defended. The counterexamples to the different versions of the OHCO thesis also arise in actual encoding projects -- so although our focus is theoretical it is grounded in the methodology and problems of contemporary encoding practices. The problems discussed here have implications not only for text encoding and our understanding of the nature of textual communication, but raise very fundamental issues in the logic and methodology of the humanities.


  • D. Barnard et al. (1988) 'SGML-Based Markup for Literary Texts: Two Problems and Some Solutions', Computers and the Humanities 22: 265-276.


  • David Barnard, Lou Burnard, Jean-Pierre Gaspart, Lynne A. Price, Michael Sperberg-McQueen, and Giovanni Battista Varile. "Hierarchical Encoding of Text: Technical Problems and SGML Solutions." In The Text Encoding Initiative: Background and Contents. Guest Editors: Nancy Ide and Jean Veronis. Computers and the Humanities 29/3 (1995), pages 211-231. (http://xml.coverpages.org/bib-ab.html#barnardHierarchicalCHUM)