Difference between revisions of "LegacyFacsimileMarkup"
Dot.porter (talk | contribs) (→3: Techniques for Referencing Images) |
Dot.porter (talk | contribs) (→3: Techniques for Referencing Images) |
||
Line 103: | Line 103: | ||
<pb n="16"/> | <pb n="16"/> | ||
<figure n="16"> | <figure n="16"> | ||
− | <figDesc>[Page 16: 15 lines of text followed by image of "Fall of Angels" (Ohlgren 16.11)]</figDesc> | + | <figDesc>[Page 16: 15 lines of text followed by image of "Fall of Angels" |
+ | (Ohlgren 16.11)]</figDesc> | ||
<graphic url="http://image.ox.ac.uk/images/bodleian/msjunius11/16.jpg"/> | <graphic url="http://image.ox.ac.uk/images/bodleian/msjunius11/16.jpg"/> | ||
</figure> | </figure> | ||
<pb n="17"/> | <pb n="17"/> | ||
<figure n="17" url="http://image.ox.ac.uk/images/bodleian/msjunius11/17.jpg"> | <figure n="17" url="http://image.ox.ac.uk/images/bodleian/msjunius11/17.jpg"> | ||
− | <figDesc>[Page 17: 15 lines of text followed by image of "Fall of Angels" (Ohlgren 16.12)]</figDesc> | + | <figDesc>[Page 17: 15 lines of text followed by image of "Fall of Angels" |
+ | (Ohlgren 16.12)]</figDesc> | ||
<graphic url="http://image.ox.ac.uk/images/bodleian/msjunius11/17.jpg"/> | <graphic url="http://image.ox.ac.uk/images/bodleian/msjunius11/17.jpg"/> | ||
</figure> | </figure> | ||
</pre> | </pre> | ||
+ | |||
+ | |||
+ | |||
This approach is simple, though it is not practical if one has multiple images of the same page in various resolutions | This approach is simple, though it is not practical if one has multiple images of the same page in various resolutions |
Revision as of 16:49, 1 June 2006
This page includes some examples of different approaches for facsimile (image-based) markup.
Contents
Draft Recommendations for TEI Digital Facsimiles
http://users.ox.ac.uk/~lou/wip/digfax.html
This document, authored by Richard Gartner and Lou Burnard and last updated in 2001, outlines ways to use TEI to represent three types of digital objects:
- complete digital transcriptions of the content of manuscript or print originals (possibly including illustrations as well as text)
- collections of digitized page images (digital facsimiles) intended for use as surrogates for complete manuscript or print originals
- digital objects (digital editions) combining both page facsimiles and transcriptions, possibly also including layers of editorial annotation
The authors attempt to address these areas of concern:
- the need to distinguish images of a manuscript or print source from images located within it;
- the need to support multiple formats of a single image;
- the need to associate metadata at different levels (e.g. collection level, item level);
- the need to associate transcription and facsimile in a standard way;
- the need to define practices which can be used equally well in both SGML and XML environments;
- the desire to avoid special purpose rules which assume nonstandard or ad hoc processing rules.
Case 1: Transcription
Updated to cite chapters in P5:
The content of a transcription should be marked up as a single <TEI> element using the standard TEI elements <text>, <body>, <div>, etc. from the TEI core tag sets.
Follow the Guidelines chapters 18 Transcription of Primary Resources and 13 Manuscript Description.
If the source contains "significant illustrative material", use <figure> (and children) from Guidelines chapter 22 Tables, Formulae, and Graphics to insert illustrations in their proper location in the transcript.
<lb n="12"/>feond mid his geferum eallum. Feollon þa ufon <lb n="13"/>of heofnum þurhlonge swa þreo niht and da <lb n="14"/>gas, þa englas of heofnum on helle, and heo ealle <lb n="15"/>forsceop drihten to deoflum. <figure> <head>Fall of the Angels</head> <figDesc>Above, God, cross-nimbed, beardless, and holding a closed book, accompanied by three angels, turns and gestures toward three angels on the left, one of whom (Lucifer?) holds a palm frond ...</figDesc> </figure> <pb n="17"/>
Case 2: Digital Facsimile
Should be marked as a separate <TEI> element. TEI structural tags may be used, but need not be, especially if differences between textual structure and physical structure cause overlap.
Each distinct image making up the facsimile should be encoded as a <figure> element, arranged in the normal reading sequence of the facsimile. The appropriate milestone element (<pb/> or <cb/>) should be used at the appropriate place in the facsimile.
<pb n="16"/> <figure n="16"> <figDesc>[Page 16: 15 lines of text followed by image of "Fall of Angels" (Ohlgren 16.11)]</figDesc> </figure> <pb n="17"/> <figure n="17"> <figDesc>[Page 17: 15 lines of text followed by image of "Fall of Angels" (Ohlgren 16.12)]</figDesc> </figure>
<figure> elements may be self-nested to show that one image logically contains others (in cases where two image fit together to form one larger image).
<figure n="16"> <figDesc> <figure> <figDesc>[Page 16: 15 lines of text (detail of upper part)]</figDesc> </figure> <figure> <figDesc>[Page 16: Image of "Fall of Angels" (Ohlgren 16.11) (detail of lower part)</figDesc> </figure> </figDesc> </figure>
3: Techniques for Referencing Images
Recommends declaring the file containing the image as an external entity and then referencing that entity using the "entity" attribute in <figure>. This approach depends on the use of the DTD (no longer practical given the TEI's move towards using the RelaxNG schema), and relies on an attribute that is no longer included in P5.
In TEI P5 there are also two new elements, <graphic> and <binaryObject> which may be used to represent image references or images respectively. These permit the encoding of figures containing multiple images, for example. The <graphic> element has a "url" attribute which enables the association of a <figure> with its corresponding image file. The recommendations in Gartner and Burnard need to be revised substantially in this respect.
<pb n="16"/> <figure n="16"> <figDesc>[Page 16: 15 lines of text followed by image of "Fall of Angels" (Ohlgren 16.11)]</figDesc> <graphic url="http://image.ox.ac.uk/images/bodleian/msjunius11/16.jpg"/> </figure> <pb n="17"/> <figure n="17" url="http://image.ox.ac.uk/images/bodleian/msjunius11/17.jpg"> <figDesc>[Page 17: 15 lines of text followed by image of "Fall of Angels" (Ohlgren 16.12)]</figDesc> <graphic url="http://image.ox.ac.uk/images/bodleian/msjunius11/17.jpg"/> </figure>
This approach is simple, though it is not practical if one has multiple images of the same page in various resolutions
<figure> <head>Figure One: The View from the Bridge</head> <figDesc>A Whistleresque view showing four or five sailing boats in the foreground, and a series of buoys strung out between them.</figDesc> <graphic url="http://www.somewhere.eu/fig1.png" scale="0.5"/> </figure>
Other options mentioned in the recommendation: Xlink or Xinclude.
4: Aligning transcription and fascimile
P5 methods for aligning TEI documents (see Chapter 14 Linking, Segmentation, and Alignment).
Method used by the Edition Production Technology (EPT): Link text and images
The EPT (now EPPT: http://beowulf.engl.uky.edu/~eft/EPPT-Demo.html) uses a system combining a Image Catalog combined with a special attribute for coordinates to enable linking between text and image. For more details, see here: http://beowulf.engl.uky.edu/~kiernan/eBoethius/tech.htm#tech.
Image Catalog
The Image Catalog is a simple XML file that consists of lists of image files organized by category. There is one category for every set of image files for a given source.
- daylight (images taken under regular lighting)
- ultraviolet (images taken under ultraviolet lighting)
- microfilm (images scanned from microfilm)
Categories are assigned unique IDs.
Within each category, the user associates each file with an ID that is unique within the category. IDs will repeat across categories. Essentially, these IDs associate the files with the same object, which may be represented by multiple images.
<category name="daylight" id="d"> <file name="aih0244-JPG.jpg" id="5v" /> <file name="aih0245-JPG.jpg" id="6r" /> <file name="aih0242-JPG.jpg" id="6v" /> </category> <category name="ultraviolet" id="u"> <file name="uv-5v.jpg" id="5v" /> <file name="uv-6r.jpg" id="6r" /> <file name="uv-6v.jpg" id="6v" /> </category> <category name="microfilm" id="m"> <file name="mf-5v.jpg" id="5v" /> <file name="mf-6r.jpg" id="6r" /> <file name="mf-6v.jpg" id="6v" /> </category>
The file IDs from image catalog are then associated with an attribute within the text markup itself. The attribute used is configured by the user; for the Electronic Boethius, we used the attribute n in the element <fol> (folio). The content for each folio is thus associated with the correct image files (from every category):
<fol n="5v">
points to
<daylight><file id="5v"/>
and
<ultraviolet><file id=“5v”/>
and
<microfilm><file id=“5v”/>
Attribute for Coordinates
To store coordinates from an image that correspond with a given piece of XML content, the EPT simply allows the user to assign an attribute (Electronic Boethius uses coords). In whatever elements that attribute is defined, the coordinates, selected by the editor, are automatically entered by the software. The ID assigned to the categories in the Image Catalog are appended to the beginning of the coordinates.
<damage coords="d:1058,738,1093,798">
Method used by the UVic Image Markup Tool: Annotate images
The UVic Image Markup Tool (http://www.tapor.uvic.ca/~mholmes/image_markup/) uses TEI, modified to allow for the inclusion of SVG, to create a system for annotating images.
Simplified example from the IMT web:
<TEI> <teiHeader> <!-- ...header content... --> </teiHeader> <text> <body> <!-- ...TEI content... --> <div type="imtAnnotatedImage"> <svg> <!-- ...svg content... --> <image> <!-- Image file is linked in here --> </image> <rect> <!-- Annotation area on the image defined here --> </rect> </svg> <div type="imtAnnotationLayer"> <!-- Annotation content goes here --> </div> </div> <!-- ...TEI content... --> </body> </text> </TEI>
Using this system, a user can annotate many sections of a single image. Unlike the EPT system described above, in which the coordinates are stored directly in the element, here the image coordinates are stored in a separate section and linked to the annotations using unique IDs (svg:id in <rect>) to link the areas to the annotation content (n in <div type="imtAnnotationLayer">).
<div type="imtAnnotatedImage" xml:id="NB-B-17328"> <svg xmlns="http://www.w3.org/2000/svg"> <image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="NB-B-17328.jpg" width="5060" height="6025"/> <title>NB-B-17328.jpg</title> <desc>5060 x 6025</desc> <rect svg:id="NB-B-17328_0" x="4287" y="1412" width="580" height="426" style="3" color="#ff0000"/> <rect svg:id="NB-B-17328_1" x="4067" y="2019" width="720" height="747" style="3" color="#ff0000"/> </svg> <div type="imtAnnotationLayer" n="NB-B-17328_0"> <head>la chandelle allume la découverte.</head> <p>Il a trouvé la cache.</p> </div> <div type="imtAnnotationLayer" n="NB-B-17328_1"> <head>le couvercle de la malle</head> <p>Il est pris sans vert</p> </div>
(TEI examples from the UVic file example: http://www.tapor.uvic.ca/~mholmes/image_markup/Amant.xml)
For more details on the TEI/SVG used by the IMT, see here: http://www.tapor.uvic.ca/~mholmes/image_markup/xml.php
Using METS to link text and image
What is METS?
- Metadata Encoding and Transmission Standard
- “a standard for encoding … metadata regarding objects within a digital library”
- Centered at the Library of Congress
- Developed out of the metadata specs from the LOC’s “Making of America 2” project (MOA2)
- Developed by librarians and archivists, for librarians and archivists
- Accepted standard with a broad user base (http://sunsite.berkeley.edu/mets/registry/)
- Unrelated to TEI
While a library may record descriptive metadata regarding a book in its collection, the book will not dissolve into a series of unconnected pages if the library fails to record structural metadata regarding the book's organization, nor will scholars be unable to evaluate the book's worth if the library fails to note that the book was produced using a Ryobi offset press.” (METS: An Overview and Tutorial)
Why METS?
- System to organize disparate parts and relate files (and pieces of files):
- Areas on images correspond with sections of text
- Areas on images relate to one another
- Areas of text relate to one another
- METS provides sections for defining the logical and/or physical structure of the digital object
- File Section (fileSec)
- Structural Map (structMap)
File Section
“lists all files containing content which comprise the electronic versions of the digital object.”
- Similar to the EPT Image Catalog
Within the FileSec
- FileSec consists of Groups of File Locaters
- File Group: contains related files
<fileGrp USE="facsimile">
- File: assigns the individual files unique identifiers
<file ID="C-Clbiv-69v" MIMETYPE="image/tiff">
- File Locater: points to the location of the file, using xlink syntax
<FLocat LOCTYPE="URN" xlink:href="images/C-Clbiv-69v.jpg"/>
Example (Images)
<fileGrp USE="facsimile"> <file ID="C-Clbiv-69v" MIMETYPE="image/tiff"> <FLocat LOCTYPE="URN" xlink:href="images/C-Clbiv-69v.tif"/> </file> <file ID="C-Clbiv-70r" MIMETYPE="image/tiff"> <FLocat LOCTYPE="URN" xlink:href="images/C-Clbiv-70r.tif"/> </file> … </fileGrp>
Example (Transcript)
<fileGrp USE="transcript"> <file ID="id-pref-genesis" MIMETYPE="text/xml"> <FLocat LOCTYPE="URN" xlink:href=“transcription/pref-genesis.xml"/> </file> <file ID="id-genesis" MIMETYPE="text/xml"> <FLocat LOCTYPE="URN" xlink:href=“transcription/genesis.xml"/> </file> <file ID="id-exodus" MIMETYPE="text/xml"> <FLocat LOCTYPE="URN" xlink:href=“transcription/exodus.xml"/> </file> … </fileGrp>
Structural Map
“outlines a hierarchical structure for the digital library object, and links the elements of that structure to content files … that pertain to each element.”
- Organization may be logical or physical
- A single METS file may contain multiple structural maps
Within the StructMap
- StructMap consists of organizational divisions, that may nest
<div LABEL="Genesis" ORDER="2"> <div LABEL="Folio 1v" ORDER="2"> ... </div> </div>
- The divisions contain pointers to the files, and areas of files, indexed in the fileSec
<div LABEL="Genesis" ORDER="2"> <div LABEL="Folio 1v" ORDER="2"> <fptr FILEID="C-Clbiv-1v"/> <fptr> <area FILEID="id-genesis" BEGIN="1v.32" END="1v.38"/> </fptr> </div> </div>
Example (Basic)
The first file pointer references the corresponding image file, while the second one references the corresponding line range (BEGIN and END) in the XML file (these are the values of, for example, @xml:id on <lb/>
<div LABEL="Folio 69v" ORDER=“139"> <fptr FILEID="C-Clbiv-69v"/> <fptr> <area FILEID="id-genesis" BEGIN="69v.1" END="69v.12"/> </fptr> </div> <div LABEL="Folio 70r" ORDER="140"> <fptr FILEID="C-Clbiv-70r"/> <fptr> <area FILEID="id-genesis" BEGIN="70r.1" END="70r.38"/> </fptr> </div>
Example (Advanced)
- <area>: reference coordinates within an image file.
- especially useful for extensively illustrated manuscripts
- create links between text and image.
- @COORDS in <area> for image files
- @BEGIN and @END in <area> for XML/TEI files
- Simplifies encoding:
- no need to define @COORDS to clutter up the TEI;
- no need to find a place to store a linking section in the TEI file
- no need to use the SVG namespace
- separate the objects (image and XML files) from the indexing of their relationships with one another
<div LABEL="Folio 69v" ORDER="139"> <div LABEL="Annotation 1" <fptr> <area FILEID="C-Clbiv-69v" COORDS="40,12,975,121"/> </fptr> <fptr> <area FILEID="id-genesis" BEGIN="69v.annot.1"/> </fptr> </div> <div LABEL="Illustration 1"> <fptr> <area FILEID="C-Clbiv-69v" COORDS="96,87979,572"/> </fptr> <fptr> <area FILEID="id-genesis" BEGIN="69v.illus.1"/> </fptr> </div> <div LABEL="Text"> <fptr> <area FILEID="C-Clbiv-69v" COORDS="71,531,978,859"/> </fptr> <fptr> <area FILEID="id-genesis" BEGIN="69v.1" END="69v.9"/> </fptr> </div> </div>
Conclusions
- Draft Recommendations for TEI Digital�Facsimiles
- Not robust
- No specific image/text linking
- Need to be updated for TEI P5
- Edition Production Technology (EPT)
- Designed for textual annotation, but can annotate illustrations as well
- Markup schema is not defined
- Complex
- Dependent on editing software
- Limited image file support
- No visualization outside of the editing environment
- UVic Image Annotation Tool
- Simple
- Annotates via description – no “text encoding” (i.e., cannot include in an electronic transcription)
- Based on accepted standards
- Markup schema is defined (extended TEI + SVG)
- Visualization through SVG-enabled browser
- METS
- Very complex
- Objects separate from indexes
- Flexible – can link many textual and image sections (Venetus A – link main text and several layers of annotations)
- No visualization (yet) for the more advanced image-text links