User talk:Martin de la Iglesia/A Guide to Images in TEI

From TEIWiki

Jump to: navigation, search

Contents

General

Suggested use of this page

The idea is to start by filling in the different chapters in Outline below. In the beginning, I think it makes sense to do this in a Discussion page, in order to avoid switching back and forth between the Article and the Discussion page all the time. After some time we can move the text from the Discussion page to my namespace, then after some more consolidation to the main wiki. Finally, we could move it to an entirely different website if we found the TEI Wiki unsuitable. However, feel free to suggest other procedures, or maybe an entirely different environment for working on this resource altogether. --Martin de la Iglesia 06:47, 2 December 2011 (EST)

Title

The title of this page (currently: "A Guide to Images in TEI") is just a working title. Feel free to suggest alternative titles. --Martin de la Iglesia 06:47, 2 December 2011 (EST)

Related resources

We should think about how this resource relates to TEI to SVG (and maybe other relevant pages/resources), and whether and how this information should be integrated here. --Martin de la Iglesia 06:47, 2 December 2011 (EST)

Is there a need for a TEI Cheatsheet for images? I imagine such a cheatsheet as a more concise and practical version of this Guide. --Martin de la Iglesia 07:59, 5 August 2013 (EDT)

Outline

Aims and Scope

We should have a text here which explains what this resource is for and how it differs from the TEI Guidelines. In the meantime, I paste my announcement e-mail to the SIG Text & Graphics mailing list from December 1, 2011:

[...]
I learned in W├╝rzburg that some other SIGs are planning to create specialized tutorials that cover aspects relevant to the respective SIG's scope which are not already covered by general introductory TEI resources like TEI By Example. This sounds like a good idea for our SIG, too. I'm aware that, basically, the solutions are all in the TEI Guidelines somewhere, but as we all know, it can be quite tough to make encoding decisions by only consulting the Guidelines. This is especially true for the encoding of graphics-related information, which, as far as I have seen, is not well covered by TEI By Example.
In addition to the examples given by John, I'd like this new FAQ/Tutorial to tackle basic questions such as
- how to encode the position of images on a page
- how to link images to text
- how to align images with portions of text or vice versa
- how to encode descriptions of images or other image metadata
etc.
So what I'm envisioning is a sort of beginner's guide to handling images in TEI, either as a FAQ (i.e. self-contained chapters) or a step-by-step Tutorial (i.e. "what to do when you encounter an image in your text"). As John already mentioned, integration into TEI By Example might also be a possibility (although I haven't discussed this idea with the TEI By Example people yet).
[...]

--Martin de la Iglesia 04:11, 5 December 2011 (EST)

Basic Integration of Images/Linking to Images

John (Walsh) suggested covering the <figure> tag. Additionally, it might be useful to have a general discussion on image types (binary vs. SVG vs. characters (i.e. trying to find matching Unicode characters for special symbols instead of integrating them as images)) and the different encoding practices they imply. --Martin de la Iglesia 04:18, 5 December 2011 (EST)

Examples

Here's an example of different encodings for the same image. The image I'd like to represent is the wedge-shaped marking that indicates that the word "war" is to be inserted into the line below.

Fontane-30633-detail.jpg

  • binary: the marking is extracted from the scan and provided as a separate file which is pointed to in the TEI code and displayed at the appropriate position in the transcribed text.
Fontane-30633-detail2.png
<figure><graphic url="myfilename.png"/></figure>

pro: representing an image through a digital image results in probably the strongest resemblance to the original image. con: it is quite a lot of work to extract the image from the scan; the TEI code provides hardly any semantics or searchability (except for the filename, or additional information in <figure>); the image can be difficult to align and position, especially if other reference points than the upper left corner of the image are to be used (see according sections) - anchor points generated by e.g. the Text Image Link Editor in TextGrid might provide a workaround for this problem.

  • SVG: an SVG drawing is designed that resembles the original image. Simplified measurements, obtained from measuring the image on the original page or its facsimile, are provided in the TEI code, from which the SVG is created by a transformation stylesheet.
<svg xmlns="http://www.w3.org/2000/svg" width="0.9cm" height="1em">
   <line x1="0" y1="0" x2="0.3cm" y2="1em" style="stroke:black; stroke-width:2px;" />
   <line x1="0.3cm" y1="1em" x2="0.9cm" y2="0" style="stroke:black; stroke-width:2px;" />                   
</svg>
<figure rend="marking_V(left:0.3cm;right:0.6cm)"/>

This TEI code indicates that it's a "V"-shaped marking, with a left side of 3 mm width and a right side of 6 mm width. (The height is fixed at 1 em.) From these measurements, the SVG code above is created during transformation.

--Martin de la Iglesia 06:27, 5 January 2012 (EST)

Examples of different methods for other images
  • direct extraction from facsimile: this only works when writing and image are not as closely intertwined as in the example above. If coordinates are provided, the image can be extracted from the facsimile (the location of which is provided in <surface @facs>) on transformation and placed correctly in between the transcription of the surrounding writing.
<zone ulx="123" uly="234" lrx="345" lry="456">
  <figure>...</figure>
</zone>

These coordinates represent pixels in the facsimile file. --Martin de la Iglesia 08:11, 5 August 2013 (EDT)

  • simple line drawings: coordinates in <zone> provide information on the position, length and angle of lines such as horizontal rules that divide sections of text.
<zone ulx="1.2" uly="6.7" lrx="5.6" lry="6.9">
  <figure type="line_drawing"/>
</zone>

These coordinates describe a horizontal line of about 4 1/2 centimetres length with a slight slope, which can be drawn e.g. in SVG on transformation. con: the difference between the @lry and @uly values is ambiguous, as it can be interpreted as either width or angle. The respective missing value should be provided as well, e.g. in <figure @rend> or <zone @rotate>. --Martin de la Iglesia 08:30, 5 August 2013 (EDT)

Linking to Facsimile Page Images

This is another topic suggested by John ("using <facsimile> and @facs to link to facsimile page images"). --Martin de la Iglesia 04:23, 5 December 2011 (EST)

You might draw upon this section of the Best Practices for TEI in Libraries. (Kshawkin 08:26, 2 December 2011 (EST))

Thanks, Kevin. I wasn't aware of that section. There's one thing I don't understand about the third approach: in the sample METS document, which is the ID that is used in @xml:id in the TEI code? --Martin de la Iglesia 05:35, 7 December 2011 (EST)
Hmm, I agree that the recommendation isn't at all clear, and I actually don't know METS very well. I've made a note to revisit this in a future revision to the Best Practices. In the meantime, if you know METS well, you might look at the sample METS document linked to from the Best Practices. If that doesn't clarify, I'd contact my coeditor Michelle Dalmau: I think she will know. --Kshawkin 10:04, 12 December 2011 (EST)
Thanks again. Michelle explained it to me and now I see the usefulness of this method, at least in some projects. --Martin de la Iglesia 09:47, 21 December 2011 (EST)

Positioning an Image

What I'd like to see here is a discussion of the <surface>/<zone> coordinate positioning system vs. <ptr>-like positioning solutions, and maybe of other systems as well (e.g. lines and margin for y and x axis positioning). --Martin de la Iglesia 04:28, 5 December 2011 (EST)

Examples

  • metric margins in CSS style for x and y axis starting points, counting from leaf borders (height and width are determined by the figure itself):
    <figure style="margin-left:1.2cm; margin-top:3.4cm">...</figure>
  • CSS style margins in percent:
    <figure style="margin-left:12%; margin-top:34%">...</figure>
  • CSS style margins in lines for y axis (not valid CSS, but line alignment can be important in some cases):
    <figure style="margin-left:1.2cm" rend="margin-top:7lines">...</figure>
    (i.e. the image starts in line 8)
  • added height and width:
<figure style="margin-left:1.2cm; margin-top:3.4cm">
   <figDesc>
      <dimensions>
         <height quantity="123" unit="mm"/>
         <width quantity="75" unit="mm"/>
      </dimensions>
   </figDesc>
</figure>

--Martin de la Iglesia 05:21, 4 January 2012 (EST)

On the use of CSS in @rend, see this discussion. (Kshawkin 18:24, 6 January 2012 (EST))
Thanks, Kevin, I wasn't aware of that. See my comment there. --Martin de la Iglesia 04:18, 10 January 2012 (EST)
Alright, so we should add that there might be a problem with whitespaces in CSS style values, and that @html:style can be used as an alternative to @rend. --Martin de la Iglesia 04:03, 16 January 2012 (EST)
Substituted @rend with @style where applicable. --Martin de la Iglesia 05:19, 2 August 2013 (EDT)
  • coordinates in <zone> (or <surface> where applicable); coordinate values follow the system employed in the TEI document (here: centimeters):
<zone ulx="1.2" uly="0.5" lrx="3.5" lry="7.1">
  <figure>...</figure>
</zone>

--Martin de la Iglesia 07:39, 5 August 2013 (EDT)

Image-Text Linking

This section should cover the encoding of different kinds of image-text relationships (e.g. an image illustrating a text paragraph, or a text paragraph describing an image), other than merely topographical ones which are covered in the following section, Image-Text Alignment. --Martin de la Iglesia 04:40, 5 December 2011 (EST)

  • using <zone> elements to associate text with images: as the example in the next section, this example refers to the image pictured in [1] in the TEI Guidelines:
<zone>67. We now give some examples, from the works of the great masters, of some of the most frequently used bowings.</zone>
<zone>
  <figure>...</figure>
  <zone>Ex. 8. Andante con moto.</zone>
  <zone>SCHUBERT : Symphony in B minor.</zone>
  <zone>pp</zone>
  <zone>fp</zone>
</zone>

In this case, the second <zone> element which contains both the <figure> element and other <zone> elements with text suggest that the text and the image belong together. --Martin de la Iglesia 08:58, 5 August 2013 (EDT)

Image-Text Alignment

How to position text relative to an image, or an image relative to text (including special cases such as an image pointing at a character, or an image pointing at two separate characters) should be covered here. --Martin de la Iglesia 04:43, 5 December 2011 (EST)

  • Here's an example of the latter case: in the following part of a scan from a 19th century German manuscript, a line connects two separate paragraphs. I'd be happy to represent this in the transcript as a straight SVG line that goes roughly from the word "Nissen" to "wird", but how to encode (and XSL-transform) this?

Fontane-30621-detail.jpg

--Martin de la Iglesia 07:33, 21 December 2011 (EST)

  • line numbers for images - this is an issue in editions in general and not specific to TEI, but still: if there's an image spanning the whole width of a page, and which is preceded and followed by writing, how does it fit in an editorial line numbering system? Should the image have its own line number, or should the image be ignored when numbering the regular text lines? --Martin de la Iglesia 07:53, 5 August 2013 (EDT)
  • This example of positioning transcribed writing within an image using coordinate attributes in <zone> refers to the image pictured in [2] in the TEI Guidelines:
<zone ulx="18" uly="50" lrx="489" lry="118">
  <figure>...</figure>
  <zone ulx="29" uly="51">Ex. 8. Andante con moto.</zone>
  <zone ulx="298" uly="50">SCHUBERT : Symphony in B minor.</zone>
  <zone ulx="66" uly="106">pp</zone>
  <zone ulx="341" uly="106">fp</zone>
</zone>

(coordinate values in pixels) --Martin de la Iglesia 08:47, 5 August 2013 (EDT)

Image Metadata

I'm thinking primarily of content description texts, but we could embed this in a general discussion of how (or whether) to encode any "invisible" aspects of images. --Martin de la Iglesia 04:34, 5 December 2011 (EST)

  • classification in @type:
<figure type="floor_plan">...</figure>

Are there any suitable vocabularies that could be used for this kind of indexing? --Martin de la Iglesia 07:47, 5 August 2013 (EDT)

Personal tools