Difference between revisions of "Whitespace"

From TEIWiki
Jump to navigation Jump to search
(Beginning to write the article.)
 
(Where XML Considers Whitespace to be Significant)
Line 9: Line 9:
 
In XML documents, some whitespace is significant, some is not. For example, inside the brackets that mark XML elements extra whitespace is not significant. For any program processing these as pieces of XML,
 
In XML documents, some whitespace is significant, some is not. For example, inside the brackets that mark XML elements extra whitespace is not significant. For any program processing these as pieces of XML,
  
     <span style="background-color:lightblue">&lt;title type="main"&gt;</span>
+
     <span style="background-color:#C3E6FC">&lt;title type="main"&gt;</span>
  
 
and
 
and
  
     <span style="background-color:lightblue">&lt;title&nbsp;&nbsp;&nbsp;&nbsp;    type =  "main"  &gt;</span>
+
     <span style="background-color:#C3E6FC">&lt;title&nbsp;&nbsp;&nbsp;&nbsp;    type =  "main"  &gt;</span>
  
 
are the same. There is no significance to the extra space. By XML rules, no application that processes the data in this XML file (as XML and not just as text) is allowed to treat these two representations differently. A person or computer editing this file is free to use either one, based merely on readability and aesthetics. The fact that there is whitespace between <span style="background-color:lightblue"><tt>title</tt></span> and <span style="background-color:lightblue"><tt>type</tt></span> is significant, but how much or of what kind (space characters, tabs, carriage returns, new lines) is not significant. The space between <span style="background-color:lightblue"><tt>type</tt></span> and <span style="background-color:lightblue"><tt>=</tt></span> is not significant.
 
are the same. There is no significance to the extra space. By XML rules, no application that processes the data in this XML file (as XML and not just as text) is allowed to treat these two representations differently. A person or computer editing this file is free to use either one, based merely on readability and aesthetics. The fact that there is whitespace between <span style="background-color:lightblue"><tt>title</tt></span> and <span style="background-color:lightblue"><tt>type</tt></span> is significant, but how much or of what kind (space characters, tabs, carriage returns, new lines) is not significant. The space between <span style="background-color:lightblue"><tt>type</tt></span> and <span style="background-color:lightblue"><tt>=</tt></span> is not significant.
Line 19: Line 19:
 
Whitespace can be significant, however, in the content of an element. For example,
 
Whitespace can be significant, however, in the content of an element. For example,
  
     <span style="background-color:lightblue">&lt;name&gt;MaryAnn&lt;/name&gt;</span>
+
     <span style="background-color:#C3E6FC">&lt;name&gt;MaryAnn&lt;/name&gt;</span>
  
 
and
 
and
  
     <span style="background-color:lightblue">&lt;name&gt;Mary Ann&lt;/name&gt;</span>  
+
     <span style="background-color:#C3E6FC">&lt;name&gt;Mary Ann&lt;/name&gt;</span>  
  
 
are different because of that space between <span style="background-color:lightblue"><tt>Mary</tt></span> and <span style="background-color:lightblue"><tt>Ann</tt></span>, and any program reading this element in an XML file is obliged to maintain the distinction.
 
are different because of that space between <span style="background-color:lightblue"><tt>Mary</tt></span> and <span style="background-color:lightblue"><tt>Ann</tt></span>, and any program reading this element in an XML file is obliged to maintain the distinction.
Line 29: Line 29:
 
But things can get complicated. Consider this.
 
But things can get complicated. Consider this.
  
     <span style="background-color:lightblue">&lt;persName&gt;    </span>
+
     <span style="background-color:#C3E6FC">&lt;persName&gt;    </span>
     <span style="background-color:lightblue">    &lt;forename&gt;Mary&lt;/forename&gt;</span>
+
     <span style="background-color:#C3E6FC">    &lt;forename&gt;Mary&lt;/forename&gt;</span>
     <span style="background-color:lightblue">    &lt;forename&gt;Ann&lt;/forename&gt;</span>
+
     <span style="background-color:#C3E6FC">    &lt;forename&gt;Ann&lt;/forename&gt;</span>
     <span style="background-color:lightblue">    &lt;surname&gt;Henry&lt;/forename&gt;</span>
+
     <span style="background-color:#C3E6FC">    &lt;surname&gt;Henry&lt;/forename&gt;</span>
     <span style="background-color:lightblue">&lt;/persName&gt;</span>
+
     <span style="background-color:#C3E6FC">&lt;/persName&gt;</span>
 +
 
 +
Should the carriage returns and new lines matter? Should it matter if that
 +
open area before <span>&lt;surname&gt;</span> is a tab or is instead four space characters? Should it matter that there is extra space after <span style="background-color:#C3E6FC"><tt>&lt;persName&gt;</tt></span>?

Revision as of 05:44, 27 July 2012

Managing XML’s Whitespace in TEI Documents

TEI has robust features for specifying space, gaps, line breaks, and related aspects of the space between text. But TEI is an XML vocabulary, and XML itself, and programs that read and process XML files, have their own ways to deal with what they call whitespace, that is, space, tab, carriage return and linefeed characters. Often the standards, constraints, and conventions imposed by XML cause no problem for TEI encodings. But the interactions between XML's features and TEI's can sometimes cause subtle problems and sometimes even significant damage during processing of a TEI document.

This page offers an introduction to those interactions.

Where XML Considers Whitespace to be Significant

In XML documents, some whitespace is significant, some is not. For example, inside the brackets that mark XML elements extra whitespace is not significant. For any program processing these as pieces of XML,

   <title type="main">

and

   <title         type =   "main"   >

are the same. There is no significance to the extra space. By XML rules, no application that processes the data in this XML file (as XML and not just as text) is allowed to treat these two representations differently. A person or computer editing this file is free to use either one, based merely on readability and aesthetics. The fact that there is whitespace between title and type is significant, but how much or of what kind (space characters, tabs, carriage returns, new lines) is not significant. The space between type and = is not significant.

Whitespace can be significant, however, in the content of an element. For example,

   <name>MaryAnn</name>

and

   <name>Mary Ann</name> 

are different because of that space between Mary and Ann, and any program reading this element in an XML file is obliged to maintain the distinction.

But things can get complicated. Consider this.

   <persName>    
       <forename>Mary</forename>
       <forename>Ann</forename>
       <surname>Henry</forename>
   </persName>

Should the carriage returns and new lines matter? Should it matter if that open area before <surname> is a tab or is instead four space characters? Should it matter that there is extra space after <persName>?