Difference between revisions of "CSV2TEI.xsl"

From TEIWiki
Jump to navigation Jump to search
m (Required Input: + link, stylesheet->CSV2TEI, see talk page)
(Required Input: changed the invocation of saxon, see the talk page)
 
Line 9: Line 9:
 
Using [[Saxon]] 8 or above use it as:
 
Using [[Saxon]] 8 or above use it as:
  
'''saxon -it main CSV2TEI.xsl input-uri=filename.csv'''
+
'''java -jar saxon8.jar -it main CSV2TEI.xsl input-uri=filename.csv'''
  
So input can be any file resembling:
+
(The "-it main" option tells the processor to invoke the "main" template by name, and the "input-uri" is a parameter that we pass to that template.)
 +
 
 +
So input can be any file (in our example, "filename.csv") resembling:
 
<pre>
 
<pre>
<nowiki>
 
 
This,is,a,test,only,a,test
 
This,is,a,test,only,a,test
 
This, is a, variation, on a, test
 
This, is a, variation, on a, test
 
"This","is","another","variation."  
 
"This","is","another","variation."  
</nowiki>
 
 
</pre>
 
</pre>
  

Latest revision as of 12:56, 2 October 2007

Summary

This stylesheet takes a comma-separated-values file and converts it to a basic TEI table structure.

Add any comments to the 'discussion' tab.

Required Input

Using Saxon 8 or above use it as:

java -jar saxon8.jar -it main CSV2TEI.xsl input-uri=filename.csv

(The "-it main" option tells the processor to invoke the "main" template by name, and the "input-uri" is a parameter that we pass to that template.)

So input can be any file (in our example, "filename.csv") resembling:

This,is,a,test,only,a,test
This, is a, variation, on a, test
"This","is","another","variation." 

Expected Output

The output is a TEI table with table, rows and cells but nothing else which can then be cut-and-pasted into a TEI document. For example, with the above input, the output should be:


<table>
   <row>
      <cell>This</cell>
      <cell>is</cell>
      <cell>a</cell>
      <cell>test</cell>
      <cell>only</cell>
      <cell>a</cell>
      <cell>test</cell>
   </row>
   <row>
      <cell>This</cell>
      <cell>is a</cell>
      <cell>variation</cell>
      <cell>on a</cell>
      <cell>test</cell>
   </row>
   <row>
      <cell>This</cell>
      <cell>is</cell>
      <cell>another</cell>
      <cell>variation.</cell>
   </row>
</table>


Known Restrictions or Problems

Note:

  • It assumes cells are comma separated and rows are on individual lines
  • If surrounded in double-quotes, it will remove them.
  • It normalize-space()s the individual cell contents so leading/trailing spaces

are removed.

  • It assumes things are in UTF-8 but this can be changed with the encoding

parameter output is converted to UTF-8.

  • It is an XSLT2 stylesheet and requires an XSLT2 processor such as Saxon8


Stylesheet


<?xml version="1.0"?>
<!-- 
Usage: 
saxon -it main styesheet.xsl input-uri=filename.csv

Note:
1) It assumes cells are comma separated and rows are on individual lines
2) If surrounded in double-quotes, it will remove them.
3) It normalize-space()s the individual cell contents so leading/trailing spaces
are removed.
4) It assumes things are in UTF-8 but this can be changed with the encoding
parameter output is converted to UTF-8.
5) It is an XSLT2 stylesheet and requires an XSLT2 processor such as Saxon8

 -->

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    version="2.0" exclude-result-prefixes="#all">
<xsl:param name="input-uri" as="xs:string"/>
<xsl:param name="encoding" select="'UTF-8'" as="xs:string"/>
<xsl:output indent="yes" encoding="UTF-8" />

<xsl:template name="main">
  <xsl:variable name="in" 
              select="unparsed-text($input-uri, $encoding)"/>
  <table>
  <xsl:analyze-string select="$in" regex="\n">
    <xsl:non-matching-substring>
      <row>
      <xsl:analyze-string select="." 
              regex='("([^"]*?)")|([^,]+?),'>
        <xsl:matching-substring>
          <cell>
             <xsl:value-of select="normalize-space(regex-group(2))"/>
             <xsl:value-of select="normalize-space(regex-group(3))"/>
          </cell>
        </xsl:matching-substring>
      </xsl:analyze-string>
      </row>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
  </table>
</xsl:template>

</xsl:stylesheet>