Difference between revisions of "CSV2TEI.xsl"
Jump to navigation
Jump to search
Piotr Banski (talk | contribs) m (→Required Input: + link, stylesheet->CSV2TEI, see talk page) |
Piotr Banski (talk | contribs) (→Required Input: changed the invocation of saxon, see the talk page) |
||
Line 9: | Line 9: | ||
Using [[Saxon]] 8 or above use it as: | Using [[Saxon]] 8 or above use it as: | ||
− | ''' | + | '''java -jar saxon8.jar -it main CSV2TEI.xsl input-uri=filename.csv''' |
− | So input can be any file resembling: | + | (The "-it main" option tells the processor to invoke the "main" template by name, and the "input-uri" is a parameter that we pass to that template.) |
+ | |||
+ | So input can be any file (in our example, "filename.csv") resembling: | ||
<pre> | <pre> | ||
− | |||
This,is,a,test,only,a,test | This,is,a,test,only,a,test | ||
This, is a, variation, on a, test | This, is a, variation, on a, test | ||
"This","is","another","variation." | "This","is","another","variation." | ||
− | |||
</pre> | </pre> | ||
Latest revision as of 12:56, 2 October 2007
Summary
This stylesheet takes a comma-separated-values file and converts it to a basic TEI table structure.
Add any comments to the 'discussion' tab.
Required Input
Using Saxon 8 or above use it as:
java -jar saxon8.jar -it main CSV2TEI.xsl input-uri=filename.csv
(The "-it main" option tells the processor to invoke the "main" template by name, and the "input-uri" is a parameter that we pass to that template.)
So input can be any file (in our example, "filename.csv") resembling:
This,is,a,test,only,a,test This, is a, variation, on a, test "This","is","another","variation."
Expected Output
The output is a TEI table with table, rows and cells but nothing else which can then be cut-and-pasted into a TEI document. For example, with the above input, the output should be:
<table> <row> <cell>This</cell> <cell>is</cell> <cell>a</cell> <cell>test</cell> <cell>only</cell> <cell>a</cell> <cell>test</cell> </row> <row> <cell>This</cell> <cell>is a</cell> <cell>variation</cell> <cell>on a</cell> <cell>test</cell> </row> <row> <cell>This</cell> <cell>is</cell> <cell>another</cell> <cell>variation.</cell> </row> </table>
Known Restrictions or Problems
Note:
- It assumes cells are comma separated and rows are on individual lines
- If surrounded in double-quotes, it will remove them.
- It normalize-space()s the individual cell contents so leading/trailing spaces
are removed.
- It assumes things are in UTF-8 but this can be changed with the encoding
parameter output is converted to UTF-8.
- It is an XSLT2 stylesheet and requires an XSLT2 processor such as Saxon8
Stylesheet
<?xml version="1.0"?> <!-- Usage: saxon -it main styesheet.xsl input-uri=filename.csv Note: 1) It assumes cells are comma separated and rows are on individual lines 2) If surrounded in double-quotes, it will remove them. 3) It normalize-space()s the individual cell contents so leading/trailing spaces are removed. 4) It assumes things are in UTF-8 but this can be changed with the encoding parameter output is converted to UTF-8. 5) It is an XSLT2 stylesheet and requires an XSLT2 processor such as Saxon8 --> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0" exclude-result-prefixes="#all"> <xsl:param name="input-uri" as="xs:string"/> <xsl:param name="encoding" select="'UTF-8'" as="xs:string"/> <xsl:output indent="yes" encoding="UTF-8" /> <xsl:template name="main"> <xsl:variable name="in" select="unparsed-text($input-uri, $encoding)"/> <table> <xsl:analyze-string select="$in" regex="\n"> <xsl:non-matching-substring> <row> <xsl:analyze-string select="." regex='("([^"]*?)")|([^,]+?),'> <xsl:matching-substring> <cell> <xsl:value-of select="normalize-space(regex-group(2))"/> <xsl:value-of select="normalize-space(regex-group(3))"/> </cell> </xsl:matching-substring> </xsl:analyze-string> </row> </xsl:non-matching-substring> </xsl:analyze-string> </table> </xsl:template> </xsl:stylesheet>