CSV2TEI.xsl
Jump to navigation
Jump to search
Summary
This stylesheet takes a comma-separated-values file and converts it to a basic TEI table structure.
Add any comments to the 'discussion' tab.
Required Input
Using Saxon 8 or above use it as:
saxon -it main styesheet.xsl input-uri=filename.csv
So input can be any file resembling:
This,is,a,test,only,a,test This, is a, variation, on a, test "This","is","another","variation."
Expected Output
The output is a TEI table with table, rows and cells but nothing else which can then be cut-and-pasted into a TEI document. For example, with the above input, the output should be:
<table> <row> <cell>This</cell> <cell>is</cell> <cell>a</cell> <cell>test</cell> <cell>only</cell> <cell>a</cell> <cell>test</cell> </row> <row> <cell>This</cell> <cell>is a</cell> <cell>variation</cell> <cell>on a</cell> <cell>test</cell> </row> <row> <cell>This</cell> <cell>is</cell> <cell>another</cell> <cell>variation.</cell> </row> </table>
Known Restrictions or Problems
Note:
- It assumes cells are comma separated and rows are on individual lines
- If surrounded in double-quotes, it will remove them.
- It normalize-space()s the individual cell contents so leading/trailing spaces
are removed.
- It assumes things are in UTF-8 but this can be changed with the encoding
parameter output is converted to UTF-8.
- It is an XSLT2 stylesheet and requires an XSLT2 processor such as Saxon8
Stylesheet
<?xml version="1.0"?> <!-- Usage: saxon -it main styesheet.xsl input-uri=filename.csv Note: 1) It assumes cells are comma separated and rows are on individual lines 2) If surrounded in double-quotes, it will remove them. 3) It normalize-space()s the individual cell contents so leading/trailing spaces are removed. 4) It assumes things are in UTF-8 but this can be changed with the encoding parameter output is converted to UTF-8. 5) It is an XSLT2 stylesheet and requires an XSLT2 processor such as Saxon8 --> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0" exclude-result-prefixes="#all"> <xsl:param name="input-uri" as="xs:string"/> <xsl:param name="encoding" select="'UTF-8'" as="xs:string"/> <xsl:output indent="yes" encoding="UTF-8" /> <xsl:template name="main"> <xsl:variable name="in" select="unparsed-text($input-uri, $encoding)"/> <table> <xsl:analyze-string select="$in" regex="\n"> <xsl:non-matching-substring> <row> <xsl:analyze-string select="." regex='("([^"]*?)")|([^,] ?),'> <xsl:matching-substring> <cell> <xsl:value-of select="normalize-space(regex-group(2))"/> <xsl:value-of select="normalize-space(regex-group(3))"/> </cell> </xsl:matching-substring> </xsl:analyze-string> </row> </xsl:non-matching-substring> </xsl:analyze-string> </table> </xsl:template> </xsl:stylesheet>