CSV2TEI.xsl
Jump to navigation
Jump to search
Summary
BigTitPatrol Asstraffic Allinternal CreampieSurprise Spermswap MrChewsAsianBeaver AdultFreindFinder GiveMePink Torrie Wilson FastSize BigMouthfuls TugJobs This stylesheet takes a comma-separated-values file and converts it to a basic TEI table structure.
Add any comments to the 'discussion' tab.
Required Input
Using Saxon 8 or above use it as:
saxon -it main styesheet.xsl input-uri=filename.csv
So input can be any file resembling:
This,is,a,test,only,a,test! This, is a, variation, on a, test! "This","is","another","variation."
Expected Output
The output is a TEI table with table, rows and cells but nothing else which can then be cut-and-pasted into a TEI document. For example, with the above input, the output should be:
<table>
<row>
<cell>This</cell>
<cell>is</cell>
<cell>a</cell>
<cell>test</cell>
<cell>only</cell>
<cell>a</cell>
</row>
<row>
<cell>This</cell>
<cell>is a</cell>
<cell>variation</cell>
<cell>on a</cell>
</row>
<row>
<cell>This</cell>
<cell>is</cell>
<cell>another</cell>
<cell>variation.</cell>
</row>
</table>
Known Restrictions or Problems
Note:
- It assumes cells are comma separated and rows are on individual lines
- If surrounded in double-quotes, it will remove them.
- It normalize-space()s the individual cell contents so leading/trailing spaces
are removed.
- It assumes things are in UTF-8 but this can be changed with the encoding
parameter output is converted to UTF-8.
- It is an XSLT2 stylesheet and requires an XSLT2 processor such as Saxon8
Stylesheet
<?xml version="1.0"?>
<!--
Usage:
saxon -it main styesheet.xsl input-uri=filename.csv
Note:
1) It assumes cells are comma separated and rows are on individual lines
2) If surrounded in double-quotes, it will remove them.
3) It normalize-space()s the individual cell contents so leading/trailing spaces
are removed.
4) It assumes things are in UTF-8 but this can be changed with the encoding
parameter output is converted to UTF-8.
5) It is an XSLT2 stylesheet and requires an XSLT2 processor such as Saxon8
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0" exclude-result-prefixes="#all">
<xsl:param name="input-uri" as="xs:string"/>
<xsl:param name="encoding" select="'UTF-8'" as="xs:string"/>
<xsl:output indent="yes" encoding="UTF-8" />
<xsl:template name="main">
<xsl:variable name="in"
select="unparsed-text($input-uri, $encoding)"/>
<table>
<xsl:analyze-string select="$in" regex="\n">
<xsl:non-matching-substring>
<row>
<xsl:analyze-string select="."
regex='("([^"]*?)")|([^,]+?),'>
<xsl:matching-substring>
<cell>
<xsl:value-of select="normalize-space(regex-group(2))"/>
<xsl:value-of select="normalize-space(regex-group(3))"/>
</cell>
</xsl:matching-substring>
</xsl:analyze-string>
</row>
</xsl:non-matching-substring>
</xsl:analyze-string>
</table>
</xsl:template>
</xsl:stylesheet>