Difference between revisions of "SampleXQueryPage"
Jump to navigation
Jump to search
Piotr Banski (talk | contribs) |
Piotr Banski (talk | contribs) (using Milestone-chunk.xquery as example) |
||
Line 1: | Line 1: | ||
− | The exact textual content of the original version of the [[Milestone-chunk.xquery]] document is as follows: | + | The exact textual content of the original version of the [[Milestone-chunk.xquery]] document by David Sewell is as follows: |
+ | <pre><nowiki> | ||
+ | == Authorship == | ||
+ | {| | ||
+ | | ''Author'' | ||
+ | | David Sewell, University of Virginia, [mailto:dsewell@virginia.edu dsewell@virginia.edu] | ||
+ | |- | ||
+ | | ''Last revised'' | ||
+ | | 2007-05-02 | ||
+ | |- | ||
+ | | ''Previous version'' | ||
+ | | none | ||
+ | |} | ||
− | { | + | |
+ | == Summary == | ||
+ | This is an XQuery 1.0 function that will return all of the content between two milestone elements such as '''pb''' | ||
+ | while preserving the hierarchical structure of the containing elements. For example, given content like this in a TEI document: | ||
+ | <pre><nowiki> | ||
+ | <TEI> | ||
+ | <text> | ||
+ | <body> | ||
+ | <div1 type="chapter" n="1"> <!-- lots of div2s --> | ||
+ | <div2 xml:id="doc100"> | ||
+ | <p>An example<pb n="3"/>of a <i>very</i> short page<pb n="4"/>here.</p> | ||
+ | </div2> <!-- followed by lots of other stuff --> | ||
+ | </div1> | ||
+ | </body> | ||
+ | </text> | ||
+ | </TEI> | ||
+ | </nowiki></pre> | ||
+ | |||
+ | the function would produce the following XML fragment as output when asked to return content between '''pb/@n=3''' and '''pb/@n=4''' | ||
+ | with the '''text''' element as the ancestor: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | <text> | ||
+ | <body> | ||
+ | <div1 type="chapter" n="1"> | ||
+ | <div2 xml:id="doc100"> | ||
+ | <p><pb n="3"/>of a <i>very</i> short page</p> | ||
+ | </div2> | ||
+ | </div1> | ||
+ | </body> | ||
+ | </text> | ||
+ | </nowiki></pre> | ||
+ | |||
+ | In other words, the full hierarchical structure of the ancestor elements, including their attributes, is preserved, but only the nodal content between the milestons is included. | ||
+ | |||
+ | == Required Input == | ||
+ | |||
+ | The function signature is | ||
+ | <pre><nowiki> | ||
+ | local:milestone-chunk( | ||
+ | $ms1 as element(), | ||
+ | $ms2 as element(), | ||
+ | $node as node()* | ||
+ | ) | ||
+ | </nowiki></pre> | ||
+ | |||
+ | '''$node''' is an element known to be a common ancestor of the two milestones. For example, it could be '''TEI/text''', or '''TEI/text/body''', or even '''TEI/text/body/div1[3]''' if the milestone parameters are both descendants of that div1. | ||
+ | |||
+ | '''$ms1''' is the first milestone element; '''$ms2''' is the second milestone element. | ||
+ | |||
+ | For example, the output in "Summary" above might have been produced by the call | ||
+ | |||
+ | <pre><nowiki> | ||
+ | let $input := doc("mydoc.xml")/tei:TEI/tei:text | ||
+ | return local:milestone-chunk($input//pb[@n="3"], $input//pb[@n="4"], $input) | ||
+ | </nowiki></pre> | ||
+ | |||
+ | If '''$input''' had been | ||
+ | doc("mydoc.xml")/tei:TEI/tei:text/tei:body | ||
+ | then the output would have started at the '''body''' element, etc. | ||
+ | |||
+ | '''$ms1''' and '''$ms2''' do not need to be adjacent milestones. You can, for example, return content between '''pb/@n=4''' and '''pb/@n=7'''. Nor do the milestones need to be of the same type; you can return content between the '''pb''' for page 4 and an arbitrary anchor or pointer element later in the document. | ||
+ | |||
+ | == Expected Output == | ||
+ | |||
+ | As indicated in the example in "Summary" above, the output should be a single XML element reflecting the structure of the input ancestor element and its descendants, but otherwise containing only the nodal content between the two milestone elements in the original input. | ||
+ | |||
+ | == Known Restrictions or Problems == | ||
+ | The output will contain a copy of the first milestone element, but not of the second (closing) one. This is a consequence of the need to use pseudo-milestones for the second milestone in some cases; see next paragraph. | ||
+ | |||
+ | When using this function to generate content between TEI '''pb''' elements, an obvious problem is that the final page will not have a final '''pb''' milestone. In this case, use a pseudo-milestone such as the last node in the input element. For example, using the sample document in "Summary" above, the content of page 4 ("here.") would be output via the call | ||
+ | <pre><nowiki> | ||
+ | let $input := doc("mydoc.xml")/tei:TEI/tei:text | ||
+ | local:milestone-chunk($input//pb[@n="4"], ($input//node())[last()], $input) | ||
+ | </nowiki></pre> | ||
+ | |||
+ | If you use this function to recurse over all the '''pb''' elements in a document, you will need to use a strategy like this when '''$ms1''' has no following '''pb''' element. | ||
+ | |||
+ | The function will not gracefully handle invalid input, but will probably throw run-time errors. You should be sure that the parameters passed to it reflect actual milestone elements with a common ancestor. | ||
+ | |||
+ | == Code == | ||
+ | <pre><nowiki> | ||
+ | declare function local:milestone-chunk( | ||
+ | $ms1 as element(), | ||
+ | $ms2 as element(), | ||
+ | $node as node()* | ||
+ | ) as node()* | ||
+ | { | ||
+ | typeswitch ($node) | ||
+ | case element() return | ||
+ | if ($node is $ms1) then $node | ||
+ | else if ( some $n in $node/descendant::* satisfies ($n is $ms1 or $n is $ms2) ) | ||
+ | then | ||
+ | element { name($node) } | ||
+ | { for $i in ( $node/node() | $node/@* ) | ||
+ | return local:milestone-chunk($ms1, $ms2, $i) } | ||
+ | else if ( $node >> $ms1 and $node << $ms2 ) then $node | ||
+ | else () | ||
+ | case attribute() return $node (: will never match attributes outside non-returned elements :) | ||
+ | default return | ||
+ | if ( $node >> $ms1 and $node << $ms2 ) then $node | ||
+ | else () | ||
+ | }; | ||
+ | </nowiki></pre> | ||
+ | |||
+ | [[Category:XQuery]] | ||
+ | |||
+ | </nowiki></pre> | ||
[[Category:XQuery|!]] | [[Category:XQuery|!]] |
Latest revision as of 00:51, 26 September 2007
The exact textual content of the original version of the Milestone-chunk.xquery document by David Sewell is as follows:
== Authorship == {| | ''Author'' | David Sewell, University of Virginia, [mailto:dsewell@virginia.edu dsewell@virginia.edu] |- | ''Last revised'' | 2007-05-02 |- | ''Previous version'' | none |} == Summary == This is an XQuery 1.0 function that will return all of the content between two milestone elements such as '''pb''' while preserving the hierarchical structure of the containing elements. For example, given content like this in a TEI document: <pre><nowiki> <TEI> <text> <body> <div1 type="chapter" n="1"> <!-- lots of div2s --> <div2 xml:id="doc100"> <p>An example<pb n="3"/>of a <i>very</i> short page<pb n="4"/>here.</p> </div2> <!-- followed by lots of other stuff --> </div1> </body> </text> </TEI> </nowiki></pre> the function would produce the following XML fragment as output when asked to return content between '''pb/@n=3''' and '''pb/@n=4''' with the '''text''' element as the ancestor: <pre><nowiki> <text> <body> <div1 type="chapter" n="1"> <div2 xml:id="doc100"> <p><pb n="3"/>of a <i>very</i> short page</p> </div2> </div1> </body> </text> </nowiki></pre> In other words, the full hierarchical structure of the ancestor elements, including their attributes, is preserved, but only the nodal content between the milestons is included. == Required Input == The function signature is <pre><nowiki> local:milestone-chunk( $ms1 as element(), $ms2 as element(), $node as node()* ) </nowiki></pre> '''$node''' is an element known to be a common ancestor of the two milestones. For example, it could be '''TEI/text''', or '''TEI/text/body''', or even '''TEI/text/body/div1[3]''' if the milestone parameters are both descendants of that div1. '''$ms1''' is the first milestone element; '''$ms2''' is the second milestone element. For example, the output in "Summary" above might have been produced by the call <pre><nowiki> let $input := doc("mydoc.xml")/tei:TEI/tei:text return local:milestone-chunk($input//pb[@n="3"], $input//pb[@n="4"], $input) </nowiki></pre> If '''$input''' had been doc("mydoc.xml")/tei:TEI/tei:text/tei:body then the output would have started at the '''body''' element, etc. '''$ms1''' and '''$ms2''' do not need to be adjacent milestones. You can, for example, return content between '''pb/@n=4''' and '''pb/@n=7'''. Nor do the milestones need to be of the same type; you can return content between the '''pb''' for page 4 and an arbitrary anchor or pointer element later in the document. == Expected Output == As indicated in the example in "Summary" above, the output should be a single XML element reflecting the structure of the input ancestor element and its descendants, but otherwise containing only the nodal content between the two milestone elements in the original input. == Known Restrictions or Problems == The output will contain a copy of the first milestone element, but not of the second (closing) one. This is a consequence of the need to use pseudo-milestones for the second milestone in some cases; see next paragraph. When using this function to generate content between TEI '''pb''' elements, an obvious problem is that the final page will not have a final '''pb''' milestone. In this case, use a pseudo-milestone such as the last node in the input element. For example, using the sample document in "Summary" above, the content of page 4 ("here.") would be output via the call <pre><nowiki> let $input := doc("mydoc.xml")/tei:TEI/tei:text local:milestone-chunk($input//pb[@n="4"], ($input//node())[last()], $input) </nowiki></pre> If you use this function to recurse over all the '''pb''' elements in a document, you will need to use a strategy like this when '''$ms1''' has no following '''pb''' element. The function will not gracefully handle invalid input, but will probably throw run-time errors. You should be sure that the parameters passed to it reflect actual milestone elements with a common ancestor. == Code == <pre><nowiki> declare function local:milestone-chunk( $ms1 as element(), $ms2 as element(), $node as node()* ) as node()* { typeswitch ($node) case element() return if ($node is $ms1) then $node else if ( some $n in $node/descendant::* satisfies ($n is $ms1 or $n is $ms2) ) then element { name($node) } { for $i in ( $node/node() | $node/@* ) return local:milestone-chunk($ms1, $ms2, $i) } else if ( $node >> $ms1 and $node << $ms2 ) then $node else () case attribute() return $node (: will never match attributes outside non-returned elements :) default return if ( $node >> $ms1 and $node << $ms2 ) then $node else () }; </nowiki></pre> [[Category:XQuery]]