PipedStylesheets.bash

From TEIWiki
Revision as of 08:40, 26 May 2006 by Syd (talk | contribs)
Jump to navigation Jump to search

Idea

This is a shell script (in the bash language) that executes each stylesheet it finds in its own directory on the specified input file. It uses (and thus depends on) the xsltproc command, which is part of the [libxslt] library. Mac OS X users can easily obtain xsltproc via [fink].

Quick Usage Summary

Copy this script and all the stylesheets you want to execute into a single directory. Make sure that the stylesheets all end in ".xslt" and are named such that they are sorted in the order you wish to have them executed. Then run the script, specifying the input file as the only argument.

Detailed Instructions

  1. Log into a commandline interface to a bash-enabled system, e.g. Mac OS X or Debian GNU/Linux.
  2. Create a new directory (if needed), and cd into it. E.g. mkdir ~/Desktop/p42p5/; cd ~/Desktop/p42p5/.
  3. If your bash executable is not /bin/bash, then change both occurences of the string "/bin/bash" to a working path to the bash executable. You can probably ascertain such a path with the command which bash.
  4. Copy all of the desired stylesheets into the new and current directory. You may want to get all or some of the stylesheets at P4toP5, and you may want Remove-Default-Attributes.xsl, although remember that it only works on the elements & attributes in TEI Lite (P4 XML), and does not understand namespaces (and thus should be 'before' Dot-two.xslt).
  5. Rename all of the stylesheets you want executed to ensure that they end in ".xslt".
  6. Rename all of the stylesheets you want executed so that they are alphabetically in the order you want them executed. E.g.:
    • 01_change2change.xslt
    • 02_id2xmlid.xslt
    • 03_default_attrs.xslt
    • 04_dot2.xslt
    • 05_dateStructless.xslt
  7. Run the script, giving the path to an input P4 document as the only argument.
    • The output file goes in the same directory as the input file with a filename based on the input file's name (issue the script without any arguments for details)
    • You can specify an output file if you don't like the default name or want it to go to a different directory.
    • You can specify a -d switch to have the temporary file left for debugging purposes.

Known Bugs, Problems, Limitations, etc.

  • Does not check to see if the output directory is writable.
  • If any one stylesheet fails, the others are still attempted.

Program code itself

#! /bin/bash

# pipedStylesheets.bash
#
# Driver script to automate a large portion of the process to convert
# a TEI P4 (XML) instance into a TEI P5 instance.
#
# This script has been tesetd on only a Mac OS X and a Debian
# GNU/Linux system. It is not intended to run on any other, but it
# might.
#
# Copyleft 2006 by Syd Bauman and the Text Encoding Initiative
# Consortium

# set up various constants for use later
TEMP=/tmp
TMPFILE=./`basename $0 .bash`_$$_temp.bash
DEBUG=false

# establish an error exit procedure
function error {
    echo "---------"
    D=`date "+%FT%T"`
    echo "fatal error: $@ at $D."
    exit 1
}

# subroutine to derive an output filename from the input
# Note: this routine has the side-effect of setting $OUT
function outPath {
    case "${1##*.}" in
	xml   ) OUT=`basename $1 .xml`.tei ;;
	teip4 ) OUT=`basename $1 .teip4`.teip5 ;;
	p4    ) OUT=`basename $1 .p4`.p5 ;;
	tei   ) OUT=`basename $1 .tei`.teip5 ;;
        *     ) OUT="${1}.xml" ;;
    esac
}

# ensure that we have a working xsltproc
echo "Checking for executable `xsltproc`:"
which xsltproc || error "I could not find an `xsltproc` command to run, so I'm giving up"

# process options
# Currently, there is only 1 option: -d for debug
while getopts ":d" opt; do
    case $opt in
	d ) DEBUG=true ;;
	* ) echo "usge: $0 [-d] path/to/input [path/to/output]"
	    echo "(Any names can be used if both input & output"
	    echo "are specified; if input ends in a recognized"
	    echo "extension, the output is the same file with a"
	    echo "modified extension:"
	    echo " IN	  OUT"
	    echo ".xml   .tei"
	    echo ".teip4 .teip5"
	    echo ".p4    .p5"
	    echo ".tei   .teip5"
	    echo "If input extension is not recognized, output"
	    echo "has extension '.xml' appended"
	    exit 1
    esac
done
shift $(( $OPTIND -1 ))

# ensure that we have the right number of arguments (1 or 2),
# and create an output filepath
case "$#" in 
    0 ) error "Input file not specified" ;;
    1 ) outPath $1 ;;
    2 ) OUT=$2 ;;
    * ) error "Extraneous operand(s) ($2 $3 $4 ...)" ;;
esac

# Assemble a large pipeline command that takes $1 as the
# input file, executes each and every stylesheet (defined
# as a file ending in '.xslt') in the current directory in
# the order found, and puts the output in $OUT. Let the user
# know which stylesheets we found as we go along.
COMMAND="cat $1"
echo; echo "The following stylesheets will be performed (in the order listed):"
for stylesheet in ./*.xslt ; do
    echo "  $stylesheet"
    COMMAND="$COMMAND | xsltproc $stylesheet -"
done
COMMAND="$COMMAND > $OUT"

# I have no idea how to execute this lovely command from directly
# within this script. So generate another script with this as its
# content, and execute it.
FILE="#! /bin/bash"
echo "$FILE" > $TMPFILE
echo "$COMMAND" >> $TMPFILE
echo
echo "ready, set, go!"
echo "(Output following this line is from execution of stylesheets)"
time source $TMPFILE

# if we're in debug mode, keep the temp file; otherwise
# nuke it
if [ "$DEBUG" != "true" ] ; then
    rm $TMPFILE
fi

Syd 07:29, 26 May 2006 (BST)