Ways to Publish TEI on the Web
There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the effort.
The root URL of your published TEI needs to be consistent, but to enable Google searching and to allow uses to re-find your content. The URL can also be an important part of the branding of your effort.
- Some efforts choose to be in sub-domains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc.) these sub-domains are typically free and brand the published TEI with the institutions identity.
- Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost a small amount of money to register and maintain.
- Some hosted services (e.g., Google sites) have their own fixed domain that users get little or no say in.
Many systems have difficulty with file names containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple file names which differ only by case (DOS-compatible file systems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that they're permitted on your choose system.
Single file vs Multiple files
Some groups may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as XPointer). Advantages of the single file approach include:
- Conceptually easy
- Easy editing
- Header information doesn't need to be duplicated across multiple files
- Constraints (schema / Schematron validity) easier to check
Advantages of the multiple file approach include:
- Files sizes kept within the range of conventional editors
- Finer-grained operations (updating, indexing, version control, etc.) possible,
- Better for for multi-person teams as different people can work on different documents
- Allows explicit division of TEI into logical units for processing and distribution.
Ensuring Google can index your content
If the findability of your content on popular search engines is important to you, there are a couple of points to bear in mind.
- Use short, sane URLs, unique before the options and search terms (for a counter-example see http://www.nzdl.org/niupepa whose content is not findable by Google). This means that if you're using a database to present your TEI, the core texts should be reachable by browsing simple URLs rather than entering search terms, and should be findable at urls that are unique prior to any '?' in the URL.
- Use quality meta-data in the generated HTML
- If you have a page about "Nouns in Klingon" then the string "Nouns in Klingon" should occur in the HTML <title> tag, a dc.title <meta> tag and in a <h1>, <h2> or <h3> tag on the page. Preferably it should be mentioned in the text too. It should appear in language you're wanting people to be able to search in (English, Klingon, etc.)
- You should try not to move the content around. This will confuse both google and normal people who've bookmarked it. See also http://www.w3.org/TR/cooluris/
Here are two separate tables, one describing the form of the content presented on the web, the other describing how that might be hosted:
Presentation of content
|Method||Pros||Cons||Skills needed||Intra-site searching possible||Dynamic KWIC etc., possible||Examples|
|Publish as single plain TEI/XML file||Simplest possible solution||Meaningless to a standard web browser / user||None||No||No|
|Publish as single plain TEI/XHTML file with a CSS file for rendering||Just need to import a CSS file into XHTML and write a rule for text to be displayed||Can not reorder tags/Meaningless to many search engines||TEI, CSS||No||No|
|Single TEI file + XSL Processing Instruction (PI)||Very simple two-file solution||Some obscure XSLT features not supported by all browsers, can be slow for large files.||TEI, XSLT||No||No|
|Single web page from (single TEI file + single XSL file); either pre-generated or dynamically||A web page like any other.||Because TEI is hidden, third party re-purposing or validation is impossible||TEI, XSLT||No||No|
|Single web page from (single TEI file + single XSL file); either pre-generated or dynamically||A web page like any other, but a link exposing the underlying TEI file||TEI, XSLT||No||No|
|Multiple web page from (multiple TEI files + xinclude + XSL file(s) )||A web page like any other, but a link exposing the underlying TEI files||Multiple files enable large, complex structures and corpora||TEI, XSLT, XPointer||No||No||Documents of Ireland|
|Multiple TEI files in eXist (native XML) database + XSL for conversion of fragments to HTML||Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform||Requires own server (real or virtual)||TEI, XSLT||Yes||Yes|
|Multiple TEI files in eXist (native XML) database + typeswitch XQuery for conversion of fragments to HTML||Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform||Requires own server (real or virtual) or hosted eXist solution||TEI, XQuery||Yes||Yes||U.S. State Department|
|Multiple TEI files in post-relational (postgres, mysql, etc) database + XSL conversion of fragments to HTML||Makes inter-linking with other information sources easy||Requires own server (real or virtual)||TEI, XSLT, SQL||Yes||No (in theory yes, but hard)||NZETC|
|Put TEI files as records in RDBMS that has an XML column datatype||Can use XQuery for HTML rendering||XQuery must be embedded inside SQL.||TEI, XQuery||Yes||No (in theory yes, but hard)|
Hosting of Content
|Local server||Full control. Can dual-use a server (or desktop)||Need hardware. Need reliable internet connection. Need technical skills. Backups important||NZETC|
|Shared server||Pooled skilled and resources||Conflicting requirements. Need reliable internet connection. Backups important|
|Virtual server||Full control. Security handled by third party.||Need technical skills. Costs $. Backups important|
|Hosting on Google sites||Free, unlimited HTML pages (size and number)||No dynamic content (no eXist or other databases possible). Backups important|
Many service-oriented TEI publishers have (at least) a pair of servers with a development server that is used for testing new developments and can also be pressed into service to replace the primary (or live) server should it develop a fault. This gives these publishers fault-tolerance. Such systems are significantly easier to construct with the flexibility virtual servers.