Ways to Publish TEI on the Web

From TEIWiki
Revision as of 23:25, 23 April 2010 by Piotr Banski (talk | contribs) (+ dummy category, just to hook the article up in the tree)
Jump to navigation Jump to search

Methods to publishing TEI content to the web

There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the effort.

Domains

The root url of your published TEI needs to be consistent, but to enable google searching and to allow uses to re-find your content. The URL can also be an important part of the branding of your effort.

Filenames

Many systems have difficulty with filenames containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple filesnames which differ only by case (DOS-compatible filesystems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that thery're permitted on your choose system.

Single file vs Multiple files

Some groups may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as xpointer). Advantages of the single file approach include:

  • Conceptually easy
  • Easy editing
  • Header information doesn't need to be duplicated across multiple files
  • Constraints (schema / schematron validity) easier to check

Advantages of the multiple file approach include:

  • Files sizes kept within the range of conventional editors
  • Finer-grained operations (updating, indexing, version control, etc) possible,
  • Better for for multi-person teams as different people can work on different documents
  • Allows explicit division of TEI into logical units for processing and distribution.

Ensuring Google can index your content

If the findability of your content on popular search engines is important to you, there are a couple of points to bear in mind.

  • Use short, sane URLs, unique before the options and search terms (for a counter-example see http://www.nzdl.org/niupepa which is not findable by google). This means that if you're using a database to present your TEI, the core texts should be reachable by browsing simple URLs rather than entering search terms.
  • Use quality metadata in the generated the HTML
  • If you have a page about "Nouns in Klingon" then the string "Nouns in Klingon" should occur in the HTML <title> tag, a dc.title <meta> tag and in a <h1>, <h2> or <h3> tag on the page. Preferably it should be mentioned in the text too. It should appear in language you're wanting people to be able to search in (English, Klingon, etc)
  • You should try not to move the content around. This will confuse both google and normal people who've bookmarked it.


Here are two separate tables, one describes the form of the content presented on the web, the other describes how that might be hosted

Table of ways to serve TEI on the web
Method Pros Cons Skills needed Searching possible Dynamic KWIC etc, possible Examples
Publish as single plain TEI/XML file Simplest possible solution Meaningless to a standard web browser / user None No No
Single TEI file + XSL Processing Instruction (PI) Very simple two-file solution Some obscure XSLT features not supported by all browsers XML, TEI, XSLT No No
Single web page from (single TEI file + single XSL file) A web page like any other. Because TEI is hidden, third party repurposing or validation is impossible XML, TEI, XSLT No No
Single web page from (single TEI file + single XSL file) + TEI file A web page like any other, but a link to web XML, TEI, XSLT No No
Multiple web page from (multiple TEI files + xinclude + XSL file(s) ) + TEI files A web page like any other, but a link to web Multiple files enable large, complex structures and corpora XML, TEI, XSLT, XPointer No No
Multiple TEI files in eXist (native XML) database + XSL and/or XQuery for conversion of fragments to HTML Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform Requires own server (real or virtual) XML, TEI, XSLT, XQuery Yes Yes U.S. State Department
Multiple TEI files in post-relational (SQL) database + XSL conversion of fragments to HTML Makes inter-linking with other information sources easy Requires own server (real or virtual) XML, TEI, XSLT, SQL Yes No (in theory yes, but hard) NZETC

Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc), a little javascript skill goes a long way.

How to host your TEI website
Method Pros Cons Examples
Local server Full control. Can dual-use a server (or desktop) Need hardware. Need reliable internet connection. Need technical skills. Backups important NZETC
Shared server Pooled skilled and resources Conflicting requirements. Need reliable internet connection. Backups important
Virtual server Full control. Security handled by third party. Need technical skills. Costs $. Backups important
Hosting on google sites Free, unlimited HTML pages (size and number) No dynamic content (no eXist or other databases possible). Backups important

Many service-orient TEI publishers have (at least) a pair of servers, with a development server that is used for testing new developments and can also be pressed into service to replace the primary (or live) server should it develop a fault. This gives these publishers fault-tolerance. Such systems are significantly easier to construct with the flexibility virtual servers.