Ways to Publish TEI on the Web

From TEIWiki

Jump to: navigation, search

There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the effort.

Contents

Domains

The root URL of your published TEI needs to be consistent, but to enable Google searching and to allow uses to re-find your content. The URL can also be an important part of the branding of your effort.

File names

Many systems have difficulty with file names containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple file names which differ only by case (DOS-compatible file systems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that they're permitted on your choose system.

Single file vs Multiple files

Some groups may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as XPointer). Advantages of the single file approach include:

  • Conceptually easy
  • Easy editing
  • Header information doesn't need to be duplicated across multiple files
  • Constraints (schema / Schematron validity) easier to check

Advantages of the multiple file approach include:

  • Files sizes kept within the range of conventional editors
  • Finer-grained operations (updating, indexing, version control, etc.) possible,
  • Better for for multi-person teams as different people can work on different documents
  • Allows explicit division of TEI into logical units for processing and distribution.

Ensuring Google can index your content

If the findability of your content on popular search engines is important to you, there are a couple of points to bear in mind.

  • Use short, sane URLs, unique before the options and search terms (for a counter-example see http://www.nzdl.org/niupepa whose content is not findable by Google). This means that if you're using a database to present your TEI, the core texts should be reachable by browsing simple URLs rather than entering search terms, and should be findable at urls that are unique prior to any '?' in the URL.
  • Use quality meta-data in the generated HTML
  • If you have a page about "Nouns in Klingon" then the string "Nouns in Klingon" should occur in the HTML <title> tag, a dc.title <meta> tag and in a <h1>, <h2> or <h3> tag on the page. Preferably it should be mentioned in the text too. It should appear in language you're wanting people to be able to search in (English, Klingon, etc.)
  • You should try not to move the content around. This will confuse both google and normal people who've bookmarked it. See also http://www.w3.org/TR/cooluris/

Here are two separate tables, one describing the form of the content presented on the web, the other describing how that might be hosted:

Presentation of content

Table of ways to serve TEI on the web
Method Pros Cons Skills needed Intra-site searching possible Dynamic KWIC etc., possible Examples
Publish as single plain TEI/XML file Simplest possible solution Meaningless to a standard web browser / user None No No
Publish as single plain TEI/XHTML file with a CSS file for rendering Just need to import a CSS file into XHTML and write a rule for text to be displayed Can not reorder tags/Meaningless to many search engines TEI, CSS No No
Single TEI file + XSL Processing Instruction (PI) Very simple two-file solution Some obscure XSLT features not supported by all browsers, can be slow for large files. TEI, XSLT No No
Single web page from (single TEI file + single XSL file); either pre-generated or dynamically A web page like any other. Because TEI is hidden, third party re-purposing or validation is impossible TEI, XSLT No No
Single web page from (single TEI file + single XSL file); either pre-generated or dynamically A web page like any other, but a link exposing the underlying TEI file TEI, XSLT No No
Multiple web page from (multiple TEI files + xinclude + XSL file(s) ) A web page like any other, but a link exposing the underlying TEI files Multiple files enable large, complex structures and corpora TEI, XSLT, XPointer No No Documents of Ireland
Multiple TEI files in eXist (native XML) database + XSL for conversion of fragments to HTML Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform Requires own server (real or virtual) TEI, XSLT Yes Yes
Multiple TEI files in eXist (native XML) database + typeswitch XQuery for conversion of fragments to HTML Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform Requires own server (real or virtual) or hosted eXist solution TEI, XQuery Yes Yes U.S. State Department
Multiple TEI files in post-relational (postgres, mysql, etc) database + XSL conversion of fragments to HTML Makes inter-linking with other information sources easy Requires own server (real or virtual) TEI, XSLT, SQL Yes No (in theory yes, but hard) NZETC
Put TEI files as records in RDBMS that has an XML column datatype Can use XQuery for HTML rendering XQuery must be embedded inside SQL. TEI, XQuery Yes No (in theory yes, but hard)

Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc.), a little JavaScript skill goes a long way.

Hosting of Content

How to host your TEI website
Method Pros Cons Examples
Local server Full control. Can dual-use a server (or desktop) Need hardware. Need reliable internet connection. Need technical skills. Backups important NZETC
Shared server Pooled skilled and resources Conflicting requirements. Need reliable internet connection. Backups important
Virtual server Full control. Security handled by third party. Need technical skills. Costs $. Backups important
Hosting on Google sites Free, unlimited HTML pages (size and number) No dynamic content (no eXist or other databases possible). Backups important

Many service-oriented TEI publishers have (at least) a pair of servers with a development server that is used for testing new developments and can also be pressed into service to replace the primary (or live) server should it develop a fault. This gives these publishers fault-tolerance. Such systems are significantly easier to construct with the flexibility virtual servers.

Personal tools