Difference between revisions of "Ways to Publish TEI on the Web"

From TEIWiki
Jump to navigation Jump to search
m (+ dummy category, just to hook the article up in the tree)
 
(16 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Methods to publishing TEI content to the web ==
 
 
 
There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the effort.  
 
There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the effort.  
  
 
===Domains===
 
===Domains===
  
The root url of your published TEI needs to be consistent, but to enable google searching and to allow uses to re-find your content.  The URL can also be an important part of the branding of your effort.  
+
The root URL of your published TEI needs to be consistent, but to enable Google searching and to allow uses to re-find your content.  The URL can also be an important part of the branding of your effort.  
* Some efforts choose to be in subdomains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc) these subdomains are typically free and brand the published TEI with the institutions identity.  
+
* Some efforts choose to be in sub-domains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc.) these sub-domains are typically free and brand the published TEI with the institutions identity.  
 
* Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost a small amount of money to register and maintain.  
 
* Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost a small amount of money to register and maintain.  
* Some hosted services (i.e. google sites) have their own fixed domain that users get little or no say in.  
+
* Some hosted services (e.g., Google sites) have their own fixed domain that users get little or no say in.  
  
===Filenames===
+
===File names===
Many systems have difficulty with filenames containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple filesnames which differ only by case (DOS-compatible filesystems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that thery're permitted on your choose system.
+
Many systems have difficulty with file names containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple file names which differ only by case (DOS-compatible file systems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that they're permitted on your choose system.
  
 
=== Single file vs Multiple files===
 
=== Single file vs Multiple files===
Some groups may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as xpointer). Advantages of the single file approach include:
+
Some groups may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as XPointer). Advantages of the single file approach include:
 
* Conceptually easy
 
* Conceptually easy
 
* Easy editing  
 
* Easy editing  
 
* Header information doesn't need to be duplicated across multiple files
 
* Header information doesn't need to be duplicated across multiple files
* Constraints (schema / schematron validity) easier to check  
+
* Constraints (schema / Schematron validity) easier to check  
  
 
Advantages of the multiple file approach include:
 
Advantages of the multiple file approach include:
 
* Files sizes kept within the range of conventional editors
 
* Files sizes kept within the range of conventional editors
* Finer-grained operations (updating, indexing, version control, etc) possible,
+
* Finer-grained operations (updating, indexing, version control, etc.) possible,
 
* Better for for multi-person teams as different people can work on different documents
 
* Better for for multi-person teams as different people can work on different documents
 
* Allows explicit division of TEI into logical units for processing and distribution.
 
* Allows explicit division of TEI into logical units for processing and distribution.
Line 28: Line 26:
 
=== Ensuring Google can index your content ===
 
=== Ensuring Google can index your content ===
 
If the findability of your content on popular search engines is important to you, there are a couple of points to bear in mind.  
 
If the findability of your content on popular search engines is important to you, there are a couple of points to bear in mind.  
* Use short, sane URLs, unique before the options and search terms (for a counter-example see http://www.nzdl.org/niupepa which is not findable by google). This means that if you're using a database to present your TEI, the core texts should be reachable by browsing simple URLs rather than entering search terms.
+
* Use short, sane URLs, unique before the options and search terms (for a counter-example see http://www.nzdl.org/niupepa whose content is not findable by Google). This means that if you're using a database to present your TEI, the core texts should be reachable by browsing simple URLs rather than entering search terms, and should be findable at urls that are unique prior to any '?' in the URL.
* Use quality metadata in the generated the HTML
+
* Use quality meta-data in the generated HTML
* If you have a page about "Nouns in Klingon" then the string "Nouns in Klingon" should occur in the HTML <title> tag, a dc.title <meta> tag and in a <nowiki><h1></nowiki>, <nowiki><h2></nowiki> or <nowiki><h3></nowiki> tag on the page. Preferably it should be mentioned in the text too.  It should appear in language you're wanting people to be able to search in (English, Klingon, etc)
+
* If you have a page about "Nouns in Klingon" then the string "Nouns in Klingon" should occur in the HTML <title> tag, a dc.title <meta> tag and in a <nowiki><h1></nowiki>, <nowiki><h2></nowiki> or <nowiki><h3></nowiki> tag on the page. Preferably it should be mentioned in the text too.  It should appear in language you're wanting people to be able to search in (English, Klingon, etc.)
* You should try not to move the content around. This will confuse both google and normal people who've bookmarked it.
+
* You should try not to move the content around. This will confuse both google and normal people who've bookmarked it. See also [http://www.w3.org/TR/cooluris/ http://www.w3.org/TR/cooluris/]
  
 +
Here are two separate tables, one describing the form of the content presented on the web, the other describing how that might be hosted:
  
Here are two separate tables, one describes the form of the content presented on the web, the other describes how that might be hosted
+
= Presentation of content =
  
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
Line 43: Line 42:
 
! Cons
 
! Cons
 
! Skills needed
 
! Skills needed
! Searching possible
+
! Intra-site searching possible
! Dynamic KWIC etc, possible
+
! Dynamic KWIC etc., possible
 
! Examples
 
! Examples
 
|-
 
|-
Line 51: Line 50:
 
| Meaningless to a standard web browser / user
 
| Meaningless to a standard web browser / user
 
| None
 
| None
 +
| No
 +
| No
 +
|
 +
|-
 +
| Publish as single plain TEI/XHTML file with a CSS file for rendering
 +
| Just need to import a CSS file into XHTML and write a rule for text to be displayed
 +
| Can not reorder tags/Meaningless to many search engines
 +
| TEI, CSS
 
| No
 
| No
 
| No
 
| No
Line 57: Line 64:
 
| Single TEI file + XSL Processing Instruction (PI)
 
| Single TEI file + XSL Processing Instruction (PI)
 
| Very simple two-file solution
 
| Very simple two-file solution
| Some obscure XSLT features not supported by all browsers
+
| Some obscure XSLT features not supported by all browsers, can be slow for large files.
| XML, TEI, [[XSLT]]
+
| TEI, [[XSLT]]
 
| No
 
| No
 
| No
 
| No
 
|
 
|
 
|-
 
|-
| Single web page from (single TEI file + single XSL file)
+
| Single web page from (single TEI file + single XSL file); either pre-generated or dynamically
 
| A web page like any other.
 
| A web page like any other.
| Because TEI is hidden, third party repurposing or validation is impossible
+
| Because TEI is hidden, third party re-purposing or validation is impossible
| XML, TEI, XSLT
+
| TEI, XSLT
 
| No
 
| No
 
| No
 
| No
 
|
 
|
 
|-
 
|-
| Single web page from (single TEI file + single XSL file) + TEI file
+
| Single web page from (single TEI file + single XSL file); either pre-generated or dynamically
| A web page like any other, but a link to web
+
| A web page like any other, but a link exposing the underlying TEI file
 
|  
 
|  
| XML, TEI, XSLT
+
| TEI, XSLT
 
| No
 
| No
 
| No
 
| No
 
|
 
|
 
|-
 
|-
| Multiple web page from (multiple TEI files + xinclude + XSL file(s) ) + TEI files
+
| Multiple web page from (multiple TEI files + xinclude + XSL file(s) )
| A web page like any other, but a link to web
+
| A web page like any other, but a link exposing the underlying TEI files
 
| Multiple files enable large, complex structures and corpora
 
| Multiple files enable large, complex structures and corpora
| XML, TEI, XSLT, [[XPointer]]
+
| TEI, XSLT, [[XPointer]]
 
| No
 
| No
 
| No
 
| No
|
+
| [http://doi.ucc.ie/ Documents of Ireland]
 
|-
 
|-
| Multiple TEI files in eXist (native XML) database + XSL and/or XQuery for conversion of fragments to HTML
+
| Multiple TEI files in eXist (native XML) database + XSL for conversion of fragments to HTML
 
| Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform
 
| Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform
 
| Requires own server (real or virtual)
 
| Requires own server (real or virtual)
| XML, TEI, XSLT, [[XQuery]]
+
| TEI, XSLT
 +
| Yes
 +
| Yes
 +
|
 +
|-
 +
| Multiple TEI files in eXist (native XML) database + typeswitch XQuery for conversion of fragments to HTML
 +
| Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform
 +
| Requires own server (real or virtual) or hosted eXist solution
 +
TEI, [[XQuery]]
 
| Yes
 
| Yes
 
| Yes
 
| Yes
 
| [http://history.state.gov/ U.S. State Department]
 
| [http://history.state.gov/ U.S. State Department]
 
|-
 
|-
| Multiple TEI files in post-relational (SQL) database + XSL conversion of fragments to HTML
+
| Multiple TEI files in post-relational (postgres, mysql, etc) database + XSL conversion of fragments to HTML
 
| Makes inter-linking with other information sources easy  
 
| Makes inter-linking with other information sources easy  
 
| Requires own server (real or virtual)
 
| Requires own server (real or virtual)
| XML, TEI, XSLT, SQL
+
| TEI, XSLT, SQL
 
| Yes
 
| Yes
 
| No (in theory yes, but hard)
 
| No (in theory yes, but hard)
 
| [http://www.nzetc.org/ NZETC]
 
| [http://www.nzetc.org/ NZETC]
 +
|-
 +
| Put TEI files as records in RDBMS that has an XML column datatype
 +
| Can use XQuery for HTML rendering
 +
| XQuery must be embedded inside SQL.
 +
|  TEI, XQuery
 +
| Yes
 +
| No (in theory yes, but hard)
 +
|
 
|}
 
|}
  
Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc), a little javascript skill goes a long way.
+
Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc.), a little JavaScript skill goes a long way.
 +
 
 +
= Hosting of Content =
  
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
Line 129: Line 154:
 
|
 
|
 
|-
 
|-
| Hosting on google sites
+
| Hosting on Google sites
 
| Free, unlimited HTML pages (size and number)
 
| Free, unlimited HTML pages (size and number)
 
| No dynamic content (no [[eXist]] or other databases possible). Backups important
 
| No dynamic content (no [[eXist]] or other databases possible). Backups important
Line 135: Line 160:
 
|}
 
|}
  
Many service-orient TEI publishers have (at least) a pair of servers, with a development server that is used for testing new developments and can also be pressed into service to replace the primary (or live) server should it develop a fault. This gives these publishers fault-tolerance. Such systems are significantly easier to construct with the flexibility virtual servers.
+
Many service-oriented TEI publishers have (at least) a pair of servers with a development server that is used for testing new developments and can also be pressed into service to replace the primary (or live) server should it develop a fault. This gives these publishers fault-tolerance. Such systems are significantly easier to construct with the flexibility virtual servers.
  
 
[[Category:?]]
 
[[Category:?]]

Latest revision as of 21:52, 24 April 2010

There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the effort.

Domains

The root URL of your published TEI needs to be consistent, but to enable Google searching and to allow uses to re-find your content. The URL can also be an important part of the branding of your effort.

File names

Many systems have difficulty with file names containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple file names which differ only by case (DOS-compatible file systems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that they're permitted on your choose system.

Single file vs Multiple files

Some groups may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as XPointer). Advantages of the single file approach include:

  • Conceptually easy
  • Easy editing
  • Header information doesn't need to be duplicated across multiple files
  • Constraints (schema / Schematron validity) easier to check

Advantages of the multiple file approach include:

  • Files sizes kept within the range of conventional editors
  • Finer-grained operations (updating, indexing, version control, etc.) possible,
  • Better for for multi-person teams as different people can work on different documents
  • Allows explicit division of TEI into logical units for processing and distribution.

Ensuring Google can index your content

If the findability of your content on popular search engines is important to you, there are a couple of points to bear in mind.

  • Use short, sane URLs, unique before the options and search terms (for a counter-example see http://www.nzdl.org/niupepa whose content is not findable by Google). This means that if you're using a database to present your TEI, the core texts should be reachable by browsing simple URLs rather than entering search terms, and should be findable at urls that are unique prior to any '?' in the URL.
  • Use quality meta-data in the generated HTML
  • If you have a page about "Nouns in Klingon" then the string "Nouns in Klingon" should occur in the HTML <title> tag, a dc.title <meta> tag and in a <h1>, <h2> or <h3> tag on the page. Preferably it should be mentioned in the text too. It should appear in language you're wanting people to be able to search in (English, Klingon, etc.)
  • You should try not to move the content around. This will confuse both google and normal people who've bookmarked it. See also http://www.w3.org/TR/cooluris/

Here are two separate tables, one describing the form of the content presented on the web, the other describing how that might be hosted:

Presentation of content

Table of ways to serve TEI on the web
Method Pros Cons Skills needed Intra-site searching possible Dynamic KWIC etc., possible Examples
Publish as single plain TEI/XML file Simplest possible solution Meaningless to a standard web browser / user None No No
Publish as single plain TEI/XHTML file with a CSS file for rendering Just need to import a CSS file into XHTML and write a rule for text to be displayed Can not reorder tags/Meaningless to many search engines TEI, CSS No No
Single TEI file + XSL Processing Instruction (PI) Very simple two-file solution Some obscure XSLT features not supported by all browsers, can be slow for large files. TEI, XSLT No No
Single web page from (single TEI file + single XSL file); either pre-generated or dynamically A web page like any other. Because TEI is hidden, third party re-purposing or validation is impossible TEI, XSLT No No
Single web page from (single TEI file + single XSL file); either pre-generated or dynamically A web page like any other, but a link exposing the underlying TEI file TEI, XSLT No No
Multiple web page from (multiple TEI files + xinclude + XSL file(s) ) A web page like any other, but a link exposing the underlying TEI files Multiple files enable large, complex structures and corpora TEI, XSLT, XPointer No No Documents of Ireland
Multiple TEI files in eXist (native XML) database + XSL for conversion of fragments to HTML Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform Requires own server (real or virtual) TEI, XSLT Yes Yes
Multiple TEI files in eXist (native XML) database + typeswitch XQuery for conversion of fragments to HTML Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform Requires own server (real or virtual) or hosted eXist solution TEI, XQuery Yes Yes U.S. State Department
Multiple TEI files in post-relational (postgres, mysql, etc) database + XSL conversion of fragments to HTML Makes inter-linking with other information sources easy Requires own server (real or virtual) TEI, XSLT, SQL Yes No (in theory yes, but hard) NZETC
Put TEI files as records in RDBMS that has an XML column datatype Can use XQuery for HTML rendering XQuery must be embedded inside SQL. TEI, XQuery Yes No (in theory yes, but hard)

Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc.), a little JavaScript skill goes a long way.

Hosting of Content

How to host your TEI website
Method Pros Cons Examples
Local server Full control. Can dual-use a server (or desktop) Need hardware. Need reliable internet connection. Need technical skills. Backups important NZETC
Shared server Pooled skilled and resources Conflicting requirements. Need reliable internet connection. Backups important
Virtual server Full control. Security handled by third party. Need technical skills. Costs $. Backups important
Hosting on Google sites Free, unlimited HTML pages (size and number) No dynamic content (no eXist or other databases possible). Backups important

Many service-oriented TEI publishers have (at least) a pair of servers with a development server that is used for testing new developments and can also be pressed into service to replace the primary (or live) server should it develop a fault. This gives these publishers fault-tolerance. Such systems are significantly easier to construct with the flexibility virtual servers.