Latest revision as of 20:52, 24 April 2010

There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the effort.

Some efforts choose to be in sub-domains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc.) these sub-domains are typically free and brand the published TEI with the institutions identity.
Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost a small amount of money to register and maintain.
Some hosted services (e.g., Google sites) have their own fixed domain that users get little or no say in.

File names

Many systems have difficulty with file names containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple file names which differ only by case (DOS-compatible file systems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that they're permitted on your choose system.

Single file vs Multiple files

Some groups may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as XPointer). Advantages of the single file approach include:

Conceptually easy
Easy editing
Header information doesn't need to be duplicated across multiple files
Constraints (schema / Schematron validity) easier to check

Advantages of the multiple file approach include:

Files sizes kept within the range of conventional editors
Finer-grained operations (updating, indexing, version control, etc.) possible,
Better for for multi-person teams as different people can work on different documents
Allows explicit division of TEI into logical units for processing and distribution.

Ensuring Google can index your content

If the findability of your content on popular search engines is important to you, there are a couple of points to bear in mind.

Use short, sane URLs, unique before the options and search terms (for a counter-example see http://www.nzdl.org/niupepa whose content is not findable by Google). This means that if you're using a database to present your TEI, the core texts should be reachable by browsing simple URLs rather than entering search terms, and should be findable at urls that are unique prior to any '?' in the URL.
Use quality meta-data in the generated HTML
If you have a page about "Nouns in Klingon" then the string "Nouns in Klingon" should occur in the HTML <title> tag, a dc.title <meta> tag and in a <h1>, <h2> or <h3> tag on the page. Preferably it should be mentioned in the text too. It should appear in language you're wanting people to be able to search in (English, Klingon, etc.)
You should try not to move the content around. This will confuse both google and normal people who've bookmarked it. See also http://www.w3.org/TR/cooluris/

Here are two separate tables, one describing the form of the content presented on the web, the other describing how that might be hosted:

Presentation of content

Table of ways to serve TEI on the web
Method	Pros	Cons	Skills needed	Intra-site searching possible	Dynamic KWIC etc., possible	Examples
Publish as single plain TEI/XML file	Simplest possible solution	Meaningless to a standard web browser / user	None	No	No
Publish as single plain TEI/XHTML file with a CSS file for rendering	Just need to import a CSS file into XHTML and write a rule for text to be displayed	Can not reorder tags/Meaningless to many search engines	TEI, CSS	No	No
Single TEI file + XSL Processing Instruction (PI)	Very simple two-file solution	Some obscure XSLT features not supported by all browsers, can be slow for large files.	TEI, XSLT	No	No
Single web page from (single TEI file + single XSL file); either pre-generated or dynamically	A web page like any other.	Because TEI is hidden, third party re-purposing or validation is impossible	TEI, XSLT	No	No
Single web page from (single TEI file + single XSL file); either pre-generated or dynamically	A web page like any other, but a link exposing the underlying TEI file		TEI, XSLT	No	No
Multiple web page from (multiple TEI files + xinclude + XSL file(s) )	A web page like any other, but a link exposing the underlying TEI files	Multiple files enable large, complex structures and corpora	TEI, XSLT, XPointer	No	No	Documents of Ireland
Multiple TEI files in eXist (native XML) database + XSL for conversion of fragments to HTML	Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform	Requires own server (real or virtual)	TEI, XSLT	Yes	Yes
Multiple TEI files in eXist (native XML) database + typeswitch XQuery for conversion of fragments to HTML	Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform	Requires own server (real or virtual) or hosted eXist solution	TEI, XQuery	Yes	Yes	U.S. State Department
Multiple TEI files in post-relational (postgres, mysql, etc) database + XSL conversion of fragments to HTML	Makes inter-linking with other information sources easy	Requires own server (real or virtual)	TEI, XSLT, SQL	Yes	No (in theory yes, but hard)	NZETC
Put TEI files as records in RDBMS that has an XML column datatype	Can use XQuery for HTML rendering	XQuery must be embedded inside SQL.	TEI, XQuery	Yes	No (in theory yes, but hard)

Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc.), a little JavaScript skill goes a long way.

Hosting of Content

How to host your TEI website
Method	Pros	Cons	Examples
Local server	Full control. Can dual-use a server (or desktop)	Need hardware. Need reliable internet connection. Need technical skills. Backups important	NZETC
Shared server	Pooled skilled and resources	Conflicting requirements. Need reliable internet connection. Backups important
Virtual server	Full control. Security handled by third party.	Need technical skills. Costs $. Backups important
Hosting on Google sites	Free, unlimited HTML pages (size and number)	No dynamic content (no eXist or other databases possible). Backups important

Many service-oriented TEI publishers have (at least) a pair of servers with a development server that is used for testing new developments and can also be pressed into service to replace the primary (or live) server should it develop a fault. This gives these publishers fault-tolerance. Such systems are significantly easier to construct with the flexibility virtual servers.

@@ Line 1: / Line 1: @@
-== Methods to publishing TEI content to the web ==
 There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the effort.
 ===Domains===
-The root url of your published TEI needs to be consistent, but to enable google searching and to allow uses to re-find your content.  The URL can also be an important part of the branding of your effort.
+The root URL of your published TEI needs to be consistent, but to enable Google searching and to allow uses to re-find your content.  The URL can also be an important part of the branding of your effort.
-* Some efforts choose to be in subdomains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc) these subdomains are typically free and brand the published TEI with the institutions identity.
+* Some efforts choose to be in sub-domains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc.) these sub-domains are typically free and brand the published TEI with the institutions identity.
 * Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost a small amount of money to register and maintain.
-* Some hosted services (i.e. google sites) have their own fixed domain that users get little or no say in.
+* Some hosted services (e.g., Google sites) have their own fixed domain that users get little or no say in.
-===Filenames===
+===File names===
-Many systems have difficulty with filenames containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple filesnames which differ only by case (DOS-compatible filesystems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that thery're permitted on your choose system.
+Many systems have difficulty with file names containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple file names which differ only by case (DOS-compatible file systems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that they're permitted on your choose system.
 === Single file vs Multiple files===
-Some groups may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as xpointer). Advantages of the single file approach include:
+Some groups may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as XPointer). Advantages of the single file approach include:
 * Conceptually easy
 * Easy editing
 * Header information doesn't need to be duplicated across multiple files
-* Constraints (schema / schematron validity) easier to check
+* Constraints (schema / Schematron validity) easier to check
 Advantages of the multiple file approach include:
 * Files sizes kept within the range of conventional editors
-* Finer-grained operations (updating, indexing, version control, etc) possible,
+* Finer-grained operations (updating, indexing, version control, etc.) possible,
 * Better for for multi-person teams as different people can work on different documents
 * Allows explicit division of TEI into logical units for processing and distribution.
@@ Line 28: / Line 26: @@
 === Ensuring Google can index your content ===
 If the findability of your content on popular search engines is important to you, there are a couple of points to bear in mind.
-* Use short, sane URLs, unique before the options and search terms (for a counter-example see http://www.nzdl.org/niupepa which is not findable by google). This means that if you're using a database to present your TEI, the core texts should be reachable by browsing simple URLs rather than entering search terms.
+* Use short, sane URLs, unique before the options and search terms (for a counter-example see http://www.nzdl.org/niupepa whose content is not findable by Google). This means that if you're using a database to present your TEI, the core texts should be reachable by browsing simple URLs rather than entering search terms, and should be findable at urls that are unique prior to any '?' in the URL.
-* Use quality metadata in the generated the HTML
+* Use quality meta-data in the generated HTML
-* If you have a page about "Nouns in Klingon" then the string "Nouns in Klingon" should occur in the HTML <title> tag, a dc.title <meta> tag and in a <nowiki><h1></nowiki>, <nowiki><h2></nowiki> or <nowiki><h3></nowiki> tag on the page. Preferably it should be mentioned in the text too.  It should appear in language you're wanting people to be able to search in (English, Klingon, etc)
+* If you have a page about "Nouns in Klingon" then the string "Nouns in Klingon" should occur in the HTML <title> tag, a dc.title <meta> tag and in a <nowiki><h1></nowiki>, <nowiki><h2></nowiki> or <nowiki><h3></nowiki> tag on the page. Preferably it should be mentioned in the text too.  It should appear in language you're wanting people to be able to search in (English, Klingon, etc.)
-* You should try not to move the content around. This will confuse both google and normal people who've bookmarked it.
+* You should try not to move the content around. This will confuse both google and normal people who've bookmarked it. See also [http://www.w3.org/TR/cooluris/ http://www.w3.org/TR/cooluris/]
+Here are two separate tables, one describing the form of the content presented on the web, the other describing how that might be hosted:
-Here are two separate tables, one describes the form of the content presented on the web, the other describes how that might be hosted
+= Presentation of content =
 {| class="wikitable" border="1"
@@ Line 43: / Line 42: @@
 ! Cons
 ! Skills needed
-! Searching possible
+! Intra-site searching possible
-! Dynamic KWIC etc, possible
+! Dynamic KWIC etc., possible
 ! Examples
 |-
@@ Line 51: / Line 50: @@
 | Meaningless to a standard web browser / user
 | None
+| No
+| No
+|
+|-
+| Publish as single plain TEI/XHTML file with a CSS file for rendering
+| Just need to import a CSS file into XHTML and write a rule for text to be displayed
+| Can not reorder tags/Meaningless to many search engines
+| TEI, CSS
 | No
 | No
@@ Line 57: / Line 64: @@
 | Single TEI file + XSL Processing Instruction (PI)
 | Very simple two-file solution
-| Some obscure XSLT features not supported by all browsers
+| Some obscure XSLT features not supported by all browsers, can be slow for large files.
-| XML, TEI, [[XSLT]]
+|  TEI, [[XSLT]]
 | No
 | No
 |
 |-
-| Single web page from (single TEI file + single XSL file)
+| Single web page from (single TEI file + single XSL file); either pre-generated or dynamically
 | A web page like any other.
-| Because TEI is hidden, third party repurposing or validation is impossible
+| Because TEI is hidden, third party re-purposing or validation is impossible
-| XML, TEI, XSLT
+|  TEI, XSLT
 | No
 | No
 |
 |-
-| Single web page from (single TEI file + single XSL file) + TEI file
+| Single web page from (single TEI file + single XSL file); either pre-generated or dynamically
-| A web page like any other, but a link to web
+| A web page like any other, but a link exposing the underlying TEI file
 |
-| XML, TEI, XSLT
+|  TEI, XSLT
 | No
 | No
 |
 |-
-| Multiple web page from (multiple TEI files + xinclude + XSL file(s) ) + TEI files
+| Multiple web page from (multiple TEI files + xinclude + XSL file(s) )
-| A web page like any other, but a link to web
+| A web page like any other, but a link exposing the underlying TEI files
 | Multiple files enable large, complex structures and corpora
-| XML, TEI, XSLT, [[XPointer]]
+|  TEI, XSLT, [[XPointer]]
 | No
 | No
-|
+| [http://doi.ucc.ie/ Documents of Ireland]
 |-
-| Multiple TEI files in eXist (native XML) database + XSL and/or XQuery for conversion of fragments to HTML
+| Multiple TEI files in eXist (native XML) database + XSL for conversion of fragments to HTML
 | Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform
 | Requires own server (real or virtual)
-| XML, TEI, XSLT, [[XQuery]]
+|  TEI, XSLT
+| Yes
+| Yes
+|
+|-
+| Multiple TEI files in eXist (native XML) database + typeswitch XQuery for conversion of fragments to HTML
+| Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform
+| Requires own server (real or virtual) or hosted eXist solution
+|  TEI, [[XQuery]]
 | Yes
 | Yes
 | [http://history.state.gov/ U.S. State Department]
 |-
-| Multiple TEI files in post-relational (SQL) database + XSL conversion of fragments to HTML
+| Multiple TEI files in post-relational (postgres, mysql, etc) database + XSL conversion of fragments to HTML
 | Makes inter-linking with other information sources easy
 | Requires own server (real or virtual)
-| XML, TEI, XSLT, SQL
+| TEI, XSLT, SQL
 | Yes
 | No (in theory yes, but hard)
 | [http://www.nzetc.org/ NZETC]
+|-
+| Put TEI files as records in RDBMS that has an XML column datatype
+| Can use XQuery for HTML rendering
+| XQuery must be embedded inside SQL.
+|  TEI, XQuery
+| Yes
+| No (in theory yes, but hard)
+|
 |}
-Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc), a little javascript skill goes a long way.
+Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc.), a little JavaScript skill goes a long way.
+= Hosting of Content =
 {| class="wikitable" border="1"
@@ Line 129: / Line 154: @@
 |
 |-
-| Hosting on google sites
+| Hosting on Google sites
 | Free, unlimited HTML pages (size and number)
 | No dynamic content (no [[eXist]] or other databases possible). Backups important
@@ Line 135: / Line 160: @@
 |}
-Many service-orient TEI publishers have (at least) a pair of servers, with a development server that is used for testing new developments and can also be pressed into service to replace the primary (or live) server should it develop a fault. This gives these publishers fault-tolerance. Such systems are significantly easier to construct with the flexibility virtual servers.
+Many service-oriented TEI publishers have (at least) a pair of servers with a development server that is used for testing new developments and can also be pressed into service to replace the primary (or live) server should it develop a fault. This gives these publishers fault-tolerance. Such systems are significantly easier to construct with the flexibility virtual servers.
 [[Category:?]]

Difference between revisions of "Ways to Publish TEI on the Web"

Latest revision as of 20:52, 24 April 2010

Contents

Domains

File names

Single file vs Multiple files

Ensuring Google can index your content

Presentation of content

Hosting of Content

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools