Difference between revisions of "Talk:TEI Web Publishing"

Revision as of 08:58, 9 March 2010

How easy should it be?

If we're really serious about making publishing easy, surely the easiest is to insert the following at the top of the TEI?

<?xml-stylesheet href="path/to/stylehseet.xsl"  type="text/xsl"?>

(said Stuart at 04:06, 6 March 2010)

But that would work only if we want to publish a single TEI document rather than a collection of them. (xi:include will not always help, sometimes the relationship between documents should be expressed as reference rather than inclusion).

So maybe let's turn your question into a possible basic condition: how to publish a set of TEI documents without having to modify them manually with e.g. stylesheet PIs? Piotr 05:05, 6 March 2010 (EST)

The first sentence of the page talks about 'a single TEI document' Stuartyeates 15:00, 6 March 2010 (EST)

Maybe it shouldn't be steps but rather issues. Still, the 2nd sentence talks about a set of documents, so we can make the assumption that the 1st one talks about a single-membered set, as a particular case. (Or we can make a different assumption :-)) Piotr 15:59, 6 March 2010 (EST)

I think the issue here is that we're conflating documents and files, both of which are further conflated in the minds of many users with works. Stuartyeates 21:41, 6 March 2010 (EST)

That would make a fine introduction. Indeed, xinclude has nothing to do with documents (or has as much to do with them as inter-paragraph spacing, perhaps). Let me rephrase (still as an exercise, to see if we can get anything interesting out of it):

Given a document encoded in the TEI, what are the available publishing strategies?

One is to add the PI (to the (main) file containing the document) and count on a client-side engine in a browser (if this is what you meant, Stuart). The XSL list suggests it may be a tough task, especially where cross-platform, cross-browser portability is intended. (expertise involved: XSLT, browser XSLT engine quirks)
Another is to prepare a transformation scenario, possibly including a pipe, and run a processor on it. (expertise involved: XSLT + Saxon/Xalan/xsltproc -- these are at least well-documented)
The next scenario would involve a native TEI db (which? let's say eXist for starters; expertise: +XQuery; and aren't we looking here at a slightly different range of applications?)
Next is Thomas Crombez' Google Docs suggestion, which may involve an easy step (->HTML) using default stylesheets.
Ready-made TEI publishing solutions (other than Sebastian's stylesheets) -- if there were any such full-fledged systems, would we bother doing this?
Basic command of (X)HTML/CSS is a given. Somewhere on the way, one should possibly mention Cocoon, XProc... am I making any sense? If I am, it begins to look like a book... :-/ But something could be carved out of this as one of the possible examples to follow. There are lots of sources on XSLT, and Thomas has a tutorial for people who want the Google way; as Martin noted, there is little documentation on the database scenario, and this is where this might go.

At this point, my interests are in the use of a native XML database for the purpose of manipulating dictionaries and corpora, so I would be glad to help develop documentation on that on the understanding that there are going to be areas where I will be learning and experimenting, so not really contributing from an expert's position. Piotr 17:30, 7 March 2010 (EST)

Stuart, that's some beginning! :-) Minor points: "XSLT 1.0" can just be XSLT (1.0 is xsltproc's demand; I'm not sure about Xalan anymore), hmm, I'll just go ahead and edit it in two passes, so that the "1.0" step is easy to revert if you had something specific in mind. Piotr 07:58, 8 March 2010 (EST)

Methods to publishing TEI content to the web

There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the project.

Domains

The root url of your published TEI needs to be consistent, but to enable google searching and to allow uses to re-find your content. The URL can also be an important part of the branding of your project.

Some projects choose to be in subdomains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc) these subdomains are typically free and brand the project with the institutions identity.
Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost a small amount of money to register and maintain.
Some hosted services (i.e. google sites) have their own fixed domain that users get little or no say in.

Filenames

Many systems have difficulty with filenames containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple filesnames which differ only by case (DOS-compatible filesystems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that thery're permitted on your choose system.

Single file vs Multiple files

Projects may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as xpointer). Advantages of the single file approach include:

Conceptually easy
Easy editing
Header information doesn't need to be duplicated across multiple files
Constraints (schema / schematron validity) easier to check

Advantages of the multiple file approach include:

Files sizes kept within the range of conventional editors
Finer-grained operations (updating, indexing, version control, etc) possible,
Better for for multi-person teams as different people can work on different documents
Allows explicit division of TEI into logical units for processing and distribution.

Here are two separate tables, one describes the form of the content presented on the web, the other describes how that might be hosted

Table of ways to serve TEI on the web
Method	Pros	Cons	Skills needed	Searching possible	Dynamic KWIC etc, possible	Examples
Publish as single plain TEI/XML file	Simplest possible solution	Meaningless to a standard web browser / user	None	No	No
Single TEI file + XSL Processing Instruction (PI)	Very simple two-file solution	Some obscure XSLT features not supported by all browsers	XML, TEI, XSLT	No	No
Single web page from (single TEI file + single XSL file)	A web page like any other.	Because TEI is hidden, third party repurposing or validation is impossible	XML, TEI, XSLT	No	No
Single web page from (single TEI file + single XSL file) + TEI file	A web page like any other, but a link to web		XML, TEI, XSLT	No	No
Multiple web page from (multiple TEI files + xinclude + XSL file(s) ) + TEI files	A web page like any other, but a link to web	Multiple files enable large, complex structures and corpora	XML, TEI, XSLT, XPointer	No	No
Multiple TEI files in eXist (native XML) database + XSL and/or XQuery for conversion of fragments to HTML	Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform	Requires own server (real or virtual)	XML, TEI, XSLT, XQuery	Yes	Yes	U.S. State Department
Multiple TEI files in post-relational (SQL) database + XSL conversion of fragments to HTML	Makes inter-linking with other information sources easy	Requires own server (real or virtual)	XML, TEI, XSLT, SQL	Yes	No (in theory yes, but hard)	NZETC

Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc), a little javascript skill goes a long way.

How to host your TEI website
Method	Pros	Cons	Examples
Local server	Full control. Can dual-use a server (or desktop)	Need hardware. Need reliable internet connection. Need technical skills. Backups important	NZETC
Shared server	Pooled skilled and resources	Conflicting requirements. Need reliable internet connection. Backups important	NZETC
Virtual server	Full control. Security handled by third party.	Need technical skills. Costs $. Backups important
Hosting on google sites	Free, unlimited HTML pages (size and number)	No dynamic content (no eXist or other databases possible). Backups important

@@ Line 27: / Line 27: @@
 Stuart, that's ''some'' beginning! :-) Minor points: "XSLT 1.0" can just be XSLT (1.0 is xsltproc's demand; I'm not sure about Xalan anymore), hmm, I'll just go ahead and edit it in two passes, so that the "1.0" step is easy to revert if you had something specific in mind. [[User:Piotr Banski|Piotr]] 07:58, 8 March 2010 (EST)
-== Ranked list of methods to publishing TEI content to the web ==
+== Methods to publishing TEI content to the web ==
+There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the project.
+===Domains===
+The root url of your published TEI needs to be consistent, but to enable google searching and to allow uses to re-find your content.  The URL can also be an important part of the branding of your project.
+* Some projects choose to be in subdomains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc) these subdomains are typically free and brand the project with the institutions identity.
+* Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost a small amount of money to register and maintain.
+* Some hosted services (i.e. google sites) have their own fixed domain that users get little or no say in.
+===Filenames===
+Many systems have difficulty with filenames containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple filesnames which differ only by case (DOS-compatible filesystems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that thery're permitted on your choose system.
+=== Single file vs Multiple files===
+Projects may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as xpointer). Advantages of the single file approach include:
+* Conceptually easy
+* Easy editing
+* Header information doesn't need to be duplicated across multiple files
+* Constraints (schema / schematron validity) easier to check
+Advantages of the multiple file approach include:
+* Files sizes kept within the range of conventional editors
+* Finer-grained operations (updating, indexing, version control, etc) possible,
+* Better for for multi-person teams as different people can work on different documents
+* Allows explicit division of TEI into logical units for processing and distribution.
 Here are two separate tables, one describes the form of the content presented on the web, the other describes how that might be hosted
@@ Line 129: / Line 154: @@
 |
 |}
-===Domains===
-The root url of your published TEI needs to be consistent, but to enable google searching and to allow uses to re-find your content.  The URL can also be an important part of the branding of your project.
-* Some projects choose to be in subdomains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc) these subdomains are typically free and brand the project with the institutions identity.
-* Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost money to maintain.
-* Some hosted services (i.e. google sites) have their own fixed domain that users get little or no say in.
-===Filenames===
-Many systems have difficulty with filenames containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple filesnames which differ only by case (DOS-compatible filesystems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that thery're permitted on your choose system.
-=== Single file vs Multiple files===
-Projects may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as xpointer). Advantages of the single file approach include:
-* Conceptually easy
-* Easy editing
-* Header information doesn't need to be duplicated across multiple files
-* Constraints (schema / schematron validity) easier to check
-Advantages of the multiple file approach include:
-* Files sizes kept within the range of conventional editors
-* Finer-grained operations (updating, indexing, version control, etc) possible,
-* Better for for multi-person teams as different people can work on different documents
-* Allows explicit division of TEI into logical units for processing and distribution.

Difference between revisions of "Talk:TEI Web Publishing"

Revision as of 08:58, 9 March 2010

Contents

Methods to publishing TEI content to the web

Domains

Filenames

Single file vs Multiple files

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools