Difference between revisions of "Talk:TEI Web Publishing"
Stuartyeates (talk | contribs) |
Stuartyeates (talk | contribs) |
||
Line 27: | Line 27: | ||
Stuart, that's ''some'' beginning! :-) Minor points: "XSLT 1.0" can just be XSLT (1.0 is xsltproc's demand; I'm not sure about Xalan anymore), hmm, I'll just go ahead and edit it in two passes, so that the "1.0" step is easy to revert if you had something specific in mind. [[User:Piotr Banski|Piotr]] 07:58, 8 March 2010 (EST) | Stuart, that's ''some'' beginning! :-) Minor points: "XSLT 1.0" can just be XSLT (1.0 is xsltproc's demand; I'm not sure about Xalan anymore), hmm, I'll just go ahead and edit it in two passes, so that the "1.0" step is easy to revert if you had something specific in mind. [[User:Piotr Banski|Piotr]] 07:58, 8 March 2010 (EST) | ||
− | == | + | == Methods to publishing TEI content to the web == |
+ | |||
+ | There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the project. | ||
+ | |||
+ | ===Domains=== | ||
+ | |||
+ | The root url of your published TEI needs to be consistent, but to enable google searching and to allow uses to re-find your content. The URL can also be an important part of the branding of your project. | ||
+ | * Some projects choose to be in subdomains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc) these subdomains are typically free and brand the project with the institutions identity. | ||
+ | * Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost a small amount of money to register and maintain. | ||
+ | * Some hosted services (i.e. google sites) have their own fixed domain that users get little or no say in. | ||
+ | |||
+ | ===Filenames=== | ||
+ | Many systems have difficulty with filenames containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple filesnames which differ only by case (DOS-compatible filesystems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that thery're permitted on your choose system. | ||
+ | |||
+ | === Single file vs Multiple files=== | ||
+ | Projects may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as xpointer). Advantages of the single file approach include: | ||
+ | * Conceptually easy | ||
+ | * Easy editing | ||
+ | * Header information doesn't need to be duplicated across multiple files | ||
+ | * Constraints (schema / schematron validity) easier to check | ||
+ | |||
+ | Advantages of the multiple file approach include: | ||
+ | * Files sizes kept within the range of conventional editors | ||
+ | * Finer-grained operations (updating, indexing, version control, etc) possible, | ||
+ | * Better for for multi-person teams as different people can work on different documents | ||
+ | * Allows explicit division of TEI into logical units for processing and distribution. | ||
Here are two separate tables, one describes the form of the content presented on the web, the other describes how that might be hosted | Here are two separate tables, one describes the form of the content presented on the web, the other describes how that might be hosted | ||
Line 129: | Line 154: | ||
| | | | ||
|} | |} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Revision as of 08:58, 9 March 2010
How easy should it be?
If we're really serious about making publishing easy, surely the easiest is to insert the following at the top of the TEI?
<?xml-stylesheet href="path/to/stylehseet.xsl" type="text/xsl"?>
(said Stuart at 04:06, 6 March 2010)
- But that would work only if we want to publish a single TEI document rather than a collection of them. (xi:include will not always help, sometimes the relationship between documents should be expressed as reference rather than inclusion).
- So maybe let's turn your question into a possible basic condition: how to publish a set of TEI documents without having to modify them manually with e.g. stylesheet PIs? Piotr 05:05, 6 March 2010 (EST)
- The first sentence of the page talks about 'a single TEI document' Stuartyeates 15:00, 6 March 2010 (EST)
- Maybe it shouldn't be steps but rather issues. Still, the 2nd sentence talks about a set of documents, so we can make the assumption that the 1st one talks about a single-membered set, as a particular case. (Or we can make a different assumption :-)) Piotr 15:59, 6 March 2010 (EST)
- I think the issue here is that we're conflating documents and files, both of which are further conflated in the minds of many users with works. Stuartyeates 21:41, 6 March 2010 (EST)
- That would make a fine introduction. Indeed, xinclude has nothing to do with documents (or has as much to do with them as inter-paragraph spacing, perhaps). Let me rephrase (still as an exercise, to see if we can get anything interesting out of it):
- Given a document encoded in the TEI, what are the available publishing strategies?
- One is to add the PI (to the (main) file containing the document) and count on a client-side engine in a browser (if this is what you meant, Stuart). The XSL list suggests it may be a tough task, especially where cross-platform, cross-browser portability is intended. (expertise involved: XSLT, browser XSLT engine quirks)
- Another is to prepare a transformation scenario, possibly including a pipe, and run a processor on it. (expertise involved: XSLT + Saxon/Xalan/xsltproc -- these are at least well-documented)
- The next scenario would involve a native TEI db (which? let's say eXist for starters; expertise: +XQuery; and aren't we looking here at a slightly different range of applications?)
- Next is Thomas Crombez' Google Docs suggestion, which may involve an easy step (->HTML) using default stylesheets.
- Ready-made TEI publishing solutions (other than Sebastian's stylesheets) -- if there were any such full-fledged systems, would we bother doing this?
- Basic command of (X)HTML/CSS is a given. Somewhere on the way, one should possibly mention Cocoon, XProc... am I making any sense? If I am, it begins to look like a book... :-/ But something could be carved out of this as one of the possible examples to follow. There are lots of sources on XSLT, and Thomas has a tutorial for people who want the Google way; as Martin noted, there is little documentation on the database scenario, and this is where this might go.
- At this point, my interests are in the use of a native XML database for the purpose of manipulating dictionaries and corpora, so I would be glad to help develop documentation on that on the understanding that there are going to be areas where I will be learning and experimenting, so not really contributing from an expert's position. Piotr 17:30, 7 March 2010 (EST)
- I think the issue here is that we're conflating documents and files, both of which are further conflated in the minds of many users with works. Stuartyeates 21:41, 6 March 2010 (EST)
- Maybe it shouldn't be steps but rather issues. Still, the 2nd sentence talks about a set of documents, so we can make the assumption that the 1st one talks about a single-membered set, as a particular case. (Or we can make a different assumption :-)) Piotr 15:59, 6 March 2010 (EST)
- The first sentence of the page talks about 'a single TEI document' Stuartyeates 15:00, 6 March 2010 (EST)
Stuart, that's some beginning! :-) Minor points: "XSLT 1.0" can just be XSLT (1.0 is xsltproc's demand; I'm not sure about Xalan anymore), hmm, I'll just go ahead and edit it in two passes, so that the "1.0" step is easy to revert if you had something specific in mind. Piotr 07:58, 8 March 2010 (EST)
Contents
Methods to publishing TEI content to the web
There are whole range of options for publishing TEI onto the internet. The choices made by groups largely reflect the available skills and funding and the goals and scope of the project.
Domains
The root url of your published TEI needs to be consistent, but to enable google searching and to allow uses to re-find your content. The URL can also be an important part of the branding of your project.
- Some projects choose to be in subdomains of academic institutions ( http://epidoc.cch.kcl.ac.uk/inscriptions/index.html , http://www.ota.ox.ac.uk/ , etc) these subdomains are typically free and brand the project with the institutions identity.
- Others choose to have their own domains (http://papyri.info/idp_static/current/ http://www.freedict.org/) these are most commonly cross-institutional collaborations, these domains typically cost a small amount of money to register and maintain.
- Some hosted services (i.e. google sites) have their own fixed domain that users get little or no say in.
Filenames
Many systems have difficulty with filenames containing non-ASCII characters or spaces (this is very common with poorly written scripted system on UNIX platforms). Others have difficulties with multiple filesnames which differ only by case (DOS-compatible filesystems are unable to make this distinction). It is recommended that you avoid these unless absolutely necessary or you have already tested that thery're permitted on your choose system.
Single file vs Multiple files
Projects may choose to have all their TEI data in a single file or spread it across multiple files. The multiple files are combined, either in a database or in XML (using a technique such as xpointer). Advantages of the single file approach include:
- Conceptually easy
- Easy editing
- Header information doesn't need to be duplicated across multiple files
- Constraints (schema / schematron validity) easier to check
Advantages of the multiple file approach include:
- Files sizes kept within the range of conventional editors
- Finer-grained operations (updating, indexing, version control, etc) possible,
- Better for for multi-person teams as different people can work on different documents
- Allows explicit division of TEI into logical units for processing and distribution.
Here are two separate tables, one describes the form of the content presented on the web, the other describes how that might be hosted
Method | Pros | Cons | Skills needed | Searching possible | Dynamic KWIC etc, possible | Examples |
---|---|---|---|---|---|---|
Publish as single plain TEI/XML file | Simplest possible solution | Meaningless to a standard web browser / user | None | No | No | |
Single TEI file + XSL Processing Instruction (PI) | Very simple two-file solution | Some obscure XSLT features not supported by all browsers | XML, TEI, XSLT | No | No | |
Single web page from (single TEI file + single XSL file) | A web page like any other. | Because TEI is hidden, third party repurposing or validation is impossible | XML, TEI, XSLT | No | No | |
Single web page from (single TEI file + single XSL file) + TEI file | A web page like any other, but a link to web | XML, TEI, XSLT | No | No | ||
Multiple web page from (multiple TEI files + xinclude + XSL file(s) ) + TEI files | A web page like any other, but a link to web | Multiple files enable large, complex structures and corpora | XML, TEI, XSLT, XPointer | No | No | |
Multiple TEI files in eXist (native XML) database + XSL and/or XQuery for conversion of fragments to HTML | Native XML database means no shredding XML into tables, and rapid fulltext query across some/all TEI files; built-in full-featured web server with URL rewriting for cool URLs; cross platform | Requires own server (real or virtual) | XML, TEI, XSLT, XQuery | Yes | Yes | U.S. State Department |
Multiple TEI files in post-relational (SQL) database + XSL conversion of fragments to HTML | Makes inter-linking with other information sources easy | Requires own server (real or virtual) | XML, TEI, XSLT, SQL | Yes | No (in theory yes, but hard) | NZETC |
Remember that for polished websites at least some CSS (Cascading StyleSheets) skills are going to required. For pages with at least some dynamism (drop down menus, expanding sections, etc), a little javascript skill goes a long way.
Method | Pros | Cons | Examples |
---|---|---|---|
Local server | Full control. Can dual-use a server (or desktop) | Need hardware. Need reliable internet connection. Need technical skills. Backups important | NZETC |
Shared server | Pooled skilled and resources | Conflicting requirements. Need reliable internet connection. Backups important | NZETC |
Virtual server | Full control. Security handled by third party. | Need technical skills. Costs $. Backups important | |
Hosting on google sites | Free, unlimited HTML pages (size and number) | No dynamic content (no eXist or other databases possible). Backups important |