Ad-hoc committee on encoding of bibliographic citations

= Charge =

At its meeting on 2010-02-07, the TEI Council discussed bug 2714682 and reached consensus that biblScope should not be allowed as a child of imprint since the prose definition of the imprint element does not include such content as might be included in one or more biblScope elements. Laurent Romary, Martin Holmes, and Kevin Hawkins were charged with writing a proposal to make it clear how to use biblScope for various types of bibliographic citations. The proposal should include corrected, annotated (with XML comments) examples for the Guidelines of encoding various types of citations in biblStruct (and maybe also bibl and biblFull). It was suggested that the ad-hoc committee also look at how citations are handled in other encoding schemes.

= Background =

The TEI Guidelines offer three elements for bibliographic descriptions:


 * bibl -- no enforced content structure
 * biblStruct -- structured citation follows a TEI content model
 * biblFull -- structured citation with a content model similar to fileDesc

These elements may be used for a number of different purposes in a TEI document. They may occur within the  to describe the source of a TEI document (as part of the metadata) or within the to describe citations appearing as content. As with other elements used in the body, any of these three elements could be used either:


 * to represent citations in a source document (such as a pre-existing print document)
 * to encode citations as part of a publishing process (such as a manuscript of a book to be published), whether destined for print publication, online publication, or both.

Some people distinguish these as two different purposes for markup (which Wendell Piez called "retrospective markup" versus "encoding done for the sake of fitting data to a particular application"). In discussing the elements for bibliographic purposes, one's opinion might be affected by whether one assumes the markup will be used for retrospective digitization or for a publishing workflow.

Encoding citations as represented in a source document (such as a pre-existing print document)
When encoding a citation as it appears in a source document, most users prefer to leave the transcribed text in the order it is read on the page when tagging; therefore, any elements used for bibliographic descriptions need to be flexible in their internal structure. bibl will clearly work for this purpose, and biblFull will not (unless the source document contains only ISBD-compatible citations without certain errors). biblStruct has a complicated content model whose allowed order of elements seems to map well onto common citation formats (at least those used in European languages). However, when including page numbers, it's not clear when to include a biblScope as a child or sibling of imprint.

Examples in the Guidelines sometimes include delimiting punctuation in the elements, which is sensible when encoding citations as represented in a source document. Users often wish to represent the original document as faithfully as possible, including inconsistent punctuation, so they don't want to insert any punctuation automatically. Leaving in punctuation is also simpler when converting from print or electronic source files.

Encoding citations as part of a publishing process (such as a manuscript of a book to be published)
When encoding a citation as part of a publishing process, the usual desire is to enforce uniformity of citation style through use of markup. If a particular citation format (MLA, APA, GOST 7.1, etc.) is to be used, it is likely that the possible combination of citation components in that format does not match perfectly with the content model of any of three elements for bibliographic descriptions. biblFull likely requires too much detail for most purposes, bibl would allow for more flexibility of encoding structure than is required for the citation format being used, and biblStruct is already too constrained for certain use cases (like a work without an an author, editor, or other responsibility, or like a monograph part of a larger monograph series). Constraining the content model of bibl would break TEI Conformance (?), so a user might instead maintain the TEI content model for the element but use a Schematron schema or other outside tool to check that the citation is encoded as desired.

It is advised not to include delimiting punctuation in citations encoded as part of a publishing process but rather to insert these through a stylesheet, thereby guaranteeing not only consistent use of punctuation but also making it easier to adapt to a different citation style in the future.

Proposed revisions to P5 to present in Dublin
Before the Council meeting in Dublin in April 2010, Martin, Laurent, and Kevin will copy and paste §3.11 and §2.7 from P5 into a Microsoft Word document and track changes on revisions. Expected revisions include:

a. Clearer guidance on choosing among bibl, biblStruct, and biblFull. In short, we'll say that biblStruct is easier to process with a machine but will not be able to account for every type of citation that you might want to encode (and will be especially difficult to render precisely using stylesheets without resorting to @rend on elements), whereas bibl gives you maximum flexibility in structure but is not especially machine-processable. Kevin will address biblFull's relationship to ISBD.

b. Addressing use of delimiting (separating) punctuation within bibl, biblStruct, and biblFull -- advantages and disadvantages to leaving it in the data. In short, you should leave it in if using bibl but not if using biblStruct. We should specifically address the issue of trailing punctuation in titles; periods are not conventionally viewed as part of titles, but other punctuation may be (Language Change: Progress or decay?). Processors typically have to check for trailing punctuation anyway, and supply a period as part of the rendering process if there is no punctuation. It would help if we could provide consistent recommendations for biblStruct.

c. Discussing in more detail how these elements relate to library catalogue data. (§2.7 is lacking in detail.)

d. Including better examples in §3.11 that address common needs, like citing an article within a journal issue.

e. Proposing adding ref as a child of analytic, monogr, and series. There is currently no straightforward way of specifying the URI of an online article, journal issue, journal or series publication; many workarounds are in use, such as using idno, biblScope or note, but these are ad-hoc. ref is the natural way to specify a URI, and ought to be available at every level of biblStruct, as it is in bibl. This does not break backward compatibility.

Council will decide in Dublin whether to accept these revisions. If so, someone will edit the P5 source.

Proposed future revisions to the Guidelines (which will break backwards compatibility)
Examine content model for biblStruct and citations standards like Z39.29 derived from ISBD. Consider introducing revisions that would break backwards compatibility, such as the following:

a. Clarify §3.11.2.1 and the definitions of the elements monogr and series to say that a citation for an article in a journal should be encoded using analytic and series, not analytic and monogr. Change content model of biblStruct so that either monogr or series is required (not only monogr, as is the case currently).

b. Have analytic, monogr, series contain only information about the titles of these works with other metadata in elements that are children of biblStruct. For example, a citation to a portion of a journal article that would be formatted in print according to Z39.29-2005 as:


 * Gardos, George; Cole, Jonathan O.; Haskell, David; Marby, David; Paine, Susan Schniebolk; Moore, Patricia. The natural history of tardive dyskinesia. J Clin Psychopharmacol. 1988 Aug;8(4 Suppl):31S-37S. Table 3, Occurrence in the United States; p. 32S.

might be encoded like this:

 Gardos, George Cole, Jonathan O.  Haskell, David Marby, David Paine, Susan Schniebolk Moore, Patricia The natural history of tardive dyskinesia J Clin Psychopharmacol 1988 Aug 8 4 Suppl 31S-37S Table 3 Occurrence in the United States 32S </biblStruct>

c. Prescribe use of BibTeXML or another XML format for citations for use with citations. Such elements would used the other scheme's namespace within a TEI document, much like MathML or SVG can be used in P5.