Ad-hoc committee on encoding of bibliographic citations
At its meeting on 2010-02-07, the TEI Council discussed bug 2714682 and reached consensus that biblScope should not be allowed as a child of imprint since the prose definition of the imprint element does not include such content as might be included in one or more biblScope elements. Laurent Romary, Martin Holmes, and Kevin Hawkins were charged with writing a proposal to make it clear how to use biblScope for various types of bibliographic citations. The proposal should include corrected, annotated (with XML comments) examples for the Guidelines of encoding various types of citations in biblStruct (and maybe also bibl and biblFull). It was suggested that the ad-hoc committee also look at how citations are handled in other encoding schemes.
The TEI Guidelines offer three elements for bibliographic descriptions:
- bibl -- no enforced content structure
- biblStruct -- structured citation follows a TEI content model
- biblFull -- structured citation with a content model similar to fileDesc
These elements may be used for a number of different purposes in a TEI document. They may occur within the <teiHeader> to describe the source of a TEI document (as part of the metadata) or within the <text> to describe citations appearing as content. As with other elements used in the body, any of these three elements could be used either:
- to represent citations in a source document (such as a pre-existing print document)
- to encode citations as part of a publishing process (such as a manuscript of a book to be published), whether destined for print publication, online publication, or both.
Some people distinguish these as two different purposes for markup (which Wendell Piez called "retrospective markup" versus "encoding done for the sake of fitting data to a particular application"). In discussing the elements for bibliographic purposes, one's opinion might be affected by whether one assumes the markup will be used for retrospective digitization or for a publishing workflow.
Encoding citations as represented in a source document (such as a pre-existing print document)
When encoding a citation as it appears in a source document, most users prefer to leave the transcribed text in the order it is read on the page when tagging; therefore, any elements used for bibliographic descriptions need to be flexible in their internal structure. bibl will clearly work for this purpose, and biblFull will not (unless the source document contains only ISBD-compatible citations without certain errors). biblStruct has a complicated content model whose allowed order of elements seems to map well onto common citation formats (at least those used in European languages). However, when including page numbers, it's not clear when to include a biblScope as a child or sibling of imprint.
Examples in the Guidelines sometimes include delimiting punctuation in the elements, which is sensible when encoding citations as represented in a source document. Users often wish to represent the original document as faithfully as possible, including inconsistent punctuation, so they don't want to insert any punctuation automatically. Leaving in punctuation is also simpler when converting from print or electronic source files.
Encoding citations as part of a publishing process (such as a manuscript of a book to be published)
When encoding a citation as part of a publishing process, the usual desire is to enforce uniformity of citation style through use of markup. If a particular citation format (MLA, APA, GOST 7.1, etc.) is to be used, it is likely that the possible combination of citation components in that format does not match perfectly with the content model of any of three elements for bibliographic descriptions. biblFull likely requires too much detail for most purposes, bibl would allow for more flexibility of encoding structure than is required for the citation format being used, and biblStruct is already too constrained for certain use cases (like a work without an an author, editor, or other responsibility, or like a monograph part of a larger monograph series). Constraining the content model of bibl would break TEI Conformance (?), so a user might instead maintain the TEI content model for the element but use a Schematron schema or other outside tool to check that the citation is encoded as desired.
It is advised not to include delimiting punctuation in citations encoded as part of a publishing process but rather to insert these through a stylesheet, thereby guaranteeing not only consistent use of punctuation but also making it easier to adapt to a different citation style in the future.
This distinction might not be entirely appropriate for our purposes. Here are some other distinctions that may be illuminating:
- Chapter 9 of the Guidelines, which explain how to use entry versus entryFree
- the two citation styles (URL for a frame) used in the NLM Journal Publishing Tag Set
See other markup languages for citations:
- Citation Style Language (CSL)
- CITO (Citation Ontology): http://imageweb.zoo.ox.ac.uk/pub/2008/publications/Shotton_ISMB_BioOntology_CiTO_final_postprint.pdf and http://imageweb.zoo.ox.ac.uk/pub/2009/citobase/cito-20091124-1.4/cito-content/owldoc/
Examples of citations in use
- Kent Hooper volunteered on TEI-L to help, pointing us to his bibliography at http://barlach-biblio.org/ and offering his TEI encoding behind it.
- Charles Muller offered his pre-P5 bibliography at http://www.acmuller.net/yogacara/bibliography/bibnotes.html for examples.
- bibl's as used in the real world (See also: Samples of TEI texts):
Proposed revisions to P5 presented Dublin
After much discussion, it was decided that Lou would make revisions in SourceForge to parts that he recalled Council agreeing to, leaving Martin, Laurent, and Kevin to present a slimmed down proposal, with corrections noted during discussion in Dublin, for consideration at a future Council meeting.
Activity since the Dublin meeting
Lou made changes to the Guidelines based on the non-controversial parts of the Dublin proposal in time for TEI P5 release 1.7.0:
The committee members agreed to break their proposal into individual SourceForge tickets to simplify discussion:
- 2987832: date within biblStruct -- Proposal was withdrawn and replaced with proposal to add <date> as a child of <analytic>, which was implemented in February 2011.
- 2987241: <monogr> and <analytic> should be allowed in <bibl> -- In the course of the comments, the proposal was revised to requesting that <bibl> be allowed within <bibl>. Was implemented in April 2011.
- 2976715: resp -> model.respLike in biblStruct. and other things. -- There were difficulties in implementing this, so it was decided not to implement it.
- 2714682: biblScope should be in Imprint or not? (biblStruct) -- Parts of this were resolved, and the remainder became 3497079: clarify and rationalize encoding of pagination in bibliograp, which in turn became two tickets:
- 3555190: Improve guidance and restrict usage of biblScope -- implemented
- 3555191: New element <citedRange> for bibliography -- implemented
Some more revisions based on the Dublin proposal were also made in the course of implementing 3585939: <title level="s"> in a <monogr>: see http://tei.svn.sourceforge.net/viewvc/tei/trunk/P5/Source/Guidelines/en/CO-CoreElements.xml?r1=11278&r2=11277&pathrev=11278 .
The committee members returned to their proposal made in Dublin to see if any revisions were still needed to the relevant chapters of the Guidelines. Tickets have been created for the remaining work:
- 3602256: title of section 3.11.1 (#COBITY) -- implemented
- 3602308: transcription in national cataloguing codes wrt fileDesc -- implemented
- 3602410: clarification of biblStruct vs bibl -- rejected
- 3602416: expansion of markup in a bibl example -- implemented
A related topic arose on TEI-L: 3602428: biblScope@unit & citedRange@unit: consistency & sugg. values -- implemented
Proposed future revisions to the Guidelines (which will break backwards compatibility)
Examine content model for biblStruct and citations standards like Z39.29 derived from ISBD. Consider introducing revisions that would break backwards compatibility, such as the following:
a. Clarify §184.108.40.206 and the definitions of the elements monogr and series to say that a citation for an article in a journal should be encoded using analytic and series, not analytic and monogr. Change content model of biblStruct so that either monogr or series is required (not only monogr, as is the case currently).
b. Have analytic, monogr, series contain only information about the titles of these works, with other metadata put in elements that are children of biblStruct. For example, a citation to a portion of a journal article that would be formatted in print according to Z39.29-2005 as:
- Gardos, George; Cole, Jonathan O.; Haskell, David; Marby, David; Paine, Susan Schniebolk; Moore, Patricia. The natural history of tardive dyskinesia. J Clin Psychopharmacol. 1988 Aug;8(4 Suppl):31S-37S. Table 3, Occurrence in the United States; p. 32S.
might be encoded like this:
<biblStruct> <author>Gardos, George</author> <author>Cole, Jonathan O.</author> <author>Haskell, David</author> <author>Marby, David</author> <author>Paine, Susan Schniebolk</author> <author>Moore, Patricia</author> <analytic> <title>The natural history of tardive dyskinesia</title> </analytic> <series> <title>J Clin Psychopharmacol</title> </series> <imprint> <date>1988 Aug</date> <biblScope type="vol">8</biblScope> <biblScope type="issue">4</biblScope> <biblScope type="issue_subdivision">Suppl</biblScope> <biblScope type="pp">31S-37S</biblScope> <biblScope type="subdivision">Table 3</biblScope> <biblScope type="subdivision_title">Occurrence in the United States</biblScope> <biblScope type="subdivision_pagination">32S</biblScope> </imprint> </biblStruct>
c. Get rid of biblStruct and possibly also biblFull altogether since they are bound to cause problems for complex citations. Perhaps even prescribe use of BibTeXML   or another XML format (or even BibJSON) for use with citations instead of TEI's elements. Such elements would used the other scheme's namespace within a TEI document, much like MathML or SVG can be used in P5, and TEI would leave the other markup language to deal with bibliographies and focus on the things it does better.