Paris 2011-11 minutes
November 7 2011
November 8 2011
Present: Laurent Romary (LR) Brett Barney (BB) Lou Burnard (LB) Elena Pierazzo (EP) Kevin Hawkins (KH) Piotr Bański (PB) James Cummings (JC) Martin Holmes (MH)
Guest: Brian L. Pytlik Zillig (BZ)
Remote:
Stuart Yeates (SY)
Sebastian Rahtz (SR)
Gabriel Bodard (GB)
MORNING
GENETIC TRANSCRIPTION:
The transition between the discussion of <facsimile> and the explanation of <sourceDoc>, and when you would use the latter, needs to be expanded (EP, MH and LB). The content models of <surface>, <zone> etc. are identical whether they appear within <facsimile> or <sourceDoc>. This could lead to confusion. EP: The idea is to discourage <facsimile>. MH: But the genetic workgroup believes that <facsimile> and <sourceDoc> are different; if we remove the former, those who need that will surely complain. JC: <facsimile> and <sourceDoc> are different, and have different use cases. LB: We should provide schematron rules to say that if you use e.g. <line>, it should have ancestor::sourceDoc. LR: three points: We have ended up with a double mechanism, when a single mechanism would be simpler; some people like <facsimile> and want to keep it; some people would pursue the goal of coherence and want to remove one of them. MH: can we make <facsimile> an actual alias for <sourceDoc> through a technical mechanism? LR: Yes, through <equiv> in the description of <facsimile>, in the ODD. We could provide two ODDs, one for <sourceDoc> and one for <facsimile>, and the latter could be defined as a subset of the former. This is not yet possible in ODD. MH, summarizing: For now, we're stuck with using both elements, explaining the difference between them, and explaining the long-term strategy.
EP: There are some major problems with the chapter as it stands:
- The opening with two facing pages, with a patch crossing the two pages, or where the author has considered the opening as a single surface (e.g. Proust's working texts). EP proposed to the Goethe working group that a single <surface> be used in this case; this means that <surface> is being used as a psychological. The working group responded positively to this, and Council thinks it's uncontroversial. EP has examples, and will provide one from Proust, because the chapter should have one. BB: the chapter currently reads as though a <surface> is a physical object, or exists prior to interpretation, but we need to modify that description so that it allows for, or even specifies, the fact that a <surface> is a psychological construct of the encoder, presumably based on the assumed perception of the original author who chose to use it that way. JC: Why shouldn't this situation be handled by nesting <surface>? EP: In this case, you could represent the page <surface>s side-by-side, rather than nesting them. There is no need for nesting.
- A mini-<surface> within an existing <surface> is represented as a <patch>. If the patch is written on both sides, then it's @flippable, but how do we distinguish that from two separate patches? JC, LR: We need nested surfaces to handle this. LB: <patch> is just a special kind of <surface>. Should we collapse it into the defnition of <surface>? LB suggests we kill <patch>, and we introduce something called <surfaceGrp>, of which the prototypical case is a "leaf"; another example is a pile of post-it notes on top of each other. A <surfaceGrp> is not a matter of interpretation; it's a physical object which includes multiple <surface>s. EP objects to the idea of physical objects, because all determination of <surface> is by the encoder; we can't really express physical reality. MH, JC: A <surfaceGrp> could not have a single coordinate space because it is not a single two-dimensional space; a <surfaceGrp> can have a location within the coordinate space of its parent <surface>, and its child <surface>s have their own coordinate spaces. LB: Take the single canonical example of a leaf: the size of a leaf on one side is the same as its side on the other. Therefore we should allow a default coordinate system on <surfaceGrp>. The group consensus now is that <surfaceGrp> should replace <patch>; it covers both the use case of <patch> and other cases such as four-sided monuments or leaves; <surfaceGrp> is a child of <surface>, and <surface> is its child; <surfaceGrp> needs to be able to express its location and size within the coordinate space of its parent <surface>, assuming it has a parent <surface>. <surfaceGrp> may be a child of <facsimile> or <sourceDoc>.
- The leaf concept has been largely covered with <surfaceGrp>, but there's the issue of transparency; a writer may interact with something visible from the other side of a leaf, by e.g. doodling around it. MH: this might bve handled through an attribute on <zone>, such as "bleedThrough".
- EP: BB raised the issue of @alt on the list; exactly what did he mean? BB: I'll have to go back and look at that again.
SUMMARY: LB will rework the draft, and do it as fast as possible; he'll then ask all Council members to read the whole thing again. We should allow three working days, including if possible a weekend, to do the appropriate level of proofing.
AMBER TICKETS:
Three amber FRs were assigned to LB and set to Pending because they relate to the genetic proposal already in process:
- 3095641: New elements to document stages in the writing process.
- 3095640: New model.pPart.transcriptional elements for genetic markup.
- 3095637: New <document> <patch> <line> elements for genetic view
Two tickets were postponed till the afternoon EEBO session.
- 3293316: Move witStart et al. to model.milestoneLike. This topic is being addressed by a working group right now, so the ticket is assigned to EP with a note to that effect.
- 3115238: altIdentifier in msPart. Assigned to EP, who will nudge Torsten to provide the required examples.
- 3118435: classes in interleave mode and cardinality membership. This is an ODD3 request; MH asks if the suggested functionality in ODD would be implementable in XSD, or whether it would reduce us to only RelaxNG as a fully-working output schema language. There are actually two tickets here; the cardinality issue has been moved to another ticket, while this ticket is now addressing only interleave. We will post a comment asking gaiffe to create a new ticket only about interleave, and close this ticket.
The following tickets were dealt with:
- 3290834: memberOf with cardinality restrictions. [Related to above.] We will make this green and accepted. It is assigned to SR.
- 3258912: clarification of <colloc>. Marked as green, accepted and pending, and assigned to LR.
- 3156049: Managing egXML content (validation and presentation). JC, PB: There are two separate issues. First, does the prose need to be tidied up with respect to the last two comments. We agree, and this should be done. Second, should the @teix:show attribute should be handled. LB commented on the ticket to say that @rend could be used both for showing and for specialized rendering to e.g. highlight part of some code for teaching purposes. Therefore JC and PB propose using the tei:@rend attribute inside <teix:egXML> to specify rendering requirements etc. So: a) correct the prose according to the last two comments on the ticket, and 2) add tei:@rend to all elements within the teix namespace. The implication of tei:@rend in this context is that a processor will act on the attribute as a rendering instruction, rather than show it as part of the example. Set to green and assigned to JC.
- 3147225: New element <spGrp>. EP, LB: The proposal is that there and sub-div-level structural groups of speeches etc., as in e.g. shared arias or musical numbers, or play-within-a-play situations, and that a new <spGrp> element should be created to handle this. The content model would require a minimum of two <sp>s, along with anything else that can appear between <sp>s. But is this too specific? Should we instead introduce a new <floatingDiv> element that would handle other cases too? However, this is a simple case with a simple proposed solution which everyone understands, so we accept it. Marked as accepted and green, and assigned to LB.
- 3188679: change content model of <ident>. MH, LR: LB's use-case is actually a use-case for <idno>, not for <ident>. For instance, he uses URLs and filenames, and both of these should be <idno>. However, <idno> also does not permit internal structure, and there are other use-cases where that would be a good idea (for instance ISBNs and ISSNs). At the same time, <ident> does have a specific purpose, which is tagging formal identifiers in e.g. programming languages, and these do _not_ typically have internal structure. Finally, we are sympathetic to Sebastian's objections related to processing. Therefore we propose that <idno> should be made recursive, allowing internal structure, examples of recursive <idno> should be supplied, and LB should be encouraged to use <idno> instead of <ident> for his purposes. EP strongly objected that the subdivisions of <idno> are not necessarily <idno>s. LB says that the difference between <ident> and <idno> definitely needs some clarification. Assigned to MH to clarify the guidelines on the difference between <idno> and <ident>, and close the ticket, and to LR to raise a new ticket for the nesting of <idno> so the council can address it at length.
- 3305016: make <graphic> available within and ... BB, KH: The current content model of <formula> allows text or graphics. It turns out that
has similar requirements: it may (need to) be represented by a graphic element. So the proposal is to extend the model of
to include <graphic>. The prose should be revised not only for
but also for <formula> to explain that this is allowed. LB pointed out that you could use @facs, but MH replied that you should be able to choose between doing things in the same way as is done with <formula> if you wish, or use @facs throughout if that is your encoding practice. MH also pointed out that there is a use-case for more than one graphic (tables printed over several pages), so it should be one or more graphics. Set to pending, accepted and green. Assigned to KH (with help from LB for the content model change) to implement.
- 3106834 and 3106829 <floatingText> issues (discussed in the afternoon, but added here for clarity). KH summarized that we appear to have a pointless prohibition against using <floatingText> for incomplete texts. The practical problem is that you can't use <quote> for a quote of a lengthy part of a text, because its content model is too restrictive, but the description/definition of <floatingText> specifies "complete". The Council had discussed and agreed on the second ticket in Chicago, but implementation was held up by another ticket, which has now been resolved, so we can go ahead with minor adjustments to the proposed wording to remove the example of a musical number, which we now recommend handling through <spGrp> (see above). Council created a new, improved formulation of the change to the text. Set to agreed, green and pending, and assigned to KH for implementation, although he will assign it to BB when SourceForge settings have been changed to permit this.
- The opening with two facing pages, with a patch crossing the two pages, or where the author has considered the opening as a single surface (e.g. Proust's working texts). EP proposed to the Goethe working group that a single <surface> be used in this case; this means that <surface> is being used as a psychological. The working group responded positively to this, and Council thinks it's uncontroversial. EP has examples, and will provide one from Proust, because the chapter should have one. BB: the chapter currently reads as though a <surface> is a physical object, or exists prior to interpretation, but we need to modify that description so that it allows for, or even specifies, the fact that a <surface> is a psychological construct of the encoder, presumably based on the assumed perception of the original author who chose to use it that way. JC: Why shouldn't this situation be handled by nesting <surface>? EP: In this case, you could represent the page <surface>s side-by-side, rather than nesting them. There is no need for nesting.
- A mini-<surface> within an existing <surface> is represented as a <patch>. If the patch is written on both sides, then it's @flippable, but how do we distinguish that from two separate patches? JC, LR: We need nested surfaces to handle this. LB: <patch> is just a special kind of <surface>. Should we collapse it into the defnition of <surface>? LB suggests we kill <patch>, and we introduce something called <surfaceGrp>, of which the prototypical case is a "leaf"; another example is a pile of post-it notes on top of each other. A <surfaceGrp> is not a matter of interpretation; it's a physical object which includes multiple <surface>s. EP objects to the idea of physical objects, because all determination of <surface> is by the encoder; we can't really express physical reality. MH, JC: A <surfaceGrp> could not have a single coordinate space because it is not a single two-dimensional space; a <surfaceGrp> can have a location within the coordinate space of its parent <surface>, and its child <surface>s have their own coordinate spaces. LB: Take the single canonical example of a leaf: the size of a leaf on one side is the same as its side on the other. Therefore we should allow a default coordinate system on <surfaceGrp>. The group consensus now is that <surfaceGrp> should replace <patch>; it covers both the use case of <patch> and other cases such as four-sided monuments or leaves; <surfaceGrp> is a child of <surface>, and <surface> is its child; <surfaceGrp> needs to be able to express its location and size within the coordinate space of its parent <surface>, assuming it has a parent <surface>. <surfaceGrp> may be a child of <facsimile> or <sourceDoc>.
- The leaf concept has been largely covered with <surfaceGrp>, but there's the issue of transparency; a writer may interact with something visible from the other side of a leaf, by e.g. doodling around it. MH: this might bve handled through an attribute on <zone>, such as "bleedThrough".
- EP: BB raised the issue of @alt on the list; exactly what did he mean? BB: I'll have to go back and look at that again.
- 3095641: New elements to document stages in the writing process.
- 3095640: New model.pPart.transcriptional elements for genetic markup.
- 3095637: New <document> <patch> <line> elements for genetic view
- 3293316: Move witStart et al. to model.milestoneLike. This topic is being addressed by a working group right now, so the ticket is assigned to EP with a note to that effect.
- 3115238: altIdentifier in msPart. Assigned to EP, who will nudge Torsten to provide the required examples.
- 3118435: classes in interleave mode and cardinality membership. This is an ODD3 request; MH asks if the suggested functionality in ODD would be implementable in XSD, or whether it would reduce us to only RelaxNG as a fully-working output schema language. There are actually two tickets here; the cardinality issue has been moved to another ticket, while this ticket is now addressing only interleave. We will post a comment asking gaiffe to create a new ticket only about interleave, and close this ticket.
- 3290834: memberOf with cardinality restrictions. [Related to above.] We will make this green and accepted. It is assigned to SR.
- 3258912: clarification of <colloc>. Marked as green, accepted and pending, and assigned to LR.
- 3156049: Managing egXML content (validation and presentation). JC, PB: There are two separate issues. First, does the prose need to be tidied up with respect to the last two comments. We agree, and this should be done. Second, should the @teix:show attribute should be handled. LB commented on the ticket to say that @rend could be used both for showing and for specialized rendering to e.g. highlight part of some code for teaching purposes. Therefore JC and PB propose using the tei:@rend attribute inside <teix:egXML> to specify rendering requirements etc. So: a) correct the prose according to the last two comments on the ticket, and 2) add tei:@rend to all elements within the teix namespace. The implication of tei:@rend in this context is that a processor will act on the attribute as a rendering instruction, rather than show it as part of the example. Set to green and assigned to JC.
- 3147225: New element <spGrp>. EP, LB: The proposal is that there and sub-div-level structural groups of speeches etc., as in e.g. shared arias or musical numbers, or play-within-a-play situations, and that a new <spGrp> element should be created to handle this. The content model would require a minimum of two <sp>s, along with anything else that can appear between <sp>s. But is this too specific? Should we instead introduce a new <floatingDiv> element that would handle other cases too? However, this is a simple case with a simple proposed solution which everyone understands, so we accept it. Marked as accepted and green, and assigned to LB.
- 3188679: change content model of <ident>. MH, LR: LB's use-case is actually a use-case for <idno>, not for <ident>. For instance, he uses URLs and filenames, and both of these should be <idno>. However, <idno> also does not permit internal structure, and there are other use-cases where that would be a good idea (for instance ISBNs and ISSNs). At the same time, <ident> does have a specific purpose, which is tagging formal identifiers in e.g. programming languages, and these do _not_ typically have internal structure. Finally, we are sympathetic to Sebastian's objections related to processing. Therefore we propose that <idno> should be made recursive, allowing internal structure, examples of recursive <idno> should be supplied, and LB should be encouraged to use <idno> instead of <ident> for his purposes. EP strongly objected that the subdivisions of <idno> are not necessarily <idno>s. LB says that the difference between <ident> and <idno> definitely needs some clarification. Assigned to MH to clarify the guidelines on the difference between <idno> and <ident>, and close the ticket, and to LR to raise a new ticket for the nesting of <idno> so the council can address it at length.
- 3305016: make <graphic> available within
- 3106834 and 3106829 <floatingText> issues (discussed in the afternoon, but added here for clarity). KH summarized that we appear to have a pointless prohibition against using <floatingText> for incomplete texts. The practical problem is that you can't use <quote> for a quote of a lengthy part of a text, because its content model is too restrictive, but the description/definition of <floatingText> specifies "complete". The Council had discussed and agreed on the second ticket in Chicago, but implementation was held up by another ticket, which has now been resolved, so we can go ahead with minor adjustments to the proposed wording to remove the example of a musical number, which we now recommend handling through <spGrp> (see above). Council created a new, improved formulation of the change to the text. Set to agreed, green and pending, and assigned to KH for implementation, although he will assign it to BB when SourceForge settings have been changed to permit this.
AFTERNOON
Brian L. Pytlik Zillig (BZ) from EEBO joined the meeting all day, and in the afternoon Council worked with him on a number of issues regarding the possible convergence of TCP and TEI. The EEBO corpus will contain several billion words, and will be freely available in the future, so it's in our interests to make sure that interoperability between TCP and TEI is maximized.
There are a number of areas in which moving from TCP to TEI P5 is complicated. Three groups of issues were identified by Martin Mueller (MM), in three groups, and we discussed the first group in detail:
1. Accept the EECO model for <figure> wholly or in part.
The Council first looked at the following items from MM's spreadsheet in GoogleDocs:
figure/l 100 wrap text in floatingText allow figure/l figure/lg 50 wrap text in floatingText allow figure/lg figure/quote 150 wrap text in floatingText allow figure/quote figure/signed 50 ab or ab type="signed" allow figure/signed figure/sp 8 wrap text in floatingText allow <sp> in figure figure/stage 2 wrap text in floatingText allow <stage> in figure
figure/table 13 wrap text in floatingText allowelements inside <cell> can be deleted making the content of
the immediate content of <cell> 3. The EEBO <element> can be expressed as <floatingText type="letter"> 4. The rare <above> and <below> can be expressed as <hi rend="above"> or perhaps some different ways 5. <postscript> elements as last children of <closer> can be turned into right siblings of <closer> There was no time to consider the third type of issue, exemplified in the following items: 1. Cases in which <list> appears as child of <label> in EEBO texts should be remodeled as two-column tables or expressed in some other 2. The few instances in which <cell> has a complex content model along the lines of <item> should be rethought. 3. In addition to these problems, the encoding of signatures in early modern texts may require additional discussion. Council would also like to request access to the current state of EEBO in its "lossless XML" form, as well as the facsimiles, so that we can generate a list of issues that we believe EEBO might wish to address.
ROUGH UNREDACTED VERSION OF DAY 2 MINUTES
November 8 2011
Present: Laurent Romary (LR) Brett Barney (BB) Lou Burnard (LB) Elena Pierazzo (EP) Kevin Hawkins (KH) Piotr Bański (PB) James Cummings (JC) Martin Holmes (MH)
Guest: Brian L. Pytlik Zillig (BZ)
Remote:
Stuart Yeates (SY)
Sebastian Rahtz (SR)
Gabriel Bodard (GB)
MORNING
GENETIC TRANSCRIPTION:
The transition between the discussion of <facsimile> and the explanation of <sourceDoc>, and when you would use the latter, needs to be expanded (EP, MH and LB). The content models of <surface>, <zone> etc. are identical whether they appear within <facsimile> or <sourceDoc>. This could lead to confusion. EP: The idea is to discourage <facsimile>. MH: But the genetic workgroup believes that <facsimile> and <sourceDoc> are different; if we remove the former, those who need that will surely complain. JC: <facsimile> and <sourceDoc> are different, and have different use cases. LB: We should provide schematron rules to say that if you use e.g. <line>, it should have ancestor::sourceDoc. LR: three points: We have ended up with a double mechanism, when a single mechanism would be simpler; some people like <facsimile> and want to keep it; some people would pursue the goal of coherence and want to remove one of them. MH: can we make <facsimile> an actual alias for <sourceDoc> through a technical mechanism? LR: Yes, through <equiv> in the description of <facsimile>, in the ODD. We could provide two ODDs, one for <sourceDoc> and one for <facsimile>, and the latter could be defined as a subset of the former. This is not yet possible in ODD. MH, summarizing: For now, we're stuck with using both elements, explaining the difference between them, and explaining the long-term strategy.
EP: There are some major problems with the chapter as it stands:
SUMMARY: LB will rework the draft, and do it as fast as possible; he'll then ask all Council members to read the whole thing again. We should allow three working days, including if possible a weekend, to do the appropriate level of proofing.
AMBER TICKETS:
Three amber FRs were assigned to Lou and set to Pending because they relate to the genetic proposal already in process:
Two tickets were postponed till the afternoon EEBO session.
The following tickets were dealt with:
AFTERNOON
Brian L. Pytlik Zillig (BZ) from EEBO joined the meeting all day, and in the afternoon Council worked with him on a number of issues regarding the possible convergence of TCP and TEI. The EEBO corpus will contain several billion words, and will be freely available in the future, so it's in our interests to make sure that interoperability between TCP and TEI is maximized.
There are a number of areas in which moving from TCP to TEI P5 is complicated. Three groups of issues were identified by Martin Mueller (MM), in three groups, and we discussed the first group in detail:
1. Accept the EECO model for <figure> wholly or in part.
The Council first looked at the following items from MM's spreadsheet in GoogleDocs:
figure/l 100 wrap text in floatingText allow figure/l figure/lg 50 wrap text in floatingText allow figure/lg figure/quote 150 wrap text in floatingText allow figure/quote figure/signed 50 ab or ab type="signed" allow figure/signed figure/sp 8 wrap text in floatingText allow <sp> in figure figure/stage 2 wrap text in floatingText allow <stage> in figure
figure/table 13 wrap text in floatingText allow
2. Add <opener> and <closer> to the P5 <postscript> model (our ticket 3232942)
3. Permit <l> as a direct child of <head>
EEBO has lots of examples of <l> inside head, but P5 does not permit this. Some members of council felt we should allow <l> in <head>; others thought we should allow <lg>, and require that <lg> wrap the <l>s; still others thought that no line-group could possibly constitute a <head>. Council agrees that both <lg> and <l> should be allowed inside <head>, because elsewhere both <lg> and <l> are allowed alongside each other. How to implement this is not yet clear. LB has raised ticket 3434992 for this, and it is not yet assigned to anyone.
4. Permit <stage> as a direct child of <lg>
The justification for this is convincing, and SR suggests adding model.stageLike to the content model of <lg>, which the Council approved. LB has raised ticket 3434996 for this.
elements inside <cell> can be deleted making the content of
the immediate content of <cell> 3. The EEBO <element> can be expressed as <floatingText type="letter"> 4. The rare <above> and <below> can be expressed as <hi rend="above"> or perhaps some different ways 5. <postscript> elements as last children of <closer> can be turned into right siblings of <closer> There was no time to consider the third type of issue, exemplified in the following items: 1. Cases in which <list> appears as child of <label> in EEBO texts should be remodeled as two-column tables or expressed in some other 2. The few instances in which <cell> has a complex content model along the lines of <item> should be rethought. 3. In addition to these problems, the encoding of signatures in early modern texts may require additional discussion. Council would also like to request access to the current state of EEBO in its "lossless XML" form, as well as the facsimiles, so that we can generate a list of issues that we believe EEBO might wish to address.