DH2014Hackathon-Projects

= DH2014 Hackathon Project Discussion Page =

Let's Hack
A suggestion for how to proceed based on the range of proposals and the range of participants. The group is likely to be able to sustain 2 or at most 3 projects in the time frame allotted. Some suggestions; please modify, develop or otherwise shape to your liking!

ODD visualizer
A group might try to put together a plan and start to code up an ODD visualizer. ODD is a literate programming language for XML schemas written by the TEI, and which is used to define the TEI schema. An ODD visualizer might be a component of an ODD customization tool such as the successor of Roma, and could provide information to a user about what modules they have included in their TEI schema, or what modification they have made.

The group should start by discussing what to visualize, perhaps including some TEI users who has experience in customization.

Resources: About ODD Getting Started with P5 ODDs, Guidelines Chapter 23: Using the TEI and a look at Roma, the tool that is currently in use for generating TEI customizations.

Basic framework for rendering a document and creating simple visualizations
This project could be quite extensive, so the group should begin by selecting components that they can manage.

The proposed outcome is to be able to display (or at least mine) a document for salient features, and to produce output generic enough that it can be passed to either mapping or to other visualization software. Or perhaps to the NY Times Pourover js library. One possible way to to do this is to decide on a json data structure, as that is a popular format for visualization libraries.

Develop the MS description framework
This was one of the proposed projects that got a lot of comments. It could be developed along the lines of the comments, and generalized a bit.

Discussing, working out and documenting best practices for an authoring workflow
Projects that last for many years, have many people working on them, and comprise large amounts of material need to ensure that work is being done efficiently, produce accurate output, and move files through transformations and validations. Good workflows can be enhanced by both tools and best practices. The outcome of this project could be a document with explanation and examples of authoring/encoding/proofreading/version control workflows with a focus on software tools.

Participants

 * Robert C Kahlert, Department of Protestant Theology, University of Vienna
 * Raffaele Viglianti, University of Maryland
 * Patrik Granholm, Uppsala University Library: Greek Manuscripts in Sweden. A Digitization and Cataloguing Project
 * Frederike Neuber,DiXiT - Ph.D. Fellow on „Digital Palaeography and scholarly editions“ at Graz University / Austrian Centre for Digital Humanities
 * Felix Lange, Academy of Sciences and Literatur | Mainz, Project IBR (http://www.spatialhumanities.de)
 * Emmanuelle Morlock, CNRS, HISoMA Laboratory (Histoire et Sources des mondes antiques / History and Origins of the Antique World)
 * Elli Bleeker, PhD­ student in Digital Humanities at Antwerp University; research fellow at DiXiT
 * Magdalena Turska, University of Oxford
 * Nick Laiacona, Performant Software Solutions LLC www.performantsoftware.com
 * Elizabeth Maddock Dillon, Professor of English and Co­Director of NULab for Texts, Maps, and Networks, Northeastern University, Boston, MA USA
 * Thomas Kollatz, Steinheim-Institute for German-Jewish History: epidat
 * Elena Spadini, Huygens Ing


 * Sebastian Rahtz, University of Oxford
 * James Cummings, University of Oxford
 * Alex Czmiel, Berlin-Brandenburg Academy of Sciences and Humanities
 * Hugh Cayless, Duke University
 * Elli Mylonas, Brown University

Procedure
We will be using the wiki as a place for comments and discussion. The projects each participant proposed are all listed below. Ideally, the hackathon will be focussed around one or two projects that are useful to everyone. The participants are all experienced with TEI in various capacities, but they are not all skilled programmers. The hackathon would be a success if the outcome of the one day event was e a good start on one or two useful pieces of TEI related software. In order to achieve this, we should collectively decide on projects that are interesting, useful, generalizable, and do-able! We should also try to capitalize on everyone's skill set, and try to plan the event in such a way that we can all contribute to something.

You are all welcome to introduce yourselves, by adding a description of a few sentences to the list of participants above. And please, edit, correct and comment on the projects. We are also inviting the TEI Council and Board and other interested parties to look in on the discussions.

Mormon City Planning:
when early Mormonism ventured from Kirtland, Ohio, into Missouri, their prophet Joseph Smith Jr provided a revelation for the plat (city layout) of the new Zion (http://urbanplanning.library.cornell.edu/DOCS/smith.htm; http://zomarah.files.wordpress.com/2010/10/firstplatofzion.png), which was subsequently applied in the settlement of Far West, Missouri, for which two plats exist, one on sheepskin (https://www.lds.org/bc/content/shared/content/images/gospel-library/manual/32502/15-02 b.gif) in private possession and one on paper at BYU University (there is no online copy of this, but I have obtained a digital copy from BYU with their permission); however these plats were not only used to sketch out the city, but also to assign lots to settlers, and thus show secondary markup in pencil to allocate houses, redraw lot boundaries, etc; (http://zomarah.files.wordpress.com/2010/10/platoffarwestbig.png)

Comments:
 * A clarification: What would the project strive to accomplish? Is this a visualization? georeferencing based on markup?
 * Thomas Kollatz 11:57, 1 July 2014 (CEST) https://www.lds.org/bc/content/shared/content/images/gospel-library/manual/32502/15-02 error 404 - not found; maybe visualization: text-information encoded in TEI related to city layout ("plat") in SVG ?

ODD Customization visualizer
A web-based tool to visualize any ODD customization against TEI-all. This could be useful when working on a customization (with Roma, or manually) to quickly and visually check that the ODD is still TEI-conformant and see how it diverges from the standard. I started working on a basic D3 visualization a couple of years ago, but haven't really touched the code since:https://github.com/raffazizzi/ODDViz I only restructured the repository a bit before sending this proposal.

Comments

ACE-based Web editor
MITH, at the University of Maryland has been working on an ACE-based web editor able to validate tei-all files in the browser and provide ODD-based contextual help (e.g. suggesting valid options when entering a new element). The grant that funded this work is now over, but there's plenty more to do. This is the GitHub repository: https://github.com/umd-mith/angles

Comments:

MS Description Display Framework
To create a simple web interface which would provide basic functionalities like browsing, searching and displaying TEI-files containing manuscript descriptions. The interface could be built using a Ubuntu server with nginx, and eXist-db following the setup guide and scripts provided by Grant Macken. I have already made some preliminary work on an eXist-db web interface and posted the code in our GitHub repository. This could be used as a starting point for further development. A possible goal for the hackathon could be to create an advanced search form in XQuery which would display snippets of the descriptions in the search results using the transform function with our XSL stylesheet, with links to the full description. All of the code, with detailed documentation, would be made public on GitHub for others to reuse and modify for their own projects. I think this would be beneficial to the TEI community at large, and especially to other cataloguing projects.

Comments:


 * I believe this would be really useful project, especially if we build it not only having this particular purpose of MS descriptions in mind, but something more general, so other users would potentially swap only source files and stylesheets and would have a basic working website. Magdalena Turska 11:00, 23 June 2014 (CEST)
 * me too – more general features almost every project will need after swapping source files and stylesheets: navigation to next page/object ; or: navigation to next entry/object in chronological order, full text search everywhere resp in particular div's (type="edition") etc. Thomas Kollatz 12:09, 1 July 2014 (CEST)
 * I am definitely open to creating a more general display framework in eXist. Perhaps we could make use of Joe Wicentowski’s Punch eXist tutorial from DHOxSS 2011. Patrik Granholm 11:16, 2 July 2014 (CEST)

Medieval Text Edition
Project isn't advanced enough to suggest concrete task, but the topic of interest is focussed around a digital scholarly edition of a medieval text (with corresponding images) encoded in XML/TEI. The edition will be enriched with palaeographical and codicological information (in both and ) and the results can be visualized in a way which hopefully goes beyond the sometimes not very enlightening listing of data.

Comments: Even now my project is not advanced enough to be used for the hackathon. I will join the discussions on the other projects. --Frederike Neuber 22:05, 24 June 2014 (CEST)

Semantic Connections for Epigraphic Documents
I am currently working with epidoc documents in the context of the Project IBR. These documents, epigraphical editions from the catalogue "German Inscriptions Online" (inschriften.net), are (1) to be transformed into RDF­triples for semantic connection and for a fine­grained quantificational analysis in a Triple Store, (2) XSLT ­transformed into HTML­Documents for further annotation in the semantic annotator "Pundit"(thepund.it). I could surely contribute a programming task based on this points, but would also be happy to participate in another task, preferably based on "adding a TEI mode to a web editor".

Comments:

--Frederike Neuber 23:02, 24 June 2014 (CEST) Who is responsible for this project and can provide further information? I would be very interested in joining this project even if I can't offer material and even if I do not understand one step: how can you transform an epidoc document into RDF triples? Do you mean you generate RDF from the annotated entities (extracting them?)? And how do you do it, are there tools or software to support it?

Thomas Kollatz 12:12, 1 July 2014 (CEST) The oxgarage magic box contains already a tool converting TEI2RDF, if so – we should try it out and evaluate it if not - let's develop it …

Zotero
Some TEI encoding project may need to use a "master bibliography" to group together all the bibliographic references that are used in a given text or collection of texts. Using the "masterfile" option in Oxygen give the user a convenient way of pointing to a reference without having to encode a ref each time it is used in a text or in a specific bibliographic section.   15

The bibliography can be inserted in a element with xinclude. But the question is how to use Zotero to create this file, update it and synchonize it with the biblographic masterfile. Though incomplete, the workflow I use works like that :
 * the entering of the bibliographic entries is done with zotero, in a group library
 * the zotero database is then exported in xml using the "bibliontology_rdf" format (the TEI export format being to restrictive in its formatting choices : e.g. does'nt include "short titles" which are a requisite in epigraphy
 * the bibliontology rdf is converted in TEI through XSLT in Oxygen (modifying existing xslt :https://github.com/paregorios/Zotero-RDF-to-TEI-XML)
 * If the user wants to add a new reference in the masterbibliography or correct an entry, he or she has to do go to zotero and follow the whole workflow of export - tranform process. But the user might not have the user right to do so. And of course, there are some potential conflicting issues.

How to improve that workflow ? use a version control system like git or subversion ? or adding/modifying an entry in xml / tei and then updating the zotero group library via api ?

Comments: --Elli Bleeker 15:29, 1 July 2014 (CEST) It seems that working with subversion would improve the workflow a lot and at least illiminate the risk of conflicting issues. Of course users should have the right to work in the (seperate) xml file of the bibliographic masterfile and add/change references. Once completed, the xml file can be exported to Zotero. In short, this would mean a reverse of the current workflow.

Migrating value lists form CSS frameworks to ODD
The author mode of the Oxygen editor can be customized with custom css functions. A part from the function providing a more user-friendly and tag free visualization of the tei content, one of the fuction allow the user to edit attributes or simple elements values using combo boxes or check boxes. The "form controls" can display values collected from an xml schema. But these values can also be just in the oxygen css. This may be used in a workflow to test some choices before integrating them in a more consistent and persistant way in the ODD (then exported in the schema).

This could be useful in case you have user that may be ok to change the values in the css code but would be relunctant to get involved with the ODD editing / schema generation process. cf. http://www.oxygenxml.com/doc/ug-editor/concepts/combo-box-editor.html#combo-box-editor

Comments:

--Raffaele.viglianti 19:20, 16 June 2014 (CEST) This is interesting, would like to work on it. I would suggest to make it less tied to Oxygen and think in terms of CSS to ODD, for example to limit attribute values (e.g. hi[rend=italic])

Elena Spadini 00:13, 24 June 2014 (CEST) I'm not well-exeperienced with the author mode in oXygen and its customizations, but I would like to explore it working on this project.

Visualization of intertextual TEI content
[UPDATED on June 24th]

The focus is on the possibilities offered by TEI XML for encoding intertextuality. The case study concerns the personal library of an author and a digital edition of his work. The objective is to encode the material in such a way that the intertextual relations between the literary work and its external sources is visualized.

These -sometimes subtle- intertextual relationships provide insight into the nature of writing, all the more since the genesis of the literary work itself is already encoded in the edition (i.e. adds, dels, etc.). The envisioned result enables a user to study the writing process in detail as well as on a larger scale.

Material:
 * detailed XML TEI transcription of the literary work concerned
 * high quality digital facsimiles of the author's personal extant library
 * rough transcriptions of the library books (based on OCR)

Comments:

--Elli Bleeker 15:46, 1 July 2014 (CEST) The idea is to log the complete "path" of a citation: from a phrase underlined in a library book to the incorporation of that phrase in the author's work. Currently, the intertextual references are encoded with [ref] tag in the XML TEI transcription of the literary work. The [ref]s refer to another xml file containing the transcriptions of the personal library. This file consists of [div]s, that contain anything from a complete library book to a small section (paragraph, phrase) from a library book.
 * Clarification: is the project about the visualization? the relationships? More detail would be great.

The visualization of the intertextual relations comes in a later stage when transforming the documents. I wonder whether it is possible to change or improve this encoding.

Rendering Complex Markup
Rendering complex markup in an innovative and playful way: reconstructing & visualizing author’s geographical position by the dates of sending of his letters (taking uncertainty into account)

Comments:

Magdalena Turska 12:36, 12 June 2014 (CEST)

Starting with a list of letters that have a place and date of sending the idea would be to present map overview of the author's journeys. See the simple Google Map at https://mapsengine.google.com/map/edit?mid=z3q4AefiR7Us.kf1fVBHlu8qE

Fist ideas re visualisation:
 * places color-coded with color getting darker to represent 'later' places
 * similarly shaded lines along the routes between places
 * animation with slider to show circle moving along the routes, size of the circle getting bigger with the uncertainty

Input data: each letter has assigned place name and 1+ time intervals (notBefore to not After)

--Frederike Neuber 22:25, 24 June 2014 (CEST) Would it be a problem to work on different material? @TEI-experts I would like to visualize the provenance of medieval manuscripts on a map, so even if I do not work with letters, the task is partly the same (extracting date and time from the  and visualizing it on a map). Then our scopes separate: Magdalena wants to reconstruct the journey, I want to visualize the chronology of different mss on one map.

Thomas Kollatz 15:56, 1 July 2014 (CEST) I agree with Frederike Neuber: the task is the same in all projects, where a spatio-temporal visualization makes sense: Extract date and time from header - and visualize the chronology of date/time and place on a map and/or timeline /TEI/teiHeader/fileDesc/sourceDesc/msDesc/history/origin

Adding a TEI mode to a web editor
Title says it all!

Comments:

--Raffaele.viglianti 19:20, 16 June 2014 (CEST) How does the Angles proposal above sound?

Juxta Script support
Adding support for Bengali text to Juxta Commons

Comments: Thomas Kollatz 15:57, 1 July 2014 (CEST) If Bengali then Hebrew (Arabic … right-to-left), please
 * This is fairly narrow.

Visualization
The archive of early Caribbean texts and images is a fairly large project which will be heavily encoded (and a portion of texts will be encoded by July). I'd like to explore ways of using the TEI to visualize relations between and among texts and elements of texts.

Comments:

--Elli Bleeker 16:19, 1 July 2014 (CEST) What is the status of this project? I am also interested in finding ways to visualize relations between texts, perhaps there are similarities?

Jewish Sepulchral Headstones DB
Given that all headstones can be clearly located, and about 20.000 (of total 26.000) inscriptions are dated, and to a large extend can be distinguished by gender, and also by language usage (hebrew, german, german in Hebrew letters, …), they lend themselves to mining the epidat corpus for specific data facets and try to visualize the results "in an innovative and playful way".
 * 1) A challenging research questions could be the search for and visualization of the differences in word usage and specific idiomatic
 * 2) between different locations,
 * 3) within different periods (based on one or several locations),
 * 4) between inscriptions for men and inscriptions for women,
 * 5) between hebrew and non-hebrew inscriptions …
 * 6) Even more challenging is the search for inscriptions with very similar text coverage. These are quite difficult to discover by a simple full text search. The inscriptions are usually rather short, and the filter to develop has to consider the differentiating elements – usually names and dates – in order to determine, whether two text are identical or not.

All necessary information to answer this questions is contained in the metadata  and data of each single record. example – spatial and temporal metadata (date, country/region code, geo-coordinates, Thesaurus of Getty Names or OSM ID) 1621-08-17  Germany Hamburg  Hamburg-Altona, Königstraße  Jüdischer Friedhof 53.549373 9.950545  example – gender-specific metadata (given according to ISO 5218:2004)    <persName>Schmuel ben Jehuda</persName> <event when='1621-08-17' type="dateofdeath"> </listPerson> </particDesc> example – language Usage metadata: <langUsage> <language ident='he' usage='100'>Hebrew </langUsage>

With respect for the TEI Hackathon I have (just) set up a website with very general information how to harvest epidat records: http://www.steinheim-institut.de/cgi-bin/epidat?info=howtoharvest (to be continued …) Over and beyond that it would make no great difficulty to provide a zip file with the data available for the workshop.

Comments: Thomas Kollatz 16:05, 1 July 2014 (CEST) The TEI spam filter seems to dislike the person-attribute s_x (middle-letter e) … rather prudish, isn't it

Discussion

What commonalities to you see? What seems interesting? Anything to add?

--Frederike Neuber 23:21, 24 June 2014 (CEST) Two (even if very different) projects, "Jewish Sepulchral Headstones DB" and "Rendering complex mark up", are dealing with visualization of spatial and temporal metadata.

Three (again very different) projects, "Visualization of intertextual TEI content", "Semantic Connections for Epigraphic Documents" and "Jewish Sepulchral Headstones DB", are dealing with semantic connection and modelling of content (even if not all of them say it explicitly in the description).

--Elli Bleeker 16:25, 1 July 2014 (CEST) Add to that the visualization of textual relations in the early Caribbean texts archive.

Core Builder
Another possible tool to work on: https://github.com/raffazizzi/coreBuilder It provides a simple web interface to create stand-off markup. TEI files can be open into multiple ACE editors and the user can click on elements with xml:ids to create references. The current version creates elements containing s with pointers to the selected elements. It should be easy enough to make the elements user configurable so that the Core Builder can be used to put together <linkGrp>s or s. The tool is written in CoffeScript with a Backbone framework. --Raffaele.viglianti 19:07, 16 June 2014 (CEST)

--Elli Bleeker 16:23, 1 July 2014 (CEST) From the sound of it, I think I would like to work on this project. Some more detail would be great.

Here's a demo of the current version. It's got a few bugs but should show the basic functionality. Pick a couple of sources, then click on any element with xml:id to create a selection. Click add to save it to the "core". https://dl.dropboxusercontent.com/u/2443674/coreBuilder/index.html --Raffaele.viglianti 02:31, 3 July 2014 (CEST)