DH2014Hackathon-Projects

= DH2014 Hackathon Project Discussion Page =

Participants

 * Robert C Kahlert, Department of Protestant Theology, University of Vienna
 * Raffaele Viglianti, University of Maryland
 * Patrik Granholm, Uppsala University Library: Greek Manuscripts in Sweden. A Digitization and Cataloguing Project (http://www.manuscripta.se)
 * Frederike Neuber,DiXiT - Ph.D. Fellow on „Digital Palaeography and scholarly editions“ at Graz University / Austrian Centre for Digital Humanities
 * Felix Lange, Academy of Sciences and Literatur | Mainz, Project IBR (http://www.spatialhumanities.de)
 * Emmanuelle Morlock, CNRS, HISoMA Laboratory (Histoire et Sources des mondes antiques / History and Origins of the Antique World)
 * Elli Bleeker, PhD­student in Digital Humanities at Antwerp University; research fellow at DiXiT
 * Magdalena Turska, University of Oxford
 * Nick Laiacona, Performant Software Solutions LLC www.performantsoftware.com
 * Elizabeth Maddock Dillon, Professor of English and Co­Director of NULab for Texts, Maps, and Networks, Northeastern University, Boston, MA USA
 * Thomas Kollatz, Steinheim-Institute for German-Jewish History: epidat
 * Elena Spadini, Huygens Ing


 * Sebastian Rahtz, University of Oxford
 * James Cummings, University of Oxford
 * Alex Czmiel,
 * Hugh Cayless, Duke University
 * Elli Mylonas, Brown University

Procedure
We will be using the wiki as a place for comments and discussion. The projects each participant proposed are all listed below. Ideally, the hackathon will be focussed around one or two projects that are useful to everyone. The participants are all experienced with TEI in various capacities, but they are not all skilled programmers. The hackathon would be a success if the outcome of the one day event was e a good start on one or two useful pieces of TEI related software. In order to achieve this, we should collectively decide on projects that are interesting, useful, generalizable, and do-able! We should also try to capitalize on everyone's skill set, and try to plan the event in such a way that we can all contribute to something.

You are all welcome to introduce yourselves, by adding a description of a few sentences to the list of participants above. And please, edit, correct and comment on the projects. We are also inviting the TEI Council and Board and other interested parties to look in on the discussions.

Mormon City Planning:
when early Mormonism ventured from Kirtland, Ohio, into Missouri, their prophet Joseph Smith Jr provided a revelation for the plat (city layout) of the new Zion (http://urbanplanning.library.cornell.edu/DOCS/smith.htm; http://zomarah.files.wordpress.com/2010/10/firstplatofzion.png), which was subsequently applied in the settlement of Far West, Missouri, for which two plats exist, one on sheepskin (https://www.lds.org/bc/content/shared/content/images/gospel-library/manual/32502/15-02 b.gif) in private possession and one on paper at BYU University (there is no online copy of this, but I have obtained a digital copy from BYU with their permission); however these plats were not only used to sketch out the city, but also to assign lots to settlers, and thus show secondary markup in pencil to allocate houses, redraw lot boundaries, etc; (http://zomarah.files.wordpress.com/2010/10/platoffarwestbig.png)

Comments:
 * A clarification: What would the project strive to accomplish? Is this a visualization? georeferencing based on markup?

ODD Customization visualizer
A web-based tool to visualize any ODD customization against TEI-all. This could be useful when working on a customization (with Roma, or manually) to quickly and visually check that the ODD is still TEI-conformant and see how it diverges from the standard. I started working on a basic D3 visualization a couple of years ago, but haven't really touched the code since:https://github.com/raffazizzi/ODDViz I only restructured the repository a bit before sending this proposal.

Comments

ACE-based Web editor
MITH, at the University of Maryland has been working on an ACE-based web editor able to validate tei-all files in the browser and provide ODD-based contextual help (e.g. suggesting valid options when entering a new element). The grant that funded this work is now over, but there's plenty more to do. This is the GitHub repository: https://github.com/umd-mith/angles

Comments:

MS Description Display Framework
To create a simple web interface which would provide basic functionalities like browsing, searching and displaying TEI-files containing manuscript descriptions. The interface could be built using a Ubuntu server with nginx (http://nginx.org), and eXist-db (http://exist-db.org) following the setup guide and scripts provided by Grant Macken (https://github.com/grantmacken/nginx-eXist-ubuntu). I have already made some preliminary work on an eXist-db web interface and posted the code in our GitHub repository (https://github.com/manuscripta). This could be used as a starting point for further development. A possible goal for the hackathon could be to create an advanced search form in XQuery which would display snippets of the descriptions in the search results using the transform function with our XSL stylesheet, with links to the full description. All of the code, with detailed documentation, would be made public on GitHub for others to reuse and modify for their own projects. I think this would be beneficial to the TEI community at large, and especially to other cataloguing projects.

Comments:

I believe this would be really useful project, especially if we build it not only having this particular purpose of MS descriptions in mind, but something more general, so other users would potentially swap only source files and stylesheets and would have a basic working website. Magdalena Turska 11:00, 23 June 2014 (CEST)

Medieval Text Edition
Project isn't advanced enough to suggest concrete task, but the topic of interest is focussed around a digital scholarly edition of a medieval text (with corresponding images) encoded in XML/TEI. The edition will be enriched with palaeographical and codicological information (in both and ) and the results can be visualized in a way which hopefully goes beyond the sometimes not very enlightening listing of data.

Comments: Even now my project is not advanced enough to be used for the hackathon. I will join the discussions on the other projects. --Frederike Neuber 22:05, 24 June 2014 (CEST)

Semantic Connections for Epigraphic Documents
I am currently working with epidoc documents in the context of the Project IBR. These documents, epigraphical editions from the catalogue "German Inscriptions Online" (inschriften.net), are (1) to be transformed into RDF­triples for semantic connection and for a fine­grained quantificational analysis in a Triple Store, (2) XSLT ­transformed into HTML­Documents for further annotation in the semantic annotator "Pundit"(thepund.it). I could surely contribute a programming task based on this points, but would also be happy to participate in another task, preferably based on "adding a TEI mode to a web editor".

Comments:

--Frederike Neuber 23:02, 24 June 2014 (CEST) Who is responsible for this project and can provide further information? I would be very interested in joining this project even if I can't offer material and even if I do not understand one step: how can you transform an epidoc document into RDF triples? Do you mean you generate RDF from the annotated entities (extracting them?)? And how do you do it, are there tools or software to support it?

Zotero
Some TEI encoding project may need to use a "master bibliography" to group together all the bibliographic references that are used in a given text or collection of texts. Using the "masterfile" option in Oxygen give the user a convenient way of pointing to a reference without having to encode a ref each time it is used in a text or in a specific bibliographic section.   15

The bibliography can be inserted in a element with xinclude. But the question is how to use Zotero to create this file, update it and synchonize it with the biblographic masterfile. Though incomplete, the workflow I use works like that :
 * the entering of the bibliographic entries is done with zotero, in a group library
 * the zotero database is then exported in xml using the "bibliontology_rdf" format (the TEI export format being to restrictive in its formatting choices : e.g. does'nt include "short titles" which are a requisite in epigraphy
 * the bibliontology rdf is converted in TEI through XSLT in Oxygen (modifying existing xslt :https://github.com/paregorios/Zotero-RDF-to-TEI-XML)
 * If the user wants to add a new reference in the masterbibliography or correct an entry, he or she has to do go to zotero and follow the whole workflow of export - tranform process. But the user might not have the user right to do so. And of course, there are some potential conflicting issues.

How to improve that workflow ? use a version control system like git or subversion ? or adding/modifying an entry in xml / tei and then updating the zotero group library via api ?

Comments:

Migrating value lists form CSS frameworks to ODD
The author mode of the Oxygen editor can be customized with custom css functions. A part from the function providing a more user-friendly and tag free visualization of the tei content, one of the fuction allow the user to edit attributes or simple elements values using combo boxes or check boxes. The "form controls" can display values collected from an xml schema. But these values can also be just in the oxygen css. This may be used in a workflow to test some choices before integrating them in a more consistent and persistant way in the ODD (then exported in the schema).

This could be useful in case you have user that may be ok to change the values in the css code but would be relunctant to get involved with the ODD editing / schema generation process. cf. http://www.oxygenxml.com/doc/ug-editor/concepts/combo-box-editor.html#combo-box-editor

Comments:

--Raffaele.viglianti 19:20, 16 June 2014 (CEST) This is interesting, would like to work on it. I would suggest to make it less tied to Oxygen and think in terms of CSS to ODD, for example to limit attribute values (e.g. hi[rend=italic])

Elena Spadini 00:13, 24 June 2014 (CEST) I'm not well-exeperienced with the author mode in oXygen and its customizations, but I would like to explore it working on this project.

Visualization of intertextual TEI content
[UPDATED on June 24th]

The focus is on the possibilities offered by TEI XML for encoding intertextuality. The case study concerns the personal library of an author and a digital edition of his work. The objective is to encode the material in such a way that the intertextual relations between the literary work and its external sources is visualized.

These -sometimes subtle- intertextual relationships provide insight into the nature of writing, all the more since the genesis of the literary work itself is already encoded in the edition (i.e. adds, dels, etc.). The envisioned result enables a user to study the writing process in detail as well as on a larger scale.

Material:
 * detailed XML TEI transcription of the literary work concerned
 * high quality digital facsimiles of the author's personal extant library
 * rough transcriptions of the library books (based on OCR)

Comments:


 * Clarification: is the project about the visualization? the relationships? More detail would be great.

Rendering Complex Markup
Rendering complex markup in an innovative and playful way: reconstructing & visualizing author’s geographical position by the dates of sending of his letters (taking uncertainty into account)

Comments:

Magdalena Turska 12:36, 12 June 2014 (CEST)

Starting with a list of letters that have a place and date of sending the idea would be to present map overview of the author's journeys. See the simple Google Map at https://mapsengine.google.com/map/edit?mid=z3q4AefiR7Us.kf1fVBHlu8qE

Fist ideas re visualisation:
 * places color-coded with color getting darker to represent 'later' places
 * similarly shaded lines along the routes between places
 * animation with slider to show circle moving along the routes, size of the circle getting bigger with the uncertainty

Input data: each letter has assigned place name and 1+ time intervals (notBefore to not After)

--Frederike Neuber 22:25, 24 June 2014 (CEST) Would it be a problem to work on different material? @TEI-experts I would like to visualize the provenance of medieval manuscripts on a map, so even if I do not work with letters, the task is partly the same (extracting date and time from the  and visualizing it on a map). Then our scopes separate: Magdalena wants to reconstruct the journey, I want to visualize the chronology of different mss on one map.

Adding a TEI mode to a web editor
Title says it all!

Comments:

--Raffaele.viglianti 19:20, 16 June 2014 (CEST) How does the Angles proposal above sound?

Juxta Script support
Adding support for Bengali text to Juxta Commons

Comments:
 * This is fairly narrow.

Visualization
The archive of early Caribbean texts and images is a fairly large project which will be heavily encoded (and a portion of texts will be encoded by July). I'd like to explore ways of using the TEI to visualize relations between and among texts and elements of texts.

Comments:

Jewish Sepulchral Headstones DB
Given that all headstones can be clearly located, and about 20.000 (of total 26.000) inscriptions are dated, and to a large extend can be distinguished by gender, and also by language usage (hebrew, german, german in Hebrew letters, …), they lend themselves to mining the epidat corpus for specific data facets and try to visualize the results "in an innovative and playful way". 1	A challenging research questions could be the search for and visualization of the differences in word usage and specific idiomatic – between different locations, – within different periods (based on one or several locations), – between inscriptions for men and inscriptions for women. 2	Even more challenging is the search for inscriptions with very similar text coverage. These are quite difficult to discover by a simple full text search. The inscriptions are usually rather short, and the filter to develop has to consider the differentiating elements – usually names and dates – in order to determine, whether two text are identical or not.

All necessary information to answer this questions is contained in the metadata  and data of each single record. example – spatial and temporal metadata (date, country/region code, geo-coordinates, Thesaurus of Getty Names or OSM ID) 1621-08-17  Germany Hamburg  Hamburg-Altona, Königstraße  Jüdischer Friedhof 53.549373 9.950545  example – gender-specific metadata (given according to ISO 5218:2004)    <persName>Schmuel ben Jehuda</persName> <event when='1621-08-17' type="dateofdeath"> </listPerson> </particDesc> example – language Usage metadata: <langUsage> <language ident='he' usage='100'>Hebrew </langUsage>

With respect for the TEI Hackathon I have (just) set up a website with very general information how to harvest epidat records: http://www.steinheim-institut.de/cgi-bin/epidat?info=howtoharvest (to be continued …) Over and beyond that it would make no great difficulty to provide a zip file with the data available for the workshop.

Comments:

Discussion
What commonalities to you see? What seems interesting? Anything to add?

--Frederike Neuber 23:21, 24 June 2014 (CEST) Two (even if very different) projects, "Jewish Sepulchral Headstones DB" and "Rendering complex mark up", are dealing with visualization of spatial and temporal metadata.

Three (again very different) projects, "Visualization of intertextual TEI content", "Semantic Connections for Epigraphic Documents" and "Jewish Sepulchral Headstones DB", are dealing with semantic connection and modelling of content (even if not all of them say it explicitly in the description).

--Raffaele.viglianti 19:07, 16 June 2014 (CEST)

Core Builder
Another possible tool to work on: https://github.com/raffazizzi/coreBuilder It provides a simple web interface to create stand-off markup. TEI files can be open into multiple ACE editors and the user can click on elements with xml:ids to create references. The current version creates elements containing s with pointers to the selected elements. It should be easy enough to make the elements user configurable so that the Core Builder can be used to put together <linkGrp>s or s. The tool is written in CoffeScript with a Backbone framework.