From TEIWiki

Jump to: navigation, search



TEITOK - the Tokenized TEI Environment, a web-based platform for viewing, creating, and editing corpora with both rich textual mark-up and linguistic annotation pereferably in TEI annotation. It is developed at the Centro de Linguística da Universidade de Lisboa.


  • Build a corpus where each text consists of a TEI/XML file
  • Annotate and edit each text in the corpus
    • Use a variety of scripts to provide automatic annotations
    • Use an easy GUI to edit manually
  • Search the corpus using CQP
    • Search results give XML fragments rather than raw CQP results
    • Statistical data about the results can be rendered as graphs
    • Edit XML documents directly from the CQP results
  • Visualize each TEI/XML file individually
    • Various visualization options depending on the content of the file
  • Plot the XML documents on the world map (OpenStreetMap)
    • Provide search results direclty on the map
  • Align the XML transcription with a facsimile image
    • Visualize each manuscript line above its transription
    • Get facsimile images of words from a CQP search
    • Transcribe directly from the facsimile image to TEI/XML
  • Work with dependency relations in TEI/XML
    • Searchable using a modified version of CQP (TT-CQP)
    • Create word sketches from the corpus
  • Align the XML transcription with an audio file
    • Get audio fragments direclty from a CQP search
    • Visualize the audio as a waveform
    • Transcribe directly from the audio file to TEI/XML
  • Use stand-off annotation alongside the TEI/XML files
    • Visualization and editing of the stand-off inspired by Brat
    • Search the stand-off annotations direclty in CQP
  • Work with interlinear glossed texts

User commentary

System requirements

Server based software that runs on most Linux servers

Source code and licensing

The source code is available from GitLab, and can be used free of charge

Support for TEI

In principle, TEITOK works with generic XML files, and can hence handle most flavours of TEI/XML. For more advanced features, it assumes the XML to be in TEI/XML P5.


  • Interface written in PHP and Javascript
  • Scripts written in Perl and C++
  • Multilingual interface with customizable internationalization
  • Documentation in English


[1] [2]

Publications: [3]

Tech support

For tech support, there is both a Google group mailing list and a Facebook page for TEITOK.

User community

Sample implementations

Examples of projects using TEITOK can be found on the project website: http://www.teitok.org/. Here are some highlighted projects:

  • Postscriptum - a corpus of handwritten letter in Portuguese and Spanish
  • ODE - a historic corpus of mediaeval Spanish
  • MADISON - a dialectal corpus of Portuguese
  • CroLTeC - a learner corpus of Croatian

Current version number and date of release

Frequent updates, current version is 2.3 (August 2018)

History of versions

A complete version changelog can be found on GitLab

How to download or buy

TEITOK is currently a private project on GitLab. Anybody interested in using TEITOK, please create an account on GitLab and contact the author with your account details to add you as a user to the project: Maarten Janssen.

Additional notes

Personal tools