TXM

Synopsis
TXM is a cross-platform text corpora analysis environment and graphical client based on CQP and R, supporting Windows, Linux, Mac OS X (in alpha) and J2EE. The Textométrie scientific project web site is http://textometrie.ens-lyon.fr/?lang=en.

Features

 * Works on any collection of documents of various formats: TXT, XML, TEI P5 (BFM project way), Transcriber, TMX, PPS (Factiva), etc.
 * Applies various NLP tools on the fly on texts before analysis (e.g. TreeTagger for lemmatization and pos tagging)
 * Indexes words and their properties as well as the hierarchical structure of texts
 * Allows to build various subcorpora and partitions (for constrative analysis between text structures or groups of words)
 * Provides qualitative analysis tools : various index and concordances of patterns based on word & structure level queries, rich text edition navigation, patterns occurrences layout display
 * Provides quantitative analysis tools : factorial correspondance analysis, constrative word specificities, hierarchical classification, cooccurrents of patterns
 * Exports any result in CSV, XML or SVG format
 * Runs as standalone Windows, Mac OS X or Linux application
 * Script drivable for repetitive tasks automation or platform extension (in Groovy/Java)
 * Includes a text editor to edit the sources and scripts
 * Runs also as portal web application to access and analyze corpora online through a web browser (with access control management)
 * Open source: based on the best open source components for text analysis: CQP, R and Java & XSL libraries
 * Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components is used by all the applications
 * Efficient Eclipse or Netbeans powered development framework

User commentary
Please sign all comments.

System requirements
The standalone version runs on:
 * Windows (tested on XP, Vista and Seven) - 32bit or 64bit
 * Mac OS X (tested on 10.5 and 10.6)
 * Linux (tested on Ubuntu and Debian) - 32bit or 64bit

The portal server should run on any JVM/J2EE capable platform but has only been tested on a Linux Ubuntu in Tomcat and Glassfish containers for the moment.

Source code and licensing
Open Source under GPL V3 licence.

Support for TEI
Supports TEI and TEI Lite "out of the box" at the XML level: words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.

Supports the TEI P5 encoding semantics used by the Base de Français Médiéval (BFM) project (http://bfm.ens-lyon.fr) at the TEI level: words - #PCDATA, , ..., edition -, ..., structures - , ..., notes, etc. See "BFM encoding manual" - in French http://bfm.ens-lyon.fr/article.php3?id_article=158).

The "TEI P5 BFM" TXM import module is completely written in several Groovy and XSL scripts, so as to be able to be adapted by the user to any specific TEI encoding usage.

TXM Import Modules also provide various import parameters to tune each import process to specific data sources.

[Note: The Presses Universitaires de Caen (PUC) center has successfully experimented the TXM import process on their own TEI text editions (July 2011).]

Language(s)
TXM is written in:
 * C for CQP search engine (independent open source project http://cwb.sourceforge.net)
 * C and R for statistical packages (independent open source project http://www.r-project.org)
 * Java for the Toolbox and the Applications (driven by an independent open source project (independent open source project http://jcp.org/en/home/index))
 * Using the Eclipse RCP framework for the standalone version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)
 * Using the GWT framework for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)
 * Groovy for the import modules and command scripts (independent open source project http://groovy.codehaus.org)

The user interface is currently available in:
 * standalone version:
 * English (EN)
 * French (FR)
 * portal version:
 * English (EN)
 * French (FR)

The documentation is currently available in:
 * standalone version:
 * English (EN)
 * French (FR)
 * portal version:
 * French (FR)

Documentation

 * Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&lang=en
 * Wiki of TXM users community (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)
 * Wiki of TXM developers (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/textometrie
 * All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/textometrie/files/documentation

Tech support
Tech support is mainly provided through two mailing lists (see below).

Users can also use 3 different trackers:
 * Bug Reports - to describe the bugs that you encounter in the software: https://sourceforge.net/tracker/?group_id=247041&atid=1190738
 * Feature requests - to describe the features, changes in interface or any other improvements you want to see in the software: https://sourceforge.net/tracker/?group_id=247041&atid=1190851
 * Request for help - to describe a very difficult technical problem that you encounter in using the software: https://sourceforge.net/tracker/?group_id=247041&atid=1190852

User community
Currently, the user community of TXM is mostly animated trough two mailing lists and a wiki:
 * The international mailing list : textometrie-open AT lists.sourceforge.net (very low activity for the moment)
 * See the archive at http://sourceforge.net/mailarchive/forum.php?forum_name=textometrie-open
 * The mostly French speaking mailing list : txm-users AT cru.fr (the most active)
 * See the archives at https://listes.cru.fr/sympa/arc/txm-users
 * Wiki of TXM users community (in French) at https://listes.cru.fr/wiki/txm-users

TXM is also teached every year at the CNRS summer school called « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.

The JADT conference (http://jadt.org) is the main place where the TXM user community meet.

Sample implementations
The standalone version of TXM is delivered with several sample corpus included, that can be directly analyzed from within TXM after installation.

The portal version of TXM has a demo running online at http://txm.risc.cnrs.fr/test/?locale=en (work in progress).

A previous experiment of a web application based on TXM applied to one TEI encoded text can be found at http://txm.risc.cnrs.fr/txm/texte/quete.

Current version number and date of release

 * standalone: Current version is 0.5 released March 2011
 * portal: Current version is 0.3 beta 2 released July 2011

History of versions
See the Roadmap section on the developer's wiki at http://sourceforge.net/apps/mediawiki/textometrie.

How to download or buy

 * standalone: First point your browser to http://sourceforge.net/projects/textometrie and then click on the green Download button to download the setup for your architecture. [Note for Mac users: Running TXM on Mac is still experimental, please read the Mac setup FAQ entry at https://listes.cru.fr/wiki/txm-users/public/faq#comment_installer_txm_05_sur_mac_os_x (in French sorry)]
 * portal: No war release yet. Please follow the instructions on the developer's wiki at http://sourceforge.net/apps/mediawiki/textometrie/index.php?title=Build_the_toolbox_or_the_application#TXM-WEB_:_GWT_web_application to install and run from sources.

Additional notes
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&lang=en.