<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.tei-c.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Sheiden</id>
	<title>TEIWiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.tei-c.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Sheiden"/>
	<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Special:Contributions/Sheiden"/>
	<updated>2026-04-21T13:04:28Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.32.0</generator>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16445</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16445"/>
		<updated>2018-12-06T16:15:42Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Synopsis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
&lt;br /&gt;
[http://textometrie.ens-lyon.fr/?lang=en TXM] is a free and open-source XML &amp;amp; TEI compatible textual corpus analysis framework and graphical client based on the CQP search engine and the R statistical software. It is available as a [http://textometrie.ens-lyon.fr/spip.php?rubrique61 desktop software] for Microsoft Windows, Linux, Mac OS X and as a J2EE [https://sourceforge.net/projects/txm/files/software/TXM%20portal/ web portal software]&amp;lt;ref&amp;gt;See a [https://groupes.renater.fr/wiki/txm-users/public/references_portails list of public portals]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
&lt;br /&gt;
=== Provides qualitative analysis tools ===&lt;br /&gt;
&lt;br /&gt;
* kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
* word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
* word pattern '''progression graphics'''&lt;br /&gt;
* Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
* rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
&lt;br /&gt;
=== Provides quantitative analysis tools, based on [http://www.r-project.org R packages] ===&lt;br /&gt;
* '''factorial correspondance analysis'''&lt;br /&gt;
* '''cluster analysis'''&lt;br /&gt;
* '''specific''' word patterns analysis&lt;br /&gt;
* '''collocations''' analysis&lt;br /&gt;
&lt;br /&gt;
=== Helps to build various corpus configurations ===&lt;br /&gt;
(for contrastive analysis between text structures or word selections)&lt;br /&gt;
* '''sub-corpora'''&lt;br /&gt;
* '''partitions'''&lt;br /&gt;
&lt;br /&gt;
=== Large spectrum of input formats ===&lt;br /&gt;
* several text formats (from raw to rich):&lt;br /&gt;
** '''Unicode TXT'''&lt;br /&gt;
** '''ODT'''&lt;br /&gt;
** '''XML'''&lt;br /&gt;
** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
* speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
* aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
* news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
=== And more ===&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;br /&gt;
&lt;br /&gt;
== Notes ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16444</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16444"/>
		<updated>2018-12-06T16:00:04Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Synopsis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
&lt;br /&gt;
[http://textometrie.ens-lyon.fr/?lang=en TXM] is a free and open-source XML &amp;amp; TEI compatible textual corpus analysis framework and graphical client based on the CQP search engine and the R statistical software. It is available as a [http://textometrie.ens-lyon.fr/spip.php?rubrique61 desktop software] for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal software&amp;lt;ref&amp;gt;See a [https://groupes.renater.fr/wiki/txm-users/public/references_portails list of public portals]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
&lt;br /&gt;
=== Provides qualitative analysis tools ===&lt;br /&gt;
&lt;br /&gt;
* kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
* word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
* word pattern '''progression graphics'''&lt;br /&gt;
* Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
* rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
&lt;br /&gt;
=== Provides quantitative analysis tools, based on [http://www.r-project.org R packages] ===&lt;br /&gt;
* '''factorial correspondance analysis'''&lt;br /&gt;
* '''cluster analysis'''&lt;br /&gt;
* '''specific''' word patterns analysis&lt;br /&gt;
* '''collocations''' analysis&lt;br /&gt;
&lt;br /&gt;
=== Helps to build various corpus configurations ===&lt;br /&gt;
(for contrastive analysis between text structures or word selections)&lt;br /&gt;
* '''sub-corpora'''&lt;br /&gt;
* '''partitions'''&lt;br /&gt;
&lt;br /&gt;
=== Large spectrum of input formats ===&lt;br /&gt;
* several text formats (from raw to rich):&lt;br /&gt;
** '''Unicode TXT'''&lt;br /&gt;
** '''ODT'''&lt;br /&gt;
** '''XML'''&lt;br /&gt;
** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
* speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
* aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
* news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
=== And more ===&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;br /&gt;
&lt;br /&gt;
== Notes ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16443</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16443"/>
		<updated>2018-12-06T15:59:16Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Additional notes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
&lt;br /&gt;
[http://textometrie.ens-lyon.fr/?lang=en TXM] is a free and open-source XML &amp;amp; TEI compatible textual corpus analysis framework and graphical client based on the CQP search engine and the R statistical software. It is available as a [http://textometrie.ens-lyon.fr/spip.php?rubrique61 desktop software] for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal software&amp;lt;ref&amp;gt;See a [https://groupes.renater.fr/wiki/txm-users/public/references_portails list of public porals]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
&lt;br /&gt;
=== Provides qualitative analysis tools ===&lt;br /&gt;
&lt;br /&gt;
* kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
* word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
* word pattern '''progression graphics'''&lt;br /&gt;
* Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
* rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
&lt;br /&gt;
=== Provides quantitative analysis tools, based on [http://www.r-project.org R packages] ===&lt;br /&gt;
* '''factorial correspondance analysis'''&lt;br /&gt;
* '''cluster analysis'''&lt;br /&gt;
* '''specific''' word patterns analysis&lt;br /&gt;
* '''collocations''' analysis&lt;br /&gt;
&lt;br /&gt;
=== Helps to build various corpus configurations ===&lt;br /&gt;
(for contrastive analysis between text structures or word selections)&lt;br /&gt;
* '''sub-corpora'''&lt;br /&gt;
* '''partitions'''&lt;br /&gt;
&lt;br /&gt;
=== Large spectrum of input formats ===&lt;br /&gt;
* several text formats (from raw to rich):&lt;br /&gt;
** '''Unicode TXT'''&lt;br /&gt;
** '''ODT'''&lt;br /&gt;
** '''XML'''&lt;br /&gt;
** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
* speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
* aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
* news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
=== And more ===&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;br /&gt;
&lt;br /&gt;
== Notes ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16442</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16442"/>
		<updated>2018-12-06T15:58:51Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Additional notes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
&lt;br /&gt;
[http://textometrie.ens-lyon.fr/?lang=en TXM] is a free and open-source XML &amp;amp; TEI compatible textual corpus analysis framework and graphical client based on the CQP search engine and the R statistical software. It is available as a [http://textometrie.ens-lyon.fr/spip.php?rubrique61 desktop software] for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal software&amp;lt;ref&amp;gt;See a [https://groupes.renater.fr/wiki/txm-users/public/references_portails list of public porals]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
&lt;br /&gt;
=== Provides qualitative analysis tools ===&lt;br /&gt;
&lt;br /&gt;
* kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
* word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
* word pattern '''progression graphics'''&lt;br /&gt;
* Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
* rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
&lt;br /&gt;
=== Provides quantitative analysis tools, based on [http://www.r-project.org R packages] ===&lt;br /&gt;
* '''factorial correspondance analysis'''&lt;br /&gt;
* '''cluster analysis'''&lt;br /&gt;
* '''specific''' word patterns analysis&lt;br /&gt;
* '''collocations''' analysis&lt;br /&gt;
&lt;br /&gt;
=== Helps to build various corpus configurations ===&lt;br /&gt;
(for contrastive analysis between text structures or word selections)&lt;br /&gt;
* '''sub-corpora'''&lt;br /&gt;
* '''partitions'''&lt;br /&gt;
&lt;br /&gt;
=== Large spectrum of input formats ===&lt;br /&gt;
* several text formats (from raw to rich):&lt;br /&gt;
** '''Unicode TXT'''&lt;br /&gt;
** '''ODT'''&lt;br /&gt;
** '''XML'''&lt;br /&gt;
** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
* speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
* aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
* news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
=== And more ===&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16441</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16441"/>
		<updated>2018-12-06T15:58:00Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Synopsis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
&lt;br /&gt;
[http://textometrie.ens-lyon.fr/?lang=en TXM] is a free and open-source XML &amp;amp; TEI compatible textual corpus analysis framework and graphical client based on the CQP search engine and the R statistical software. It is available as a [http://textometrie.ens-lyon.fr/spip.php?rubrique61 desktop software] for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal software&amp;lt;ref&amp;gt;See a [https://groupes.renater.fr/wiki/txm-users/public/references_portails list of public porals]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
&lt;br /&gt;
=== Provides qualitative analysis tools ===&lt;br /&gt;
&lt;br /&gt;
* kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
* word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
* word pattern '''progression graphics'''&lt;br /&gt;
* Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
* rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
&lt;br /&gt;
=== Provides quantitative analysis tools, based on [http://www.r-project.org R packages] ===&lt;br /&gt;
* '''factorial correspondance analysis'''&lt;br /&gt;
* '''cluster analysis'''&lt;br /&gt;
* '''specific''' word patterns analysis&lt;br /&gt;
* '''collocations''' analysis&lt;br /&gt;
&lt;br /&gt;
=== Helps to build various corpus configurations ===&lt;br /&gt;
(for contrastive analysis between text structures or word selections)&lt;br /&gt;
* '''sub-corpora'''&lt;br /&gt;
* '''partitions'''&lt;br /&gt;
&lt;br /&gt;
=== Large spectrum of input formats ===&lt;br /&gt;
* several text formats (from raw to rich):&lt;br /&gt;
** '''Unicode TXT'''&lt;br /&gt;
** '''ODT'''&lt;br /&gt;
** '''XML'''&lt;br /&gt;
** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
* speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
* aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
* news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
=== And more ===&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16440</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16440"/>
		<updated>2018-12-06T15:54:13Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Features */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
&lt;br /&gt;
[http://textometrie.ens-lyon.fr/?lang=en TXM] is a free and open-source XML &amp;amp; TEI compatible textual corpus analysis framework and graphical client based on the CQP search engine and the R statistical software. It is available as a desktop software for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal software.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
&lt;br /&gt;
=== Provides qualitative analysis tools ===&lt;br /&gt;
&lt;br /&gt;
* kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
* word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
* word pattern '''progression graphics'''&lt;br /&gt;
* Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
* rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
&lt;br /&gt;
=== Provides quantitative analysis tools, based on [http://www.r-project.org R packages] ===&lt;br /&gt;
* '''factorial correspondance analysis'''&lt;br /&gt;
* '''cluster analysis'''&lt;br /&gt;
* '''specific''' word patterns analysis&lt;br /&gt;
* '''collocations''' analysis&lt;br /&gt;
&lt;br /&gt;
=== Helps to build various corpus configurations ===&lt;br /&gt;
(for contrastive analysis between text structures or word selections)&lt;br /&gt;
* '''sub-corpora'''&lt;br /&gt;
* '''partitions'''&lt;br /&gt;
&lt;br /&gt;
=== Large spectrum of input formats ===&lt;br /&gt;
* several text formats (from raw to rich):&lt;br /&gt;
** '''Unicode TXT'''&lt;br /&gt;
** '''ODT'''&lt;br /&gt;
** '''XML'''&lt;br /&gt;
** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
* speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
* aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
* news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
=== And more ===&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16439</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16439"/>
		<updated>2018-12-06T15:50:06Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Synopsis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
&lt;br /&gt;
[http://textometrie.ens-lyon.fr/?lang=en TXM] is a free and open-source XML &amp;amp; TEI compatible textual corpus analysis framework and graphical client based on the CQP search engine and the R statistical software. It is available as a desktop software for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal software.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16438</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16438"/>
		<updated>2018-12-06T15:47:48Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Synopsis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
&lt;br /&gt;
[http://textometrie.ens-lyon.fr/?lang=en TXM] is a free and open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal software.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16437</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16437"/>
		<updated>2018-12-06T15:46:17Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Synopsis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
[http://textometrie.ens-lyon.fr/?lang=en TXM] is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16436</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=16436"/>
		<updated>2018-12-06T14:34:08Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* How to download or buy */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://textometrie.ens-lyon.fr/spip.php?rubrique61&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15776</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15776"/>
		<updated>2017-04-02T09:43:12Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]] [[Category: TEI:P4]] [[Category: TEI:P5]] __NOTOC__&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://teibyexample.org TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of French Revolution parliamentary debates from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], French classical plays (Corneille, Molière, Racine, etc.)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/?command=documentation&amp;amp;path=/GRAAL '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
* [https://textgrid.de/digitale-bibliothek '''TextGrid''' Digital Library] conversion from Zeno-XML-Markup to XML-TEI and additional markup&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/stuartyeates/sampler URLs to diverse TEI files] in terms of language, structure, linguistics, coding, tools in use, hosting method, etc.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Works in French and TEI] (Baudelaire, Hugo, Rimbaud, Verlaine, Balzac, Descartes, La Fayette, Sade, Saint-Simon, etc.)&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15562</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15562"/>
		<updated>2016-12-23T12:21:04Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]] [[Category: TEI:P4]] [[Category: TEI:P5]] __NOTOC__&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://teibyexample.org TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of French Revolution parliamentary debates from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], French classical plays (Corneille, Molière, Racine, etc.)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
* [https://textgrid.de/digitale-bibliothek '''TextGrid''' Digital Library] conversion from Zeno-XML-Markup to XML-TEI and additional markup&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/stuartyeates/sampler URLs to diverse TEI files] in terms of language, structure, linguistics, coding, tools in use, hosting method, etc.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Works in French and TEI] (Baudelaire, Hugo, Rimbaud, Verlaine, Balzac, Descartes, La Fayette, Sade, Saint-Simon, etc.)&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15561</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15561"/>
		<updated>2016-12-23T11:18:13Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: dramacode and oeuvres update&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]] [[Category: TEI:P4]] [[Category: TEI:P5]] __NOTOC__&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://teibyexample.org TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of French Revolution parliamentary debates from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], French plays (Corneille, Molière, Racine, etc.)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
* [https://textgrid.de/digitale-bibliothek '''TextGrid''' Digital Library] conversion from Zeno-XML-Markup to XML-TEI and additional markup&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/stuartyeates/sampler URLs to diverse TEI files] in terms of language, structure, linguistics, coding, tools in use, hosting method, etc.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Works in French and TEI] (Baudelaire, Hugo, Rimbaud, Verlaine, Balzac, Descartes, La Fayette, Sade, Saint-Simon, etc.)&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15560</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15560"/>
		<updated>2016-12-23T10:58:22Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: minor edit of French Revolution&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]] [[Category: TEI:P4]] [[Category: TEI:P5]] __NOTOC__&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://teibyexample.org TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of French Revolution parliamentary debates from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
* [https://textgrid.de/digitale-bibliothek '''TextGrid''' Digital Library] conversion from Zeno-XML-Markup to XML-TEI and additional markup&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/stuartyeates/sampler URLs to diverse TEI files] in terms of language, structure, linguistics, coding, tools in use, hosting method, etc.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15559</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15559"/>
		<updated>2016-12-23T10:55:02Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: updated front paragraph lines&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]] [[Category: TEI:P4]] [[Category: TEI:P5]] __NOTOC__&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://teibyexample.org TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of parliamentary debates during French Revolution from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
* [https://textgrid.de/digitale-bibliothek '''TextGrid''' Digital Library] conversion from Zeno-XML-Markup to XML-TEI and additional markup&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/stuartyeates/sampler URLs to diverse TEI files] in terms of language, structure, linguistics, coding, tools in use, hosting method, etc.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15558</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15558"/>
		<updated>2016-12-23T10:53:35Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: updated http://teibyexample.org&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://teibyexample.org TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of parliamentary debates during French Revolution from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
* [https://textgrid.de/digitale-bibliothek '''TextGrid''' Digital Library] conversion from Zeno-XML-Markup to XML-TEI and additional markup&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/stuartyeates/sampler URLs to diverse TEI files] in terms of language, structure, linguistics, coding, tools in use, hosting method, etc.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15557</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15557"/>
		<updated>2016-12-23T10:52:12Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: added https://github.com/stuartyeates/sampler&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of parliamentary debates during French Revolution from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
* [https://textgrid.de/digitale-bibliothek '''TextGrid''' Digital Library] conversion from Zeno-XML-Markup to XML-TEI and additional markup&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/stuartyeates/sampler URLs to diverse TEI files] in terms of language, structure, linguistics, coding, tools in use, hosting method, etc.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15556</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15556"/>
		<updated>2016-12-21T15:15:40Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Texts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of parliamentary debates during French Revolution from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], le catalogue, textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
* [https://textgrid.de/digitale-bibliothek '''TextGrid''' Digital Library] conversion from Zeno-XML-Markup to XML-TEI and additional markup&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15555</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15555"/>
		<updated>2016-12-21T15:14:42Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Texts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of parliamentary debates of French Revolution from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], le catalogue, textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
* [https://textgrid.de/digitale-bibliothek '''TextGrid''' Digital Library] conversion from Zeno-XML-Markup to XML-TEI and additional markup&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15554</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15554"/>
		<updated>2016-12-21T14:14:00Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Texts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of parliamentary debates of French Revolution from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], le catalogue, textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts]) (32853 texts as of 2015-01-01)&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15553</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15553"/>
		<updated>2016-12-21T14:11:36Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Texts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of parliamentary debates of French Revolution from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], le catalogue, textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts])&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15552</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15552"/>
		<updated>2016-12-21T14:10:55Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Texts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei '''FRDA'''] (French Revolution Digital Archive) TEI of full text for 82 volumes of parliamentary debates and deliberations of French Revolution from 1787 to 1794&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], le catalogue, textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts])&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15551</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15551"/>
		<updated>2016-12-21T14:01:37Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Texts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei TEI of full text for 82 volumes of the Archives Parlementaires]&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://www.deutschestextarchiv.de/download '''DTA'''] Deutsches Textarchiv (2435 texts as of 2016-12-21)&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode], le catalogue, textes de théâtre en libre accès&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts])&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15519</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15519"/>
		<updated>2016-12-19T16:35:05Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Texts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei TEI of full text for 82 volumes of the Archives Parlementaires]&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode, le catalogue, textes de théâtre en libre accès]&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts])&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/oeuvres Œuvres en français et en TEI]&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15518</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15518"/>
		<updated>2016-12-19T16:32:28Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Texts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei TEI of full text for 82 volumes of the Archives Parlementaires]&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [http://dramacode.github.io Dramacode, le catalogue, textes de théâtre en libre accès]&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership '''EEBO'''] collection Phase 1 TEI P5 XML versions of texts (Text Creation Partnership's Early English Books Online) ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts])&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15517</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15517"/>
		<updated>2016-12-19T16:26:14Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Texts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei TEI of full text for 82 volumes of the Archives Parlementaires]&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership Text Creation Partnership's Early English Books Online ('''EEBO''') collection Phase 1 TEI P5 XML versions of texts] ([https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv CSV file listing all the texts])&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15516</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15516"/>
		<updated>2016-12-19T16:23:23Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei TEI of full text for 82 volumes of the Archives Parlementaires]&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/textcreationpartnership Text Creation Partnership's Early English Books Online (EEBO) collection Phase 1 TEI P5 XML versions of texts]&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Main_Page&amp;diff=15391</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Main_Page&amp;diff=15391"/>
		<updated>2016-11-03T21:40:29Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: added &amp;quot;key&amp;quot; tutorials to their link description and some bold&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__ __NOEDITSECTION__&lt;br /&gt;
This is a wiki devoted to the '''[http://www.tei-c.org/ Text Encoding Initiative (TEI)]'''. It is created by TEI-ers for TEI-ers, and if you wish to contribute something or join the discussions, you are most welcome – all you need to do is [[Special:Userlogin| login or register]]. Choose from the following:&lt;br /&gt;
&lt;br /&gt;
==== Technical matters: building and manipulating TEI XML ====&lt;br /&gt;
* [http://www.tei-c.org/Support/Learn/tutorials.xml#tut-gen XML, TEI by Example, What is the Text Encoding Initiative?, teiHeader '''tutorials''']&lt;br /&gt;
* [http://www.tei-c.org/Guidelines/P5/ TEI P5 '''Guidelines''']: [http://www.tei-c.org/Vault/P5/ current and archived past releases] in the [http://www.tei-c.org/Vault/ Vault]&lt;br /&gt;
** [http://www.tei-c.org/Support/Learn/tutorials.xml#tut-glp Project-specific encoding guidelines]&lt;br /&gt;
* Browse examples of markup or code submitted to this wiki:&lt;br /&gt;
** [[Samples|samples of TEI '''documents''']],&lt;br /&gt;
** [[:Category:Code|'''stylesheets''', '''scripts''', and other '''code''': XSLT, CSS, XQuery, Schematron, and more]].&lt;br /&gt;
* Read about [[:Category:Tools|useful '''software''' (editors, processors and such)]].&lt;br /&gt;
* Read about [[Roma]] and [[Vesta]], the software suites that produce TEI customizations, documentation and schemas, as well as [[ODD]], the language larger than TEI.&lt;br /&gt;
** [[:Category:Customization|customizations of TEI schemas (ODD files, DTD extensions)]],&lt;br /&gt;
* Find a [[Crosswalks|crosswalk]] (mapping) between a TEI header and another metadata format.&lt;br /&gt;
* See [[:Category:Projects|'''projects''' using TEI]] (in addition to [http://www.tei-c.org/Activities/Projects/ those listed on the TEI website]).&lt;br /&gt;
&lt;br /&gt;
==== TEI Community matters ====&lt;br /&gt;
* [[TEI Cheatsheets]]&lt;br /&gt;
* [[:Category:Community|the TEI community and related communities]]&lt;br /&gt;
* [[FAQ]]&lt;br /&gt;
* [[Conferences]]&lt;br /&gt;
* [[Current events]]&lt;br /&gt;
* [[TEI-C_Board_of_Directors|Board of Directors]]&lt;br /&gt;
* [[Council|Technical Council]]&lt;br /&gt;
* [[:Category:SIG|Special Interest Groups (SIGs)]]: [[SIG:Computer-Mediated_Communication|Computer-Mediated Communication]], [[SIG:Correspondence|Correspondence]], [[SIG:Education|Education]], [[FacsimileMarkup|Facsimile Markup]], [[SIG:Libraries|Libraries]], [[SIG:MSS|Manuscripts]], [[SIG:Music|Music]], [[SIG:Ontologies|Ontologies]], [[SIG:Overlap|Overlap]], [[SIG:Scholarly Publishing|Scholarly Publishing]], [[SIG:TEI_for_Linguists|TEI for Linguists]], [[SIG:Text&amp;amp;Graphic|Text and Graphics]], [[SIG:Tools|Tools]]&lt;br /&gt;
* [[:Category:Publications|Publication opportunities]]&lt;br /&gt;
* [[:Category:Grants|Grant opportunities]]&lt;br /&gt;
* [[:Category:Suggestions|Suggestions]]&lt;br /&gt;
&lt;br /&gt;
==== TEI Wiki ====&lt;br /&gt;
* [[Help:Contents|Editing help]]&lt;br /&gt;
* [[TEIWiki:IdleTalk|&amp;amp;lt;idleTalk/&amp;gt;]] – the local bulletin board&lt;br /&gt;
* [[Special:Wantedpages|Articles waiting to be born]]&lt;br /&gt;
* [[:Category:Articles that need extending|Articles that need extending]]&lt;br /&gt;
* [[Special:AllPages|All pages in the wiki]]&lt;br /&gt;
* [[Special:Categories|All categories in the wiki]]&lt;br /&gt;
* [[:Category:Users|Optional real TEIWiki-users list]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
See also the [http://www.tei-c.org/ main TEI web site] and the [https://github.com/TEIC/TEI TEI GitHub page]. The Text Encoding Initiative also has [http://en.wikipedia.org/wiki/Text_Encoding_Initiative an entry in Wikipedia].&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
'''Note:''' We require registration in order to avoid automated spam attacks and page vandalism. For the same reason, this front page has been locked and is unable to be edited by normal users.  In addition, throughout the wiki certain key words which are prevalent in spam attacks have been banned.  If you need to use any of these words, or to suggest a change to the homepage, please '''contact one of [[Special:ListUsers/sysop|the &amp;quot;bureaucrat&amp;quot; users]]'''.&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Main_Page&amp;diff=15390</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Main_Page&amp;diff=15390"/>
		<updated>2016-11-03T20:55:53Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: move Tools link before the customizations links&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__ __NOEDITSECTION__&lt;br /&gt;
This is a wiki devoted to the '''[http://www.tei-c.org/ Text Encoding Initiative (TEI)]'''. It is created by TEI-ers for TEI-ers, and if you wish to contribute something or join the discussions, you are most welcome – all you need to do is [[Special:Userlogin| login or register]]. Choose from the following:&lt;br /&gt;
&lt;br /&gt;
==== Technical matters: building and manipulating TEI XML ====&lt;br /&gt;
* [http://www.tei-c.org/Support/Learn/tutorials.xml#tut-gen XML, TEI, teiHeader Tutorials]&lt;br /&gt;
* [http://www.tei-c.org/Guidelines/P5/ TEI P5 Guidelines]: [http://www.tei-c.org/Vault/P5/ current and archived past releases] in the [http://www.tei-c.org/Vault/ Vault]&lt;br /&gt;
** [http://www.tei-c.org/Support/Learn/tutorials.xml#tut-glp Project-specific encoding guidelines]&lt;br /&gt;
* Browse examples of markup or code submitted to this wiki:&lt;br /&gt;
** [[Samples|samples of TEI documents]],&lt;br /&gt;
** [[:Category:Code|'''stylesheets, scripts, and other code''': XSLT, CSS, XQuery, Schematron, and more]].&lt;br /&gt;
* Read about [[:Category:Tools|'''useful software''' (editors, processors and such)]].&lt;br /&gt;
* Read about [[Roma]] and [[Vesta]], the software suites that produce TEI customizations, documentation and schemas, as well as [[ODD]], the language larger than TEI.&lt;br /&gt;
** [[:Category:Customization|customizations of TEI schemas (ODD files, DTD extensions)]],&lt;br /&gt;
* Find a [[Crosswalks|crosswalk]] (mapping) between a TEI header and another metadata format.&lt;br /&gt;
* See [[:Category:Projects|projects using TEI]] (in addition to [http://www.tei-c.org/Activities/Projects/ those listed on the TEI website]).&lt;br /&gt;
&lt;br /&gt;
==== TEI Community matters ====&lt;br /&gt;
* [[TEI Cheatsheets]]&lt;br /&gt;
* [[:Category:Community|the TEI community and related communities]]&lt;br /&gt;
* [[FAQ]]&lt;br /&gt;
* [[Conferences]]&lt;br /&gt;
* [[Current events]]&lt;br /&gt;
* [[TEI-C_Board_of_Directors|Board of Directors]]&lt;br /&gt;
* [[Council|Technical Council]]&lt;br /&gt;
* [[:Category:SIG|Special Interest Groups (SIGs)]]: [[SIG:Computer-Mediated_Communication|Computer-Mediated Communication]], [[SIG:Correspondence|Correspondence]], [[SIG:Education|Education]], [[FacsimileMarkup|Facsimile Markup]], [[SIG:Libraries|Libraries]], [[SIG:MSS|Manuscripts]], [[SIG:Music|Music]], [[SIG:Ontologies|Ontologies]], [[SIG:Overlap|Overlap]], [[SIG:Scholarly Publishing|Scholarly Publishing]], [[SIG:TEI_for_Linguists|TEI for Linguists]], [[SIG:Text&amp;amp;Graphic|Text and Graphics]], [[SIG:Tools|Tools]]&lt;br /&gt;
* [[:Category:Publications|Publication opportunities]]&lt;br /&gt;
* [[:Category:Grants|Grant opportunities]]&lt;br /&gt;
* [[:Category:Suggestions|Suggestions]]&lt;br /&gt;
&lt;br /&gt;
==== TEI Wiki ====&lt;br /&gt;
* [[Help:Contents|Editing help]]&lt;br /&gt;
* [[TEIWiki:IdleTalk|&amp;amp;lt;idleTalk/&amp;gt;]] – the local bulletin board&lt;br /&gt;
* [[Special:Wantedpages|Articles waiting to be born]]&lt;br /&gt;
* [[:Category:Articles that need extending|Articles that need extending]]&lt;br /&gt;
* [[Special:AllPages|All pages in the wiki]]&lt;br /&gt;
* [[Special:Categories|All categories in the wiki]]&lt;br /&gt;
* [[:Category:Users|Optional real TEIWiki-users list]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
See also the [http://www.tei-c.org/ main TEI web site] and the [https://github.com/TEIC/TEI TEI GitHub page]. The Text Encoding Initiative also has [http://en.wikipedia.org/wiki/Text_Encoding_Initiative an entry in Wikipedia].&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
'''Note:''' We require registration in order to avoid automated spam attacks and page vandalism. For the same reason, this front page has been locked and is unable to be edited by normal users.  In addition, throughout the wiki certain key words which are prevalent in spam attacks have been banned.  If you need to use any of these words, or to suggest a change to the homepage, please '''contact one of [[Special:ListUsers/sysop|the &amp;quot;bureaucrat&amp;quot; users]]'''.&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Main_Page&amp;diff=15389</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Main_Page&amp;diff=15389"/>
		<updated>2016-11-03T20:48:51Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: inserted 'XML, TEI, teiHeader Tutorials' before the Guidelines&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__ __NOEDITSECTION__&lt;br /&gt;
This is a wiki devoted to the '''[http://www.tei-c.org/ Text Encoding Initiative (TEI)]'''. It is created by TEI-ers for TEI-ers, and if you wish to contribute something or join the discussions, you are most welcome – all you need to do is [[Special:Userlogin| login or register]]. Choose from the following:&lt;br /&gt;
&lt;br /&gt;
==== Technical matters: building and manipulating TEI XML ====&lt;br /&gt;
* [http://www.tei-c.org/Support/Learn/tutorials.xml#tut-gen XML, TEI, teiHeader Tutorials]&lt;br /&gt;
* [http://www.tei-c.org/Guidelines/P5/ TEI P5 Guidelines]: [http://www.tei-c.org/Vault/P5/ current and archived past releases] in the [http://www.tei-c.org/Vault/ Vault]&lt;br /&gt;
** [http://www.tei-c.org/Support/Learn/tutorials.xml#tut-glp Project-specific encoding guidelines]&lt;br /&gt;
* Browse examples of markup or code submitted to this wiki:&lt;br /&gt;
** [[Samples|samples of TEI documents]],&lt;br /&gt;
** [[:Category:Code|'''stylesheets, scripts, and other code''': XSLT, CSS, XQuery, Schematron, and more]].&lt;br /&gt;
* Read about [[Roma]] and [[Vesta]], the software suites that produce TEI customizations, documentation and schemas, as well as [[ODD]], the language larger than TEI.&lt;br /&gt;
** [[:Category:Customization|customizations of TEI schemas (ODD files, DTD extensions)]],&lt;br /&gt;
* Find a [[Crosswalks|crosswalk]] (mapping) between a TEI header and another metadata format.&lt;br /&gt;
* Read about [[:Category:Tools|'''useful software''' (editors, processors and such)]].&lt;br /&gt;
* See [[:Category:Projects|projects using TEI]] (in addition to [http://www.tei-c.org/Activities/Projects/ those listed on the TEI website]).&lt;br /&gt;
&lt;br /&gt;
==== TEI Community matters ====&lt;br /&gt;
* [[TEI Cheatsheets]]&lt;br /&gt;
* [[:Category:Community|the TEI community and related communities]]&lt;br /&gt;
* [[FAQ]]&lt;br /&gt;
* [[Conferences]]&lt;br /&gt;
* [[Current events]]&lt;br /&gt;
* [[TEI-C_Board_of_Directors|Board of Directors]]&lt;br /&gt;
* [[Council|Technical Council]]&lt;br /&gt;
* [[:Category:SIG|Special Interest Groups (SIGs)]]: [[SIG:Computer-Mediated_Communication|Computer-Mediated Communication]], [[SIG:Correspondence|Correspondence]], [[SIG:Education|Education]], [[FacsimileMarkup|Facsimile Markup]], [[SIG:Libraries|Libraries]], [[SIG:MSS|Manuscripts]], [[SIG:Music|Music]], [[SIG:Ontologies|Ontologies]], [[SIG:Overlap|Overlap]], [[SIG:Scholarly Publishing|Scholarly Publishing]], [[SIG:TEI_for_Linguists|TEI for Linguists]], [[SIG:Text&amp;amp;Graphic|Text and Graphics]], [[SIG:Tools|Tools]]&lt;br /&gt;
* [[:Category:Publications|Publication opportunities]]&lt;br /&gt;
* [[:Category:Grants|Grant opportunities]]&lt;br /&gt;
* [[:Category:Suggestions|Suggestions]]&lt;br /&gt;
&lt;br /&gt;
==== TEI Wiki ====&lt;br /&gt;
* [[Help:Contents|Editing help]]&lt;br /&gt;
* [[TEIWiki:IdleTalk|&amp;amp;lt;idleTalk/&amp;gt;]] – the local bulletin board&lt;br /&gt;
* [[Special:Wantedpages|Articles waiting to be born]]&lt;br /&gt;
* [[:Category:Articles that need extending|Articles that need extending]]&lt;br /&gt;
* [[Special:AllPages|All pages in the wiki]]&lt;br /&gt;
* [[Special:Categories|All categories in the wiki]]&lt;br /&gt;
* [[:Category:Users|Optional real TEIWiki-users list]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
See also the [http://www.tei-c.org/ main TEI web site] and the [https://github.com/TEIC/TEI TEI GitHub page]. The Text Encoding Initiative also has [http://en.wikipedia.org/wiki/Text_Encoding_Initiative an entry in Wikipedia].&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
'''Note:''' We require registration in order to avoid automated spam attacks and page vandalism. For the same reason, this front page has been locked and is unable to be edited by normal users.  In addition, throughout the wiki certain key words which are prevalent in spam attacks have been banned.  If you need to use any of these words, or to suggest a change to the homepage, please '''contact one of [[Special:ListUsers/sysop|the &amp;quot;bureaucrat&amp;quot; users]]'''.&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Main_Page&amp;diff=15388</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Main_Page&amp;diff=15388"/>
		<updated>2016-11-03T20:40:05Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: Reorganized the 'Technical matters: building and manipulating TEI XML' by topics&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__ __NOEDITSECTION__&lt;br /&gt;
This is a wiki devoted to the '''[http://www.tei-c.org/ Text Encoding Initiative (TEI)]'''. It is created by TEI-ers for TEI-ers, and if you wish to contribute something or join the discussions, you are most welcome – all you need to do is [[Special:Userlogin| login or register]]. Choose from the following:&lt;br /&gt;
&lt;br /&gt;
==== Technical matters: building and manipulating TEI XML ====&lt;br /&gt;
* [http://www.tei-c.org/Guidelines/P5/ TEI P5 Guidelines]: [http://www.tei-c.org/Vault/P5/ current and archived past releases] in the [http://www.tei-c.org/Vault/ Vault]&lt;br /&gt;
** [http://www.tei-c.org/Support/Learn/tutorials.xml#tut-glp Project-specific encoding guidelines]&lt;br /&gt;
* Browse examples of markup or code submitted to this wiki:&lt;br /&gt;
** [[Samples|samples of TEI documents]],&lt;br /&gt;
** [[:Category:Code|'''stylesheets, scripts, and other code''': XSLT, CSS, XQuery, Schematron, and more]].&lt;br /&gt;
* Read about [[Roma]] and [[Vesta]], the software suites that produce TEI customizations, documentation and schemas, as well as [[ODD]], the language larger than TEI.&lt;br /&gt;
** [[:Category:Customization|customizations of TEI schemas (ODD files, DTD extensions)]],&lt;br /&gt;
* Find a [[Crosswalks|crosswalk]] (mapping) between a TEI header and another metadata format.&lt;br /&gt;
* Read about [[:Category:Tools|'''useful software''' (editors, processors and such)]].&lt;br /&gt;
* See [[:Category:Projects|projects using TEI]] (in addition to [http://www.tei-c.org/Activities/Projects/ those listed on the TEI website]).&lt;br /&gt;
&lt;br /&gt;
==== TEI Community matters ====&lt;br /&gt;
* [[TEI Cheatsheets]]&lt;br /&gt;
* [[:Category:Community|the TEI community and related communities]]&lt;br /&gt;
* [[FAQ]]&lt;br /&gt;
* [[Conferences]]&lt;br /&gt;
* [[Current events]]&lt;br /&gt;
* [[TEI-C_Board_of_Directors|Board of Directors]]&lt;br /&gt;
* [[Council|Technical Council]]&lt;br /&gt;
* [[:Category:SIG|Special Interest Groups (SIGs)]]: [[SIG:Computer-Mediated_Communication|Computer-Mediated Communication]], [[SIG:Correspondence|Correspondence]], [[SIG:Education|Education]], [[FacsimileMarkup|Facsimile Markup]], [[SIG:Libraries|Libraries]], [[SIG:MSS|Manuscripts]], [[SIG:Music|Music]], [[SIG:Ontologies|Ontologies]], [[SIG:Overlap|Overlap]], [[SIG:Scholarly Publishing|Scholarly Publishing]], [[SIG:TEI_for_Linguists|TEI for Linguists]], [[SIG:Text&amp;amp;Graphic|Text and Graphics]], [[SIG:Tools|Tools]]&lt;br /&gt;
* [[:Category:Publications|Publication opportunities]]&lt;br /&gt;
* [[:Category:Grants|Grant opportunities]]&lt;br /&gt;
* [[:Category:Suggestions|Suggestions]]&lt;br /&gt;
&lt;br /&gt;
==== TEI Wiki ====&lt;br /&gt;
* [[Help:Contents|Editing help]]&lt;br /&gt;
* [[TEIWiki:IdleTalk|&amp;amp;lt;idleTalk/&amp;gt;]] – the local bulletin board&lt;br /&gt;
* [[Special:Wantedpages|Articles waiting to be born]]&lt;br /&gt;
* [[:Category:Articles that need extending|Articles that need extending]]&lt;br /&gt;
* [[Special:AllPages|All pages in the wiki]]&lt;br /&gt;
* [[Special:Categories|All categories in the wiki]]&lt;br /&gt;
* [[:Category:Users|Optional real TEIWiki-users list]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
See also the [http://www.tei-c.org/ main TEI web site] and the [https://github.com/TEIC/TEI TEI GitHub page]. The Text Encoding Initiative also has [http://en.wikipedia.org/wiki/Text_Encoding_Initiative an entry in Wikipedia].&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
'''Note:''' We require registration in order to avoid automated spam attacks and page vandalism. For the same reason, this front page has been locked and is unable to be edited by normal users.  In addition, throughout the wiki certain key words which are prevalent in spam attacks have been banned.  If you need to use any of these words, or to suggest a change to the homepage, please '''contact one of [[Special:ListUsers/sysop|the &amp;quot;bureaucrat&amp;quot; users]]'''.&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Category:Projects&amp;diff=15383</id>
		<title>Category:Projects</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Category:Projects&amp;diff=15383"/>
		<updated>2016-11-03T12:32:25Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Root|Projects]]&lt;br /&gt;
&lt;br /&gt;
This page collects together any pages relating to Projects using TEI.  Any wiki pages which include '''&amp;lt;nowiki&amp;gt;[[Category:Projects]]&amp;lt;/nowiki&amp;gt;''' will appear here.&lt;br /&gt;
&lt;br /&gt;
'''Please help us document projects using TEI!  It's easy:'''&lt;br /&gt;
* Log in (Create a user account for this WIKI if you have not done so.)&lt;br /&gt;
* Add a new page. To do this, add the name of the new page to the URL in your browser's address bar after &amp;lt;code&amp;gt;index.php/&amp;lt;/code&amp;gt; .For example, if you wish to create a new page called &amp;quot;MyProject&amp;quot;, just edit the URL in your web browser's address bar to look like this: &amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;http://www.tei-c.org.uk/wiki/index.php/MyProject&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
*  Assuming no one else has made a page with the name you chose, you will be brought to a blank page. Click the edit button and do the following:&lt;br /&gt;
** '''Copy and paste [[Project template|this code]] to help structure the page.'''&lt;br /&gt;
** Wrap any non-wiki content (scripts, code, command-line setup instructions, etc.) in &amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;pre&amp;gt; .... &amp;lt;/pre&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; tags&lt;br /&gt;
** Preview and then Save your page and you are finished. You can go back and edit at any time.&lt;br /&gt;
&lt;br /&gt;
[[Category:Community]]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Project_template&amp;diff=15382</id>
		<title>Project template</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Project_template&amp;diff=15382"/>
		<updated>2016-11-03T09:02:06Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: template for project description based on Tool template + tei web site project description input form&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;When creating or revising a page for a particular project, please consider copying and pasting this code to help structure the page.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[[Category:Projects]]&lt;br /&gt;
&lt;br /&gt;
===== Title (name of the project) =====&lt;br /&gt;
&lt;br /&gt;
* Host Institution:&lt;br /&gt;
* URL: [http://www.tei-c.org] (please place your url here)&lt;br /&gt;
&lt;br /&gt;
== Other institutions involved ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== Subject area ====&lt;br /&gt;
&lt;br /&gt;
(please choose)&lt;br /&gt;
* Archival and Museum Information&lt;br /&gt;
* Language Composition and Teaching&lt;br /&gt;
* Classical and Medieval Literature&lt;br /&gt;
* Historical Materials&lt;br /&gt;
* Dictionaries and Lexicographies&lt;br /&gt;
* Language Corpora&lt;br /&gt;
* Electronic Publishing&lt;br /&gt;
* Literary Texts&lt;br /&gt;
* Miscellaneous&lt;br /&gt;
* Music Historical Texts&lt;br /&gt;
* Politics and Journalism&lt;br /&gt;
* Religious Texts&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== Main Language ====&lt;br /&gt;
&lt;br /&gt;
(Historical and dialectal forms of modern languages should be classified under the nearest modern descendant or relative)&lt;br /&gt;
&lt;br /&gt;
* Multilingual&lt;br /&gt;
* Armenian&lt;br /&gt;
* Bulgarian&lt;br /&gt;
* Chinese&lt;br /&gt;
* Czech&lt;br /&gt;
* Danish&lt;br /&gt;
* Dutch&lt;br /&gt;
* English&lt;br /&gt;
* Estonian&lt;br /&gt;
* Finnish&lt;br /&gt;
* French&lt;br /&gt;
* German&lt;br /&gt;
* Greek&lt;br /&gt;
* Hebrew&lt;br /&gt;
* etc.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== TEI Guidelines Version ====&lt;br /&gt;
&lt;br /&gt;
(Version of TEI Guidelines used to encode texts in the project)&lt;br /&gt;
&lt;br /&gt;
(please choose)&lt;br /&gt;
* P5 (2007-present)&lt;br /&gt;
* P4 (2002-2007)&lt;br /&gt;
* P3 (1994-2002)&lt;br /&gt;
* P2 (1992-1994)&lt;br /&gt;
* P1 (1990-1993)&lt;br /&gt;
* Various Versions&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== General description ====&lt;br /&gt;
&lt;br /&gt;
(Explain the role of the project, its intended audience, the scope of the TEI resources being created, and their coverage)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== Implementation description ====&lt;br /&gt;
&lt;br /&gt;
(Include comments on which TEI tagsets you use, any modifications made, etc.)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== Related resources ====&lt;br /&gt;
&lt;br /&gt;
(Include links to any local manuals or articles)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== Copyright information ====&lt;br /&gt;
&lt;br /&gt;
(Include links to any specific access restrictions)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:TEI Wiki]]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15381</id>
		<title>Samples of TEI texts</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Samples_of_TEI_texts&amp;diff=15381"/>
		<updated>2016-11-03T06:55:53Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: typo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category: Markup]]&lt;br /&gt;
[[Category: TEI:P4]]&lt;br /&gt;
[[Category: TEI:P5]]&lt;br /&gt;
__NOTOC__&lt;br /&gt;
&lt;br /&gt;
The availability of texts enables others to learn by example; fosters similar approaches to solving the same problems across the entire community of practice; and gives developers of TEI-based tools a broader sample of texts to test against. The fact a text is listed here should not be taken as a licence to redistribute the text, please check with text owners should they wish to make any more in-depth use of these materials.&lt;br /&gt;
&lt;br /&gt;
== Explicitly Pedagogical Samples ==&lt;br /&gt;
&lt;br /&gt;
* [http://tbe.kantl.be/TBE/ TEI By Example] is a set of examples design to teach the basics of TEI. Includes TEI P3 (SGML) and P5 examples. [http://creativecommons.org/licenses/by-sa/3.0/ CC licensed].&lt;br /&gt;
&lt;br /&gt;
== Texts == &lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/ala2004/redist/inscriptions/inscriptions.zip ala2004] ([[EpiDoc]] XML) from the [http://insaph.kcl.ac.uk/ala2004 '''Aphrodisias in Late Antiquity'''] publication. The downloadable .zip archive contains 230 XML files, each containing an ancient Greek inscription, which validate to the version 4 of the [http://epidoc.sf.net/ EpiDoc] DTD (a TEI localization)--the DTD is also included in the archive. These files are licenced under [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution], so please feel free to do whatever you like with them! (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://archimedespalimpsest.net/ '''Archimedes Palimpsest'''], XML files containing the transcriptions of the Archimedes text, released (like all the Palimpsest data and metadata) under Creative Commons Attribution 3.0 Unported. Texts validate to TEI P5. One XML file per folio page (scroll down list of hi-res photographs in each directory). Format: TEI P5&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/StanfordUniversityLibraries/ap_tei TEI of full text for 82 volumes of the Archives Parlementaires]&lt;br /&gt;
&lt;br /&gt;
* [http://www.ota.ox.ac.uk/headers/2493.xml The '''Auchinleck Manuscript'''], made available by the [http://www.ota.ox.ac.uk/ Oxford Text Archive] contact [mailto:ota-info@rt.oucs.ox.ac.uk ota-info@rt.oucs.ox.ac.uk].  This text originates from the [http://www.nls.uk/auchinleck/ Auchinleck Manuscript Project] at the National Library of Scotland, please see their website for more contextual material. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/CopticScriptorium/corpora Coptic SCRIPTORIUM corpora] for the [http://copticscriptorium.org/ Coptic SCRIPTORIUM]&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/papyri/idp.data '''Duke Databank'''/Heidelberg/APIS] ([[EpiDoc]] XML) aggregated data from the Duke Databank of Documentary Papyri (DDbDP: transcribed Greek texts) the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV: metadata), and the Advanced Papyrological Information System. Approx 145,000 XML files released under Creative Commons Attribution license (CC-BY), by the [http://idp.atlantides.org/trac/idp/wiki Integrating Digital Papyrology] project. Format: TEI P5.&lt;br /&gt;
&lt;br /&gt;
* [http://epidoc.cch.kcl.ac.uk/inscriptions/index.html '''EpiDoc Demo''' Website], a growing collection of sample [[EpiDoc]] XML files, including examples from epigraphic, papyrological, and other ancient projects. XML downloadable from each transformed inscription. (Vintage 2007.) (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://www.folgerdigitaltexts.org/ '''Folger Digital Texts''']: From the Folger Shakespeare Library, &amp;quot;Each play in Folger Digital Texts is rigorously encoded: every word, every punctuation mark, every space, within a sophisticated, TEI-compliant XML structure.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
* A subset of [http://www.gutenberg.org/ Project '''Gutenberg'''] is available as TEI, go to [http://www.gutenberg.org/catalog/world/search http://www.gutenberg.org/catalog/world/search] and select &amp;quot;TEI Text Encoding Initiative (tei)&amp;quot; as the file type.&lt;br /&gt;
&lt;br /&gt;
* [http://insaph.kcl.ac.uk/iaph2007/inscriptions/xml-repo.html '''IAph2007''' ([[EpiDoc]] XML files)] from the [http://insaph.kcl.ac.uk/iaph2007/ Inscriptions of Aphrodisias (2007)] publication. There are approx 1500 XML files available (either in a single .zip or as individual files either downloadable or linkable directly for dynamic processing), each containing an ancient Greek or Latin inscription. All files validate to the [[EpiDoc]] DTD (version 5). These files are licensed under [http://creativecommons.org/licenses/by/2.0/uk/ Creative Commons Attribution (UK)], so please feel free to do exciting things with them. (Format: TEI P4)&lt;br /&gt;
&lt;br /&gt;
* [http://irt.kcl.ac.uk/irt2009/inscr/xmlrepo.html '''Inscriptions of Roman Tripolitania''' 2009] ([[EpiDoc]] XML), about 1000 Latin and Greek inscriptions available for download under Creative Commons Attribution (CC-BY) licence. Format: TEI P4.&lt;br /&gt;
&lt;br /&gt;
* [http://www.sbl-site.org/Resources/Resources_ManuscriptMarkup.aspx Files] referenced in Timothy J. Finney, &amp;quot;'''Manuscript Markup''',&amp;quot; in ''The Freer Biblical Manuscripts: Fresh Studies of an American Treasure Trove'' (ed. Larry W. Hurtado; SBLTCS 6; Atlanta: Society of Biblical Literature, 2006), 263-87. These include a partial [http://www.sbl-site.org/assets/U16/U16.xml transcription] of the Freer manuscript of Paul (Gregory-Aland I 016), a [http://www.sbl-site.org/assets/U16/U16.xsl transform], a [http://www.sbl-site.org/assets/U16/U16.css stylesheet] and a [http://www.sbl-site.org/assets/U16/U16.htm web page] produced from the transcription by the transform. (Format: TEI P5)&lt;br /&gt;
&lt;br /&gt;
* The [http://www.nzetc.org/ NZETC] has a range of '''New Zealand and Pacific-Islands''' texts. The texts are P5 encoded and the TEI is generally downloadable from the document table of contents. Features include:&lt;br /&gt;
** Use of &amp;lt;revisionDesc&amp;gt; and &amp;lt;change&amp;gt; tags to implement workflow&lt;br /&gt;
** &amp;lt;name&amp;gt; tag used extensively for personal, ship, place, organisation and work names (keyed to external authority at [http://authority.nzetc.org/])&lt;br /&gt;
** Use of  xml:lang=&amp;quot;en&amp;quot; and  xml:lang=&amp;quot;mi&amp;quot; for texts with English and Maori (plus small amounts of other languages)&lt;br /&gt;
** Page images, facsimile PDFs and typeset PDFs  (some texts only, for example [http://www.nzetc.org/tm/scholarly/tei-JCB-001.html this letter])&lt;br /&gt;
** Document-by-document licensing, some documents under a creative commons license (licensing info not currently stored in the TEI).&lt;br /&gt;
&lt;br /&gt;
* The University of [http://www.ota.ox.ac.uk/ '''Oxford Text Archive'''] (OTA) is home to some 2685 TEI P5 texts, including all of the ECCO texts which are in the public domain, all available under CC licences, plus some TEI P5 linguistic corpora, and others following older editions of the guidelines, with legacy licences. The OTA exists as a community resource, and projects and people are encouraged to offer texts for deposit in the archive.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.perseus.tufts.edu/hopper/opensource Perseus Project] makes its TEI P4 XML collections in Greek, Latin, and English available from http://www.perseus.tufts.edu/hopper/opensource under a Creative Commons Sharalike/Non-Commercial/Attribution license.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/BZA/bzaComCatWeb.html '''Samyukta Agama''' Project] at Dharma Drum Buddhist College provides access to its more than 1000 TEI source files. Click on any cluster and find the link to the TEI source at the bottom of each column. The files are in Chinese, Pali and Sanskrit. Markup documentation, schemas and stylesheets are available as a zip archive at the website.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/biographies/gis/ '''Chinese Buddhist Bibliographies''' Project] at Dharma Drum Buddhist College provides access to different collections with more than 1000 biographies marked up in TEI for place and person names as well as dates. The archives contain basic documentation, schema etc. The data is in Chinese, linked to authority databases and available through three different interfaces visualizing it as GIS light, social network and on a timeline. All the data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://buddhistinformatics.ddbc.edu.tw/fosizhi/ '''Chinese Buddhist Temple Gazetteers''' Project] at Dharma Drum Buddhist College provides access to topographical descriptions of Buddhist temple marked up for place and person names as well as dates. All together there are 237 gazetteers, 13 of which are available with TEI markup and new punctuation. The archives contain the TEI, image files referenced in the TEI, schema, METS wrapper etc. The data is linked to authority databases and available through an interface that displays the marked-up edition next to the images. The data is published under a CC licence.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.tei-c.org/Activities/MI/Samples/ '''Migration Samples'''] page on the main TEI website includes sample texts from (inter alia) the British National Corpus, the Thomas McGreevey Archive, Early English Books Online, Multext East, Documenting the American South, and the Women Writers Project which were prepared as part of the TEI P4 Migration Work Group, the purpose of which was to demonstrate how to migrate TEI P3 (SGML) to TEI P4 (XML). Most of the material here is therefore of a certain antiquity.&lt;br /&gt;
&lt;br /&gt;
* The [http://www.bvh.univ-tours.fr/ BVH] project ('''Virtual Humanistic Libraries''')  is a virtual library of high-quality digitised documents, offering a selection of Renaissance books located in the libraries of the Région Centre, Paris, Poitiers, Lyons, Troyes, etc. Three samples of TEI texts are proposed in html, pdf and xml/tei on [http://www.bvh.univ-tours.fr:8080/xtf/search?title=&amp;amp;creator=&amp;amp;year=&amp;amp;keyword=&amp;amp;type=tei Epistemon]. These files are licenced under Creative Commons Attribution.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/iulibdcs/tei_text TEI and Plain Text from Digital Collections Services, Indiana University Libraries]&lt;br /&gt;
&lt;br /&gt;
* TEI in dspace example http://dspace.nitle.org/handle/10090/11695 (P4?) (seems broken May 2012)&lt;br /&gt;
&lt;br /&gt;
* The [http://sarit.indology.info SARIT] project has recently brought out an electronic TEI-encoded edition of a 2007 print publication.  It is a work on Buddhist tantric religion:   Christian K. Wedemeyer, ed., ''Āryadeva's Lamp that Integrates the Practices (Caryāmelāpakapradīpa): The Gradual Path of Vajrayāna Buddhism According to the Esoteric Community Noble Tradition - Part Three: Critically Edited Sanskrit Text of Āryadeva's Caryāmelāpakapradīpa,'' (New York: The American Institute of Buddhist Studies at Columbia University in New York with Columbia University's Center for Buddhist Studies and Tibet House US, 2007). E-details and full text can be seen [[http://sarit.indology.info/newphilo/navigate.pl?indologica.16 here]].  Clicking [[http://sarit.indology.info/downloads.shtml Downloads]] on the above screen offers downloadable TEI, PDF and HTML versions of this e-text, and several others. The interesting thing about this e-text from the TEI point of view is the encoding and display of the manuscript variants to the critical edition.  It was good of the publishers and editors to give their permission for the e-dissemination of this work just three years after print publication. Best, Dr Dominik Wujastyk.&lt;br /&gt;
&lt;br /&gt;
* [http://txm.bfm-corpus.org/txm/ '''La Queste del Saint Graal'''] (The Quest of the Holy Grail) online interactive edition offers a parallel multi-level (normalized, diplomatic and imitative) transcription of the Lyon MN PA 77 manuscript along with manuscript images and a translation in modern French, powered by TXM text search and statistical analysis platform. The complete source XML-TEI P5 encoded with Menota extensions manuscript transcriptions and an ODD customization file, as well as the stylesheets used to produce HTML editions and a PDF printable version are freely available for [http://txm.bfm-corpus.org/txm/images/graal_src.zip download] under a CC BY-NC-SA 3.0 license.&lt;br /&gt;
&lt;br /&gt;
* [http://docsouth.unc.edu/southlit/poe/menu.html &amp;quot;Tales&amp;quot; by Edgar Allan Poe] at University of North Carolina at Chapel Hill. Uses mnemonic entity references for non-ASCII characters.&lt;br /&gt;
&lt;br /&gt;
* [https://github.com/TEI-examples/tei-examples tei-examples] -- Examples of TEI documents dealing with different use-cases.&lt;br /&gt;
&lt;br /&gt;
== Dictionaries ==&lt;br /&gt;
* [[FreeDict]] is a repository of various TEI-encoded bilingual translating dictionaries on free licenses (http://www.freedict.org/). Most of the dictionaries have been converted from TEI P4 to TEI P5, but not all of the changes can be found in the official releases yet. Visiting [http://freedict.svn.sourceforge.net/viewvc/freedict/trunk/ the SVN repository] directly may be the better way out.&lt;br /&gt;
* [http://ducange.enc.sorbonne.fr/ Du Cange] is a medieval latin dictionary (mostly written during XVIIe XVIIIe). The printed text is encoded in TEI-P5, freely available at http://svn.code.sf.net/p/ducange/code/xml/ as an [http://sourceforge.net/p/ducange/wiki/Home/ open source project]. The TEI choices are [http://svn.code.sf.net/p/ducange/code/xml/ducange.html documented (in french)].&lt;br /&gt;
* [http://algone.net/littre/ Littré] a classical French dictionary, encoded in TEI-P5, freely available at https://svn.code.sf.net/p/javacrim/code/littre/xml/, [https://svn.code.sf.net/p/javacrim/code/littre/xml/schema.html documented in French with the words of Littré himself]&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14995</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14995"/>
		<updated>2016-06-29T09:45:00Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Support for TEI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://groupes.renater.fr/wiki/txm-info/xml_tei_txm&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14390</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14390"/>
		<updated>2015-08-02T15:39:35Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Support for TEI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
Experiments with other TEI sources:&lt;br /&gt;
* Victorian Women Writers Project (english): http://www.dlib.indiana.edu/collections/vwwp&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14389</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14389"/>
		<updated>2015-08-02T15:29:06Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* User community */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
&lt;br /&gt;
TXM community uses two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available:&lt;br /&gt;
* Monthly in Lyon (France): https://groupes.renater.fr/wiki/txm-users/public/ateliers_txm&lt;br /&gt;
* Regularly in workshops: [http://dh2015.org/workshops Sydney 2015], [http://www.germanistik.uni-wuerzburg.de/lehrstuehle/computerphilologie/aktuelles/veranstaltungen/workshop_introduction_to_the_txm_content_analysis_platform Wurzburg 2014]&lt;br /&gt;
* and Summer schools: MISAT 2011, [http://www.iqla.org/IQLA-GIAT%20Summer%20School%202013.pdf Padova 2013] &lt;br /&gt;
&lt;br /&gt;
The JADT biennal international conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14388</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14388"/>
		<updated>2015-08-02T15:19:44Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* User community */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT).&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14387</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14387"/>
		<updated>2015-08-02T15:05:23Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Tech support */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
&lt;br /&gt;
Tech support is community based.&lt;br /&gt;
&lt;br /&gt;
Feedbacks and bug reports:&lt;br /&gt;
# Check if your feedback is not already reported in the [http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues?query_id=31 TXM platform bug tracker]&lt;br /&gt;
# If not, send your feedback either:&lt;br /&gt;
#* in the [https://lists.sourceforge.net/lists/listinfo/txm-open txm-open mailing list]&lt;br /&gt;
#* by directly editing the [https://groupes.renater.fr/wiki/txm-users/public/retours_de_bugs_logiciel/txm_0.7.7 txm-users wiki]&lt;br /&gt;
#* by chating directly with the developers on the #txm IRC channel of the 'irc.freenode.net' server ([http://webchat.freenode.net/?channels=txm web access])&lt;br /&gt;
#* by contacting the TXM team at the 'textometrie AT ens-lyon.fr' address&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14386</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14386"/>
		<updated>2015-08-02T14:58:56Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Documentation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
&lt;br /&gt;
* TXM manual (in French) http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/Manuel%20de%20TXM%200.7%20FR.pdf&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in French) https://groupes.renater.fr/wiki/txm-info&lt;br /&gt;
* All published documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14385</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14385"/>
		<updated>2015-08-02T14:50:54Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Text/Corpus Language(s) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&amp;lt;br/&amp;gt;&lt;br /&gt;
* ZH: best results have been reported by using the ZPar tokenizer &amp;amp;amp; tagger - http://sourceforge.net/projects/zpar&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&amp;amp;lang=en&lt;br /&gt;
** See for example the TXM manual (in French) at http://txm.svn.sourceforge.net/viewvc/txm/trunk/doc/Manuel%20de%20TXM%200.7%20FR.pdf?revision=2332&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/txm&lt;br /&gt;
* All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14384</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14384"/>
		<updated>2015-08-02T14:46:56Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Support for TEI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library (classical latin, old greek): http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid (german): http://www.textgrid.de/en&lt;br /&gt;
* Base de Français Médiéval - BFM (old french): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon (middle french): http://www.bvh.univ-tours.fr/Epistemon &amp;amp;amp; https://groupes.renater.fr/wiki/txm-users/public/mise_en_ligne_du_corpus_bvh_avec_txm&lt;br /&gt;
* PROCLAC mesopotamian tablets (akkadian): https://groupes.renater.fr/wiki/txm-users/public/umr_proclac_corpus_akkadien&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet (french): http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* NLTK - Brown Corpus - TEI XML Version (american english): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org (french): http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* Frantext - libre (french): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* XML-TXM - TXM own pivot format (any language): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&amp;amp;lang=en&lt;br /&gt;
** See for example the TXM manual (in French) at http://txm.svn.sourceforge.net/viewvc/txm/trunk/doc/Manuel%20de%20TXM%200.7%20FR.pdf?revision=2332&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/txm&lt;br /&gt;
* All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14383</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=14383"/>
		<updated>2015-08-02T14:36:54Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Source code and licensing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence: [http://sourceforge.net/p/txm/code/HEAD/tree/trunk http://sourceforge.net/p/txm/code/HEAD/tree/trunk].&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library: http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid: http://www.textgrid.de/en&lt;br /&gt;
* NLTK - Brown Corpus (TEI XML Version): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Frantext (libre): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* Base de Français Médiéval (BFM): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon: http://www.bvh.univ-tours.fr/Epistemon&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet: http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org: http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* TXM (TXM own pivot format): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&amp;amp;lang=en&lt;br /&gt;
** See for example the TXM manual (in French) at http://txm.svn.sourceforge.net/viewvc/txm/trunk/doc/Manuel%20de%20TXM%200.7%20FR.pdf?revision=2332&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/txm&lt;br /&gt;
* All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* 0.7.7 (2015-07-31)&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
http://perso.ens-lyon.fr/serge.heiden/txm/files/software/TXM/0.7.7/&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=Category:Editing_tools&amp;diff=13941</id>
		<title>Category:Editing tools</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=Category:Editing_tools&amp;diff=13941"/>
		<updated>2014-11-07T20:55:11Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
These are XML editors and other tools for editing TEI texts.&lt;br /&gt;
&lt;br /&gt;
For a comparison between these tools, see the '[[Editors|Editors for TEI, sorted by Beginner-friendliness]]' table.&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
&lt;br /&gt;
* http://en.wikipedia.org/wiki/XML_editor&lt;br /&gt;
* http://epu.ucc.ie/articles/extreme06&lt;br /&gt;
* http://texteditors.org/cgi-bin/wiki.pl?XMLEditorFamily&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13856</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13856"/>
		<updated>2014-10-08T14:46:43Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* History of versions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence.&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library: http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid: http://www.textgrid.de/en&lt;br /&gt;
* NLTK - Brown Corpus (TEI XML Version): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Frantext (libre): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* Base de Français Médiéval (BFM): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon: http://www.bvh.univ-tours.fr/Epistemon&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet: http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org: http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* TXM (TXM own pivot format): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&amp;amp;lang=en&lt;br /&gt;
** See for example the TXM manual (in French) at http://txm.svn.sourceforge.net/viewvc/txm/trunk/doc/Manuel%20de%20TXM%200.7%20FR.pdf?revision=2332&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/txm&lt;br /&gt;
* All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* TXM desktop: Current version is 0.7.6 released July 2014&lt;br /&gt;
* TXM portal: Current version is 0.6alpha released June 2014&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development project report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
TXM is free to download and use:&lt;br /&gt;
* desktop (Windows, Mac, Linux):&lt;br /&gt;
** First point your browser to http://sourceforge.net/projects/txm&lt;br /&gt;
** Then click on the green Download button to download the setup for your architecture.&lt;br /&gt;
* portal (J2EE):&lt;br /&gt;
** First choose the archive for your architecture at [https://sourceforge.net/projects/txm/files/software/TXM%20portal https://sourceforge.net/projects/txm/files/software/TXM portal]&lt;br /&gt;
** Then follow installation instructions at https://sourceforge.net/apps/mediawiki/txm/index.php?title=TXM_WEB:_Quick_Install&lt;br /&gt;
** See also the demo portal http://portal.textometrie.org/demo/?locale=en&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13855</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13855"/>
		<updated>2014-10-08T14:46:16Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* History of versions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence.&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library: http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid: http://www.textgrid.de/en&lt;br /&gt;
* NLTK - Brown Corpus (TEI XML Version): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Frantext (libre): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* Base de Français Médiéval (BFM): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon: http://www.bvh.univ-tours.fr/Epistemon&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet: http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org: http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* TXM (TXM own pivot format): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&amp;amp;lang=en&lt;br /&gt;
** See for example the TXM manual (in French) at http://txm.svn.sourceforge.net/viewvc/txm/trunk/doc/Manuel%20de%20TXM%200.7%20FR.pdf?revision=2332&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/txm&lt;br /&gt;
* All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* TXM desktop: Current version is 0.7.6 released July 2014&lt;br /&gt;
* TXM portal: Current version is 0.6alpha released June 2014&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See software development projet report page: http://forge.cbp.ens-lyon.fr/redmine/projects/txm/issues/report.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
TXM is free to download and use:&lt;br /&gt;
* desktop (Windows, Mac, Linux):&lt;br /&gt;
** First point your browser to http://sourceforge.net/projects/txm&lt;br /&gt;
** Then click on the green Download button to download the setup for your architecture.&lt;br /&gt;
* portal (J2EE):&lt;br /&gt;
** First choose the archive for your architecture at [https://sourceforge.net/projects/txm/files/software/TXM%20portal https://sourceforge.net/projects/txm/files/software/TXM portal]&lt;br /&gt;
** Then follow installation instructions at https://sourceforge.net/apps/mediawiki/txm/index.php?title=TXM_WEB:_Quick_Install&lt;br /&gt;
** See also the demo portal http://portal.textometrie.org/demo/?locale=en&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13854</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13854"/>
		<updated>2014-10-08T14:38:17Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Current version number and date of release */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence.&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library: http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid: http://www.textgrid.de/en&lt;br /&gt;
* NLTK - Brown Corpus (TEI XML Version): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Frantext (libre): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* Base de Français Médiéval (BFM): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon: http://www.bvh.univ-tours.fr/Epistemon&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet: http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org: http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* TXM (TXM own pivot format): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&amp;amp;lang=en&lt;br /&gt;
** See for example the TXM manual (in French) at http://txm.svn.sourceforge.net/viewvc/txm/trunk/doc/Manuel%20de%20TXM%200.7%20FR.pdf?revision=2332&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/txm&lt;br /&gt;
* All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* TXM desktop: Current version is 0.7.6 released July 2014&lt;br /&gt;
* TXM portal: Current version is 0.6alpha released June 2014&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See the Roadmap section on the developer's wiki at http://sourceforge.net/apps/mediawiki/txm.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
TXM is free to download and use:&lt;br /&gt;
* desktop (Windows, Mac, Linux):&lt;br /&gt;
** First point your browser to http://sourceforge.net/projects/txm&lt;br /&gt;
** Then click on the green Download button to download the setup for your architecture.&lt;br /&gt;
* portal (J2EE):&lt;br /&gt;
** First choose the archive for your architecture at [https://sourceforge.net/projects/txm/files/software/TXM%20portal https://sourceforge.net/projects/txm/files/software/TXM portal]&lt;br /&gt;
** Then follow installation instructions at https://sourceforge.net/apps/mediawiki/txm/index.php?title=TXM_WEB:_Quick_Install&lt;br /&gt;
** See also the demo portal http://portal.textometrie.org/demo/?locale=en&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13853</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13853"/>
		<updated>2014-10-08T14:36:35Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence.&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus Digital Library: http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid: http://www.textgrid.de/en&lt;br /&gt;
* NLTK - Brown Corpus (TEI XML Version): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Frantext (libre): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* Base de Français Médiéval (BFM): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon: http://www.bvh.univ-tours.fr/Epistemon&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet: http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org: http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* TXM (TXM own pivot format): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&amp;amp;lang=en&lt;br /&gt;
** See for example the TXM manual (in French) at http://txm.svn.sourceforge.net/viewvc/txm/trunk/doc/Manuel%20de%20TXM%200.7%20FR.pdf?revision=2332&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/txm&lt;br /&gt;
* All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* TXM desktop: Current version is 0.7.5 released February 2014&lt;br /&gt;
* TXM portal: Current version is 0.6alpha released June 2014&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See the Roadmap section on the developer's wiki at http://sourceforge.net/apps/mediawiki/txm.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
TXM is free to download and use:&lt;br /&gt;
* desktop (Windows, Mac, Linux):&lt;br /&gt;
** First point your browser to http://sourceforge.net/projects/txm&lt;br /&gt;
** Then click on the green Download button to download the setup for your architecture.&lt;br /&gt;
* portal (J2EE):&lt;br /&gt;
** First choose the archive for your architecture at [https://sourceforge.net/projects/txm/files/software/TXM%20portal https://sourceforge.net/projects/txm/files/software/TXM portal]&lt;br /&gt;
** Then follow installation instructions at https://sourceforge.net/apps/mediawiki/txm/index.php?title=TXM_WEB:_Quick_Install&lt;br /&gt;
** See also the demo portal http://portal.textometrie.org/demo/?locale=en&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13406</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13406"/>
		<updated>2014-06-06T19:10:29Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Sample implementations */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence.&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus: http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid: http://www.textgrid.de/en&lt;br /&gt;
* NLTK - Brown Corpus (TEI XML Version): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Frantext (libre): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* Base de Français Médiéval (BFM): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon: http://www.bvh.univ-tours.fr/Epistemon&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet: http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org: http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* TXM (TXM own pivot format): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&amp;amp;lang=en&lt;br /&gt;
** See for example the TXM manual (in French) at http://txm.svn.sourceforge.net/viewvc/txm/trunk/doc/Manuel%20de%20TXM%200.7%20FR.pdf?revision=2332&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/txm&lt;br /&gt;
* All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* TXM desktop: Current version is 0.7.5 released February 2014&lt;br /&gt;
* TXM portal: Current version is 0.6alpha released June 2014&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See the Roadmap section on the developer's wiki at http://sourceforge.net/apps/mediawiki/txm.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
TXM is free to download and use:&lt;br /&gt;
* desktop (Windows, Mac, Linux):&lt;br /&gt;
** First point your browser to http://sourceforge.net/projects/txm&lt;br /&gt;
** Then click on the green Download button to download the setup for your architecture.&lt;br /&gt;
* portal (J2EE):&lt;br /&gt;
** First choose the archive for your architecture at [https://sourceforge.net/projects/txm/files/software/TXM%20portal https://sourceforge.net/projects/txm/files/software/TXM portal]&lt;br /&gt;
** Then follow installation instructions at https://sourceforge.net/apps/mediawiki/txm/index.php?title=TXM_WEB:_Quick_Install&lt;br /&gt;
** See also the demo portal http://portal.textometrie.org/demo/?locale=en&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
	<entry>
		<id>https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13405</id>
		<title>TXM</title>
		<link rel="alternate" type="text/html" href="https://wiki.tei-c.org/index.php?title=TXM&amp;diff=13405"/>
		<updated>2014-06-06T19:09:41Z</updated>

		<summary type="html">&lt;p&gt;Sheiden: /* Support for TEI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Tools]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Administrative tools]]&lt;br /&gt;
[[Category:Development tools]]&lt;br /&gt;
[[Category:Conversion and preprocessing tools]]&lt;br /&gt;
[[Category:Publishing and delivery tools]]&lt;br /&gt;
[[Category:Querying tools]]&lt;br /&gt;
[[Category:Analysis tools]]&lt;br /&gt;
[[Category:All-in-one Tools]]&lt;br /&gt;
[[Category:Interfaces]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Discovering]]&lt;br /&gt;
[[Category:Comparing]]&lt;br /&gt;
[[Category:Sampling]]&lt;br /&gt;
[[Category:Illustrating]]&lt;br /&gt;
[[Category:Representing]]&lt;br /&gt;
&lt;br /&gt;
== Synopsis ==&lt;br /&gt;
TXM is free, open-source Unicode, XML &amp;amp; TEI compatible text/corpus analysis environment and graphical client based on CQP and R. It is available for Microsoft Windows, Linux, Mac OS X and as a J2EE web portal.&lt;br /&gt;
&lt;br /&gt;
== Features ==&lt;br /&gt;
* Provides qualitative analysis tools:&lt;br /&gt;
** kwic '''concordances''' of word patterns based on the efficient [http://cwb.sourceforge.net CQP] full text search engine and its powerfull CQL query language&lt;br /&gt;
** word pattern '''frequency lists''' based on any word property (graphical form or type, lemma, pos...)&lt;br /&gt;
** word pattern '''progression graphics'''&lt;br /&gt;
** Examples of word patterns, expressed in the CQL query language which is based on word &amp;amp; structural level properties:&lt;br /&gt;
*** &amp;quot;aiming&amp;quot; to simply search for the word 'aiming'&lt;br /&gt;
*** &amp;quot;.*ing&amp;quot; to search for words ending in &amp;quot;ing&amp;quot; (including mainly verb forms)&lt;br /&gt;
*** [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for verb forms ending in &amp;quot;.ing&amp;quot; (where Part of Speech annotation is present)&lt;br /&gt;
*** [lemma=&amp;quot;group&amp;quot;] []{0,3} [pos=&amp;quot;VERB&amp;quot; &amp;amp; word=&amp;quot;.*ing&amp;quot;] to search for the collocation &amp;lt;group lemma&amp;gt; followed by a &amp;lt;verb with progressive aspect&amp;gt; with at most 3 words in between&lt;br /&gt;
** rich HTML-based text edition navigation with links from all other tools&lt;br /&gt;
* Provides quantitative analysis tools, based on [http://www.r-project.org R packages]:&lt;br /&gt;
** '''factorial correspondance analysis'''&lt;br /&gt;
** '''cluster analysis'''&lt;br /&gt;
** '''specific''' word patterns analysis&lt;br /&gt;
** '''collocations''' analysis&lt;br /&gt;
* Helps to build various corpus configurations: '''sub-corpora''' or '''partitions''' (for contrastive analysis between text structures or word selections)&lt;br /&gt;
* Large spectrum of input formats&lt;br /&gt;
** several text formats (from raw to rich):&lt;br /&gt;
*** '''Unicode TXT'''&lt;br /&gt;
*** '''ODT'''&lt;br /&gt;
*** '''XML'''&lt;br /&gt;
*** '''XML/w''' (where some or all word limits and properties can be pre-encoded)&lt;br /&gt;
*** XML-'''TEI P4''' (according to Perseus project practice)&lt;br /&gt;
*** XML-'''TEI P5''' (according to various projects practice: BFM, BVH, NLTK, etc.)&lt;br /&gt;
** speech transcription: XML-'''TRS''' (from Transcriber software, with time synchro)&lt;br /&gt;
** aligned corpora: XML-'''TMX''' (with texts in relation of translation or versioning)&lt;br /&gt;
** news portal articles: XML-'''PPS''' (Factiva), Europresse&lt;br /&gt;
** etc.&lt;br /&gt;
* Applies various NLP tools on the fly on texts before analysis (e.g. '''TreeTagger''' for lemmatization and pos tagging)&lt;br /&gt;
* Indexes words and their properties as well as hierarchical structure of texts&lt;br /&gt;
* Indexes external or internal text metadata or speaker metadata&lt;br /&gt;
* '''Export'''s any result in CSV, XML or SVG format&lt;br /&gt;
* Provides Scripting facilities for repetitive or lengthy tasks automation or for platform extension (in '''Groovy'''/Java dynamic language)&lt;br /&gt;
* Includes a complete '''text editor''' to edit data sources, results and scripts&lt;br /&gt;
* Runs as a desktop application for '''Windows''', '''Mac OS X''' or '''Linux'''&lt;br /&gt;
* Runs also as '''web portal''' to give corpora access and analysis online through any web browser (including account and access control management)&lt;br /&gt;
* '''Open source''' licence: based on the best open source components for text analysis: CQP, R and Java &amp;amp; XSLT libraries&lt;br /&gt;
* Modular architecture (Eclipse RCP OSGi and J2EE conformant): one toolbox connecting all core components to build the applications&lt;br /&gt;
* Efficient Eclipse or Netbeans powered development framework&lt;br /&gt;
&lt;br /&gt;
== User comments ==&lt;br /&gt;
'''Please sign all comments.'''&lt;br /&gt;
&lt;br /&gt;
== System requirements ==&lt;br /&gt;
The desktop version runs on:&lt;br /&gt;
* Windows - 32bit or 64bit (tested on XP, Vista, 7 and 8)&lt;br /&gt;
* Mac OS X (tested on 10.5, 10.6, 10.7, 10.8 and 10.9)&lt;br /&gt;
* Linux - 32bit or 64bit (tested on Ubuntu and Debian)&lt;br /&gt;
&lt;br /&gt;
The portal server runs on any J2EE capable platform (tested in Tomcat and Glassfish).&lt;br /&gt;
&lt;br /&gt;
== Source code and licensing ==&lt;br /&gt;
Open Source under GPL V3 licence.&lt;br /&gt;
&lt;br /&gt;
== Support for TEI ==&lt;br /&gt;
Supports TEI and TEI Lite &amp;quot;out of the box&amp;quot; '''at XML level semantics''': words will be tokenized inside any #PCDATA and all the XML structure will be imported directly as textual structure.&lt;br /&gt;
&lt;br /&gt;
Supports various flavours of TEI P5 encoding semantics '''at TEI level semantics''':&lt;br /&gt;
* words and their properties: &amp;lt;nowiki&amp;gt;#PCDATA, &amp;lt;w&amp;gt;, &amp;lt;num&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* editorial markup: &amp;lt;nowiki&amp;gt;&amp;lt;sic&amp;gt;, &amp;lt;corr&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* texts and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;TEI&amp;gt;, &amp;lt;text&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* intermediate text structures and their properties: &amp;lt;nowiki&amp;gt;&amp;lt;div&amp;gt;, &amp;lt;p&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;pb/&amp;gt;, &amp;lt;p&amp;gt;, &amp;lt;lb/&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* what should not be indexed but considered for edition rendering: &amp;lt;nowiki&amp;gt;&amp;lt;teiHeader&amp;gt;, &amp;lt;note&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* alignment between texts: &amp;lt;nowiki&amp;gt;&amp;lt;teiCorpus&amp;gt;, &amp;lt;linkGrp&amp;gt;, &amp;lt;link&amp;gt;...&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* words identifier policy: &amp;lt;nowiki&amp;gt;@xml:id&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
* language declaration policy: &amp;lt;nowiki&amp;gt;@xml:lang&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
See the &amp;quot;BFM encoding manual&amp;quot; for an example of TEI encoding practice interpreted by TXM, in French, http://bfm.ens-lyon.fr/article.php3?id_article=158.&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;TEI P5 BFM&amp;quot; TXM import module consists of Groovy and XSL scripts: they can be adapted directly by the user to any specific TEI encoding usage.&lt;br /&gt;
&lt;br /&gt;
TXM Import Modules also provide various import parameters to tune each import process to specific data sources.&lt;br /&gt;
&lt;br /&gt;
TEI sources from the following projects are currently imported into TXM at TEI level semantics:&lt;br /&gt;
* Perseus: http://www.perseus.tufts.edu/hopper&lt;br /&gt;
* TextGrid: http://www.textgrid.de/en&lt;br /&gt;
* NLTK - Brown Corpus (TEI XML Version): http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml&lt;br /&gt;
* Frantext (libre): http://www.cnrtl.fr/corpus/frantext&lt;br /&gt;
* Base de Français Médiéval (BFM): http://bfm.ens-lyon.fr&lt;br /&gt;
* BVH Epistemon: http://www.bvh.univ-tours.fr/Epistemon&lt;br /&gt;
* Bouvard&amp;amp;Pécuchet: http://dossiers-flaubert.ish-lyon.cnrs.fr&lt;br /&gt;
* Presses Universitaires de Caen (PUC), MRSH de Caen - Revues.org: http://www.unicaen.fr/recherche/mrsh/document_numerique/outils ([[http://discours.revues.org?lang=en DISCOURS scientific journal]])&lt;br /&gt;
* TXM (TXM own pivot format): https://sourceforge.net/apps/mediawiki/txm/index.php?title=XML-TXM&lt;br /&gt;
&lt;br /&gt;
TEI sources are preprocessed by several XSL stylesheets, one can find in TXM source code.&lt;br /&gt;
Some of those stylesheets are available in the online TXM XSL stylesheets library:&lt;br /&gt;
http://sourceforge.net/projects/txm/files/library/xsl&lt;br /&gt;
&lt;br /&gt;
== Language(s) ==&lt;br /&gt;
&lt;br /&gt;
=== User Interface Language(s) ===&lt;br /&gt;
The user interface is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
** Russian (RU)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
&lt;br /&gt;
=== Documentation Language(s) ===&lt;br /&gt;
The documentation is currently available in:&lt;br /&gt;
* desktop version:&lt;br /&gt;
** English (EN)&lt;br /&gt;
** French (FR)&lt;br /&gt;
* portal  version:&lt;br /&gt;
** French (FR) (tutorial - alpha state)&lt;br /&gt;
&lt;br /&gt;
=== Text/Corpus Language(s) ===&lt;br /&gt;
TXM works natively with any Unicode-conformant corpus.&amp;lt;br/&amp;gt;&lt;br /&gt;
Language support is specific to each NLP tool used (for example, TreeTagger can tag the following languages: BG, DE, EN, ES, ET, FR, FRO, GL, IT, LA, PT, RU, SW, ZH).&lt;br /&gt;
&lt;br /&gt;
=== Programming Language(s) ===&lt;br /&gt;
TXM is written in the following programming languages:&lt;br /&gt;
* '''C''' for CQP search engine (independent open source project http://cwb.sourceforge.net)&lt;br /&gt;
* '''C''' and '''R''' for statistical packages (independent open source project http://www.r-project.org)&lt;br /&gt;
* '''Java''' for the Toolbox and the Applications (driven by an independent open consortium http://jcp.org/en/home/index)&lt;br /&gt;
** Eclipse RCP framework used for the desktop version (independent open source project http://wiki.eclipse.org/index.php/Rich_Client_Platform)&lt;br /&gt;
** GWT framework used for the web portal version (independent open source project http://code.google.com/intl/fr/webtoolkit)&lt;br /&gt;
* '''Groovy''' for the import modules and command scripts (independent open source project http://groovy.codehaus.org)&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
* Main entry point for documentation on TXM at the Textométrie project web site: http://textometrie.ens-lyon.fr/spip.php?article98&amp;amp;lang=en&lt;br /&gt;
** See for example the TXM manual (in French) at http://txm.svn.sourceforge.net/viewvc/txm/trunk/doc/Manuel%20de%20TXM%200.7%20FR.pdf?revision=2332&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users (includes a FAQ)&lt;br /&gt;
* TXM developers wiki (in English) on Sourceforge : http://sourceforge.net/apps/mediawiki/txm&lt;br /&gt;
* All available documentation (for users and for developers) published on Sourceforge: http://sourceforge.net/projects/txm/files/documentation&lt;br /&gt;
&lt;br /&gt;
== Tech support ==&lt;br /&gt;
Tech support is mainly provided through two mailing lists (see below).&lt;br /&gt;
&lt;br /&gt;
Users can also use 3 different trackers:&lt;br /&gt;
* Bug Reports - to describe bugs encountered in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190738&lt;br /&gt;
* Feature requests - to describe the features, changes in interface or any other improvements required in the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190851&lt;br /&gt;
* Request for help - to describe a very difficult technical problem encountered in using the software: https://sourceforge.net/tracker/?group_id=247041&amp;amp;atid=1190852&lt;br /&gt;
&lt;br /&gt;
== User community ==&lt;br /&gt;
Currently, the TXM user community communicates using two mailing lists and a wiki:&lt;br /&gt;
* International mailing list : txm-open AT lists.sourceforge.net (very low activity for the moment)&lt;br /&gt;
** See archives at http://sourceforge.net/mailarchive/forum.php?forum_name=txm-open&lt;br /&gt;
* The mostly French-speaking mailing list : txm-users AT cru.fr (the most active)&lt;br /&gt;
** See archives at https://listes.cru.fr/sympa/arc/txm-users&lt;br /&gt;
* TXM user community wiki (in French) at https://listes.cru.fr/wiki/txm-users&lt;br /&gt;
&lt;br /&gt;
Training in the use of TXM is available every year at the CNRS summer school « Computing and Statistical Methods in Text Analysis » (MISAT), see http://laseldi.univ-fcomte.fr/ecole.&lt;br /&gt;
&lt;br /&gt;
The JADT conference (http://jadt.org) is the main meeting place for the TXM user community.&lt;br /&gt;
&lt;br /&gt;
== Sample implementations ==&lt;br /&gt;
The desktop version of TXM is delivered with several sample corpora included, which can be directly analyzed from within TXM after installation.&lt;br /&gt;
&lt;br /&gt;
The portal version of TXM has a demo running online at http://portal.textometrie.org/demo/?locale=en (work in progress).&lt;br /&gt;
&lt;br /&gt;
A previous experiment of a web application based on TXM applied to one TEI encoded text can be found at http://txm.ish-lyon.cnrs.fr/txm.&lt;br /&gt;
&lt;br /&gt;
== Current version number and date of release ==&lt;br /&gt;
* TXM desktop: Current version is 0.7.5 released February 2014&lt;br /&gt;
* TXM portal: Current version is 0.6alpha released June 2014&lt;br /&gt;
&lt;br /&gt;
== History of versions ==&lt;br /&gt;
See the Roadmap section on the developer's wiki at http://sourceforge.net/apps/mediawiki/txm.&lt;br /&gt;
&lt;br /&gt;
== How to download or buy ==&lt;br /&gt;
TXM is free to download and use:&lt;br /&gt;
* desktop (Windows, Mac, Linux):&lt;br /&gt;
** First point your browser to http://sourceforge.net/projects/txm&lt;br /&gt;
** Then click on the green Download button to download the setup for your architecture.&lt;br /&gt;
* portal (J2EE):&lt;br /&gt;
** First choose the archive for your architecture at [https://sourceforge.net/projects/txm/files/software/TXM%20portal https://sourceforge.net/projects/txm/files/software/TXM portal]&lt;br /&gt;
** Then follow installation instructions at https://sourceforge.net/apps/mediawiki/txm/index.php?title=TXM_WEB:_Quick_Install&lt;br /&gt;
** See also the demo portal http://portal.textometrie.org/demo/?locale=en&lt;br /&gt;
&lt;br /&gt;
== Additional notes ==&lt;br /&gt;
For publications related to TXM, please visit the Textométrie project web site at http://textometrie.ens-lyon.fr/spip.php?article82&amp;amp;lang=en:&lt;br /&gt;
* See for example:&amp;lt;br/&amp;gt;Heiden, S. (2010b). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In K. I. Ryo Otoguro (Ed.), 24th Pacific Asia Conference on Language, Information and Computation - [http://www.compling.jp/paclic24 PACLIC24] (p. 389-398). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan. [http://halshs.archives-ouvertes.fr/halshs-00549764/en Online].&lt;br /&gt;
&lt;br /&gt;
Sponsors &amp;amp; Contributors:&lt;br /&gt;
* Initial design and development of TXM (jan 2007- dec 2011) supported by French ANR grant #ANR-06-CORP-029&lt;br /&gt;
* Currently the platform continues its development through various contracts:&lt;br /&gt;
** ENS-LYON contract jun-aug 2009 (Rhône-Alpes region Cluster 13 grant): Queste del saint Graal web prototype&lt;br /&gt;
** ENS-LYON contract sept 2009 - jul 2010 (ANR CORPTEF Research Project funding): portal development&lt;br /&gt;
** Lyon 3 University contract jan-mar 2011: XML-Transcriber import, R GUI&lt;br /&gt;
** CNRS contract 2011 (DGLFLF grant): GGHF corpus processing&lt;br /&gt;
** Paris 1 University contract jan 2012 - dec 2014 (Matrice Equipex): TXM development and infrastructure for historians&lt;br /&gt;
* Other independent projects also improve TXM (community of developers):&lt;br /&gt;
** LASLA project 2011: import of ancient latin and greek corpora&lt;br /&gt;
** GREYC-PUC project may-jul 2011: PUC corpora import, improvement of portal, test on Glassfish&lt;br /&gt;
** PhD thesis on micro-finance 2011-: Factiva and Calibre import&lt;br /&gt;
** ANR-DFG SRCMF contract jun-jul 2012 : Tiger Search module, import &amp;amp; syntactic concordances&lt;/div&gt;</summary>
		<author><name>Sheiden</name></author>
		
	</entry>
</feed>