Difference between revisions of "GROBID"
(→Synopsis: link to a brief overview) |
(→Documentation: updated link) |
||
Line 33: | Line 33: | ||
== Documentation == | == Documentation == | ||
− | + | http://grobid.readthedocs.org/ | |
== Tech support == | == Tech support == |
Revision as of 01:43, 9 November 2015
Contents
- 1 Synopsis
- 2 Features
- 3 User commentary
- 4 System requirements
- 5 Source code and licensing
- 6 Support for TEI
- 7 Language(s)
- 8 Documentation
- 9 Tech support
- 10 User community
- 11 Sample implementations
- 12 Current version number and date of release
- 13 History of versions
- 14 How to download or buy
- 15 Additional notes
- 16 References
Synopsis
"Grobid is a machine learning library for extracting, parsing and TEI-encoding of bibliographical information at large, with a particular focus on technical and scientific articles."<ref>https://github.com/kermitt2/grobid</ref> See a brief overview.
Features
According to the Grobid website:<ref>https://github.com/kermitt2/grobid</ref>
- Written in Java (with JNI call).
- High performance - on a modern but low profile MacBook Pro: header extraction from 4000 PDF in 10 minutes, parsing of 3000 references in 18 seconds.
- Modular and reusable machine learning models. The extractions are based on Linear Chain Conditional Random Fields which is currently the state of the art in bibliographical information extraction and labeling.
- Full encoding in TEI, both for the training corpus and the parsed results.
- Reinforcement of extracted bibliographical data via online call to Crossref (optional), export in OpenURL, etc. for easier integration into Digital Library environments.
- Rich bibliographical processing: fine grained parsing of author names, dates, affiliations, addresses, etc. but also quite reliable automatic attachment of affiliations to corresponding authors.
- "Automatic Generation" of pre-formatted training data based on new pdf documents, for supporting semi-automatic training data generation.
User commentary
Please sign all comments.
System requirements
"Grobid should run properly on MacOS X, Linux (32 & 64) and Windows (32) environments 'out of the box'."<ref>https://github.com/kermitt2/grobid</ref>
Source code and licensing
"Grobid is distributed under Apache 2.0 license."<ref>https://github.com/kermitt2/grobid</ref>
Support for TEI
Output created in TEI P5 XML.
Language(s)
"Written in Java (with JNI call)."<ref>https://github.com/kermitt2/grobid</ref>
Documentation
http://grobid.readthedocs.org/
Tech support
User community
Sample implementations
Current version number and date of release
History of versions
How to download or buy
https://github.com/kermitt2/grobid
Additional notes
References
<references/>