Difference between revisions of "GROBID"

Latest revision as of 17:47, 16 February 2017

1 Synopsis
2 Features
3 User commentary
4 System requirements
5 Source code and licensing
6 Support for TEI
7 Language(s)
8 Documentation
9 Tech support
10 User community
11 Sample implementations
12 Current version number and date of release
13 History of versions
14 How to download or buy
15 Additional notes
16 References

Synopsis

"GROBID is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured TEI-encoded documents with a particular focus on technical and scientific publications."<ref>https://github.com/kermitt2/grobid</ref> See a brief overview.

Features

According to the Grobid website:<ref>https://github.com/kermitt2/grobid</ref>

Written in Java (with JNI call).
High performance - on a modern but low profile MacBook Pro: header extraction from 4000 PDF in 10 minutes, parsing of 3000 references in 18 seconds.
Modular and reusable machine learning models. The extractions are based on Linear Chain Conditional Random Fields which is currently the state of the art in bibliographical information extraction and labeling.
Full encoding in TEI, both for the training corpus and the parsed results.
Reinforcement of extracted bibliographical data via online call to Crossref (optional), export in OpenURL, etc. for easier integration into Digital Library environments.
Rich bibliographical processing: fine grained parsing of author names, dates, affiliations, addresses, etc. but also quite reliable automatic attachment of affiliations to corresponding authors.
"Automatic Generation" of pre-formatted training data based on new pdf documents, for supporting semi-automatic training data generation.

User commentary

Please sign all comments.

System requirements

"Grobid should run properly on MacOS X, Linux (32 & 64) and Windows (32) environments 'out of the box'."<ref>https://github.com/kermitt2/grobid</ref>

Source code and licensing

"Grobid is distributed under Apache 2.0 license."<ref>https://github.com/kermitt2/grobid</ref>

Support for TEI

Output created in TEI P5 XML.

Language(s)

"Written in Java (with JNI call)."<ref>https://github.com/kermitt2/grobid</ref>

Documentation

http://grobid.readthedocs.org/

Tech support

User community

Sample implementations

demo site (not working as of 2017-01-05)
online demo

Current version number and date of release

History of versions

How to download or buy

https://github.com/kermitt2/grobid

Additional notes

References

@@ Line 4: / Line 4: @@
 == Synopsis ==
-"Grobid is a machine learning library for extracting, parsing and TEI-encoding of bibliographical information at large, with a particular focus on technical and scientific articles."<ref>https://github.com/kermitt2/grobid</ref>
+"GROBID is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured TEI-encoded documents with a particular focus on technical and scientific publications."<ref>https://github.com/kermitt2/grobid</ref> See a [http://ercim-news.ercim.eu/en100/r-i/grobid-information-extraction-from-scientific-publications brief overview].
 == Features ==
@@ Line 33: / Line 33: @@
 == Documentation ==
-See "Usage" section of https://github.com/kermitt2/grobid .
+http://grobid.readthedocs.org/
 == Tech support ==
@@ Line 42: / Line 42: @@
 == Sample implementations ==
+* [http://scite-it.eu/ demo site] (not working as of 2017-01-05)
+* [http://cloud.science-miner.com/grobid/ online demo]
 == Current version number and date of release ==

Difference between revisions of "GROBID"

Latest revision as of 17:47, 16 February 2017

Contents

Synopsis

Features

User commentary

System requirements

Source code and licensing

Support for TEI

Language(s)

Documentation

Tech support

User community

Sample implementations

Current version number and date of release

History of versions

How to download or buy

Additional notes

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools