The original motivation of creating NIF was quite simple. In order to integrate NLP tools into the LOD2 stack they were required to produce RDF. Instead of writing an individual RDF wrapper for each tool, it made perfectly sense to create a common format, which was expressive enough to potentially cover all NLP tools. Furthermore instead of creating a conceptual mapping between the output of the tools, several linguistic ontologies already existed and could be reused to unify tag sets and other NLP dialects. Although NIF is being generalized to provide additional benefits the main rationale is the integration of NLP tools into the LOD2 stack.
NLP Interchange Format (NIF) 2.0 – Overview and Documentation
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. NIF consists of specifications, ontologies and software (overview), which are combined under the version identifier “NIF 2.0″, but are versioned individually.
This document contains pointers to all the important resources relevant for the NLP Interchange Format (NIF), Version 2.0. Although the road to complete interoperability is still long, NIF is already successful in providing best practices and a solid foundation for the most frequent use cases. This foundation is created by:
- Reusing existing standards such as RDF, OWL 2, the PROV Ontology, LAF (ISO 24612) and RFC 5147
- Furthermore, NIF identifiers are used in the Internationalization Tag Set (ITS) Version 2.0
- All parts of NIF are royality-free and are published under an open license.
- NIF comprises a set of RDF vocabularies and ontologies, which have stable identifiers, persistent hosting, an open license and a community approved meaning.
- NIF publishes and maintains a set of specifications (NIF 2.0 Core Spec, Public Api Spec, Version Information ) with best practices, complementary implementations and examples on how to use the ontologies.
- NIF is driven by its open and weclome-to-join community project NLP2RDF, consisting of a mailing list, a GitHub Project and a blog web site
- NIF has received good uptake by industry, open-source projects and developers. We would like to thank all contributors in the attribution section
- September 6th, 2013: Added resource list below.
- July 7th, 2013: Initial draft of API parameters published.
- July 7th, 2013: Initial draft of Versioning and License specifications published.
- May 17, 2013: Integrating NLP using Linked Data. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. accepted at ISWC 2013
What is NIF 2.0?
NIF 2.0 is a set of resources which constitute a major, not backward-compatible improvement upon the previous version NIF 1.0. Since NIF 2.0 is very diverse and it consists of specifications, ontologies, implementations and corpora. NIF is maintained by the NLP2RDF community project If you are interested in NLP2RDF, you can write emails to the nlp2rdf discussion list or sign up directly below:
How to attribute NIF?
We try to maintain an up-to-date page with acknowledgementsto the large community of contributors. If you refer to NIF in an academic context, please cite the recent paper published at the ISWC in Use track 2013:
- Integrating NLP using Linked Data. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, (2013)
There is also a list of further NIF-related scientific publications.
- corpus URIWiki-Link Corpus in RDF
- demo URINIF Combinator Demo
- demo URINIF Conversion Demo Service ( ★ important)
- ontology URINIF 2.0 Core Ontology – Inference Model
- ontology URINIF Validator TestCase Suite
- ontology URIRLOG – an RDF Logging Ontology
- ontology URISTC – Simple Test Case Ontology
- ontology URINIF 2.0 Core Ontology – Validation Model
- ontology URINIF 2.0 Core Ontology – Terminological Model ( ★ important)
- poster URIPromotional poster for Meta Forum ( ★ important)
- software URINIF 2.0 Validator Tool ( ★ important)
- software URINIF 2.0 Stanford Core Implementation ( ★ important)
- specification URINIF 2.0 Stanbol Profile Specification
- specification URILicense, Persistence, Versioning
- specification URIPublic API Specification ( ★ important)
- specification URIW3C Conversion of NIF to ITS 2.0
- specification URINIF 2.0 Core Specification ( ★ important)
- specification URIW3C Conversion of ITS 2.0 to NIF
- wikipage URIHow to publish a txt corpora with NIF as Linked Data
NIF Use Cases
NIF aims to improve upon certain disadvantages commonly found in NLP frameworks, processes and tools. Its basic advantages are the way how annotations are represented and what kind of annotations are used. These two aspects combined provide structural interoperability as well a conceptual interoperability.
Here are some claims, which we believe can be made about NIF (i.e. we have not yet found evidence that indicate otherwise respective technical feasibility):
- NIF provides global interoperability. If an NLP tool incorporates a NIF parser and a NIF serializer, it is compatible with all other tools, which implement NIF.
- NIF achieves this interoperability by using and defining a most common denominator for annotations. This means that some standard annotations are required to be used. On the other hand NIF is flexible and allows the NLP tools to add any extra annotations at will.
- NIF allows to create tool chains without a large amount of up-front development work. As the output of each tool is compatible, you can try and test really fast, whether the tools you selected actually produce what you need to solve a certain task.
- As NIF is based on RDF/OWL, you can choose from a broad range of tools and technologies to work with it:
- RDF makes data integration easy: URIs, LinkedData
- OWL is based on Description Logics (Types, Type inheritance)
- Availability of open data sets (access and licence)
- Reusability of Vocabularies and Ontologies
- Diverse serializations for annotations: XML, Turtle,
- Scalable tool support (Databases, Reasoning)
- Data is flexible and can be queried / transformed in many ways
Comparison to UIMA and Gate
NIF is almost completely orthogonal to frameworks such as Gate and UIMA. Per definitionem it is a format that represents NLP output, while Gate and UIMA are software frameworks for NLP. Here is a rough guideline, when to use NIF and when not:
Use UIMA and Gate, when:
- You need to annotate a really high amount of text on a daily basis.
- You already know, which tools and annotations you need and there are already adapters and plugins for UIMA or Gate .
- You want to solve few specialized task, such as identifying keywords or find certain facts. For this you are planning one custom application and you do not have any additional requirements for RDF or interoperability.
Use NIF, when:
- You are using the LOD2 Stack
- The rest of your data is already in RDF
- You want to query your text documents with SPARQL
- You are not sure which tools to use and want to first try them and test the results.
- You have a fixed text collection (or a low daily throughput) and want to unlock the implicit meaning. The text can be processed once, saved as RDF and then transformed easily or queried in a triple store.
- You need annotations for several languages (multilingualism) in a uniform way
Definitely refrain from trying to build a scalable application that uses RDF/OWL as an internal data format. RDF and OWL are great for flexibility, reasoning and data integration, but NOT performance.
Rather consider using UIMA and Gate and then serialise the output as NIF.
Furthermore NLP2RDF provides:
- reference implementations of NIF
- collaboration platform
- tutorials / example source code
- mailing list for questions and support
- possible to join on http://nlp2rdf.org