The original motivation of creating NIF was quite simple. In order to integrate NLP tools into the LOD2 stack they were required to produce RDF. Instead of writing an individual RDF wrapper for each tool, it made perfectly sense to create a common format, which was expressive enough to potentially cover all NLP tools. Furthermore instead of creating a conceptual mapping between the output of the tools, several linguistic ontologies already existed and could be reused to unify tag sets and other NLP dialects. Although NIF is being generalized to provide additional benefits the main rationale is the integration of NLP tools into the LOD2 stack.
What does NIF do?
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. NIF consists of specifications, ontologies and software (overview), which are combined under the version identifier “NIF 2.0″, but are versioned individually.
This document contains pointers to all the important resources relevant for the NLP Interchange Format (NIF), Version 2.0. Although the road to complete interoperability is still long, NIF is already successful in providing best practices and a solid foundation for the most frequent use cases. This foundation is created by:
- Reusing existing standards such as RDF, OWL 2, the PROV Ontology, LAF (ISO 24612) and RFC 5147
- Furthermore, NIF identifiers are used in the Internationalization Tag Set (ITS) Version 2.0
- All parts of NIF are royality-free and are published under an open license.
- NIF comprises a set of RDF vocabularies and ontologies, which have stable identifiers, persistent hosting, an open license and a community approved meaning.
- NIF publishes and maintains a set of specifications (NIF 2.0 Core Spec, Public Api Spec, Version Information ) with best practices, complementary implementations and examples on how to use the ontologies.
- NIF is driven by its open and weclome-to-join community project NLP2RDF, consisting of a mailing list, a GitHub Project and a blog web site
- NIF has received good uptake by industry, open-source projects and developers. We would like to thank all contributors in the attribution section
NIF Use Cases
NIF aims to improve upon certain disadvantages commonly found in NLP frameworks, processes and tools. Its basic advantages are the way how annotations are represented and what kind of annotations are used. These two aspects combined provide structural interoperability as well a conceptual interoperability.
Here are some claims, which we believe can be made about NIF (i.e. we have not yet found evidence that indicate otherwise respective technical feasibility):
- NIF provides global interoperability. If an NLP tool incorporates a NIF parser and a NIF serializer, it is compatible with all other tools, which implement NIF.
- NIF achieves this interoperability by using and defining a most common denominator for annotations. This means that some standard annotations are required to be used. On the other hand NIF is flexible and allows the NLP tools to add any extra annotations at will.
- NIF allows to create tool chains without a large amount of up-front development work. As the output of each tool is compatible, you can try and test really fast, whether the tools you selected actually produce what you need to solve a certain task.
- As NIF is based on RDF/OWL, you can choose from a broad range of tools and technologies to work with it:
- RDF makes data integration easy: URIs, LinkedData
- OWL is based on Description Logics (Types, Type inheritance)
- Availability of open data sets (access and licence)
- Reusability of Vocabularies and Ontologies
- Diverse serializations for annotations: XML, Turtle,
- Scalable tool support (Databases, Reasoning)
- Data is flexible and can be queried / transformed in many ways
How to attribute NIF?
We try to maintain an up-to-date page with acknowledgements to the large community of contributors. If you refer to NIF in an academic context, please cite the recent paper published at the ISWC in Use track 2013:
- Integrating NLP using Linked Data. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, (2013)
There is also a list of further NIF-related scientific publications.