NIF Roadmap 2012 and pointers

Just a repost of an email I wrote to the Stanbol Dev mailing list. See here for the discussion.

Below is a copy of the email:
Last year, we have been working on the NLP Interchange Format (NIF).
NIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.

What NIF currently is:
1. In Sept. 2011, we published the specification 1.0: https://nlp2rdf.org/nif-1-0 . There are about 8-12 implementations (see demo at 5.) out there, we know of.
2. One of the latest draft papers about it can be found here: http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
3. Basic idea is to use # fragments to give URIs to Strings, e.g.:
http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 represents the first occurence of “Semantic Web” in http://www.w3.org/DesignIssues/LinkedData.html
Of course, you can then use this URI as subject and add any annotation you want.
e.g.:
ffset_717_729 its:mentions dbpedia:Semantic_Web .
4. There is a Web annotator making use of the Hash URI scheme or NIF:

5. There is a demonstrator (will be much nicer in a couple of days):
with eye candy, but minor bug:
6. Apart from that NIF also tries to find best practices for annotation. E.g. OLiA idenitifers for Part of Speech tags or NERD or the lemon model.

What is planned for NIF:
a) A new spec NIF 2.0 within this year. Discussion will be on this mailing list: http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
NIF will be simplified (simpler URI Schemes and annotations), consolidated (Better implementations) and extended (ability to express confidence value and string sets, etc. )
b) We plan to have implementations for NERD http://nerd.eurecom.fr , DBpedia Spotlight, Zemanta.com and DKPro http://www.ukp.tu-darmstadt.de/research/current-projects/dkpro/
c) Inclusion of XPointer as NIF URI Scheme and creation of a mapping to “string uris”. This should somehow be compatible with the Internationalisation Tag Set (ITS) 2.0 http://www.w3.org/TR/its20/ , but we are still working together on a bidirectional bridge. There have been a plethora of discussion partly at this thread: http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html
d) NIF should be compatible with PROV-AQ: Provenance Access and Query http://www.w3.org/TR/2012/WD-prov-aq-20120619/

What I am hoping for or my ideas about how Stanbol and NIF overlap:
I) Reading your docu, you guys seem to be able to provide very good use cases and feedback for NIF 2.0 . We would really like to include that and also tailor NIF 2.0 to your needs. We are currently setting up a Wiki – still ugly sorry: http://wiki.nlp2rdf.org/ Please mail me for accounts.
II) I would assume, that you need some OWL model for all the enhancer output. NIF standardizes NLP tool output and it tries to be blank-node free and lightweight, but still as expressive as possible. So for you this would mean that you could really save time, as ontology modelling is really tedious. By reusing NIF you would get a free data model and spec and you could focus on the implementation of the Stanbol engine. I got a 404 on http://incubator.apache.org/enhancer/enhancementstructure.html
I read “fise” somewhere. What is it? How does it compare to NIF? What URIs do you use? How many triples do you have per annotation?
III) With NIF we focused on the RDF output for tools, not on the workflow. Stanbol seems to focus on the workflow as well, right? It might be easy to implement a NIF engine with Stanbol. This could be a good showcase for NIF and Stanbol. With a Debian package, we could include Stanbol into the LOD2 Stack http://stack.lod2.eu/

NIF Roadmap 2012 and pointers

NIF 2.0

Mailing List

NLP2RDF Funding and Cooperations

Categories

Meta