"title" "" . "content" "Improvements upon NIF 1.0 will be collected and incorporated in NIF 2.0. \r\nPlease have a look at the Wiki, where we currently collect and discuss all issues:\r\nhttp://wiki.nlp2rdf.org/wiki/Issues\r\n\r\nIf you want to Get Involved, sign up on the mailing list here:\r\nhttp://nlp2rdf.org/get-involved/\r\n\r\n\r\nOpen issues:\r\nOntology Versioning\r\nAlthough NIF 1.0 is specified the used ontologies can still change as they are not versioned as of now. \r\n\r\nShould information already encoded in the URI be duplicated?\r\nThe URIs already contain all information to uniquely find the string and also calculate things such as begin and index and also the inclusion of str:anchorOf duplicates the reference string and might lead to scalability problems.\r\nOn the other hand many things are already duplicated and can even be simplified further and/or removed such as the class str:OffsetbasedString, the property str:anchorOf or the substring relations are useful for iterating the model, but are actually redundant. \r\n\r\nShould the client or the server implement the conceptual interoperability?\r\nFor each tool the interoperability to one reference ontology has only to be implemented once, while on the other hand many clients would have to implement it. The difficulty is however that the servers have to provide exact and versioned implementations so the client does not get a conflict when merging. \r\n\r\n" . "title" "" . "content" "Slides\r\n\r\nJune 2011, Christian Chiarcos, Modelling linguistic corpora and their annotations with OWL/DL \u00E2\u0080\u0093 Link to slides \r\nAugust 2011, Sebastian Hellmann, at Korea-Germany Joint Workshop for LOD2 Development and Application, link to slides\r\n\r\nSeptember 2011, Sebastian Hellmann, at Indian-summer school on Linked Data, link to slides\r\n\tSeptember 2011, Sebastian Hellmann, at W3C Workshop Program: A Local Focus for the Multilingual Web, Limerick, link to slides\r\n\r\nOctober 2011, Sebastian Hellmann, at Invited Talk at the Multilingual Semantic Web Workshop, ISWC 2012, link to slides\r\n\r\nEU Deliverables\r\n\r\n\t\u00C2\u00A0Sebastian Hellmann 2011, LOD2 EU Deliverable D 3.2.1 \u00E2\u0080\u0093 NLP2RDF\r\n" . "title" "" . "content" "This page lists the involved people and their contributions in the creation and adoption of NIF.\r\n\r\n\r\n\tKonrad H\u00F6ffner contributed a Tutorial Challenge: Semantic Yellow Pages \r\n Christian Chiarcos maintains the Ontologies of Linguistic Annotation\r\n Giuseppe Rizzo and Raphael Troncy maintain the NERD Ontology\r\nMarkus Ackermann created the Topic Ontology for Mallet\r\n\r\n\r\n\r\nWe especially thank all participants in the field test of NIF 1.0:\r\n\r\n\tMarkus Ackermann for for Mallet\r\n\tMartin Br\u00FCmmer for OpenNLP and for UIMA \r\n\tDidier Cherix for Gate Annie \r\n\tMarcus Nitzschke for MontyLingua (Python) \r\n\tRobert Schulze for DBpedia Spotlight (node-js) \r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nWe are adding all other people soon (LOD2, etc)\r\n" . "title" "" . "content" "The easiest way to get questions answered or to exchange ideas is the NLP2RDF mailing list. Please send an email to nlp2rdf@lists.informatik.uni-leipzig.de (subscribe per email, subscribe per browser], archives). All feedback is welcome. The mailing list should also keep you in the loop about current discussions. The blog also has an RSS feed for all posts and also for each category.\r\n\r\n\r\nNormally, mailing list discussions contain a lot of useful information and thoughts, which is then lost in the archives. This is why, we have a wiki at http://wiki.nlp2rdf.org, where we try to document all important information. If you want to add something to the wiki please sign up. Registration is open:\r\n\r\n" . "title" "" . "content" "This page provides the specification of NIF 1.0. For general information, Use Cases and the rationale behind NIF see the About page\r\n\r\nStatus of this page:\r\nThis document \u00E2\u0080\u0093 the specification of NIF 1.0 \u00E2\u0080\u0093 will remain mostly stable. The only corrections will either be clarifications, improving the readability of the text or spelling mistakes or additional NLP domain vocabularies. Major changes will be collected on the NIF 2.0 Draft page and included in the next version of NIF.\r\n\r\nThe NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.\r\n\r\nThe core of NIF consists of a vocabulary, which can represent Strings as RDF resources. A special URI Design is used to pinpoint annotations to a part of a document. These URIs can then be used to attach arbitrary annotations to the respective character sequence. Employing these URIs, annotations can be published on the Web as Linked Data and interchanged between different NLP tools and applications. \r\n\r\n\r\nNIF 1.0 in a nutshell\r\nNIF consists of the following three components:\r\n\r\n\tStructural Interoperability : URI recipes are used to anchor annotations in documents with the help of fragment identifiers. The URI recipes are complemented by two ontologies (String Ontology and Structured Sentence Ontology), which are used to describe the basic types of these URIs (i.e. String, Document, Word, Sentence) as well as the relations between them (subString, superString, nextWord, previousWord, etc.).\r\n\tConceptual Interoperability: The Structured Sentence Ontology (SSO) was especially developed to connect existing ontologies with the String Ontology and thus attach common annotations to the text fragment URIs. The NIF ontology can easily be extended and integrates several NLP ontologies such as OLiA for the morpho-syntactical NLP domains, the SCMS Vocabulary and DBpedia for entity linking, as well as the NERD Ontology (below for details on the ontologies).\r\n\tAccess Interoperability: A REST interface description for NIF components and web services allows NLP tools to interact on a programmatic level.\r\n\r\nNIF-1.0 is stable and can be implemented. The experience and feedback collected during this implementation will be collected as NIF-2.0-draft. This specification is complemented by the information in this Wordpress CMS, which contains documentation on how to integrate NLP tools and adapt them to NIF. Also reference implementations are available on the NLP2RDF Google code project and as blog posts in the Implementations category. A simple demo allows to test the web services\r\n An overview of the architecture can be found in the next section.\r\n\r\n\r\n\r\n\r\nArchitecture Overview\r\n\r\n\r\nStructural Interoperability\r\nStructural Interoperability is concerned with how the RDF is structured to represent annotations.\r\nNIF-1.0 URI Recipes\r\nNIF-1.0 currently supports 2 recipes: one offset-based and one hash-based. The principal task of the presented URI recipes is to address a part of a document and assign a URI to it, that can serve as an annotation anchor. A URI assigned in this way must be unique and it should be possible to identify the substring in the document with the information contained in the URI.\r\n\r\nNIF specifies how to create an identifier for uniquely locating arbitrary substrings in a document.\r\nRunning Example. In July 2011, the website http://www.w3.org/DesignIssues/LinkedData.html was a html document consisting of 25482 characters. The term \u00E2\u0080\u009CSemantic Web\u00E2\u0080\u009D occurs 9 times at different position. If we want to annotate the substring \u00E2\u0080\u009CSemantic Web\u00E2\u0080\u009D in the section \u00E2\u0080\u009CThe four rules\u00E2\u0080\u009D in the sentence \u00E2\u0080\u009CIf it doesn\u00E2\u0080\u0099t use the universal URI set of symbols, we don\u00E2\u0080\u0099t call it Semantic Web.\u00E2\u0080\u009D, a fragment identifier such as #Semantic%20Web would definitely be insufficient and lack the necessary information to distinguish the different occurrences. It would not be unique.\r\nGeneral\r\nA NIF-1.0 URI is made up of a prefix and the actual identifier. To annotate web resources it is straight-forward to use the existing URI as the basis for the prefix. NIF-1.0 does not dictate the use of the \u00E2\u0080\u0098/\u00E2\u0080\u0099 or the \u00E2\u0080\u0098#\u00E2\u0080\u0099 and considers it part of the prefix. The following guidelines should be met for NIF-1.0 URI recipes:\r\n\r\n\tAs NIF is designed to have a client server architecture, the client must be able to dictate the prefix to the server. All components must therefore have a parameter prefix which determines how the produced NIF URIs will start.\r\n\tThe NIF-URIs should be produced by a concatenation of the prefix and the identifier.\r\n\tThe NIF component must use the prefix of the client transparently without any corrections.\r\n\t(non-normative) For practical reasons, it is recommended that the client uses a prefix that ends on # or /\r\n\r\nRunning Example. Recommended prefixes for http://www.w3.org/DesignIssues/LinkedData.html are:\r\n\r\n\tOption 1: http://www.w3.org/DesignIssues/LinkedData.html/\r\n\tOption 2: http://www.w3.org/DesignIssues/LinkedData.html#\r\n\r\nNIF Recipe: Offset-based URIs\r\nNOTE: This recipe is compatible with the position and range definition of RFC 5147 (Especially Section 2.1.1 ) and builds upon it in terms of encoding and counting character positions. The syntax and everything else is different though. See the design choice section for a discussion.\r\nThe Offset-based URIs are constructed from 4 parts separated by an underscore \u00E2\u0080\u009C_\u00E2\u0080\u009D:\r\n\r\n\tan identifier, in this case the string \u00E2\u0080\u009Coffset\u00E2\u0080\u009D\r\n\tthe begin index\r\n\tthe end index (The indexes are counting the gaps between characters starting with 0)\r\n\ta human readable part consisting of the first 20 (or less, if the string is shorter) characters of the addressed string, urlencoded according to RFC 3986 (using \u00E2\u0080\u0098%20\u00E2\u0080\u00B2 instead of \u00E2\u0080\u0098+\u00E2\u0080\u0099 for white space).\r\n\r\n\r\nRunning Example. \u00E2\u0080\u009Chttp://www.w3.org/DesignIssues/LinkedData.html#\u00E2\u0080\u009D serves as the prefix. The rest is \u00E2\u0080\u009Coffset_14406_14418_Semantic%20Web\u00E2\u0080\u009C. The human readable part should match the output produced by the following Java code:\r\n\r\nString first20Chars = (anchoredPart.length() < 20) ? anchoredPart.substring(0, 20) : anchoredPart ;\r\nString prefix = \u00E2\u0080\u009Chttp://www.w3.org/DesignIssues/LinkedData.html#\u00E2\u0080\u009D\r\nString uri;\r\n//variant A\r\nuri = prefix + URLEncoder.encode(first20Chars , \u00E2\u0080\u009CUTF-8\u00E2\u0080\u00B3).replaceAll(\u00E2\u0080\u009C\+\u00E2\u0080\u009D,\u00E2\u0080\u009D%20\u00E2\u0080\u00B3);\r\n//variant B\r\nURI uri = new URI(\u00E2\u0080\u009Chttp\u00E2\u0080\u009D, \u00E2\u0080\u009Cwww.w3.org\u00E2\u0080\u009D,\u00E2\u0080\u009D/DesignIssues/LinkedData.html\u00E2\u0080\u009D,first20Chars);\r\nuri = uri.toASCIIString();\r\n\r\nor by this PHP method:\r\n\r\n$prefix = \u00E2\u0080\u009Chttp://www.w3.org/DesignIssues/LinkedData.html#\u00E2\u0080\u009D\r\n$uri = $prefix.rawurlencode(substr($anchoredPart,0,20)\r\n\r\nand together the final URI will look like this:\r\n\r\nhttp://www.w3.org/DesignIssues/LinkedData.html#offset_14406_14418_Semantic%20Web\r\n\r\nNIF Recipe: Context-Hash-based URIs\r\nThe greatest disadvantage of the offset-based recipe is that it is not stable w.r.t. changes in the document. \r\nIn case of a change of the document (insertion or deletion), all offset-based NIF-URIs after the position the change occurred become invalid. The hash-based recipe is designed to remain more robust against insertion and deletion. Some additional implication of the Context-Hash-based URIs can be found in the Design Choice Section below. The hash-based URIs are constructed from 5 parts separated by an underscore \u00E2\u0080\u009C_\u00E2\u0080\u009D:\r\n\r\n\tan identifier, in this case the string \u00E2\u0080\u009Chash\u00E2\u0080\u009D\r\n\tthe context length (number of character to the left and right used in the message for the hash-digest)\r\n\tthe overall length of the addressed string\r\n\tthe message digest, a 32 character HEXDIGIT md5 hash created of the string and the context. The message M consists of a certain number C of characters (see 2. context length above) to the left of the string, a bracket \u00E2\u0080\u0098(\u00E2\u0080\u0098, the string itself, another bracket \u00E2\u0080\u0098)\u00E2\u0080\u0099 and C characters to the right of the string: \u00E2\u0080\u009CleftContext(String)rightContext\u00E2\u0080\u009D\r\n\ta human readable part consisting of the first 20 (or less, if the string is shorter) characters of the addressed string, urlencoded according to RFC 3986 (using \u00E2\u0080\u0098%20\u00E2\u0080\u00B2 instead of \u00E2\u0080\u0098+\u00E2\u0080\u0099 for white space).\r\n\r\nRunning Example. This example uses a context length of 4, the digest therefore is:\r\nmd5(\u00E2\u0080\u009D it (Semantic Web)." . "title" "" . "content" "NLP2RDF is a LOD2 Community project that is developing the NLP Interchange Format (NIF) . NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. . The output of NLP tools can be converted into RDF and used in the LOD2 Stack. Currently a NIF-1.0 Version is created on this website (Go to NIF-1.0 Specification). Implementation of NIF 1.0 is progressing and a draft for NIF 2.0 will be refined based on the experience gained during the field test of NIF-1.0.\r\n\r\nThis project is open for anyone to join and there are several ways to get involved. For slides and further reading please have a look at the publications page. \r\n\r\nRationale\r\nThe original motivation of creating NIF was quite simple. In order to integrate NLP tools into the LOD2 stack they were required to produce RDF. Instead of writing an individual RDF wrapper for each tool, it made perfectly sense to create a common format, which was expressive enough to potentially cover all NLP tools. Furthermore instead of creating a conceptual mapping between the output of the tools, several linguistic ontologies already existed and could be reused to unify tag sets and other NLP dialects. Although NIF is being generalized to provide additional benefits the main rationale is the integration of NLP tools into the LOD2 stack.\r\n\r\nNIF Use Cases\r\nNIF aims to improve upon certain disadvantages commonly found in NLP frameworks, processes and tools. Its basic advantages are the way how annotations are represented and what kind of annotations are used. These two aspects combined provide structural interoperability as well a conceptual interoperability.\r\n\r\nHere are some claims, which we believe can be made about NIF (i.e. we have not yet found evidence that indicate otherwise respective technical feasibility):\r\n\r\n\tNIF provides global interoperability. If an NLP tool incorporates a NIF parser and a NIF serializer, it is compatible with all other tools, which implement NIF.\r\n\tNIF achieves this interoperability by using and defining a most common denominator for annotations. This means that some standard annotations are required to be used. On the other hand NIF is flexible and allows the NLP tools to add any extra annotations at will.\r\n\tNIF allows to create tool chains without a large amount of up-front development work. As the output of each tool is compatible, you can try and test really fast, whether the tools you selected actually produce what you need to solve a certain task.\r\n\tAs NIF is based on RDF/OWL, you can choose from a broad range of tools and technologies to work with it:\r\n\r\n\tRDF makes data integration easy: URIs, LinkedData\r\n\tOWL is based on Description Logics (Types, Type inheritance)\r\n\tAvailability of open data sets (access and licence)\r\n\tReusability of Vocabularies and Ontologies\r\n\tDiverse serializations for annotations: XML, Turtle,\r\nRDFa+XHTML\r\n\tScalable tool support (Databases, Reasoning)\r\n\tData is flexible and can be queried / transformed in many ways\r\n\r\n\r\n\r\nComparison to UIMA and Gate\r\nNIF is almost completely orthogonal to frameworks such as Gate and UIMA. Per definitionem it is a format that represents NLP output, while Gate and UIMA are software frameworks for NLP. Here is a rough guideline, when to use NIF and when not:\r\n\r\nUse UIMA and Gate, when:\r\n\r\n\tYou need to annotate a really high amount of text on a daily basis.\r\n\tYou already know, which tools and annotations you need and there are already adapters and plugins for UIMA or Gate .\r\n\tYou want to solve few specialized task, such as identifying keywords or find certain facts. For this you are planning one custom application and you do not have any additional requirements for RDF or interoperability.\r\n\r\nUse NIF, when:\r\n\r\n You are using the LOD2 Stack\r\n The rest of your data is already in RDF\r\n You want to query your text documents with SPARQL\r\n\tYou are not sure which tools to use and want to first try them and test the results.\r\n\tYou have a fixed text collection (or a low daily throughput) and want to unlock the implicit meaning. The text can be processed once, saved as RDF and then transformed easily or queried in a triple store.\r\n\tYou need annotations for several languages (multilingualism) in a uniform way\r\n\t\r\n\r\nDefinitely refrain from trying to build a scalable application that uses RDF/OWL as an internal data format. RDF and OWL are great for flexibility, reasoning and data integration, but NOT performance.\r\n\r\nRather consider using UIMA and Gate and then serialise the output as NIF.\r\n\r\nFurthermore NLP2RDF provides:\r\n\r\n\tdocumentation\r\n\treference implementations of NIF\r\n\tcollaboration platform\r\n\ttutorials / example source code\r\n\tmailing list for questions and support\r\n\tpossible to join on http://nlp2rdf.org\r\n\r\n" . "title" "Documentation (Wiki)" . "content" "" . "title" "About" . "content" "NLP2RDF is a LOD2 Community project that is developing the NLP Interchange Format (NIF) . NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. \"\". The output of NLP tools can be converted into RDF and used in the LOD2 Stack. Currently a NIF-1.0 Version is created on this website (Go to NIF-1.0 Specification). Implementation of NIF 1.0 is progressing and a draft for NIF 2.0 will be refined based on the experience gained during the field test of NIF-1.0.\r\n\r\nThis project is open for anyone to join and there are several ways to get involved. For slides and further reading please have a look at the publications page. \r\n\r\n

Rationale

\r\nThe original motivation of creating NIF was quite simple. In order to integrate NLP tools into the LOD2 stack they were required to produce RDF. Instead of writing an individual RDF wrapper for each tool, it made perfectly sense to create a common format, which was expressive enough to potentially cover all NLP tools. Furthermore instead of creating a conceptual mapping between the output of the tools, several linguistic ontologies already existed and could be reused to unify tag sets and other NLP dialects. Although NIF is being generalized to provide additional benefits the main rationale is the integration of NLP tools into the LOD2 stack.\r\n\r\n

NIF Use Cases

\r\nNIF aims to improve upon certain disadvantages commonly found in NLP frameworks, processes and tools. Its basic advantages are the way how annotations are represented and what kind of annotations are used. These two aspects combined provide structural interoperability as well a conceptual interoperability.\r\n\r\nHere are some claims, which we believe can be made about NIF (i.e. we have not yet found evidence that indicate otherwise respective technical feasibility):\r\n
    \r\n\t
  1. NIF provides global interoperability. If an NLP tool incorporates a NIF parser and a NIF serializer, it is compatible with all other tools, which implement NIF.
  2. \r\n\t
  3. NIF achieves this interoperability by using and defining a most common denominator for annotations. This means that some standard annotations are required to be used. On the other hand NIF is flexible and allows the NLP tools to add any extra annotations at will.
  4. \r\n\t
  5. NIF allows to create tool chains without a large amount of up-front development work. As the output of each tool is compatible, you can try and test really fast, whether the tools you selected actually produce what you need to solve a certain task.
  6. \r\n\t
  7. As NIF is based on RDF/OWL, you can choose from a broad range of tools and technologies to work with it:\r\n
      \r\n\t
    • RDF makes data integration easy: URIs, LinkedData
    • \r\n\t
    • OWL is based on Description Logics (Types, Type inheritance)
    • \r\n\t
    • Availability of open data sets (access and licence)
    • \r\n\t
    • Reusability of Vocabularies and Ontologies
    • \r\n\t
    • Diverse serializations for annotations: XML, Turtle,\r\nRDFa+XHTML
    • \r\n\t
    • Scalable tool support (Databases, Reasoning)
    • \r\n\t
    • Data is flexible and can be queried / transformed in many ways
    • \r\n
    \r\n
  8. \r\n
\r\n

Comparison to UIMA and Gate

\r\nNIF is almost completely orthogonal to frameworks such as Gate and UIMA. Per definitionem it is a format that represents NLP output, while Gate and UIMA are software frameworks for NLP. Here is a rough guideline, when to use NIF and when not:\r\n\r\nUse UIMA and Gate, when:\r\n
    \r\n\t
  • You need to annotate a really high amount of text on a daily basis.
  • \r\n\t
  • You already know, which tools and annotations you need and there are already adapters and plugins for UIMA or Gate .
  • \r\n\t
  • You want to solve few specialized task, such as identifying keywords or find certain facts. For this you are planning one custom application and you do not have any additional requirements for RDF or interoperability.
  • \r\n
\r\nUse NIF, when:\r\n
    \r\n
  • You are using the LOD2 Stack
  • \r\n
  • The rest of your data is already in RDF
  • \r\n
  • You want to query your text documents with SPARQL
  • \r\n\t
  • You are not sure which tools to use and want to first try them and test the results.
  • \r\n\t
  • You have a fixed text collection (or a low daily throughput) and want to unlock the implicit meaning. The text can be processed once, saved as RDF and then transformed easily or queried in a triple store.
  • \r\n\t
  • You need annotations for several languages (multilingualism) in a uniform way
  • \r\n\t\r\n
\r\nDefinitely refrain from trying to build a scalable application that uses RDF/OWL as an internal data format. RDF and OWL are great for flexibility, reasoning and data integration, but NOT performance.\r\n\r\nRather consider using UIMA and Gate and then serialise the output as NIF.\r\n\r\nFurthermore NLP2RDF provides:\r\n
    \r\n\t
  • documentation
  • \r\n\t
  • reference implementations of NIF
  • \r\n\t
  • collaboration platform
  • \r\n\t
  • tutorials / example source code
  • \r\n\t
  • mailing list for questions and support
  • \r\n\t
  • possible to join on http://nlp2rdf.org
  • \r\n
\r\n" . "title" "NIF-1.0" . "content" "This page provides the specification of NIF 1.0. For general information, Use Cases and the rationale behind NIF see the About page\r\n\r\nStatus of this page:\r\nThis document - the specification of NIF 1.0 - will remain mostly stable. The only corrections will either be clarifications, improving the readability of the text or spelling mistakes or additional NLP domain vocabularies. Major changes will be collected on the NIF 2.0 Draft page and included in the next version of NIF.\r\n\r\nThe NLP Interchange Format (NIF)\"\" is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.\r\n\r\nThe core of NIF consists of a vocabulary, which can represent Strings as RDF resources. A special URI Design is used to pinpoint annotations to a part of a document. These URIs can then be used to attach arbitrary annotations to the respective character sequence. Employing these URIs, annotations can be published on the Web as Linked Data and interchanged between different NLP tools and applications. \r\n\r\n\r\n

NIF 1.0 in a nutshell

\r\nNIF consists of the following three components:\r\n
    \r\n\t
  1. Structural Interoperability : URI recipes are used to anchor annotations in documents with the help of fragment identifiers. The URI recipes are complemented by two ontologies (String Ontology and Structured Sentence Ontology), which are used to describe the basic types of these URIs (i.e. String, Document, Word, Sentence) as well as the relations between them (subString, superString, nextWord, previousWord, etc.).
  2. \r\n\t
  3. Conceptual Interoperability: The Structured Sentence Ontology (SSO) was especially developed to connect existing ontologies with the String Ontology and thus attach common annotations to the text fragment URIs. The NIF ontology can easily be extended and integrates several NLP ontologies such as OLiA for the morpho-syntactical NLP domains, the SCMS Vocabulary and DBpedia for entity linking, as well as the NERD Ontology (below for details on the ontologies).
  4. \r\n\t
  5. Access Interoperability: A REST interface description for NIF components and web services allows NLP tools to interact on a programmatic level.
  6. \r\n
\r\nNIF-1.0 is stable and can be implemented. The experience and feedback collected during this implementation will be collected as NIF-2.0-draft. This specification is complemented by the information in this Wordpress CMS, which contains documentation on how to integrate NLP tools and adapt them to NIF. Also reference implementations are available on the and as blog posts in the Implementations category. A simple demo allows to test the web services\r\n An overview of the architecture can be found in the next section.\r\n\r\n\r\n\r\n\r\n

Architecture Overview

\r\n\"\"\r\n\r\n

Structural Interoperability

\r\nStructural Interoperability is concerned with how the RDF is structured to represent annotations.\r\n

NIF-1.0 URI Recipes

\r\nNIF-1.0 currently supports 2 recipes: one offset-based and one hash-based. The principal task of the presented URI recipes is to address a part of a document and assign a URI to it, that can serve as an annotation anchor. A URI assigned in this way must be unique and it should be possible to identify the substring in the document with the information contained in the URI.\r\n\r\n
NIF specifies how to create an identifier for uniquely locating arbitrary substrings in a document.
\r\nRunning Example. In July 2011, the website http://www.w3.org/DesignIssues/LinkedData.html was a html document consisting of 25482 characters. The term \u00E2\u0080\u009CSemantic Web\u00E2\u0080\u009D occurs 9 times at different position. If we want to annotate the substring \u00E2\u0080\u009CSemantic Web\u00E2\u0080\u009D in the section \u00E2\u0080\u009CThe four rules\u00E2\u0080\u009D in the sentence \u00E2\u0080\u009CIf it doesn\u00E2\u0080\u0099t use the universal URI set of symbols, we don\u00E2\u0080\u0099t call it Semantic Web.\u00E2\u0080\u009D, a fragment identifier such as #Semantic%20Web would definitely be insufficient and lack the necessary information to distinguish the different occurrences. It would not be unique.\r\n

General

\r\nA NIF-1.0 URI is made up of a prefix and the actual identifier. To annotate web resources it is straight-forward to use the existing URI as the basis for the prefix. NIF-1.0 does not dictate the use of the '/' or the '#' and considers it part of the prefix. The following guidelines should be met for NIF-1.0 URI recipes:\r\n
    \r\n\t
  1. As NIF is designed to have a client server architecture, the client must be able to dictate the prefix to the server. All components must therefore have a parameter prefix which determines how the produced NIF URIs will start.
  2. \r\n\t
  3. The NIF-URIs should be produced by a concatenation of the prefix and the identifier.
  4. \r\n\t
  5. The NIF component must use the prefix of the client transparently without any corrections.
  6. \r\n\t
  7. (non-normative) For practical reasons, it is recommended that the client uses a prefix that ends on # or /
  8. \r\n
\r\nRunning Example. Recommended prefixes for http://www.w3.org/DesignIssues/LinkedData.html are:\r\n\r\n

NIF Recipe: Offset-based URIs

\r\n
NOTE: This recipe is compatible with the position and range definition of RFC 5147 (Especially Section 2.1.1 ) and builds upon it in terms of encoding and counting character positions. The syntax and everything else is different though. See the design choice section for a discussion.
\r\nThe Offset-based URIs are constructed from 4 parts separated by an underscore \"_\":\r\n
    \r\n\t
  1. an identifier, in this case the string \"offset\"
  2. \r\n\t
  3. the begin index
  4. \r\n\t
  5. the end index (The indexes are counting the gaps between characters starting with 0)
  6. \r\n\t
  7. a human readable part consisting of the first 20 (or less, if the string is shorter) characters of the addressed string, urlencoded according to RFC 3986 (using '%20' instead of '+' for white space).
  8. \r\n
\r\n\r\nRunning Example. \"http://www.w3.org/DesignIssues/LinkedData.html#\" serves as the prefix. The rest is \"offset_14406_14418_Semantic%20Web\". The human readable part should match the output produced by the following Java code:\r\n
\r\nString first20Chars =  (anchoredPart.length() < 20) ? anchoredPart.substring(0, 20) : anchoredPart ;\r\nString prefix = \"http://www.w3.org/DesignIssues/LinkedData.html#\"\r\nString uri;\r\n//variant A\r\nuri = prefix + URLEncoder.encode(first20Chars , \"UTF-8\").replaceAll(\"\\+\",\"%20\");\r\n//variant B\r\nURI uri = new URI(\"http\", \"www.w3.org\",\"/DesignIssues/LinkedData.html\",first20Chars);\r\nuri = uri.toASCIIString();\r\n
\r\nor by this PHP method:\r\n
\r\n$prefix = \"http://www.w3.org/DesignIssues/LinkedData.html#\"\r\n$uri = $prefix.rawurlencode(substr($anchoredPart,0,20)\r\n
\r\nand together the final URI will look like this:\r\n\r\nhttp://www.w3.org/DesignIssues/LinkedData.html#offset_14406_14418_Semantic%20Web\r\n\r\n

NIF Recipe: Context-Hash-based URIs

\r\nThe greatest disadvantage of the offset-based recipe is that it is not stable w.r.t. changes in the document. \r\nIn case of a change of the document (insertion or deletion), all offset-based NIF-URIs after the position the change occurred become invalid. The hash-based recipe is designed to remain more robust against insertion and deletion. Some additional implication of the Context-Hash-based URIs can be found in the Design Choice Section below. The hash-based URIs are constructed from 5 parts separated by an underscore \"_\":\r\n
    \r\n\t
  1. an identifier, in this case the string \"hash\"
  2. \r\n\t
  3. the context length (number of character to the left and right used in the message for the hash-digest)
  4. \r\n\t
  5. the overall length of the addressed string
  6. \r\n\t
  7. the message digest, a 32 character HEXDIGIT md5 hash created of the string and the context. The message M consists of a certain number C of characters (see 2. context length above) to the left of the string, a bracket '(', the string itself, another bracket ')' and C characters to the right of the string: \"leftContext(String)rightContext\"
  8. \r\n\t
  9. a human readable part consisting of the first 20 (or less, if the string is shorter) characters of the addressed string, urlencoded according to RFC 3986 (using '%20' instead of '+' for white space).
  10. \r\n
\r\nRunning Example. This example uses a context length of 4, the digest therefore is:\r\n
md5(\" it (Semantic Web).\r\n\r\nThe resulting URI is this:\r\nhttp://www.w3.org/DesignIssues/LinkedData.html#hash_4_12_79edde636fac847c006605f82d4c5c4d_Semantic%20Web\r\n\r\n\r\n\"NIF-1.0\r\n\r\n

String Ontology

\r\nThe String Ontology is a vocabulary to describe Strings and builds the foundation for the NLP Interchange Format. It has a class String and a property anchorOf to anchor URIs in a given text and describe the relations between these string URIs. The ontology can be found here: http://nlp2rdf.lod2.eu/schema/string/\r\n\r\nThe available properties are descriptive. Only some properties are actually required for NIF, all others are optional and it is even discouraged to use them per default as they will cause quite a high amount of logical assertions (there are many transitive properties). Overall, only these terms are primarily important:\r\n
    \r\n\t
  • http://nlp2rdf.lod2.eu/schema/string/OffsetBasedString
  • \r\n\t
  • http://nlp2rdf.lod2.eu/schema/string/ContextHashBasedString
  • \r\n\t
  • http://nlp2rdf.lod2.eu/schema/string/Document
  • \r\n\t
  • http://nlp2rdf.lod2.eu/schema/string/anchorOf
  • \r\n\t
  • http://nlp2rdf.lod2.eu/schema/string/subString
  • \r\n
\r\n\r\n

Structured Sentence Ontology

\r\nThe Structured Sentence Ontology (SSO) is built upon the String Ontology and additionally provides classes for three basic units: Sentences, Phrases and Words. Properties such as sso:nextWord and sso:previousWord can be used to express the relations between these units. Furthermore properties are provided for the most common annotations such as the data type properties for stem, lemma, statistics, etc. The ontology can be found here: http://nlp2rdf.lod2.eu/schema/sso/\r\n\r\n

Normative requirements

\r\nThe URI recipes of NIF are designed to make it possible to have zero overhead and only use one triple per annotation, such as in the following example:\r\n
@prefix ld:  .\r\n@prefix str:  .\r\n@prefix revyu:  .\r\nld:offset_14406_14418_Semantic%20Web rev:hasComment \"Hey Tim, good idea that Semantic Web!\" . 
\r\n\r\nSome additional triples are added, however, to ease programmatic access. \r\n\r\n
1. All URIs created by the above-mentioned URI recipes should be typed with the respective OWL Class for the recipe (str:OffsetBasedString or str:ContextHashBasedString)
\r\nThis produces one additional triple per generated URI:\r\n\r\n
@prefix ld:  .\r\n@prefix str:  .\r\n@prefix revyu:  .\r\nld:offset_14406_14418_Semantic%20Web rev:hasComment \"Hey Tim, good idea that Semantic Web!\" . \r\nld:offset_14406_14418_Semantic%20web rdf:type str:OffsetBasedString . \r\n
\r\n\r\n
2. In each returned NIF model there should be at least one uri that relates to the document as a whole and either references the page with the property str:sourceUrl or includes the whole text of the document with str:sourceString
\r\nThe definition of Document in NIF is closely tied to the request issued to annotate it. So each piece of text that is sent to a service is treated as a document. This produces three additional triples per Document (or request).\r\n\r\n
@prefix ld:  .\r\n@prefix str:  .\r\nld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F rdf:type str:OffsetBasedString . \r\nld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F rdf:type str:Document . \r\nld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F str:sourceUrl  . \r\n
\r\n\r\n
3. For each document in a NIF model all other strings, that use NIF URIs should be reachable via the subStringTrans property by inference. Either str:subString must be used for this or a sub property thereof
\r\nProgramatically, it might be required to iterate over the strings contained in the document. This means that they have to be connected via a property. To achieve this requirement, either str:subString must be added between the URIs (where appropiate) or another property that is ardfs:subPropertyOf str:subString . Example of such sub properties are sso:word, sso:firstWord, sso:lastWord, sso:child.\r\n\r\n\r\n
@prefix ld:  .\r\n@prefix str:  .\r\nld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F rdf:type str:Document . \r\nld:offset_0_25482_%3Chtml%20xmlns%3D%22http%3A%2F%2F str:subString ld:offset_14406_14418_Semantic%20web \r\n
\r\n\r\n

Design Choices and Future Work

\r\nThis section explains the rationale behind the design choices of this section. Discussion and proposals are vital and possible via the comment section below or as written on the Get Involved page.\r\n
    \r\n
  • The additional brackets \"(\" and \")\" in the Context-Hash-based URIs were introduced to make the hash more distinguishable. If there is a sentence \"The dog barks.\" and the contxt size is too big, i.e. 10, then \"The\", \"dog\" and \"barks\" would have the same hash. Compare md5(\"The dog barks.\") vs. md5(\"The (dog) barks.\") vs. md5(\"The dog (barks).\")
  • \r\n
  • Note that Context-Hash-based URIs are unique identifiers of a specific string only if the context size is chosen sufficiently large. If a, for example, complete sentence is repeated in the document, parts of the preceding and/or subsequent sentences are to be included to make the reference to the string unique. However, in many NLP applications, a unique reference to a specific strings is not necessary, but rather, all word forms within the same minimal context (e.g., one preceding and one following word) are required to be analysed in the same way. Then, a Context-Hash-based URI refers uniquely to a word class, not one specific string. Using a small context, a programmer can therefore refer to a whole class of words rather than an individual word, e.g., she can assign every occurring string \u00E2\u0080\u009Cthe\u00E2\u0080\u009C (with one preceding and one following white space as context) the part-of-speech tag \u00E2\u0080\u009CDT\u00E2\u0080\u009D: md5(\u00E2\u0080\u009C (the) \u00E2\u0080\u009C); The resulting URI is http://www.w3.org/DesignIssues/LinkedData.html#hash_md5_1_5_8dc0d6c8afa469c52ac4981011b3f582_the\u00E2\u0080\u009D
  • \r\n\t
  • The two ontologies should not be included into the NIF output per default (via owl:import). Some axioms will produce a lot of entailed triples, especially the transitive properties. It is generally better to just write some additional client code to add the required annotations, such as nextWordTrans or sub and superString than inferring them by a reasoner such as Pellet. SPARQL CONSTRUCT is the way to go in this case.\r\n
    CONSTRUCT {?word1 sso:nextWordTrans ?word2} \r\nWHERE {?word1 sso:nextWord ?word2};
    \r\nexecuted once and\r\n
    CONSTRUCT {?word1 sso:nextWordTrans ?word2} \r\nWHERE {?word1 sso:nextWordTrans ?word2};
    \r\nexecuted N-1 times (N is the number of words of the longest sentence) might just serve the same purpose and will be a lot faster.
  • \r\n\t
  • At the moment only two URI Recipes were included, which operate on text in general. Text is defined as either a sequence of characters or simply everything that can be opened in a text editor. This also entails that HTML, XML, Source Code, CSS and everything else (except e.g. binary formats) can be addressed with NIF-URIs. This is the most general use case and in the future more content-specific URI Recipes such as XPath/XPointer for XML might be included.
  • \r\n\t
  • The explicit typing of the URIs with the class OffsetBasedString is useful for determining the type of the URI and before parsing it. Of course, the explicit typing is redundant, because the information is already included in the URI. This redundancy might be removed in the future.
  • \r\n
  • The compatibility problem RFC 5147 originates in the dilemma that normally fragment ids of URIs are media-type specific. RFC 5147 is valid for plain text(i.e. text without markup), which is disjoint with html. Achieving full compatibility comes with additional (unnecessary) ballast, while providing hardly any advantages for NLP tools. See the full discussion here. Final email.
  • \r\n
\r\n\r\n

Conceptual Interoperability

\r\nConceptual Interoperability is concerned with which ontologies and vocabularies are used to represent the actual annotations in RDF. We divided the different output of tools into different NLP domains. For each domain a vocabulary was or will be chosen that serves the most common use cases and facilitates interoperability. In simple cases a property has been created in the Structured Sentence Ontology. In the more complex cases fully developed linguistic ontologies already existed and were reused.\r\nThis part of the specification is also extended on the fly, i.e. everything in this section is stable (backward-compatible), but additional NLP domains might be appended, any time. New NLP domains can be requested or proposed here.\r\n\r\n

Golden rule of conceptual interoperability

\r\nHere is the golden rule of interoperability with respect to a reference ontology:\r\n
Alongside the local annotations used by the tool, minimal information must be added to be able to disambiguate the local annotations with the help of a reference ontology for data integration and interoperability.
\r\n\r\n\r\n

NLP Domains

\r\n

Begin/end index and context

\r\nAdding extra triples for indexes and context produces redundant information as the begin and end index as well as the context can be calculated from the URI. Properties for describing context and begin and end index can be found in the String Ontology.\r\n
@prefix ld:  .\r\n@prefix str:  .\r\n@prefix xsd:  .\r\nld:offset_14406_14418_Semantic%20web str:beginIndex \"14406\"^^xsd:int  .\r\nld:offset_14406_14418_Semantic%20web str:endIndex \"14418\"^^xsd:int  .\r\nld:offset_14406_14418_Semantic%20web str:leftContext \" it \" .\r\nld:offset_14406_14418_Semantic%20web str:leftContext4 \" it \" .\r\nld:offset_14406_14418_Semantic%20web str:beginIndex \".\r\n

Lemmata, stems, stop words, etc.

\r\nLemma and stem annotations are realized as simple data type properties in the Structured Sentence Ontology. \r\nA class is used for stopwords (sso:StopWord)\r\n
@prefix ld:  .\r\n@prefix sso:  .\r\nld:offset_14406_14414_Semantic sso:stem \"Semant\"  .\r\nld:offset_14406_14414_Semantic sso:lemma \"Semantic\"  .\r\nld:offset_14406_14414_Semantic a sso:StopWord  .\r\n
\r\n

Part of speech tags

\r\nAnnotations for part of speech tags must make use of Ontologies of Linguistic Annotations (OLiA) .\r\nOLiA connects local annotation tag sets with a global reference ontology. Therefore it allows to keep the specific part of speech tag at a fine granularity, while at the same time having a coarse grained reference model. The RDF output must contain the original tag as plain literal using the property sso:posTag as well as a link to the respective OLiA individual in the annotation model for the used tag set using the property sso:oliaLink. \r\n\r\nHere is an example using the Penn tag set:\r\n
@prefix ld:  .\r\n@prefix sso:  .\r\n@prefix penn:  .\r\nld:offset_14406_14414_Semantic sso:posTag \"JJ\"  .\r\nld:offset_14406_14414_Semantic sso:oliaLink penn:JJ  .\r\n
\r\n\r\nThe extended RDF output can additionally contain the following: 1. all classes of the OLiA reference ontology (especially from olia.owl and olia-top.owl) that belong to this individual can be copied and added to the str:String via rdf:type 2. the rdfs:subClassOf axioms between these classes must be copied as well.\r\nAll necessary information has to be copied for conciseness and self-containedness. If information is copied in this way it is unnecessary for the client to include the three OLiA reference ontologies: \r\nhttp://nachhalt.sfb632.uni-potsdam.de/owl/olia-top.owl, http://nachhalt.sfb632.uni-potsdam.de/owl/olia.owl, http://nachhalt.sfb632.uni-potsdam.de/owl/system.owl\r\nIn total, these three models amount to almost 1000 OWL classes. An overview of OLiA can be found here. \r\n\r\n\r\nHere is an example of the extended output:\r\n
@prefix ld:  .\r\n@prefix sso:  .\r\n@prefix penn:  .\r\n@prefix olia-top:  .\r\n@prefix olia:  .\r\nld:offset_14406_14414_Semantic sso:posTag \"JJ\"  .\r\nld:offset_14406_14414_Semantic sso:oliaLink penn:JJ  .\r\nld:offset_14406_14414_Semantic rdf:type olia:Adjective .\r\nld:offset_14406_14414_Semantic rdf:type olia-top:MorphosyntacticCategory .\r\nld:offset_14406_14414_Semantic rdf:type olia-top:LinguisticConcept .\r\nolia:Adjective rdfs:subClassOf olia-top:MorphosyntacticCategory.\r\nolia-top:MorphosyntacticCategory rdfs:subClassOf olia-top:LinguisticConcept.\r\n
\r\n\r\n
Currently, there are some small disadvantages, which might be fixed in the next NIF version:\r\n
    \r\n
  • The ontologies might still change a little now and then and are unversioned. In the next NIF version, we will provide stable releases.
  • \r\n
  • The semantics are not exactly modelled as instances of str:String are intuitively not instances of olia-top:LinguisticConcept. Normally you would model these classes as disjoint, but the amount of RDF triples is reduced the way it is currently done. Note that this saves an extra pattern in each SPARQL query.
  • \r\n
  • Also for the sake of conciseness subClassOf axioms from OLiA are repeated. This might formally be considered as ontology highjacking. Again, it is more concise than importing 1000 OWL classes.
  • \r\n
\r\n
\r\n \r\n

Syntax Analysis

\r\nin development\r\n\r\n

Topic Models

\r\nAn ontology draft for Topic models is available here:\r\nhttp://nlp2rdf.lod2.eu/schema/topic/\r\n

Named Entity Recognition and Entity Linking

\r\nFor describing Named Entity Recognition (NER), NIF uses one property from the Semantic Content Management System (SCMS) EU project. \r\nFor entity linking, the property http://ns.aksw.org/scms/means must be used.\r\n
@prefix ld:  .\r\n@prefix scms:     .\r\n@prefix dbpedia:     .\r\nld:offset_22849_22852_W3C scms:means dbpedia:World_Wide_Web_Consortium .\r\n
\r\nFor entity typing the Named Entity Recognition and Disambiguation Ontology (http://nerd.eurecom.fr/ontology/) must be used alongside the local annotation types. \r\n\r\n\r\n\r\nHere are three examples for Spotlight, Zemanta and Extractiv to showcase how important a reference ontology is, even for such a trivial annotation as organization. Note the three different spellings \"Organisation\", \"organization\", \"ORGANIZATION\";\r\n\r\n
\r\n@prefix dbo:     .\r\n@prefix nerd:     .\r\nld:offset_22849_22852_W3C rdf:type dbo:Organisation .\r\nld:offset_22849_22852_W3C rdf:type nerd:Organization .\r\n
\r\n
\r\n@prefix zemanta:      .\r\n@prefix nerd:     .\r\nld:offset_22849_22852_W3C rdf:type zemanta:organization .\r\nld:offset_22849_22852_W3C rdf:type nerd:Organization .\r\n
\r\n
\r\n@prefix extractiv:     .\r\n@prefix nerd:     .\r\nld:offset_22849_22852_W3C rdf:type extractiv:ORGANIZATION .\r\nld:offset_22849_22852_W3C rdf:type nerd:Organization .\r\n
\r\n\r\n\r\n

Access Interoperability

\r\nMost aspects of access interoperability are already tackled by using the RDF standard. This section contains several smaller additions, which further improves interoperability and accessibility of NIF Components. They are normative, but play a minor importance compared to the Structural and Conceptual Interoperability. While the first two allow general interoperability, Access provides easier integration and off-the-shelf solutions. \r\n\r\n

Workflow

\r\nNIF itself is a format, which can be used for import and export of data from and to NLP tools. Therefore NIF enables to create ad-hoc workflows following a client-server model or the SOA principle. In this approach, the client is responsible for implementing the workflow. The diagram below shows the communication model. The client sends requests to the different tools either as text or RDF and then receives an answer in RDF. This RDF can be aggregated into a local RDF model. Transparently, external data in RDF can also be requested and added without any additional formalism. For acquiring and merging external data from knowledge bases, the plentitude of existing RDF techniques (such as Linked Data or SPARQL) can be used.\r\n\r\n\"\"\r\n\r\n

Interfaces

\r\nCurrently the main interface is a wrapper that provides the NIF Web service. Other interfaces, such as CLI or a Java interface (using Jena) are easily possible and can be provided in addition to a Web service. The Web service must:\r\n
    \r\n
  1. be stateless
  2. \r\n
  3. treat POST and GET the same
  4. \r\n
  5. accept the parameters in the next section
  6. \r\n
\r\n\r\nThe component may have any number of additional parameters to configure and fine tune results. \r\n\r\n

Parameters

\r\nNote that to send the parameters to a Web service they must be url encoded first. \r\n\r\nThese parameters must be used for each request (required): \r\n
    \r\n
  • input-type = \"text\" | \"nif-owl\"\r\nDetermines the content required for the next parameter input, either plain text or RDF/XML in NIF
  • \r\n
  • input = text | rdf/xml\r\nEither the text or RDF in XML format in NIF
  • \r\n
\r\n\r\nThese parameters can be used for each request (optional): \r\n
    \r\n\t
  • nif = \"true\" | \"nif-1.0\" \r\nIf the application already has a web service, use this parameter to enable NIF output (allows to develop NIF in parallel to existing output, legacy support). It is recommend for a client to always send this with each request.
  • \r\n\t
  • format = \"rdfxml\" | \"ntriples\" | \"turtle\" | \"n3\" | \"json\" \r\nThe RDF serialisation format. Standard RDF frameworks generally support all the different formats.
  • \r\n\t
  • prefix = uriprefix \r\nA prefix, which is used to create any URIs. Therefore the client should ensure, that the prefix is valid, when used at the start of the uri, e.g. http://test.de/test#. If input-type=\"nif-owl\" was used, then the server must not change existing uris and only use the prefix when creating new URIs. It is recommended that the client matches the prefix to the previously used prefix. If the parameter is missing it should be substituted by a sensible default (e.g. the web service uri).
  • \r\n\t
  • urirecipe = \"offset\" | \"context-hash\" \r\nthe urirecipe that should be used (default is \"offset\")
  • \r\n
  • context-length = integer \r\n If the given uri recipe is context-hash, the client can determine the length of the context with it. The default must be 10.
  • \r\n
  • debug = \"true\" | \"false\" This options determines, if additional debug messages and errors should be written within the RDF. See also the next Section.
  • \r\n\r\n
\r\n\r\n

Errors

\r\nFor easier debugging, errors and messages have to be written as RDF output. If a fatal error occurs (i.e. the component was unable to produce any output), the RDF output should contain a message using the underspecified error ontology provided by NLP2RDF: http://nlp2rdf.lod2.eu/schema/error/\r\nErrors can be given any URI or blank node, must be typed as error:Error, must specify whether they are fatal or not and must contain an error message. Here is an example:\r\n
\r\n@prefix error:     .\r\nld:error_1 rdf:type error:Error\r\nld:error_1 error:fatal \"1\"^^xsd:boolean .\r\nld:error_1 error:hasMessage  \"Wrong input parameter ..., could not parse ontology\" .\r\n
\r\n\r\nIf the debug parameter is set, the component is allowed to mix error messages with the RDF that contains the annotations:\r\n
\r\n@prefix error:     .\r\nld:error_1 rdf:type error:Error\r\nld:error_1 error:fatal \"0\"^^xsd:boolean .\r\nld:error_1 error:hasMessage  \" Sentence 5 and 6 had a low confidence value for sentence splitting. \" .\r\nld:error_2 rdf:type error:Error\r\nld:error_2 error:fatal \"0\"^^xsd:boolean .\r\nld:error_2 error:hasMessage  \"Could not add extended output for OLiA.\" .\r\n
\r\n\r\n\r\n\r\n" . "title" "Tutorials & Challenges" . "content" "The tutorials for NLP2RDF and NIF come in two flavours. The first tutorial type are simple code snippets and functional help (such as how to use NIF Webservices with Jena). The second tutorial types are called Tutorial Challenges. This means that at first they are NLP challenges posed by the community, but as soon as somebody provides a solution as a blog post, they will become tutorials, as the post should describe how the challenge has been solved.\r\n\r\nIf you think you can provide an interesting task and want to see how others are solving it, you can submit it to this page and then see how other people solved the problem.\r\nPlease have a look at the Get Involved page to see how submission works.\r\n\r\n

Code Snippets

\r\n\r\n\r\n\r\n

Current unsolved challenges

\r\n" . "title" "Get Involved" . "content" "The easiest way to get questions answered or to exchange ideas is the NLP2RDF mailing list. Please send an email to (, subscribe per browser, archives). All feedback is welcome. The mailing list should also keep you in the loop about current discussions. The blog also has an RSS feed for all posts and also for each category.\r\n\r\nNormally, mailing list discussions contain a lot of useful information and thoughts, which are then lost in the mailing archives. This is why, we have a wiki at http://wiki.nlp2rdf.org, where we try to document and collect all important information. If you want to add something to the wiki please sign up. Registration is open." . "title" "Involved People" . "content" "This page lists the involved people and their contributions in the creation and adoption of NIF (this list is never complete, please email , if you are missing).\r\n\r\nWe especially thank all participants in the field test of NIF 1.0:\r\n\r\nWe are adding all other people soon (LOD2, etc)" . "title" "Publications" . "content" "

Publications

\r\n
    \r\n\t
  1. Sebastian Hellmann and Claus Stadler and Jens Lehmann: The German DBpedia: A Sense Repository for Linking Entities - Linked Data in Linguistics in 2012. \"PDFPDF
  2. \r\n\t
  3. Sebastian Hellmann and Jens Lehmann and S\u00C3\u00B6ren Auer: Linked-Data Aware URI Schemes for Referencing Text Fragments. - EKAW 2012 \"PDFPDF
  4. \r\n\t
  5. S\u00C3\u00B6ren Auer and Sebastian Hellmann: The Web of Data: Decentralized, collaborative, interlinked and interoperable. - LREC, European Language Resources Association, 2012. \"PDFPDF
  6. \r\n\t
  7. Giuseppe Rizzo and Rapha\u00C3\u00ABl Troncy and Sebastian Hellmann and Martin Bruemmer: NERD meets NIF: Lifting NLP extraction results to the linked data cloud. - LDOW, 5th Workshop on Linked Data on the Web, April 16, 2012, Lyon, France. \"PDFPDF
  8. \r\n\t
  9. Christian Chiarcos and Sebastian Hellmann and Sebastian Nordhoff: Towards a linguistic\u00C2\u00A0 linked open data cloud: The open linguistics working group. -\u00C2\u00A0 Traitement automatique des langues (to appear). \"PDFPDF
  10. \r\n\t
  11. Christian Chiarcos: A Generic Formalism to Represent Linguistic Corpora in RDF and OWL/DL. - 8th International Conference on Language Resources and Evaluation (LREC-2012). Istanbul, Turkey, May 2012. \"PDFPDF
  12. \r\n\t
  13. Christian Chiarcos:\u00C2\u00A0Ontologies of linguistic annotation: Survey and perspectives.\u00C2\u00A0 - LREC, European Language Resources Association, 2012.\u00C2\u00A0 \"PDFPDF
  14. \r\n
\r\nPublications mentioning NLP2RDF and NIF are collected in the Wiki\r\n

Presentation and Slides

\r\nMissing:\r\nkeynote LREC 2012 by S\u00C3\u00B6ren, http://www.lrec-conf.org/lrec2012/?Keynote-Speeches-and-Invited-Talk\r\nLDOW2012 Giuseppe Rizzo http://events.linkeddata.org/ldow2012/\r\n\r\n

EU Deliverables

\r\n\r\n\"\"" . "title" "Demo & Development" . "content" "This page links to all available documentation and demos. The \r\n\r\n\r\n

Links

\r\n
    \r\n
  • There is a web demo and validator available (currently under development)
  • \t\r\n
  • A blog post (with Linked Data) about each known implementation can be found under the Category Implementations. Each implementation normally has a demo web service. For example the full Stanford Core NLP NIF output can be seen here:\r\n\"My favorite actress is Natalie Portman!\"
  • \r\n\t
  • Tutorials(for clients) can be found on this page, on the Tutorials & Challenges page.
  • \r\n\t
  • Reusable Java code is available as open source in the . A readme file in the repository is explaining the basics.
  • \r\n\t\r\n\r\n\r\n
\r\n\r\n\r\n\r\n\r\n

Checklist

\r\nThe checklist is separated in three parts. The first part states the minimal requirements, which allow basic interoperability. This allows for an initial low-effort integration. Then there is a further list for all requirements of NIF components. Note that there is a third list with things that are optional. Developers should look at this list especially, because it can help them to save some time. This list includes things, which might be provided by a central component or which just produce unnecessary triples.\r\n\r\n

Minimal requirements

\r\nThe NIF wrapper ...\r\n
    \t\r\n
  1. takes text as input over some interface (CLI, Web Service, API )
  2. \r\n
  3. has implemented the parameter prefix
  4. \r\n
  5. produces well-formed RDF output (no syntax errors, N3 or RDF/XML recommended)
  6. \r\n
  7. produces RDF with Offset Based URIs
  8. \r\n
  9. fulfils the normative requirements in the Specification
  10. \r\n
  11. fulfills the golden rule of conceptual interoperability
  12. \r\n\r\n
\r\n\r\n

Remaining requirements

\r\nThe NIF wrapper ...\r\n
    \r\n \t
  1. provides a web service and complies to the Web service requirements and parameters
  2. \r\n\t
  3. has implemented the Context-hash Based URI recipe
  4. \r\n\t
  5. can read NIF and load it into the internal data structure of the tool (e.g. read tokens for POS-Taggers or read POS-Tags from NIF for NER tools.)
  6. \r\n
\r\n\r\n

Optional additions

\r\n
    \r\n\t
  1. the properties sso:nextWord, sso:nextSentence are just necessary for some use cases and should be omitted. The next/previous properties infer transitive properties, which increase reasoner work load.
  2. \r\n\t
  3. the properties str:beginIndex, str:leftContext duplicate the information contained in the uris and can be included for easier search.
  4. \r\n\t
  5. OLiA Classes and the hierarchy is copied as specified and included in the ourput.
  6. \r\n\r\n\r\n
\r\n\r\n\r\n

Maven

\r\n

Maven Dependencies

\r\nThe Maven archiva with the libraries. (Currently 1.1-SNAPSHOT is recommended)\r\nNIF Library and Ontology Bindings\r\nWeb Service\r\n\r\n

Maven Repository

\r\n
\r\n\r\n   maven.aksw.internal\r\n   University Leipzig, AKSW Maven2 Repository\r\n   http://maven.aksw.org/repository/internal\r\n\r\n\r\n   maven.aksw.snapshots\r\n   University Leipzig, AKSW Maven2 Repository\r\n   http://maven.aksw.org/repository/snapshots\r\n\r\n
\r\n\r\n

Validation of NIF Models

\r\n

OWL Validation

\r\nThe basic OWL syntax can be validated efficiently with these standard tools:\r\n\r\n\r\n\r\n\r\n\r\n" . "title" "NIF 2.0 Draft" . "content" "Improvements upon NIF 1.0 will be collected and incorporated in NIF 2.0.\r\nPlease have a look at the Wiki, where we currently collect and discuss all issues:\r\nhttp://wiki.nlp2rdf.org/wiki/Category:Issues\r\n\r\nIf you want to Get Involved, sign up on the mailing list here:\r\nhttp://nlp2rdf.org/get-involved/\r\n\r\n " . . "implements some features of NIF-1.0, but: GET is missing, 'input-type' is 'type'" . . . . "Entity Linking" . "title" "FOX" . "content" "FOX participated in the initial field test before NIF 1.0 and has not yet been updated.\r\nIt is best to try the online demo of FOX at http://fox.aksw.org\r\n\r\nCurrently, FOX only allows POST so the API can not be called within the browser but only with curl:\r\n\r\ncurl\u00A0 http://139.18.2.164:4444/api -d \"type=TEXT&nif=TRUE&task=NER&output=TURTLE&text=My%20favorite%20actress%20is%20Natalie%20Portman!\"\r\n\r\n" . . "Reference Implementation for NIF 1.0 ." . "Other languages are available with: stemmer=PorterStemmer or stemmer=HungarianStemmer and others from:\u00A0 http://lucene.apache.org/java/2_4_0/api/contrib-snowball/index.html " . . . . "title" "SnowballStemmer" . "content" "According to the Get Involved page each blog post has to start with a short introduction: I created this implementation to provide a reference implementation for NIF 1.0. \r\n\r\n\r\nThe SnowBall libraries provide basic implementations for stemming algorithms for a lot of languages.\u00A0 This NIF implementation encapsulates the stemmer.\r\n\r\n\r\n" . . "None" . "Reference Implementation for NIF 1.0 . Provides lemmas, POS tags and also (experimental) Syntax trees" . . . . "title" "Stanford CoreNLP" . "content" "According to the Get Involved page each blog post has to start with a short introduction: I created this implementation to provide a reference implementation for NIF 1.0. \r\n\r\n\r\nStanfordCore is an NLP tool, that combines lemmatizing, POS-tags, dependency parsers and many more layers. The tool currently only produces NIF output, but might be extended to read NIF input as well. There is a Demo Web service available\r\n\r\n \r\n\r\n" . "title" "Tutorial Challenge: Multilingual Part-Of-Speech Tagger" . "content" "The goal of this challenge is straight-forward: An HTML page has one text area, where you can post a text. The language of the text should be detected and then the following should be highlighted: Verbs should be highlighted in green, Nouns in red, Adjectives in orange and Articles in yellow.The highlighting should work for 5-10 languages of your choice. (The choice of colour is of course not strict, but it has to be the same across languages).\r\n\r\nA mockup can be found here: http://nlp2rdf.lod2.eu/tutorial/mutlilingual-pos/mockup.php\r\n\r\nCode: http://nlp2rdf.lod2.eu/tutorial/mutlilingual-pos/mockup.txt\r\n\r\nSome suggestions of resources that can be used, i.e. you can use anything else:\r\n
    \r\n\t
  • The connection between Stanford CoreNLP and OLiA is currently only implemented for the English pre-trained model and only for the Penn tag set.
  • \r\n\t
  • In some months, KAIST will produce a NIF adapter for a Korean POS tagger.
  • \r\n\t
  • Not all components required for this task have NIF adapters currently.
  • \r\n
" . "title" "Tutorial Challenge: Semantic Search" . "content" "According to the Get Involved page each blog post has to start with a short introduction: \r\nMy name is Sebastian and I wrote his challenge to give you a rough template for writing your own challenge. Besides I think, that the problem can be easily solved with NIF and it is a good showcase. \r\n\r\nThe goal of this challenge is to create a Semantic Search. In this context this means the following.\r\n\r\nFor a given text (see below) a user gets a search form and can enter one or several search terms. The search shall return all sentences that have \"something to do\" with the search term. Additional information should also be shown.\r\n\r\nMost of the following requirements should be met:\r\n
    \r\n\t
  • Synonyms should be included, i.e. searching for \"USA\" returns sentences with \"United States\"
  • \r\n\t
  • Some form of normalisation (stemming, lemmatising, stopword removal) should be applied.
  • \r\n\t
  • DBpedia Instances, that are in the text and match the search should shown. They can also be shown to disambiguate the search, i.e.\u00A0 when searching for \"Bush\"\u00A0 or \"Madonna\".
  • \r\n\t
  • Related and similar instances to the found DBpedia instances, that are also in the same text, i.e. Barack Obama is related to United States.
  • \r\n
\r\n

Given text

\r\nthis text should be used: http://nlp2rdf.lod2.eu/tutorial/semantic-search/search_text.txt\r\n

Mockup

\r\nA static mockup, where only \"USA\" can be searched can be found here\r\nhttp://nlp2rdf.lod2.eu/tutorial/semantic-search/mockup.php\r\nCode:\r\nhttp://nlp2rdf.lod2.eu/tutorial/semantic-search/mockup.txt\r\n\r\n\r\nSome suggestions of resources that can be used, i.e. you can use anything else.\r\n\r\n " . "title" "Tutorial Challenge: Semantic Yellow Pages" . "content" "According to the Get Involved page each blog post has to start with a short introduction: \r\nHello, I am Konrad H\u00F6ffner and I am a student of computer science at the at University of Leipzig. I love living in the future but one of the things that I am still dissatisfied with is yellow pages. They are nationally limited, pestered by advertisments (and I *hate* ads),\u00A0 don't understand synonyms, are only indexed in the language of the country of origin and/or are generell dumb (try searching for \"delicious pizza nearby\" in Google Maps). Fortunately I think the Semantic Web is the technology that can alleviate this nuisance and it's only *you* who can save the world!\r\n\r\n

Challenge

\r\nYour goal is to create a Semantic Yellow Pages Search for LinkedGeoData. In a simple html search form a user can enter keywords or a search sentence. For the search the following needs to be extracted:\r\n
    \r\n\t
  1. a location
  2. \r\n\t
  3. an amenity
  4. \r\n\t
  5. (optional) a restriction or filter condition
  6. \r\n
\r\nThis information is now to be used to construct SPARQL queries on the LinkedGeoData SPARQL endpoint (or another knowledge base if you like) and present the result to the user. Note that if no location can be found in the search string, the user's current location should be used instead. The position can be found out via the HTML 5 feature geolocation (or given in another input text field for testing ).\r\n

Example 1

\r\n\"I am looking for a optician in Paris.\" Here the location is the city of lgdo:Paris and the amenity is lgdo:Optician. There is no restriction or filter in this example. The city Paris only has a single geo point (her center). Since it is a lgdo:City a radius of 5 km is appropriate. Here is an example SPARQL query(click here to see the result):\r\n
Prefix lgd: \r\nPrefix lgdo: \r\nSelect ?optician ?name ?opticiangeo from  {\r\n   ?paris owl:sameAs  .\r\n   ?paris geo:geometry ?parisgeo .\r\n   ?optician a lgdo:Optician .\r\n   OPTIONAL {?optician rdfs:label ?name . }\r\n   ?optician geo:geometry ?opticiangeo .\r\n   Filter(bif:st_intersects(?parisgeo, ?opticiangeo, 5))\r\n}
\r\nAs an additional challenge you can adjust the radius not only for the type of the location (city:5km, country:100km), but also for the searched amenity (lgdo:Optician 5 km, lgdo:Toilets 0.4 km)\r\n

Example 2

\r\n\"cheap restaurant\": Here the task is to find cheap restaurants near the position of the user. The amenity in this case is lgdo:Restaurant. The restriction is hard to extract here and could best be translated to \"below a certain price point\" which even then still requires the application to a) find out the restaurant's prices and b) determine where that price lies (e.g. below the median or at least one standard deviation to the left of the average). Because the restriction handling is quite challenging, it is ok if you don't implement restrictions or only do it for basic cases like \"within 500 m\". If a restriction is present and the results are shown as a list, the results should be ordered according to the restriction criterion, e.g. for \"within 500 m\" they should be ordered by distance, ascending.\r\n\r\n

Requirements

\r\nMost of the following requirements should be met:\r\n
    \r\n\t
  • Synonyms should be included, i.e. searching for \"tooth doctor\" returns the same result as \"dentist\".
  • \r\n\t
  • Other languages should be included, i.e. searching for \"Zahnarzt\" returns the same result as \"dentist\".
  • \r\n\t
  • Search results should be shown as a table. The geo position and the name of the amenities should be shown along with their relevant properties (distance, opening times, etc. )
  • \r\n
\r\n\r\n

Suggested resources

\r\nSome suggestions of resources that can be used, i.e. you can use anything else.\r\n" . "title" "Tutorial: How to call a NIF web service with your favorite SemWeb library" . "content" "The parameters for NIF 1.0 can be found in the Parameter Section of the spec.\r\nBelow are example code snippets for several client side implementations. The result is always a combined RDF model of two NIF services.\r\n\r\n\r\n\r\n

curl

\r\nNote that there currently is no \"best\" RDF merge tool for the command line so we will use Jena CLI.\r\n\r\n\r\n# query snowball demo webservice\r\ncurl \"http://nlp2rdf.lod2.eu/demo/NIFStemmer?input=My%20favorite%20actress%20is%20Natalie%20Portman!&input-type=text&nif=true\" > snowball.owl\r\n# query stanford demo webservice\r\ncurl \"http://nlp2rdf.lod2.eu/demo/NIFStanfordCore?input=My%20favorite%20actress%20is%20Natalie%20Portman!&input-type=text&nif=true\" > stanford.owl\r\n#combine with Jena rdfcat\r\nrdfcat -x snowball.owl stanford.owl > combined.owl\r\n\r\n\r\n\r\n\r\n

Jena

\r\nSee http://jena.sourceforge.net\r\n
\r\n\r\nModel model = ModelFactory.createDefaultModel();\r\nString text = \"My favorite actress is Natalie Portman!\"\r\nStringBuilder p = new StringBuilder();\r\np.append(\"?input=\");\r\np.append(URLEncoder.encode(text,\"UTF-8\"));\r\np.append(\"&input-type=text\");\r\np.append(\"&nif=true\");\r\nURL stemmer = new URL(\"http://nlp2rdf.lod2.eu/demo/NIFStemmer\"+p.toString());\r\nURL stanford = new URL(\"http://nlp2rdf.lod2.eu/demo/NIFStanfordCore\"+p.toString());\r\nmodel.read(\r\n   new BufferedReader(new InputStreamReader(stemmer.openConnection().getInputStream())), null);\r\nmodel.read(\r\n   new BufferedReader(new InputStreamReader(stanford.openConnection().getInputStream())), null);\r\n\r\n
\r\n\r\n

ARC2

\r\nSee http://arc.semsol.org. This is also the code used in this .\r\n\r\n
\r\n$stemmer = \"http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&input=\".urlencode($text); \r\n$parser = ARC2::getRDFXMLParser();\r\n$parser->parse($stemmer);\r\n$stemmertriples = $parser->getTriples();\r\n$stanford = \"http://nlp2rdf.lod2.eu/demo/NIFStanfordCore?input-type=text&nif=true&input=\".urlencode($text); \r\n$parser = ARC2::getRDFXMLParser();\r\n$parser->parse($stanford);\r\n$stanfordtriples = $parser->getTriples();\r\n$alltriples = array_merge($stanfordtriples, $stemmertriples);\r\n$ser = ARC2::getTurtleSerializer();\r\n$output = $ser->getSerializedTriples($alltriples);\r\necho $output;\r\n
\r\n" . . "None" . "NIF 1.0 compliant without RDF/XML input and given error handling." . . . . "title" "MontyLingua" . "content" "My name is Marcus Nitzschke and I'm studying computer science at the University of Leipzig. This implementation was written as the practical course of the lecture \"Software aus Komponenten\" in autumn 2011. Generally I chose this topic because I'm interested in the techniques of the Semantic Web and in detail because the connection of these techniques and NLP applications meant a new experience to me.\r\n\r\nDue to the website, \"MontyLingua is a free, commonsense-enriched, end-to-end natural language understander for English\". The commonsense-enriched part let MontyLingua differ from various other NLP tools. MontyLingua combines a Tokenizer, Part-of-speech Tagger, Extractor, Lemmatiser and a so called NLGenerator, which generates naturalistic English sentences and text summaries.\r\n\r\nBecause MontyLingua is written in Python this is one of the first non-Java wrapper for NLP2RDF (Monty also provides a Java binary, but Python is more fun :)). The wrapper currently implements the Part-of-speech Tagger component of MontyLingua. For future work it would be interesting to extract informations of word relationships which are provided by MontyLingua.\r\n
" . . "None." . "NIF 1.0 compliant without RDF/XML input; JSONP output" . . . . "title" "DBpedia Spotlight" . "content" "My name is Robert Schulze and like Marcus I'm studying computer science at the University of Leipzig. For the practical course of the lecture \"Software aus Komponente\" I created a wrapper for the DBpedia Spotlight web service, that\u00A0generates NIF output.\r\n\r\nThe wrapper uses the Spotlight annotation endpoint to find named entities in a given text input. It's implemented in Node.js, runs as a web service itself and fulfills all normative and (almost) all\u00A0interface requirements given by the NIF-1.0 specification. Please have a look at the projects README\u00A0for a detailed overview. Additionally to the specification I added JSONP as a output format. JSONP output allows JavaScript developers to create client side\u00A0software on top of my implementation.\r\n\r\nFor future development I would like to support N-Triples, \u00A0Turtle and N3 as output formats. Because to the best of my knowledge there is no RDF framework/library/tool written in JavaScript or for Node.js, that supports transformations between these formats and RDF/XML, this is not the easiest goal to achieve. Furthermore I think it would be convenient to have a small reference implementation that uses the JSONP output.\r\n\r\n" . "title" "Gate ANNIE" . "content" " \r\n\r\nMy name is Didier Cehrix and like\u00C2\u00A0Marcus\u00C2\u00A0and Robert\u00C2\u00A0I\u00E2\u0080\u0099m studying computer science at the University of Leipzig. For the practical part of the lecture \u00E2\u0080\u009CSoftware aus Komponente\u00E2\u0080\u009D I created a wrapper for the NLP Software Gate, especially for the ANNIE plugin.\r\n\r\nThe wrapper use the embedded version of Gate. The ANNIE plugin is ein Part Of Speach tagger. For the NIF output must the Gate document format be converted. The Gate document use a tree and this tree must be traversed and so the RDF output generated.\r\n\r\nFor future work I will to implement a document converter from NIF to Gate. This enable to use NIF as input in Gate.\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n
HomepageGate
Additional parameterNone
WebserviceurlNIFgateAnnie
DemoHello World
CodeBitbucket
" . "title" "NIF Roadmap 2012 and pointers" . "content" "Just a repost of an email I wrote to the Stanbol Dev mailing list. See here for the discussion.\r\n\r\nBelow is a copy of the email:\r\nLast year, we have been working on the NLP Interchange Format (NIF).\r\nNIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.\r\n\r\nWhat NIF currently is:\r\n1. In Sept. 2011, we published the specification 1.0: http://nlp2rdf.org/nif-1-0 . There are about 8-12 implementations (see demo at 5.) out there, we know of.\r\n2. One of the latest draft papers about it can be found here: http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf\r\n3. Basic idea is to use # fragments to give URIs to Strings, e.g.:\r\n http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 represents the first occurence of \"Semantic Web\" in http://www.w3.org/DesignIssues/LinkedData.html\r\nOf course, you can then use this URI as subject and add any annotation you want.\r\ne.g.:\r\n:offset_717_729 its:mentions dbpedia:Semantic_Web .\r\n4. There is a Web annotator making use of the Hash URI scheme or NIF:\r\nhttp://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A//www.w3.org/DesignIssues/LinkedData.html%23frag_65b9eea6e1cc6bb9f0cd2a47751a186f\r\n5. There is a demonstrator (will be much nicer in a couple of days): http://nlp2rdf.lod2.eu/demo.php\r\nwith eye candy, but minor bug: http://nlp2rdf.lod2.eu/demo_new.php\r\n6. Apart from that NIF also tries to find best practices for annotation. E.g. OLiA idenitifers for Part of Speech tags http://www.sfb632.uni-potsdam.de/~chiarcos/ontologies.xml or NERD or the lemon model.\r\n\r\nWhat is planned for NIF:\r\na) A new spec NIF 2.0 within this year. Discussion will be on this mailing list: http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf\r\nNIF will be simplified (simpler URI Schemes and annotations), consolidated (Better implementations) and extended (ability to express confidence value and string sets, etc. )\r\nb) We plan to have implementations for NERD http://nerd.eurecom.fr , DBpedia Spotlight, Zemanta.com and DKPro http://www.ukp.tu-darmstadt.de/research/current-projects/dkpro/\r\nc) Inclusion of XPointer as NIF URI Scheme and creation of a mapping to \"string uris\". This should somehow be compatible with the Internationalisation Tag Set (ITS) 2.0 http://www.w3.org/TR/its20/ , but we are still working together on a bidirectional bridge. There have been a plethora of discussion partly at this thread: http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html\r\nd) NIF should be compatible with PROV-AQ: Provenance Access and Query http://www.w3.org/TR/2012/WD-prov-aq-20120619/\r\n\r\nWhat I am hoping for or my ideas about how Stanbol and NIF overlap:\r\nI) Reading your docu, you guys seem to be able to provide very good use cases and feedback for NIF 2.0 . We would really like to include that and also tailor NIF 2.0 to your needs. We are currently setting up a Wiki - still ugly sorry: http://wiki.nlp2rdf.org/ Please mail me for accounts.\r\nII) I would assume, that you need some OWL model for all the enhancer output. NIF standardizes NLP tool output and it tries to be blank-node free and lightweight, but still as expressive as possible. So for you this would mean that you could really save time, as ontology modelling is really tedious. By reusing NIF you would get a free data model and spec and you could focus on the implementation of the Stanbol engine. I got a 404 on http://incubator.apache.org/enhancer/enhancementstructure.html\r\nI read \"fise\" somewhere. What is it? How does it compare to NIF? What URIs do you use? How many triples do you have per annotation?\r\nIII) With NIF we focused on the RDF output for tools, not on the workflow. Stanbol seems to focus on the workflow as well, right? It might be easy to implement a NIF engine with Stanbol. This could be a good showcase for NIF and Stanbol. With a Debian package, we could include Stanbol into the LOD2 Stack http://stack.lod2.eu/" .