"content" "Improvements upon NIF 1.0 will be collected and incorporated in NIF 2.0. \r\nPlease have a look at the Wiki, where we currently collect and discuss all issues:\r\nhttp://wiki.nlp2rdf.org/wiki/Issues\r\n\r\nIf you want to Get Involved, sign up on the mailing list here:\r\nhttp://nlp2rdf.org/get-involved/\r\n\r\n\r\nOpen issues:\r\nOntology Versioning\r\nAlthough NIF 1.0 is specified the used ontologies can still change as they are not versioned as of now. \r\n\r\nShould information already encoded in the URI be duplicated?\r\nThe URIs already contain all information to uniquely find the string and also calculate things such as begin and index and also the inclusion of str:anchorOf duplicates the reference string and might lead to scalability problems.\r\nOn the other hand many things are already duplicated and can even be simplified further and/or removed such as the class str:OffsetbasedString, the property str:anchorOf or the substring relations are useful for iterating the model, but are actually redundant. \r\n\r\nShould the client or the server implement the conceptual interoperability?\r\nFor each tool the interoperability to one reference ontology has only to be implemented once, while on the other hand many clients would have to implement it. June 2011, Christian Chiarcos, Modelling linguistic corpora and their annotations with OWL/DL – Link to slides 
August 2011, Sebastian Hellmann, at Korea-Germany Joint Workshop for LOD2 Development and Application, link to slides

September 2011, Sebastian Hellmann, at Indian-summer school on Linked Data, link to slides
	September 2011, Sebastian Hellmann, at W3C Workshop Program: A Local Focus for the Multilingual Web, Limerick, link to slides

October 2011, Sebastian Hellmann, at Invited Talk at the Multilingual Semantic Web Workshop, ISWC 2012, link to slides

EU Deliverables

	 Sebastian Hellmann 2011, LOD2 EU Deliverable D 3.2.1 – NLP2RDF This page lists the involved people and their contributions in the creation and adoption of NIF.


	Konrad Höffner contributed a Tutorial Challenge: Semantic Yellow Pages 
 Christian Chiarcos maintains the Ontologies of Linguistic Annotation
 Giuseppe Rizzo and Raphael Troncy maintain the NERD Ontology
Markus Ackermann created the Topic Ontology for Mallet



We especially thank all participants in the field test of NIF 1.0:

	Markus Ackermann for for Mallet
	Martin Brümmer for OpenNLP and for UIMA 
	Didier Cherix for Gate Annie 
	Marcus Nitzschke for MontyLingua (Python) 
	Robert Schulze for DBpedia Spotlight (node-js) The easiest way to get questions answered or to exchange ideas is the NLP2RDF mailing list. Please send an email to nlp2rdf@lists.informatik.uni-leipzig.de (subscribe per email, subscribe per browser], archives). All feedback is welcome. The mailing list should also keep you in the loop about current discussions. The blog also has an RSS feed for all posts and also for each category.


Normally, mailing list discussions contain a lot of useful information and thoughts, which is then lost in the archives. This is why, we have a wiki at http://wiki.nlp2rdf.org, where we try to document all important information. If you want to add something to the wiki please sign up. Registration is open: The only corrections will either be clarifications, improving the readability of the text or spelling mistakes or additional NLP domain vocabularies. Major changes will be collected on the NIF 2.0 Draft page and included in the next version of NIF.\r\n\r\nThe NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.\r\n\r\nThe core of NIF consists of a vocabulary, which can represent Strings as RDF resources. A special URI Design is used to pinpoint annotations to a part of a document. These URIs can then be used to attach arbitrary annotations to the respective character sequence. Employing these URIs, annotations can be published on the Web as Linked Data and interchanged between different NLP tools and applications. \r\n\r\n\r\nNIF 1.0 in a nutshell\r\nNIF consists of the following three components:\r\n\r\n\tStructural Interoperability : URI recipes are used to anchor annotations in documents with the help of fragment identifiers. The URI recipes are complemented by two ontologies (String Ontology and Structured Sentence Ontology), which are used to describe the basic types of these URIs (i.e. String, Document, Word, Sentence) as well as the relations between them (subString, superString, nextWord, previousWord, etc.).\r\n\tConceptual Interoperability: The Structured Sentence Ontology (SSO) was especially developed to connect existing ontologies with the String Ontology and thus attach common annotations to the text fragment URIs. The NIF ontology can easily be extended and integrates several NLP ontologies such as OLiA for the morpho-syntactical NLP domains, the SCMS Vocabulary and DBpedia for entity linking, as well as the NERD Ontology (below for details on the ontologies).\r\n\tAccess Interoperability: A REST interface description for NIF components and web services allows NLP tools to interact on a programmatic level.\r\n\r\nNIF-1.0 is stable and can be implemented. The experience and feedback collected during this implementation will be collected as NIF-2.0-draft. This specification is complemented by the information in this Wordpress CMS, which contains documentation on how to integrate NLP tools and adapt them to NIF. Also reference implementations are available on the NLP2RDF Google code project and as blog posts in the Implementations category. A simple demo allows to test the web services\r\n An overview of the architecture can be found in the next section.\r\n\r\n\r\n\r\n\r\nArchitecture Overview\r\n\r\n\r\nStructural Interoperability\r\nStructural Interoperability is concerned with how the RDF is structured to represent annotations.\r\nNIF-1.0 URI Recipes\r\nNIF-1.0 currently supports 2 recipes: one offset-based and one hash-based. The principal task of the presented URI recipes is to address a part of a document and assign a URI to it, that can serve as an annotation anchor. A URI assigned in this way must be unique and it should be possible to identify the substring in the document with the information contained in the URI.\r\n\r\nNIF specifies how to create an identifier for uniquely locating arbitrary substrings in a document.\r\nRunning Example. In July 2011, the website http://www.w3.org/DesignIssues/LinkedData.html was a html document consisting of 25482 characters. The term \u00E2\u0080\u009CSemantic Web\u00E2\u0080\u009D occurs 9 times at different position. If we want to annotate the substring \u00E2\u0080\u009CSemantic Web\u00E2\u0080\u009D in the section \u00E2\u0080\u009CThe four rules\u00E2\u0080\u009D in the sentence \u00E2\u0080\u009CIf it doesn\u00E2\u0080\u0099t use the universal URI set of symbols, we don\u00E2\u0080\u0099t call it Semantic Web.\u00E2\u0080\u009D, a fragment identifier such as #Semantic%20Web would definitely be insufficient and lack the necessary information to distinguish the different occurrences. It would not be unique.\r\nGeneral\r\nA NIF-1.0 URI is made up of a prefix and the actual identifier. To annotate web resources it is straight-forward to use the existing URI as the basis for the prefix. NIF-1.0 does not dictate the use of the \u00E2\u0080\u0098/\u00E2\u0080\u0099 or the \u00E2\u0080\u0098#\u00E2\u0080\u0099 and considers it part of the prefix. The following guidelines should be met for NIF-1.0 URI recipes:\r\n\r\n\tAs NIF is designed to have a client server architecture, the client must be able to dictate the prefix to the server. All components must therefore have a parameter prefix which determines how the produced NIF URIs will start.\r\n\tThe NIF-URIs should be produced by a concatenation of the prefix and the identifier.\r\n\tThe NIF component must use the prefix of the client transparently without any corrections.\r\n\t(non-normative) For practical reasons, it is recommended that the client uses a prefix that ends on # or /\r\n\r\nRunning Example. Recommended prefixes for http://www.w3.org/DesignIssues/LinkedData.html are:\r\n\r\n\tOption 1: http://www.w3.org/DesignIssues/LinkedData.html/\r\n\tOption 2: http://www.w3.org/DesignIssues/LinkedData.html#\r\n\r\nNIF Recipe: Offset-based URIs\r\nNOTE: This recipe is compatible with the position and range definition of RFC 5147 (Especially Section 2.1.1 ) and builds upon it in terms of encoding and counting character positions. The syntax and everything else is different though. See the design choice section for a discussion.\r\nThe Offset-based URIs are constructed from 4 parts separated by an underscore \u00E2\u0080\u009C_\u00E2\u0080\u009D:\r\n\r\n\tan identifier, in this case the string \u00E2\u0080\u009Coffset\u00E2\u0080\u009D\r\n\tthe begin index\r\n\tthe end index (The indexes are counting the gaps between characters starting with 0)\r\n\ta human readable part consisting of the first 20 (or less, if the string is shorter) characters of the addressed string, urlencoded according to RFC 3986 (using \u00E2\u0080\u0098%20\u00E2\u0080\u00B2 instead of \u00E2\u0080\u0098+\u00E2\u0080\u0099 for white space).\r\n\r\n\r\nRunning Example. \u00E2\u0080\u009Chttp://www.w3.org/DesignIssues/LinkedData.html#\u00E2\u0080\u009D serves as the prefix. The rest is \u00E2\u0080\u009Coffset_14406_14418_Semantic%20Web\u00E2\u0080\u009C. The human readable part should match the output produced by the following Java code:\r\n\r\nString first20Chars = (anchoredPart.length() < 20) ? anchoredPart.substring(0, 20) : anchoredPart ;\r\nString prefix = \u00E2\u0080\u009Chttp://www.w3.org/DesignIssues/LinkedData.html#\u00E2\u0080\u009D\r\nString uri;\r\n//variant A\r\nuri = prefix + URLEncoder.encode(first20Chars , \u00E2\u0080\u009CUTF-8\u00E2\u0080\u00B3).replaceAll(\u00E2\u0080\u009C\+\u00E2\u0080\u009D,\u00E2\u0080\u009D%20\u00E2\u0080\u00B3);\r\n//variant B\r\nURI uri = new URI(\u00E2\u0080\u009Chttp\u00E2\u0080\u009D, \u00E2\u0080\u009Cwww.w3.org\u00E2\u0080\u009D,\u00E2\u0080\u009D/DesignIssues/LinkedData.html\u00E2\u0080\u009D,first20Chars);\r\nuri = uri.toASCIIString();\r\n\r\nor by this PHP method:\r\n\r\n$prefix = \u00E2\u0080\u009Chttp://www.w3.org/DesignIssues/LinkedData.html#\u00E2\u0080\u009D\r\n$uri = $prefix.rawurlencode(substr($anchoredPart,0,20)\r\n\r\nand together the final URI will look like this:\r\n\r\nhttp://www.w3.org/DesignIssues/LinkedData.html#offset_14406_14418_Semantic%20Web\r\n\r\nNIF Recipe: Context-Hash-based URIs\r\nThe greatest disadvantage of the offset-based recipe is that it is not stable w.r.t. changes in the document. \r\nIn case of a change of the document (insertion or deletion), all offset-based NIF-URIs after the position the change occurred become invalid. The hash-based recipe is designed to remain more robust against insertion and deletion. Some additional implication of the Context-Hash-based URIs can be found in the Design Choice Section below. The hash-based URIs are constructed from 5 parts separated by an underscore \u00E2\u0080\u009C_\u00E2\u0080\u009D:\r\n\r\n\tan identifier, in this case the string \u00E2\u0080\u009Chash\u00E2\u0080\u009D\r\n\tthe context length (number of character to the left and right used in the message for the hash-digest)\r\n\tthe overall length of the addressed string\r\n\tthe message digest, a 32 character HEXDIGIT md5 hash created of the string and the context. The message M consists of a certain number C of characters (see 2. context length above) to the left of the string, a bracket \u00E2\u0080\u0098(\u00E2\u0080\u0098, the string itself, another bracket \u00E2\u0080\u0098)\u00E2\u0080\u0099 and C characters to the right of the string: \u00E2\u0080\u009CleftContext(String)rightContext\u00E2\u0080\u009D\r\n\ta human readable part consisting of the first 20 (or less, if the string is shorter) characters of the addressed string, urlencoded according to RFC 3986 (using \u00E2\u0080\u0098%20\u00E2\u0080\u00B2 instead of \u00E2\u0080\u0098+\u00E2\u0080\u0099 for white space).\r\n\r\nRunning Example. This example uses a context length of 4, the digest therefore is:\r\nmd5(\u00E2\u0080\u009D it (Semantic Web)." . "title" "" . "content" "NLP2RDF is a LOD2 Community project that is developing the NLP Interchange Format (NIF) . NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. . The output of NLP tools can be converted into RDF and used in the LOD2 Stack. Currently a NIF-1.0 Version is created on this website (Go to NIF-1.0 Specification). Implementation of NIF 1.0 is progressing and a draft for NIF 2.0 will be refined based on the experience gained during the field test of NIF-1.0.\r\n\r\nThis project is open for anyone to join and there are several ways to get involved. For slides and further reading please have a look at the publications page. \r\n\r\nRationale\r\nThe original motivation of creating NIF was quite simple. In order to integrate NLP tools into the LOD2 stack they were required to produce RDF. Instead of writing an individual RDF wrapper for each tool, it made perfectly sense to create a common format, which was expressive enough to potentially cover all NLP tools. Furthermore instead of creating a conceptual mapping between the output of the tools, several linguistic ontologies already existed and could be reused to unify tag sets and other NLP dialects. Although NIF is being generalized to provide additional benefits the main rationale is the integration of NLP tools into the LOD2 stack.\r\n\r\nNIF Use Cases\r\nNIF aims to improve upon certain disadvantages commonly found in NLP frameworks, processes and tools. Its basic advantages are the way how annotations are represented and what kind of annotations are used. These two aspects combined provide structural interoperability as well a conceptual interoperability.\r\n\r\nHere are some claims, which we believe can be made about NIF (i.e. we have not yet found evidence that indicate otherwise respective technical feasibility):\r\n\r\n\tNIF provides global interoperability. If an NLP tool incorporates a NIF parser and a NIF serializer, it is compatible with all other tools, which implement NIF.\r\n\tNIF achieves this interoperability by using and defining a most common denominator for annotations. This means that some standard annotations are required to be used. On the other hand NIF is flexible and allows the NLP tools to add any extra annotations at will.\r\n\tNIF allows to create tool chains without a large amount of up-front development work. As the output of each tool is compatible, you can try and test really fast, whether the tools you selected actually produce what you need to solve a certain task.\r\n\tAs NIF is based on RDF/OWL, you can choose from a broad range of tools and technologies to work with it:\r\n\r\n\tRDF makes data integration easy: URIs, LinkedData\r\n\tOWL is based on Description Logics (Types, Type inheritance)\r\n\tAvailability of open data sets (access and licence)\r\n\tReusability of Vocabularies and Ontologies\r\n\tDiverse serializations for annotations: XML, Turtle,\r\nRDFa+XHTML\r\n\tScalable tool support (Databases, Reasoning)\r\n\tData is flexible and can be queried / transformed in many ways\r\n\r\n\r\n\r\nComparison to UIMA and Gate\r\nNIF is almost completely orthogonal to frameworks such as Gate and UIMA. Per definitionem it is a format that represents NLP output, while Gate and UIMA are software frameworks for NLP. Here is a rough guideline, when to use NIF and when not:\r\n\r\nUse UIMA and Gate, when:\r\n\r\n\tYou need to annotate a really high amount of text on a daily basis.\r\n\tYou already know, which tools and annotations you need and there are already adapters and plugins for UIMA or Gate .\r\n\tYou want to solve few specialized task, such as identifying keywords or find certain facts. For this you are planning one custom application and you do not have any additional requirements for RDF or interoperability.\r\n\r\nUse NIF, when:\r\n\r\n You are using the LOD2 Stack\r\n The rest of your data is already in RDF\r\n You want to query your text documents with SPARQL\r\n\tYou are not sure which tools to use and want to first try them and test the results.\r\n\tYou have a fixed text collection (or a low daily throughput) and want to unlock the implicit meaning. The text can be processed once, saved as RDF and then transformed easily or queried in a triple store.\r\n\tYou need annotations for several languages (multilingualism) in a uniform way\r\n\t\r\n\r\nDefinitely refrain from trying to build a scalable application that uses RDF/OWL as an internal data format. RDF and OWL are great for flexibility, reasoning and data integration, but NOT performance.\r\n\r\nRather consider using UIMA and Gate and then serialise the output as NIF.\r\n\r\nFurthermore NLP2RDF provides:\r\n\r\n\tdocumentation\r\n\treference implementations of NIF\r\n\tcollaboration platform\r\n\ttutorials / example source code\r\n\tmailing list for questions and support\r\n\tpossible to join on http://nlp2rdf.org\r\n\r\n" . "title" "Documentation (Wiki)" . "content" "" . "title" "About" . "content" "NLP2RDF is a LOD2 Community project that is developing the NLP Interchange Format (NIF) . Code Snippets


Current unsolved challenges

  1. Sebastian Hellmann and Claus Stadler and Jens Lehmann: The German DBpedia: A Sense Repository for Linking Entities - Linked Data in Linguistics in 2012. \"PDFPDF
  2. \r\n\t
  3. Sebastian Hellmann and Jens Lehmann and S\u00C3\u00B6ren Auer: Linked-Data Aware URI Schemes for Referencing Text Fragments. - EKAW 2012 \"PDFPDF
  4. \r\n\t
  5. S\u00C3\u00B6ren Auer and Sebastian Hellmann: The Web of Data: Decentralized, collaborative, interlinked and interoperable. - LREC, European Language Resources Association, 2012. \"PDFPDF
  6. \r\n\t
  7. Giuseppe Rizzo and Rapha\u00C3\u00ABl Troncy and Sebastian Hellmann and Martin Bruemmer: NERD meets NIF: Lifting NLP extraction results to the linked data cloud. - LDOW, 5th Workshop on Linked Data on the Web, April 16, 2012, Lyon, France. \"PDFPDF
  8. \r\n\t
  9. Christian Chiarcos and Sebastian Hellmann and Sebastian Nordhoff: Towards a linguistic\u00C2\u00A0 linked open data cloud: The open linguistics working group. -\u00C2\u00A0 Traitement automatique des langues (to appear). \"PDFPDF
  10. \r\n\t
  11. Christian Chiarcos: A Generic Formalism to Represent Linguistic Corpora in RDF and OWL/DL. - 8th International Conference on Language Resources and Evaluation (LREC-2012). Istanbul, Turkey, May 2012. \"PDFPDF
  12. \r\n\t
  13. Christian Chiarcos:\u00C2\u00A0Ontologies of linguistic annotation: Survey and perspectives.\u00C2\u00A0 - LREC, European Language Resources Association, 2012.\u00C2\u00A0 \"PDFPDF
  14. \r\n
\r\nPublications mentioning NLP2RDF and NIF are collected in the Wiki\r\n

Presentation and Slides

\r\nMissing:\r\nkeynote LREC 2012 by S\u00C3\u00B6ren, http://www.lrec-conf.org/lrec2012/?Keynote-Speeches-and-Invited-Talk\r\nLDOW2012 Giuseppe Rizzo http://events.linkeddata.org/ldow2012/\r\n\r\n

  • There is a web demo and validator available (currently under development)
  • \t\r\n
  • A blog post (with Linked Data) about each known implementation can be found under the Category Implementations. Each implementation normally has a demo web service. For example the full Stanford Core NLP NIF output can be seen here:\r\n\"My favorite actress is Natalie Portman!\"
  • \r\n\t
  • Tutorials(for clients) can be found on this page, on the Tutorials & Challenges page.
  • \r\n\t
  • Reusable Java code is available as open source in the . A readme file in the repository is explaining the basics.
Minimal requirements

\r\nThe NIF wrapper ...\r\n
  1. takes text as input over some interface (CLI, Web Service, API )
  2. \r\n
  3. has implemented the parameter prefix
  4. \r\n
  5. produces well-formed RDF output (no syntax errors, N3 or RDF/XML recommended)
  6. \r\n
  7. produces RDF with Offset Based URIs
  8. \r\n
  9. fulfils the normative requirements in the Specification
  10. \r\n
  11. fulfills the golden rule of conceptual interoperability
  12. \r\n\r\n

Remaining requirements

\r\nThe NIF wrapper ...\r\n
    \r\n \t
  1. provides a web service and complies to the Web service requirements and parameters
  2. \r\n\t
  3. has implemented the Context-hash Based URI recipe
  4. \r\n\t
  5. can read NIF and load it into the internal data structure of the tool (e.g. read tokens for POS-Taggers or read POS-Tags from NIF for NER tools.)
  6. \r\n

Optional additions

  1. the properties sso:nextWord, sso:nextSentence are just necessary for some use cases and should be omitted. The next/previous properties infer transitive properties, which increase reasoner work load.
  2. \r\n\t
  3. the properties str:beginIndex, str:leftContext duplicate the information contained in the uris and can be included for easier search.
  4. \r\n\t
  5. OLiA Classes and the hierarchy is copied as specified and included in the ourput.
OWL Validation

\r\nThe basic OWL syntax can be validated efficiently with these standard tools:\r\n\r\n\r\n\r\n\r\n\r\n" . "title" "NIF 2.0 Draft" . "content" "Improvements upon NIF 1.0 will be collected and incorporated in NIF 2.0.\r\nPlease have a look at the Wiki, where we currently collect and discuss all issues:\r\nhttp://wiki.nlp2rdf.org/wiki/Category:Issues\r\n\r\nIf you want to Get Involved, sign up on the mailing list here:\r\nhttp://nlp2rdf.org/get-involved/\r\n\r\n " . . "implements some features of NIF-1.0, but: GET is missing, 'input-type' is 'type'" . . . . "Entity Linking" . "title" "FOX" . "content" "FOX participated in the initial field test before NIF 1.0 and has not yet been updated.\r\nIt is best to try the online demo of FOX at http://fox.aksw.org\r\n\r\nCurrently, FOX only allows POST so the API can not be called within the browser but only with curl:\r\n\r\ncurl\u00A0 -d \"type=TEXT&nif=TRUE&task=NER&output=TURTLE&text=My%20favorite%20actress%20is%20Natalie%20Portman!\"\r\n\r\n" . . "Reference Implementation for NIF 1.0 ." . "Other languages are available with: stemmer=PorterStemmer or stemmer=HungarianStemmer and others from:\u00A0 http://lucene.apache.org/java/2_4_0/api/contrib-snowball/index.html " . . . . "title" "SnowballStemmer" . "content" "According to the Get Involved page each blog post has to start with a short introduction: I created this implementation to provide a reference implementation for NIF 1.0. \r\n\r\n\r\nThe SnowBall libraries provide basic implementations for stemming algorithms for a lot of languages.\u00A0 This NIF implementation encapsulates the stemmer.\r\n\r\n\r\n" . . "None" . "Reference Implementation for NIF 1.0 . Provides lemmas, POS tags and also (experimental) Syntax trees" . . . . "title" "Stanford CoreNLP" . "content" "According to the Get Involved page each blog post has to start with a short introduction: I created this implementation to provide a reference implementation for NIF 1.0. \r\n\r\n\r\nStanfordCore is an NLP tool, that combines lemmatizing, POS-tags, dependency parsers and many more layers. The tool currently only produces NIF output, but might be extended to read NIF input as well. There is a Demo Web service available\r\n\r\n \r\n\r\n" . "title" "Tutorial Challenge: Multilingual Part-Of-Speech Tagger" . "content" "The goal of this challenge is straight-forward: An HTML page has one text area, where you can post a text. The language of the text should be detected and then the following should be highlighted: Verbs should be highlighted in green, Nouns in red, Adjectives in orange and Articles in yellow.The highlighting should work for 5-10 languages of your choice. (The choice of colour is of course not strict, but it has to be the same across languages).\r\n\r\nA mockup can be found here: http://nlp2rdf.lod2.eu/tutorial/mutlilingual-pos/mockup.php\r\n\r\nCode: http://nlp2rdf.lod2.eu/tutorial/mutlilingual-pos/mockup.txt\r\n\r\nSome suggestions of resources that can be used, i.e. you can use anything else:\r\n
  • The connection between Stanford CoreNLP and OLiA is currently only implemented for the English pre-trained model and only for the Penn tag set.
  • \r\n\t
  • In some months, KAIST will produce a NIF adapter for a Korean POS tagger.
  • \r\n\t
  • Not all components required for this task have NIF adapters currently.
  • \r\n
" . "title" "Tutorial Challenge: Semantic Search" . "content" "According to the Get Involved page each blog post has to start with a short introduction: \r\nMy name is Sebastian and I wrote his challenge to give you a rough template for writing your own challenge. Besides I think, that the problem can be easily solved with NIF and it is a good showcase. \r\n\r\nThe goal of this challenge is to create a Semantic Search. In this context this means the following.\r\n\r\nFor a given text (see below) a user gets a search form and can enter one or several search terms. The search shall return all sentences that have \"something to do\" with the search term. Additional information should also be shown.\r\n\r\nMost of the following requirements should be met:\r\n
  • Synonyms should be included, i.e. searching for \"USA\" returns sentences with \"United States\"
  • \r\n\t
  • Some form of normalisation (stemming, lemmatising, stopword removal) should be applied.
  • \r\n\t
  • DBpedia Instances, that are in the text and match the search should shown. They can also be shown to disambiguate the search, i.e.\u00A0 when searching for \"Bush\"\u00A0 or \"Madonna\".
  • \r\n\t
  • Related and similar instances to the found DBpedia instances, that are also in the same text, i.e. Barack Obama is related to United States.
  • \r\n

Given text

\r\nthis text should be used: http://nlp2rdf.lod2.eu/tutorial/semantic-search/search_text.txt\r\n


\r\nA static mockup, where only \"USA\" can be searched can be found here\r\nhttp://nlp2rdf.lod2.eu/tutorial/semantic-search/mockup.php\r\nCode:\r\nhttp://nlp2rdf.lod2.eu/tutorial/semantic-search/mockup.txt\r\n\r\n\r\nSome suggestions of resources that can be used, i.e. you can use anything else.\r\n\r\n " . "title" "Tutorial Challenge: Semantic Yellow Pages" . "content" "According to the Get Involved page each blog post has to start with a short introduction: \r\nHello, I am Konrad H\u00F6ffner and I am a student of computer science at the at University of Leipzig. I love living in the future but one of the things that I am still dissatisfied with is yellow pages. They are nationally limited, pestered by advertisments (and I *hate* ads),\u00A0 don't understand synonyms, are only indexed in the language of the country of origin and/or are generell dumb (try searching for \"delicious pizza nearby\" in Google Maps). Fortunately I think the Semantic Web is the technology that can alleviate this nuisance and it's only *you* who can save the world!\r\n\r\n


\r\nYour goal is to create a Semantic Yellow Pages Search for LinkedGeoData. In a simple html search form a user can enter keywords or a search sentence. For the search the following needs to be extracted:\r\n
  1. a location
  2. \r\n\t
  3. an amenity
  4. \r\n\t
  5. (optional) a restriction or filter condition
  6. \r\n
\r\nThis information is now to be used to construct SPARQL queries on the LinkedGeoData SPARQL endpoint (or another knowledge base if you like) and present the result to the user. Note that if no location can be found in the search string, the user's current location should be used instead. The position can be found out via the HTML 5 feature geolocation (or given in another input text field for testing ).\r\n

Example 1

\r\n\"I am looking for a optician in Paris.\" Here the location is the city of lgdo:Paris and the amenity is lgdo:Optician. There is no restriction or filter in this example. The city Paris only has a single geo point (her center). Since it is a lgdo:City a radius of 5 km is appropriate. Here is an example SPARQL query(click here to see the result):\r\n
Prefix lgd: \r\nPrefix lgdo: \r\nSelect ?optician ?name ?opticiangeo from  {\r\n   ?paris owl:sameAs  .\r\n   ?paris geo:geometry ?parisgeo .\r\n   ?optician a lgdo:Optician .\r\n   OPTIONAL {?optician rdfs:label ?name . }\r\n   ?optician geo:geometry ?opticiangeo .\r\n   Filter(bif:st_intersects(?parisgeo, ?opticiangeo, 5))\r\n}
\r\nAs an additional challenge you can adjust the radius not only for the type of the location (city:5km, country:100km), but also for the searched amenity (lgdo:Optician 5 km, lgdo:Toilets 0.4 km)\r\n

Example 2

\r\n\"cheap restaurant\": Here the task is to find cheap restaurants near the position of the user. The amenity in this case is lgdo:Restaurant. The restriction is hard to extract here and could best be translated to \"below a certain price point\" which even then still requires the application to a) find out the restaurant's prices and b) determine where that price lies (e.g. below the median or at least one standard deviation to the left of the average). Because the restriction handling is quite challenging, it is ok if you don't implement restrictions or only do it for basic cases like \"within 500 m\". If a restriction is present and the results are shown as a list, the results should be ordered according to the restriction criterion, e.g. for \"within 500 m\" they should be ordered by distance, ascending.\r\n\r\n


\r\nMost of the following requirements should be met:\r\n
  • Synonyms should be included, i.e. searching for \"tooth doctor\" returns the same result as \"dentist\".
  • \r\n\t
  • Other languages should be included, i.e. searching for \"Zahnarzt\" returns the same result as \"dentist\".
  • \r\n\t
  • Search results should be shown as a table. The geo position and the name of the amenities should be shown along with their relevant properties (distance, opening times, etc. )
  • \r\n

Suggested resources

\r\nSome suggestions of resources that can be used, i.e. you can use anything else.\r\n" . "title" "Tutorial: How to call a NIF web service with your favorite SemWeb library" . "content" "The parameters for NIF 1.0 can be found in the Parameter Section of the spec.\r\nBelow are example code snippets for several client side implementations. The result is always a combined RDF model of two NIF services.\r\n\r\n\r\n\r\n


\r\nNote that there currently is no \"best\" RDF merge tool for the command line so we will use Jena CLI.\r\n\r\n\r\n# query snowball demo webservice\r\ncurl \"http://nlp2rdf.lod2.eu/demo/NIFStemmer?input=My%20favorite%20actress%20is%20Natalie%20Portman!&input-type=text&nif=true\" > snowball.owl\r\n# query stanford demo webservice\r\ncurl \"http://nlp2rdf.lod2.eu/demo/NIFStanfordCore?input=My%20favorite%20actress%20is%20Natalie%20Portman!&input-type=text&nif=true\" > stanford.owl\r\n#combine with Jena rdfcat\r\nrdfcat -x snowball.owl stanford.owl > combined.owl\r\n\r\n\r\n\r\n\r\n


\r\nSee http://jena.sourceforge.net\r\n
\r\n\r\nModel model = ModelFactory.createDefaultModel();\r\nString text = \"My favorite actress is Natalie Portman!\"\r\nStringBuilder p = new StringBuilder();\r\np.append(\"?input=\");\r\np.append(URLEncoder.encode(text,\"UTF-8\"));\r\np.append(\"&input-type=text\");\r\np.append(\"&nif=true\");\r\nURL stemmer = new URL(\"http://nlp2rdf.lod2.eu/demo/NIFStemmer\"+p.toString());\r\nURL stanford = new URL(\"http://nlp2rdf.lod2.eu/demo/NIFStanfordCore\"+p.toString());\r\nmodel.read(\r\n   new BufferedReader(new InputStreamReader(stemmer.openConnection().getInputStream())), null);\r\nmodel.read(\r\n   new BufferedReader(new InputStreamReader(stanford.openConnection().getInputStream())), null);\r\n\r\n


\r\nSee http://arc.semsol.org. This is also the code used in this .\r\n\r\n
\r\n$stemmer = \"http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&input=\".urlencode($text); \r\n$parser = ARC2::getRDFXMLParser();\r\n$parser->parse($stemmer);\r\n$stemmertriples = $parser->getTriples();\r\n$stanford = \"http://nlp2rdf.lod2.eu/demo/NIFStanfordCore?input-type=text&nif=true&input=\".urlencode($text); \r\n$parser = ARC2::getRDFXMLParser();\r\n$parser->parse($stanford);\r\n$stanfordtriples = $parser->getTriples();\r\n$alltriples = array_merge($stanfordtriples, $stemmertriples);\r\n$ser = ARC2::getTurtleSerializer();\r\n$output = $ser->getSerializedTriples($alltriples);\r\necho $output;\r\n
\r\n" . . "None" . "NIF 1.0 compliant without RDF/XML input and given error handling." . . . . "title" "MontyLingua" . "content" "My name is Marcus Nitzschke and I'm studying computer science at the University of Leipzig. This implementation was written as the practical course of the lecture \"Software aus Komponenten\" in autumn 2011. Generally I chose this topic because I'm interested in the techniques of the Semantic Web and in detail because the connection of these techniques and NLP applications meant a new experience to me.\r\n\r\nDue to the website, \"MontyLingua is a free, commonsense-enriched, end-to-end natural language understander for English\". The commonsense-enriched part let MontyLingua differ from various other NLP tools. MontyLingua combines a Tokenizer, Part-of-speech Tagger, Extractor, Lemmatiser and a so called NLGenerator, which generates naturalistic English sentences and text summaries.\r\n\r\nBecause MontyLingua is written in Python this is one of the first non-Java wrapper for NLP2RDF (Monty also provides a Java binary, but Python is more fun :)). The wrapper currently implements the Part-of-speech Tagger component of MontyLingua. For future work it would be interesting to extract informations of word relationships which are provided by MontyLingua.\r\n
