Difference between revisions of "X-Link"
(→Demo Scenarios) |
(→Plans and Next Steps) |
||
Line 105: | Line 105: | ||
A tentative plan is to: | A tentative plan is to: | ||
+ | |||
a) understand the problem and define the requirements against the challenges (by end of Feb 2013), | a) understand the problem and define the requirements against the challenges (by end of Feb 2013), | ||
+ | |||
b) decide what is required to be designed/implemented, and | b) decide what is required to be designed/implemented, and | ||
+ | |||
c) have a first implementation. | c) have a first implementation. | ||
Revision as of 12:07, 22 February 2013
General Description
Persons responsible for editing/maintaining this page
- Pavlos Fafalios (fafalios@ics.forth.gr)
- Yannis Marketakis (marketak@ics.forth.gr)
- Julien Barde (julien.barde@ird.fr)
Description
The requirements concern the development of an application (library? web app?) that, based on a knowledge base, will be able to match named entities that lie in a file to URIs. The objective is to rely upon previous demonstrations of entity mining (highlighting terms in web pages) to fit some needs of the community of users that will create new linked open data available for different clients (e.g. search engines). Among data to be turn into linked data: bibliographic references, metadata (OGC from geospatial cluster, RDF results from opensearch complying with GENESI-DEC RDF schema), named entities in documents (Word, PDF files), etc.
In brief, the aforementioned application should:
a) read the content of a file (doc/pdf/XML/RDF) or web page as input,
b) discover named entities of interest (e.g. keywords, Species, Persons, Organizations, etc.) in that file,
c) match each discovered entity with one (ideally) or more entities from the underlying knowledge bases (i.e. URIs of TLO, FLOD, Ecoscope, etc)
The supposed process is sketched in the following figure:
Examples:
1) From following author description:
<foaf:Person>
<foaf:givenname>C.</foaf:givenname>
<foaf:surname>Mellon-Duval</foaf:surname>
</foaf:Person>
create the triple:
http_www_the_uri_of_mellon foaf:name C.Mellon-Duval
2) From the following:
<dc:subject>
<z:AutomaticTag>
<rdf:value>Mediterranean</rdf:value>
</z:AutomaticTag>
</dc:subject>
create the triple:
http_www_uri_of_mediterranean rdfs:label Mediterranean
Difficulties/Challenges:
a) We must limit the probability of erroneous matchings “Entity-URI”. Possible Solution: a user will approve the matchings (however this may be laborious), or only URIs without ambiguity will be kept.
b) If for an entity we have matched more than one possible URIs, which one to select? Possible Solution: a user will select the right one (however this may be laborious).
Related iMarine WP/Tasks
It could be considered related to T10.4 - Semantic Data Analysis Facilities although it was not described in the corresponding milestone: Semantic_Data_Analysis
Related iMarine Deliverables
-
Related Milestones
-
Related Cluster
http://wiki.i-marine.eu/index.php/Semantic_cluster_achievements
Related Presentations/Tutorials
-
Current (development) status
Understand the problem and define the requirements against the challenges.
Demo Scenarios
(to describe one or more ideal scenarios)
The following figure (by Julien) depicts a possible application that exploits the functionality of X-Link:
Plans and Next Steps
A tentative plan is to:
a) understand the problem and define the requirements against the challenges (by end of Feb 2013),
b) decide what is required to be designed/implemented, and
c) have a first implementation.
Design (tentative)
X-Link could be an extension of X-Search.
INPUT: i) configuration ID (more on this below), ii) plain text OUTPUT: A set of identified entities. Each entity must have one or more corresponding URIs (according to the underlying knowledge bases).
The user will define the desired categories of entities like in the X-Search configuration Page (http://139.91.183.72/x-search-fao/sesadmin.jsp). A configuration can have a unique ID. Example: Detect FAO countries and Water Areas (from FLOD).
We could offer the aforementioned functionality as a:
- Java Library, or/and
- HTTP API (Output: JSON, XML, RDF?)
Then, one could create an application that will utilize the above library/API, e.g. a web application in which the user uploads an RDF file and gets back a set of triples.
Related Tickets
Requirements
Design
-
Implementation
-