Difference between revisions of "X-Link"

From D4Science Wiki
Jump to: navigation, search
(Created page with " __TOC__ =General Description= ==Persons responsible for editing/maintaining this page== * Pavlos Fafalios (fafalios@ics.forth.gr) * Yannis Marketakis (marketak@ics.forth.gr...")
 
 
(70 intermediate revisions by 4 users not shown)
Line 4: Line 4:
 
=General Description=
 
=General Description=
  
==Persons responsible for editing/maintaining this page==
+
The requirements described in this page has led to the implementation of the '''X-Link''' system.
 +
 
 +
'''X-Link''' is a fully configurable (Linke Data-based) Named Entity Extraction (NEE) tool which allows the user/developer to easily define the categories of entities that are interesting for the application at hand by exploiting one or more (online) Semantic Knowledge Bases (Linked Data). The user is also able to update a category and specify how to semantically link and enrich the identified entities. This enhanced configurability allows X-Link to be lightly configured for different contexts, for building domain-specific applications (e.g. for identifying drugs in a medical search system, for annotating and exploring fish species in a marine-related web page, etc.).
 +
 
 +
A detailed description of '''X-Link''' (functionality, configurability, etc.) can be found at https://wiki.gcube-system.org/gcube/index.php/X-Link
 +
 
 +
=Requirements=
 +
 
 +
==Involved Persons==
  
 
* Pavlos Fafalios (fafalios@ics.forth.gr)
 
* Pavlos Fafalios (fafalios@ics.forth.gr)
  
* Yannis Marketakis (marketak@ics.forth.gr)
+
* Manolis Baritakis (mbaritak@ics.forth.gr)
  
 
* Julien Barde (julien.barde@ird.fr)
 
* Julien Barde (julien.barde@ird.fr)
Line 14: Line 22:
 
==Description==
 
==Description==
  
The requirements concern the development of an application (library? web app?) that, based on a knowledge base, will be able to match named entities that lie in a file to URIs.  
+
The requirements concern the development of an application (library? RESTful service? ..?) that, based on a knowledge base, will be able to match named entities that lie in a file to URIs.  
 
+
 
The objective is to rely upon previous demonstrations of entity mining (highlighting terms in web pages) to fit some needs of the community of users that will create new linked open data available for different clients (e.g. search engines).  
 
The objective is to rely upon previous demonstrations of entity mining (highlighting terms in web pages) to fit some needs of the community of users that will create new linked open data available for different clients (e.g. search engines).  
 
 
Among data to be turn into linked data: bibliographic references, metadata (OGC from geospatial cluster, RDF results from opensearch complying with GENESI-DEC RDF schema), named entities in documents (Word, PDF files), etc.
 
Among data to be turn into linked data: bibliographic references, metadata (OGC from geospatial cluster, RDF results from opensearch complying with GENESI-DEC RDF schema), named entities in documents (Word, PDF files), etc.
  
 
In brief, the aforementioned application should:
 
In brief, the aforementioned application should:
 +
 
a) read the content of a file (doc/pdf/XML/RDF) or web page as input,
 
a) read the content of a file (doc/pdf/XML/RDF) or web page as input,
 +
 
b) discover named entities of interest (e.g. keywords, Species, Persons, Organizations, etc.) in that file,
 
b) discover named entities of interest (e.g. keywords, Species, Persons, Organizations, etc.) in that file,
 +
 
c) match each discovered entity with one (ideally) or more entities from the underlying knowledge bases (i.e. URIs of TLO, FLOD, Ecoscope, etc)
 
c) match each discovered entity with one (ideally) or more entities from the underlying knowledge bases (i.e. URIs of TLO, FLOD, Ecoscope, etc)
  
 
The supposed process is sketched in the following figure:
 
The supposed process is sketched in the following figure:
 +
 +
[[File:XSearchLink.png|900px|X-Search-Link process]]
  
 
Examples:
 
Examples:
 +
 
1) From following author description:
 
1) From following author description:
 +
 
<foaf:Person>
 
<foaf:Person>
 +
 
<foaf:givenname>C.</foaf:givenname>
 
<foaf:givenname>C.</foaf:givenname>
 +
 
<foaf:surname>Mellon-Duval</foaf:surname>
 
<foaf:surname>Mellon-Duval</foaf:surname>
 +
 
</foaf:Person>
 
</foaf:Person>
create the triple:  
+
 
http://www.the_uri_of_mellon... foaf:name C.Mellon-Duval
+
The tagger will find the triple of the related foaf:agent in Ecoscope SPARQL enpoint (or other endpoints):  
 +
 
 +
http://www.ecoscope.org/ontologies/agents/capucineMellon    foaf:name     C.Mellon-Duval
  
 
2) From the following:
 
2) From the following:
 +
 
<dc:subject>
 
<dc:subject>
 +
 
<z:AutomaticTag>
 
<z:AutomaticTag>
 +
 
<rdf:value>Mediterranean</rdf:value>
 
<rdf:value>Mediterranean</rdf:value>
 +
 
</z:AutomaticTag>
 
</z:AutomaticTag>
 +
 
</dc:subject>
 
</dc:subject>
create the triple:
+
 
http://www.uri_of_mediterranean...    rdfs:label Mediterranean
+
The tagger will find the triple of the related ecosystem in Ecoscope SPARQL enpoint (or other endpoints):  
 +
 
 +
http://www.ecoscope.org/ontologies/ecosystems/mediterranean_ecosystem    rdfs:label     Mediterranean
  
 
Difficulties/Challenges:
 
Difficulties/Challenges:
 +
 
a) We must limit the probability of erroneous matchings “Entity-URI”.  
 
a) We must limit the probability of erroneous matchings “Entity-URI”.  
 
Possible Solution: a user will approve the matchings (however this may be laborious), or only URIs without ambiguity will be kept.
 
Possible Solution: a user will approve the matchings (however this may be laborious), or only URIs without ambiguity will be kept.
 +
 
b) If for an entity we have matched more than one possible URIs, which one to select?  
 
b) If for an entity we have matched more than one possible URIs, which one to select?  
 
Possible Solution: a user will select the right one (however this may be laborious).
 
Possible Solution: a user will select the right one (however this may be laborious).
  
 +
==Related D4Science WP/Tasks==
  
==Related iMarine WP/Tasks==
+
''T10.4 - Semantic Data Analysis Facilities''. It has been described in the extended version of the DoW.
  
It could be considered related to ''T10.4 - Semantic Data Analysis Facilities'' although it was not described in the corresponding milestone: [https://gcube.wiki.gcube-system.org/gcube/index.php/Semantic_Data_Analysis Semantic_Data_Analysis]
+
==Related Cluster==
  
==Related iMarine Deliverables==
+
[[Semantic_cluster_achievements]]
  
-
+
==Related Presentations/Tutorials==
  
==Related Milestones==
+
* X-Link presentation from the 6th TCOM (@ Skiathos) [http://goo.gl/H8Wueo - slides]
 +
* X-Link presentation from the 7th TCOM (@ Rome) [http://goo.gl/kN6qs5 slides]
 +
* X-Link presentation from the 8th TCOM (@ Athens) [http://goo.gl/HxhdAO slides]
 +
* X-Link presentation from the 9th TCOM (@ Heraklion) [http://goo.gl/7ou9Y1 slides]
  
-
+
==Plans and Next Steps==
  
==Related Cluster==
+
A tentative plan is to:
  
http://wiki.i-marine.eu/index.php/Semantic_cluster_achievements
+
a) understand the problem and define the requirements against the challenges (by end of Feb 2013),
  
==Related Presentations/Tutorials==
+
b) decide what is required to be designed/implemented (by end of Apr 2013), and
  
-
+
c) plan and have a first implementation.
  
==Current (development) status==
+
=Demo Scenarios=
  
Understand the problem and define the requirements against the challenges
+
(to describe one or more ideal scenarios)
  
==Demo Scenarios==
+
The following figure (by Julien) depicts a possible application that exploits the functionality of X-Search-Link:
  
(to describe one or more ideal scenarios)
+
[[File:mockupOfXLinkUsage.png|900px|X-Search-Link possible application]]
  
The following figure (by Julien) depicts a possible application that exploits the functionality of X-Link:
+
=Design (tentative)=
  
 +
We propose to start from a software library
 +
(in future one could easily use it to provide a web service, or any other user interface, gcube-related or not gcube-related).
  
=Plans and Next Steps=
+
A rough description follows:
  
A tentative plan is to:
+
=== Setup ===
a) understand the problem and define the requirements against the challenges (by end of Feb 2013),
+
b) decide what is required to be designed/implemented, and
+
c) have a first implementation.
+
  
=Design (tentative)=
+
At setup time the user defines the desired categories and entity lists.
 +
To add a category (type) of entities user has to provide:
  
X-Link could be an extension of X-Search.
+
* a category name,
  
INPUT: i) configuration ID (more on this below), ii) plain text
+
* a sparql endpoint,
OUTPUT: A set of identified entities. Each entity must have one or more corresponding URIs (according to the underlying knowledge bases).
+
  
The user will define the desired categories of entities like in the X-Search configuration Page (http://139.91.183.72/x-search-fao/sesadmin.jsp). A configuration can have a unique ID. Example: Detect FAO countries and Water Areas (from FLOD).  
+
* the URI of the resource class (type), e.g. ''http://www.ecoscope.org/ontologies/ecosystems_def#shark'' (or a SPARQL query that returns the desired lists of entities URIs).
  
We could offer the aforementioned functionality as a:
+
When a new category of entities is added, '''X-Search-Link'''
* Java Library, or/and
+
i) stores the sparql endpoint to use,
* HTTP API (Output: JSON, XML, RDF?)
+
ii) constructs a sparql template query which will be used for retrieving entities related to a string, and
Then, one could create an application that will utilize the above library/API, e.g. a web application in which the user uploads an RDF file and gets back a set of triples.  
+
iii) stores a list of named entities belonging to the category (i.e. instances of the class) and for each entity one or more corresponding URI(s).
  
=Related Tickets=
+
=== Input ===
  
[https://issue.imarine.research-infrastructures.eu/ticket/1187 #1187]
+
To apply its functionality over a particular document the user has to specify:
  
==Requirements==
+
* the '''document''' he wants to analyze (e.g. pdf, doc, rdf, xml, web page, etc)
  
==Design==
+
* the '''categories of entities''' for which he wants to detect entities in the document (e.g. Countries, Species, Water Areas, etc, or all possible categories), subset of the categories defined at startup.
  
-
+
===Output===
 +
 
 +
The '''desired result''' is a list of entities, each described by a name and one or more URI(s).
 +
 
 +
A detailed possible specification (classes and method signatures), plus an example of how a client could use it, is given in the following page: [http://wiki.i-marine.eu/index.php/XSearchLink_Specification XSearchLink Specification]
 +
 
 +
We estimate that, a first implementation of the above specification, will require around 1.5PM.
 +
 
 +
=Related Tickets=
 +
 
 +
==Requirements==
 +
 
 +
[https://issue.D4Science.research-infrastructures.eu/ticket/1187 Enriching RDF files with the URIs of Named Entities (#1187)]
  
 
==Implementation==
 
==Implementation==
  
-
+
[https://issue.D4Science.research-infrastructures.eu/ticket/1814 X-Link: Enriching documents with the URIs of Named Entities (#1814)]
 +
 
 +
and Y. Tzitzikas, Configuring Named Entity Extraction through Real-Time Exploitation of Linked Data, 4th International Conference on Web Intelligence, Mining and Semantics (WIMS'14), Thessaloniki, Greece, June [[2014 ([http://www.ics.forth.gr/isl/X-Link/files/fafalios_2014_wims.pdf pdf] | [http://users.ics.forth.gr/~fafalios/files/ppts/fafalios_2014_xlink.pdf slides])]]
 +
[[]]

Latest revision as of 18:10, 3 September 2015

General Description

The requirements described in this page has led to the implementation of the X-Link system.

X-Link is a fully configurable (Linke Data-based) Named Entity Extraction (NEE) tool which allows the user/developer to easily define the categories of entities that are interesting for the application at hand by exploiting one or more (online) Semantic Knowledge Bases (Linked Data). The user is also able to update a category and specify how to semantically link and enrich the identified entities. This enhanced configurability allows X-Link to be lightly configured for different contexts, for building domain-specific applications (e.g. for identifying drugs in a medical search system, for annotating and exploring fish species in a marine-related web page, etc.).

A detailed description of X-Link (functionality, configurability, etc.) can be found at https://wiki.gcube-system.org/gcube/index.php/X-Link

Requirements

Involved Persons

  • Pavlos Fafalios (fafalios@ics.forth.gr)
  • Manolis Baritakis (mbaritak@ics.forth.gr)
  • Julien Barde (julien.barde@ird.fr)

Description

The requirements concern the development of an application (library? RESTful service? ..?) that, based on a knowledge base, will be able to match named entities that lie in a file to URIs. The objective is to rely upon previous demonstrations of entity mining (highlighting terms in web pages) to fit some needs of the community of users that will create new linked open data available for different clients (e.g. search engines). Among data to be turn into linked data: bibliographic references, metadata (OGC from geospatial cluster, RDF results from opensearch complying with GENESI-DEC RDF schema), named entities in documents (Word, PDF files), etc.

In brief, the aforementioned application should:

a) read the content of a file (doc/pdf/XML/RDF) or web page as input,

b) discover named entities of interest (e.g. keywords, Species, Persons, Organizations, etc.) in that file,

c) match each discovered entity with one (ideally) or more entities from the underlying knowledge bases (i.e. URIs of TLO, FLOD, Ecoscope, etc)

The supposed process is sketched in the following figure:

X-Search-Link process

Examples:

1) From following author description:

<foaf:Person>

<foaf:givenname>C.</foaf:givenname>

<foaf:surname>Mellon-Duval</foaf:surname>

</foaf:Person>

The tagger will find the triple of the related foaf:agent in Ecoscope SPARQL enpoint (or other endpoints):

http://www.ecoscope.org/ontologies/agents/capucineMellon     foaf:name     C.Mellon-Duval

2) From the following:

<dc:subject>

<z:AutomaticTag>

<rdf:value>Mediterranean</rdf:value>

</z:AutomaticTag>

</dc:subject>

The tagger will find the triple of the related ecosystem in Ecoscope SPARQL enpoint (or other endpoints):

http://www.ecoscope.org/ontologies/ecosystems/mediterranean_ecosystem     rdfs:label     Mediterranean

Difficulties/Challenges:

a) We must limit the probability of erroneous matchings “Entity-URI”. Possible Solution: a user will approve the matchings (however this may be laborious), or only URIs without ambiguity will be kept.

b) If for an entity we have matched more than one possible URIs, which one to select? Possible Solution: a user will select the right one (however this may be laborious).

Related D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. WP/Tasks

T10.4 - Semantic Data Analysis Facilities. It has been described in the extended version of the DoW.

Related Cluster

Semantic_cluster_achievements

Related Presentations/Tutorials

  • X-Link presentation from the 6th TCOM (@ Skiathos) - slides
  • X-Link presentation from the 7th TCOM (@ Rome) slides
  • X-Link presentation from the 8th TCOM (@ Athens) slides
  • X-Link presentation from the 9th TCOM (@ Heraklion) slides

Plans and Next Steps

A tentative plan is to:

a) understand the problem and define the requirements against the challenges (by end of Feb 2013),

b) decide what is required to be designed/implemented (by end of Apr 2013), and

c) plan and have a first implementation.

Demo Scenarios

(to describe one or more ideal scenarios)

The following figure (by Julien) depicts a possible application that exploits the functionality of X-Search-Link:

X-Search-Link possible application

Design (tentative)

We propose to start from a software library (in future one could easily use it to provide a web service, or any other user interface, gcube-related or not gcube-related).

A rough description follows:

Setup

At setup time the user defines the desired categories and entity lists. To add a category (type) of entities user has to provide:

  • a category name,
  • a sparql endpoint,

When a new category of entities is added, X-Search-Link i) stores the sparql endpoint to use, ii) constructs a sparql template query which will be used for retrieving entities related to a string, and iii) stores a list of named entities belonging to the category (i.e. instances of the class) and for each entity one or more corresponding URI(s).

Input

To apply its functionality over a particular document the user has to specify:

  • the document he wants to analyze (e.g. pdf, doc, rdf, xml, web page, etc)
  • the categories of entities for which he wants to detect entities in the document (e.g. Countries, Species, Water Areas, etc, or all possible categories), subset of the categories defined at startup.

Output

The desired result is a list of entities, each described by a name and one or more URI(s).

A detailed possible specification (classes and method signatures), plus an example of how a client could use it, is given in the following page: XSearchLink Specification

We estimate that, a first implementation of the above specification, will require around 1.5PM.

Related Tickets

Requirements

Enriching RDF files with the URIs of Named Entities (#1187)

Implementation

X-Link: Enriching documents with the URIs of Named Entities (#1814)

and Y. Tzitzikas, Configuring Named Entity Extraction through Real-Time Exploitation of Linked Data, 4th International Conference on Web Intelligence, Mining and Semantics (WIMS'14), Thessaloniki, Greece, June [[2014 (pdf | slides)]] [[]]