Difference between revisions of "X-Search"

From D4Science Wiki
Jump to: navigation, search
(Related Presentations/Tutorials)
(General Description)
 
(21 intermediate revisions by 3 users not shown)
Line 4: Line 4:
 
=General Description=
 
=General Description=
  
== Latest Presentation (July 2003) ==
+
[[File:xsearch.jpg|550px]]
  
* Presentation from the 6th TCOM (Skiathos, June 2013) [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=ca0f43d3-917a-4d45-b895-ecc4170ee30c slides]
+
== Latest Presentation (July 2014) ==
 +
 
 +
* Presentation from the 9th TCOM (Heraklion, July 2014) [http://goo.gl/7ou9Y1 slides]
  
 
==Persons responsible for editing/maintaining this page==
 
==Persons responsible for editing/maintaining this page==
Line 20: Line 22:
 
==Description==
 
==Description==
  
'''XSearch''' is a meta-search engine that offers semantic post-processing of results. It reads the description of an underlying search source (OpenSearch), and is able to query that source and analyze in various ways the returned results and also exploit the availability of semantic repositories (SPARQL endpoints).
+
X-Search is a meta-search engine that reads the description of an underlying search source and is able to query that source and analyze in various ways the returned results. It also exploits the availability of semantic repositories.
It also has a gCube version in which the underlying search system is gCube search.
+
 
Some of its key features are provision of textual ''clustering'' of the results, provision of snippet or content-based textual ''entity mining'', ability to fetch and display the ''semantic information'' for identified entities, etc.
+
=== Key features ===
 +
 
 +
;Provision of textual clustering of the results.
 +
:Clustering is performed either over the textual snippets or over the entire contents.  
 +
 
 +
;Provision of textual entity mining of the results.
 +
:Text entity mining can be performed either over the textual snippets or over the entire contents.
 +
 
 +
;Provision of faceted search-like exploration of the results.
 +
:The results of clustering, entity mining and metadata-based grouping are visualized and exploited according to the faceted exploration interaction paradigm: when the user clicks on a cluster or entity, the results are restricted to those that contain that cluster or entity.
 +
 
 +
;Ability to semantically explore an identified entity.
 +
:X-Search provides the necessary linkage between the mined entities and semantic information. In particular, by exploiting appropriate Semantic Knowledge Bases, the user can retrieve more information about an entity by querying and browsing over these Knowledge Bases.
 +
 
 +
;Ability to apply entity mining and explore the identified entities during plain Web browsing.
 +
:X-Search also offers entity discovery and exploration while the user is browsing on the Web. Specifically, the user is able to inspect the entities of a particular Web page by clicking a bookmarklet (a special bookmark) and then to further retrieve more information about an entity by querying a Knowledge Base.
  
 
A Detailed description of XSearch (functionality, uses cases, components) can be found at https://gcube.wiki.gcube-system.org/gcube/index.php/X-Search
 
A Detailed description of XSearch (functionality, uses cases, components) can be found at https://gcube.wiki.gcube-system.org/gcube/index.php/X-Search
Line 31: Line 48:
  
 
==Related iMarine Deliverables==
 
==Related iMarine Deliverables==
D10.4 - iMarine Data Consumpion Software [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=64c9e4df-d492-4fb3-91d5-f781ea3c29b1 pdf] - '''DELIVERED'''
+
* D10.4 - iMarine Data Consumpion Software [http://goo.gl/XWJjq0 pdf] - '''DELIVERED''' ''(October 2012)''
 
+
* D10.5 - iMarine Data Consumpion Software [http://goo.gl/avrbpx pdf] - '''DELIVERED''' ''(July 2014)''
This activity will also contribute to the forthcoming D10.5 (M27, Jan 2014) and D11.4 (M28,  Feb 2014)
+
* D11.4 - Application Programming Interface Software [http://goo.gl/n759bj pdf] - '''DELIVERED''' ''(March 2014)''
  
 
==Related Milestones==
 
==Related Milestones==
Line 41: Line 58:
 
Detailed: https://gcube.wiki.gcube-system.org/gcube/index.php/Semantic_Data_Analysis
 
Detailed: https://gcube.wiki.gcube-system.org/gcube/index.php/Semantic_Data_Analysis
  
==Related Cluster==
+
==Related Presentations/Tutorials==
  
http://wiki.i-marine.eu/index.php/Semantic_cluster_achievements
+
* XSearch-related presentations produced for the Review meetings
 +
** Presentation produced for the 1st Review [http://goo.gl/8plKik slides]
 +
** Presentation produced for the 2nd Review [http://goo.gl/7LdTA3 slides]
  
==Related Presentations/Tutorials==
+
* XSearch-related presentations from the iMarine TCOMs
 
+
** Presentation from the 1st TCOM [http://goo.gl/Lv08dx slides]
A presentation produced for the 1st Review can be found at [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=51f971f8-a407-467c-9b61-848448d29424 iMarine workspace]
+
** Presentation from the 2nd TCOM [http://goo.gl/qdf8uH slides]
 +
** Presentation from the 3rd TCOM [http://goo.gl/5AhGNW slides]
 +
** Document (Oct 2012) describing the implemented features: [http://goo.gl/qNDuK4 XSearch_Prototypes]
 +
** Presentation from the 4th TCOM [http://goo.gl/l5jse4 slides]
 +
** Presentation from the 5th TCOM [http://goo.gl/0k3Tyz slides]
 +
** Presentation from the 6th TCOM [http://goo.gl/H8Wueo slides]
 +
** Presentation from the 7th TCOM [http://goo.gl/PudH1C slides]
 +
** Presentation from the 8th TCOM [http://goo.gl/vMYcSx slides]
 +
** Presentation from the 9th TCOM [http://goo.gl/7ou9Y1 slides]
  
Some XSearch-related presentations from the iMarine TCOMs follow.
+
User's manual and screen captures
* Presentation from the 1st TCOM [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=f530a887-52dd-4656-8e11-c5e5aa31706e slides]
+
* XSearch user's manual [http://goo.gl/Nx08ij pdf]
* Presentation from the 2nd TCOM [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=a774a4d8-500e-47fa-b267-2da14528f794 slides]
+
* XSearch Demo [http://goo.gl/cij8nw avi]
* Presentation from the 3rd TCOM [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=254dfdc0-baa7-4cd5-a3f4-aa7314f04288 slides]
+
* Document (Oct 2012) describing the implemented features: [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=f224cc8d-c17a-429d-bab4-6cffabefec2e XSearch_Prototypes]
+
* Presentation from the 4th TCOM [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=b4212391-a70b-49c2-82ba-247fbb9a10ef slides]
+
* Presentation from the 5th TCOM [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=1a0de1ce-8474-4274-9c89-a06809e709cc slides]
+
* Presentation from the 6th TCOM [https://portal.i-marine.d4science.org/group/data-e-infrastructure-gateway/workspace?itemid=ca0f43d3-917a-4d45-b895-ecc4170ee30c slides]
+
* User's Manual (to add)
+
  
 
=Current Deployments=
 
=Current Deployments=
Line 81: Line 102:
 
=Related Papers=
 
=Related Papers=
  
* P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, ''Web Searching with Entity Mining at Query Time'', Proceedings of the 5th Information Retrieval Facility Conference, (IRF 2012), Vienna, July 2012. [http://www.ics.forth.gr/~fafalios/files/pubs/fafalios_2012_irf.pdf paper], [http://www.ics.forth.gr/~fafalios/files/ppts/fafalios_2012_irfc_presentation.pdf presentation], [http://www.ics.forth.gr/~fafalios/files/bibs/fafalios2012websearching.bib bib]
+
* P. Fafalios and Y. Tzitzikas, Exploratory Professional Search through Semantic Post-Analysis of Search Results, [http://link.springer.com/book/10.1007/978-3-319-12511-4 Professional Search in the Modern World], Lecture Notes in Computer Science, Vol. 8830, Springer, 2014 ([http://users.ics.forth.gr/~fafalios/files/pubs/fafalios2014explProfSearch.pdf pdf]).
 
+
* P. Fafalios and Y. Tzitzikas, Post-Analysis of Keyword-based Search Results using Entity Mining, Linked Data and Link Analysis at Query Time, IEEE 8th International Conference on Semantic Computing (ICSC'14), Newport Beach, California, USA, June 2014 ([http://users.ics.forth.gr/~fafalios/files/pubs/fafalios_2014_ieee_icsc.pdf pdf] | [http://users.ics.forth.gr/~fafalios/files/ppts/fafalios_2014_ieee_icsc_slides.pdf slides]).
* P. Fafalios, M. Salampasis and Y. Tzitzikas, Exploratory Patent Search with Faceted Search and Configurable Entity Mining. Proceedings of the 1st International Workshop on Integrating IR technologies for Professional Search in conjunction with the 35th European Conference on Information Retrieval (ECIR’13), Moscow, Russia, March 2013. [http://users.ics.forth.gr/~fafalios/files/pubs/fafalios_2013_explPatSearch.pdf paper], [http://users.ics.forth.gr/~fafalios/files/bibs/fafalios2013explPatSearch.bib bib]
+
* P. Fafalios and P. Papadakos, Theophrastus: On Demand and Real-Time Automatic Annotation and Exploration of (Web) Documents using Open Linked Data, Web Semantics: Science, Services and Agents on the World Wide Web, Elsevier (ISSN: 1570-8268), 2014 ([http://users.ics.forth.gr/~fafalios/files/pubs/fafalios_2014_theophrastus.pdf pdf]).
 
+
* P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, Web Searching with Entity Mining at Query Time, In Proceedings of the 5th Information Retrieval Facility Conference (IRF'2012), Vienna, July 2012 ([http://www.ics.forth.gr/~tzitzik/publications/Tzitzikas_2012_IRF.pdf pdf] | [http://users.ics.forth.gr/~fafalios/files/ppts/fafalios_2012_irfc_presentation.pdf slides]).
* P. Fafalios and Y. Tzitzikas. X-ENS: Semantic Enrichment of Web Search Results at Real-Time. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (demo paper), SIGIR 2013, Dublin, Ireland. [http://62.217.127.118/x-ens/ demo][http://users.ics.forth.gr/~fafalios/files/bibs/fafalios2013xens.bib bib]
+
* P. Fafalios, M. Salampasis and Y. Tzitzikas, Exploratory Patent Search with Faceted Search and Configurable Entity Mining, In Proceedings of the 1st International Workshop of Integrating IR technologies for Professional Search in conjuction with the 35th European Conference on Information Retrieval (ECIR'13), Moscow, Russia, March 2013 ([http://users.ics.forth.gr/~fafalios/files/pubs/fafalios_2013_explPatSearch.pdf pdf])
 
+
* P. Fafalios and Y. Tzitzikas, X-ENS: Semantic Enrichment of Web Search Results at Real-Time (demo paper), In Proceedings of the 36th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'13),Dublin, Ireland, August 2013 ([http://users.ics.forth.gr/~fafalios/files/pubs/fafalios_2013_sigir.pdf pdf]).
  
 
=Demo Scenarios=
 
=Demo Scenarios=
  
A number of demo scenarios could be described. This depends also on the underlying systems.  
+
A number of scenarios for demonstrating its value are possible,
An indicative scenario, e.g. for the deployment over FIGIS and FLOD, follows.
+
and obviously this  depends also on the underlying systems.
It is related with the  
+
Moreover the configurability that is offered
ticket 1190
+
(underlying search systems, categories, entities of interest, etc)
https://issue.imarine.research-infrastructures.eu/ticket/1190.
+
allows customizing  this service according to the needs of a community.
 +
An indicative scenario, e.g. for the deployment over FIGIS and FLOD, follows
 +
(it is related with the ticket 1190
 +
https://issue.imarine.research-infrastructures.eu/ticket/1190).
  
==Demo Scenario 1==
+
==Indicative Scenario==
 
Suppose that a user is looking for publications about '''''tuna'''''. Specifically he wants to find experiments that were applied to several species of tuna. So, he submits the query ''tuna'' and gets a sorted list of results and various categories of entities like ''Regional Fisheries Body'', ''Species'', ''FAO Country'', etc.   
 
Suppose that a user is looking for publications about '''''tuna'''''. Specifically he wants to find experiments that were applied to several species of tuna. So, he submits the query ''tuna'' and gets a sorted list of results and various categories of entities like ''Regional Fisheries Body'', ''Species'', ''FAO Country'', etc.   
 
User realizes that the category ''Species'' may contain interesting entities. He notices that there is an entity with the label ''yellowfin'' which is a species of tuna found in pelagic waters of tropical and subtropical oceans worldwide, and an entity with the label ''skipjack tuna'' which is another species in the tuna family.  
 
User realizes that the category ''Species'' may contain interesting entities. He notices that there is an entity with the label ''yellowfin'' which is a species of tuna found in pelagic waters of tropical and subtropical oceans worldwide, and an entity with the label ''skipjack tuna'' which is another species in the tuna family.  
Line 103: Line 127:
  
 
'''Entity Enrichment:''' By clicking the small RDF icon next to the entity’s name, user can instantly (at that time) get information about that particular entity by querying the FLOD endpoint (or the forthcoming TLO-SPARQL endpoint). For example, by clicking the icon next to ''yellowfin'' we could instantly get more information about yellowfin tuna and explore its characteristics (e.g. a list of ''is predator of'', ''is prey of'', etc.).
 
'''Entity Enrichment:''' By clicking the small RDF icon next to the entity’s name, user can instantly (at that time) get information about that particular entity by querying the FLOD endpoint (or the forthcoming TLO-SPARQL endpoint). For example, by clicking the icon next to ''yellowfin'' we could instantly get more information about yellowfin tuna and explore its characteristics (e.g. a list of ''is predator of'', ''is prey of'', etc.).
 
 
 
= Management (plans, tickets, etc)=
 
== Current Activities ==
 
A shortcut showing (automatically) all the tickets that relate to Semantic Data Analysis functional area can be found [https://issue.imarine.research-infrastructures.eu/query?status=accepted&status=assigned&status=closed&status=new&status=reopened&group=area&area=Semantic+Data+Analysis&order=type&col=id&col=summary&col=status&col=type&col=owner&col=priority&col=milestone&col=component&report=23 here].
 
Below we provide a more updated view of the current situation.
 
 
* XSearch and gCubeSearch (various issues)
 
** XSearch Portlet Memory Consumption (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/628 #628])
 
** Ranking in gCube search (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/684 #684])
 
** GUI improvements for offering an homogenized view within the iMarine portal [https://issue.imarine.research-infrastructures.eu/ticket/1854 #1854]
 
 
* XSearch that exploits objects associations (exploitation of forthcoming TLO) (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/960 #960])
 
**So far XSearch exploits entity lists which can be results of SPARQL queries (e.g. see the [http://www.ics.forth.gr/~fafalios/files/pubs/fafalios_2012_irf.pdf IRF2012] paper).  One rising issue is whether (why, how) it should also exploit the associations between these entities (i.e. the RDF properties that connect these entities).  For example, consider the case where the search results page contains two facets ''Water Areas'' and ''Species'' which contain entities which have been mined from the snippets of the search results and at the same time they are described in the underlying RDF KB.  In the KB, some entities of ''Water Areas'' could be connected with instances of ''Species'' through various RDF properties (in general they can be numerous). The question is how to exploit them in order to make the search/exploration process more powerful and/or more flexible/handy/effective. Should we exploit them only in the pop-up windows that show more information about each entity?  Should we exploit them also for restricting the current answer set? How to tackle the big number of properties that may exist? On what principles / generic solutions could be founded on? If we understand the above, the oucomes could be useful for exploiting the structuring of the forthcoming TLO. A tentative plan is to (a) try to understand the problem and sketch scenarios (by Apr 2013), (b) decide what to design/implement (by May 2013), and (c) have a first implementation (by June 2013).
 
 
* Enriching RDF files with the URIs of Named Entities (an XSearch Tagger like Agrotagger, i.e. an iMarine annotator) (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/1187 #1187], [https://issue.imarine.research-infrastructures.eu/ticket/1814, #1814]). There is now a wiki page for this:
 
http://wiki.i-marine.eu/index.php/XSearchLink
 
 
* XSearch over RDF results
 
 
== Past Activities ==
 
 
* Various xsearch-portlet activities for improving scalability and extended functionality
 
** Exploitation of IS in XSearch portlet (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/780 #780]) - '''CLOSED -  Feb 2013'''
 
** Implementation of the new incremental algorithm for extended functionality (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/1823 #1823]) - '''CLOSED - Jun 2013'''
 
**Exploitation of multiple xsearch-service instances (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/1828 #1828]) - '''CLOSED - Jun 2013'''
 
** Dynamic fetching of xsearch configuration files (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/783 #783] '''CLOSED - Nov 2012'''
 
** Retrieval of semantic information about the mined entities (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/1813 # 1813]) - '''CLOSED - Jun 2013'''
 
 
* Various activities about xsearch and gCube search
 
** Support search results snippets (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/7 #7]) '''CLOSED - Jun 2012'''
 
** Provision of textual snippets from gCube search (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/838 #838]) - '''CLOSED - Nov 2012'''
 
** Searching over multiple collections (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/839 #839]) - '''CLOSED - Mar 2013'''
 
** Configurability of TCPLocator (Ticket [https://issue.imarine.research-infrastructures.eu/ticket/627 #627]) - '''CLOSED - Feb 2013'''
 
 
== Log of plans ==
 
 
=== Proposed in Nov/Dec 2012 ===
 
The above steps should be demonstrated in the context of iMarine (certainly in the next review).
 
This is related to all activities of the semantic cluster and involved partners (IRD, FAO) and NKUA/CNR
 
(related ticket: 1190 https://issue.imarine.research-infrastructures.eu/ticket/1190)
 
 
'''Key Issue'''
 
 
During the 1st year we (FORTH) have made various prototypes
 
over systems/artifacts from the partners (FAO and IRD),
 
specificaly P2, P3 and P4, in order to investigate the rising issues and evaluate various techniques. It seems quite  reasonable (to us) to investigate whether the same functionality can be provided through the gCube, i.e. to test P5 over systems of the community partners.
 
 
A possible scenario is the following. Two community search systems (i.e. FIGIS and ECOSCOPE) are  registered in the infrastructure, e.g. as external open search systems. The added value of gCube search  is that it will enable querying both of them. XSearch then analyzes the top-results and provides the  functionality that is currently exposed by the prototypes of the 1st year. We believe that this scenario could serve as a good testbed for evaluating various things (ranking by gCube search, ticket [https://issue.imarine.research-infrastructures.eu/ticket/684 #684], communication with X-Search, registration of FLOD SPARQL Endpoint, efficiency, etc.).
 
This also raises the issue about the markup of the fields that are appropriate for applying analysis (in this case the metadata returned with the hits should pass to X-Search), ticket [https://issue.imarine.research-infrastructures.eu/ticket/363 #363]. For instance, the search result from FIGIS apart from formatted HTML can be returned in XML format which uses Dublin Core schema to encapsulate bibliographic information. Each returned hit has various textual elements, including publication title and abstract. The first is around 9 words, the second cannot go beyond 3,000 characters. As regards ECOSCOPE, the results can also be returned in XML format which uses various schemata of ECOSCOPE together with Dublin Core, SKOS, Wordnet, etc. Specifically, title, description, preferred label and comment are interesting textual elements for further analysis. This would also allow testing the snippets that are generated by gCube search (ticket [https://issue.imarine.research-infrastructures.eu/ticket/838 #838]) and how gCube searches over multiple collections (ticket [https://issue.imarine.research-infrastructures.eu/ticket/839 #839]).
 
 
Actions:  The involved partners (NKUA, FAO, IRD,  FORTH, CNR, ... ) use this as scenario.
 

Latest revision as of 11:09, 18 November 2014

General Description

Xsearch.jpg

Latest Presentation (July 2014)

  • Presentation from the 9th TCOM (Heraklion, July 2014) slides

Persons responsible for editing/maintaining this page

  • Pavlos Fafalios (fafalios@ics.forth.gr)
  • Yannis Marketakis (marketak@ics.forth.gr)

Type

Libraries, Web application, deployed (and configured) applications, gCube version (service and portlet) over gCube search.

Description

X-Search is a meta-search engine that reads the description of an underlying search source and is able to query that source and analyze in various ways the returned results. It also exploits the availability of semantic repositories.

Key features

Provision of textual clustering of the results.
Clustering is performed either over the textual snippets or over the entire contents.
Provision of textual entity mining of the results.
Text entity mining can be performed either over the textual snippets or over the entire contents.
Provision of faceted search-like exploration of the results.
The results of clustering, entity mining and metadata-based grouping are visualized and exploited according to the faceted exploration interaction paradigm: when the user clicks on a cluster or entity, the results are restricted to those that contain that cluster or entity.
Ability to semantically explore an identified entity.
X-Search provides the necessary linkage between the mined entities and semantic information. In particular, by exploiting appropriate Semantic Knowledge Bases, the user can retrieve more information about an entity by querying and browsing over these Knowledge Bases.
Ability to apply entity mining and explore the identified entities during plain Web browsing.
X-Search also offers entity discovery and exploration while the user is browsing on the Web. Specifically, the user is able to inspect the entities of a particular Web page by clicking a bookmarklet (a special bookmark) and then to further retrieve more information about an entity by querying a Knowledge Base.

A Detailed description of XSearch (functionality, uses cases, components) can be found at https://gcube.wiki.gcube-system.org/gcube/index.php/X-Search

Related iMarine WP/Tasks

T10.4 - Semantic Data Analysis Facilities

Related iMarine Deliverables

  • D10.4 - iMarine Data Consumpion Software pdf - DELIVERED (October 2012)
  • D10.5 - iMarine Data Consumpion Software pdf - DELIVERED (July 2014)
  • D11.4 - Application Programming Interface Software pdf - DELIVERED (March 2014)

Related Milestones

MS45 - Semantic Data Analysis Specification Cover Page

Detailed: https://gcube.wiki.gcube-system.org/gcube/index.php/Semantic_Data_Analysis

Related Presentations/Tutorials

  • XSearch-related presentations produced for the Review meetings
    • Presentation produced for the 1st Review slides
    • Presentation produced for the 2nd Review slides
  • XSearch-related presentations from the iMarine TCOMs
    • Presentation from the 1st TCOM slides
    • Presentation from the 2nd TCOM slides
    • Presentation from the 3rd TCOM slides
    • Document (Oct 2012) describing the implemented features: XSearch_Prototypes
    • Presentation from the 4th TCOM slides
    • Presentation from the 5th TCOM slides
    • Presentation from the 6th TCOM slides
    • Presentation from the 7th TCOM slides
    • Presentation from the 8th TCOM slides
    • Presentation from the 9th TCOM slides

User's manual and screen captures

  • XSearch user's manual pdf
  • XSearch Demo avi

Current Deployments

Key Features

X-Search has been designed to offer its functionality on top of other search systems. In particular (and according to the milestone) it offers:

  • Clustering of the results. Clustering is performed on the textual snippets of the returned results. Clustering of the textual contents is also supported. Furthermore a ranking on the identified clusters is performed.
  • Provision of extracted textual entities. Text entity mining can be performed either over the textual snippets or over the entire contents, and supports ranking of the identified entities.
  • Provision of gradual faceted search. The user is able to quickly explore the results space by exploiting the identified entities that have been mined and the results of clustering.
  • Ability to fetch semantic information about extracted entities. XSearch provides the necessary linkage between the mined entities and semantic information. In particular by exploiting appropriate knowledge bases (i.e. FactForge, DBPedia, FLOD, EcoScope KB, etc.) the user can retrieve more information about an entity by querying and browsing over these knowledge bases.
  • Exploitation of the offered services in any web page. Text entity mining can be performed over the whole contents of a particular result (HTML and PDF web pages).

Applications (click to run)

  • (P1) XSearch over Bing and TLO Warehouse (http://62.217.127.118/x-search/). This application runs on top of Bing web search engine, and analyzes the snippets of the top-K results (the default value of K is 50). In order to provide the linkage with semantic sources it uses the TLO Warehouse (accessed through a SPARQL endpoint). It also supports the analysis of more results (i.e. top 100, 200, 500), as well as the analysis over the whole content of the results (rather than just the snippets) upon user request. It is fully configurable in terms of the underlying web search engine (OpenSearch) or the knowledge bases that are used, the categories of the mined entities, etc.
  • (P3) Bookmarklet: The functionality of XSearch can be applied in any web page (or PDF file). In particular the user can trigger the bookmarklet as he is viewing a web page. The bookmarklet will retrieve the contents of this web page (or PDF file), sent it to the XSearch service and return to the user the web page he was looking with the mined entities annotated (in case of PDF file, the entities are displayed in a sidebar). Furthermore the user is able to get more information about the identified entities by exploiting the corresponding knowledge bases. The bookmarklet can be added to the user’s web browser from http://62.217.127.118/x-search/ or http://62.217.127.118/x-search-fao/ (see the upper right corner).
  • (P4) XSearch in gCube: The functionality of XSearch is now offered also over gCube search system in an integrated environment (in particular the Liferay portal). The activities carried out towards this directions was to separate the functionalities (the XSearch logic) from the actual representation of the results. For this reason we created two separate components: (a) the xsearch-service, which is responsible for performing the analysis of the results and (b) the xsearch-portlet, which is responsible for presenting the results to the user and carrying out the dialog with the user. The current version supports snippet-based results clustering, snippet-based text entity mining, entity enrichment and metadata-based grouping of the results. Both components (xsearh-service.1-0-2 and org.gcube.portlets-user.xsearch-portlet.1-0-1) have been released under gCube 2.11.1 release. Both XSearchPortlet and the XSearchService have been deployed in the portal https://portal.i-marine.d4science.org/ in the FCPPS VREVirtual Research Environment.. A list of queries we've tried can be found at iMarine workspace.

Related Papers

  • P. Fafalios and Y. Tzitzikas, Exploratory Professional Search through Semantic Post-Analysis of Search Results, Professional Search in the Modern World, Lecture Notes in Computer Science, Vol. 8830, Springer, 2014 (pdf).
  • P. Fafalios and Y. Tzitzikas, Post-Analysis of Keyword-based Search Results using Entity Mining, Linked Data and Link Analysis at Query Time, IEEE 8th International Conference on Semantic Computing (ICSC'14), Newport Beach, California, USA, June 2014 (pdf | slides).
  • P. Fafalios and P. Papadakos, Theophrastus: On Demand and Real-Time Automatic Annotation and Exploration of (Web) Documents using Open Linked Data, Web Semantics: Science, Services and Agents on the World Wide Web, Elsevier (ISSN: 1570-8268), 2014 (pdf).
  • P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, Web Searching with Entity Mining at Query Time, In Proceedings of the 5th Information Retrieval Facility Conference (IRF'2012), Vienna, July 2012 (pdf | slides).
  • P. Fafalios, M. Salampasis and Y. Tzitzikas, Exploratory Patent Search with Faceted Search and Configurable Entity Mining, In Proceedings of the 1st International Workshop of Integrating IR technologies for Professional Search in conjuction with the 35th European Conference on Information Retrieval (ECIR'13), Moscow, Russia, March 2013 (pdf)
  • P. Fafalios and Y. Tzitzikas, X-ENS: Semantic Enrichment of Web Search Results at Real-Time (demo paper), In Proceedings of the 36th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'13),Dublin, Ireland, August 2013 (pdf).

Demo Scenarios

A number of scenarios for demonstrating its value are possible, and obviously this depends also on the underlying systems. Moreover the configurability that is offered (underlying search systems, categories, entities of interest, etc) allows customizing this service according to the needs of a community. An indicative scenario, e.g. for the deployment over FIGIS and FLOD, follows (it is related with the ticket 1190 https://issue.imarine.research-infrastructures.eu/ticket/1190).

Indicative Scenario

Suppose that a user is looking for publications about tuna. Specifically he wants to find experiments that were applied to several species of tuna. So, he submits the query tuna and gets a sorted list of results and various categories of entities like Regional Fisheries Body, Species, FAO Country, etc. User realizes that the category Species may contain interesting entities. He notices that there is an entity with the label yellowfin which is a species of tuna found in pelagic waters of tropical and subtropical oceans worldwide, and an entity with the label skipjack tuna which is another species in the tuna family. Both entities contain one (common) result; one related publication which is the 17th in the ranked list. So, user by performing just one click can locate that result which is very relevant to what he is looking for. Furthermore, user is able to locate fast results that are related to several FAO countries, Regional Fisheries Bodies, Persons, etc. For example, there are 4 results about tuna that are related to Madagascar.

Entity Enrichment: By clicking the small RDF icon next to the entity’s name, user can instantly (at that time) get information about that particular entity by querying the FLOD endpoint (or the forthcoming TLO-SPARQL endpoint). For example, by clicking the icon next to yellowfin we could instantly get more information about yellowfin tuna and explore its characteristics (e.g. a list of is predator of, is prey of, etc.).