X-Search

From D4Science Wiki
Revision as of 14:09, 17 November 2014 by Pavlos.fafalios (Talk | contribs) (Related Cluster)

Jump to: navigation, search

General Description

Xsearch.jpg]

Latest Presentation (July 2013)

  • Presentation from the 6th TCOM (Skiathos, June 2013) slides

Persons responsible for editing/maintaining this page

  • Pavlos Fafalios (fafalios@ics.forth.gr)
  • Yannis Marketakis (marketak@ics.forth.gr)

Type

Libraries, Web application, deployed (and configured) applications, gCube version (service and portlet) over gCube search.

Description

X-Search is a meta-search engine that reads the description of an underlying search source and is able to query that source and analyze in various ways the returned results. It also exploits the availability of semantic repositories.

Key features

Provision of textual clustering of the results.
Clustering is performed either over the textual snippets or over the entire contents.
Provision of textual entity mining of the results.
Text entity mining can be performed either over the textual snippets or over the entire contents.
Provision of faceted search-like exploration of the results.
The results of clustering, entity mining and metadata-based grouping are visualized and exploited according to the faceted exploration interaction paradigm: when the user clicks on a cluster or entity, the results are restricted to those that contain that cluster or entity.
Ability to semantically explore an identified entity.
X-Search provides the necessary linkage between the mined entities and semantic information. In particular, by exploiting appropriate Semantic Knowledge Bases, the user can retrieve more information about an entity by querying and browsing over these Knowledge Bases.
Ability to apply entity mining and explore the identified entities during plain Web browsing.
X-Search also offers entity discovery and exploration while the user is browsing on the Web. Specifically, the user is able to inspect the entities of a particular Web page by clicking a bookmarklet (a special bookmark) and then to further retrieve more information about an entity by querying a Knowledge Base.

A Detailed description of XSearch (functionality, uses cases, components) can be found at https://gcube.wiki.gcube-system.org/gcube/index.php/X-Search

Related iMarine WP/Tasks

T10.4 - Semantic Data Analysis Facilities

Related iMarine Deliverables

D10.4 - iMarine Data Consumpion Software pdf - DELIVERED

This activity will also contribute to the forthcoming D10.5 (M27, Jan 2014) and D11.4 (M28, Feb 2014)

Related Milestones

MS45 - Semantic Data Analysis Specification Cover Page

Detailed: https://gcube.wiki.gcube-system.org/gcube/index.php/Semantic_Data_Analysis

Related Presentations/Tutorials

A presentation produced for the 1st Review can be found at iMarine workspace

Some XSearch-related presentations from the iMarine TCOMs follow.

  • Presentation from the 1st TCOM slides
  • Presentation from the 2nd TCOM slides
  • Presentation from the 3rd TCOM slides
  • Document (Oct 2012) describing the implemented features: XSearch_Prototypes
  • Presentation from the 4th TCOM slides
  • Presentation from the 5th TCOM slides
  • Presentation from the 6th TCOM slides

User's manual and screen captures

  • XSearch user's manual pdf
  • XSearch Demo avi

Current Deployments

Key Features

X-Search has been designed to offer its functionality on top of other search systems. In particular (and according to the milestone) it offers:

  • Clustering of the results. Clustering is performed on the textual snippets of the returned results. Clustering of the textual contents is also supported. Furthermore a ranking on the identified clusters is performed.
  • Provision of extracted textual entities. Text entity mining can be performed either over the textual snippets or over the entire contents, and supports ranking of the identified entities.
  • Provision of gradual faceted search. The user is able to quickly explore the results space by exploiting the identified entities that have been mined and the results of clustering.
  • Ability to fetch semantic information about extracted entities. XSearch provides the necessary linkage between the mined entities and semantic information. In particular by exploiting appropriate knowledge bases (i.e. FactForge, DBPedia, FLOD, EcoScope KB, etc.) the user can retrieve more information about an entity by querying and browsing over these knowledge bases.
  • Exploitation of the offered services in any web page. Text entity mining can be performed over the whole contents of a particular result (HTML and PDF web pages).

Applications (click to run)

  • (P1) XSearch over Bing and TLO Warehouse (http://62.217.127.118/x-search/). This application runs on top of Bing web search engine, and analyzes the snippets of the top-K results (the default value of K is 50). In order to provide the linkage with semantic sources it uses the TLO Warehouse (accessed through a SPARQL endpoint). It also supports the analysis of more results (i.e. top 100, 200, 500), as well as the analysis over the whole content of the results (rather than just the snippets) upon user request. It is fully configurable in terms of the underlying web search engine (OpenSearch) or the knowledge bases that are used, the categories of the mined entities, etc.
  • (P3) Bookmarklet: The functionality of XSearch can be applied in any web page (or PDF file). In particular the user can trigger the bookmarklet as he is viewing a web page. The bookmarklet will retrieve the contents of this web page (or PDF file), sent it to the XSearch service and return to the user the web page he was looking with the mined entities annotated (in case of PDF file, the entities are displayed in a sidebar). Furthermore the user is able to get more information about the identified entities by exploiting the corresponding knowledge bases. The bookmarklet can be added to the user’s web browser from http://62.217.127.118/x-search/ or http://62.217.127.118/x-search-fao/ (see the upper right corner).
  • (P4) XSearch in gCube: The functionality of XSearch is now offered also over gCube search system in an integrated environment (in particular the Liferay portal). The activities carried out towards this directions was to separate the functionalities (the XSearch logic) from the actual representation of the results. For this reason we created two separate components: (a) the xsearch-service, which is responsible for performing the analysis of the results and (b) the xsearch-portlet, which is responsible for presenting the results to the user and carrying out the dialog with the user. The current version supports snippet-based results clustering, snippet-based text entity mining, entity enrichment and metadata-based grouping of the results. Both components (xsearh-service.1-0-2 and org.gcube.portlets-user.xsearch-portlet.1-0-1) have been released under gCube 2.11.1 release. Both XSearchPortlet and the XSearchService have been deployed in the portal https://portal.i-marine.d4science.org/ in the FCPPS VREVirtual Research Environment.. A list of queries we've tried can be found at iMarine workspace.

Related Papers

  • P. Fafalios and Y. Tzitzikas, Exploratory Professional Search through Semantic Post-Analysis of Search Results, Professional Search in the Modern World, Lecture Notes in Computer Science, Vol. 8830, Springer, 2014 (pdf).
  • P. Fafalios and Y. Tzitzikas, Post-Analysis of Keyword-based Search Results using Entity Mining, Linked Data and Link Analysis at Query Time, IEEE 8th International Conference on Semantic Computing (ICSC'14), Newport Beach, California, USA, June 2014 (pdf | slides).
  • P. Fafalios and P. Papadakos, Theophrastus: On Demand and Real-Time Automatic Annotation and Exploration of (Web) Documents using Open Linked Data, Web Semantics: Science, Services and Agents on the World Wide Web, Elsevier (ISSN: 1570-8268), 2014 (pdf).
  • P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, Web Searching with Entity Mining at Query Time, In Proceedings of the 5th Information Retrieval Facility Conference (IRF'2012), Vienna, July 2012 (pdf | slides).
  • P. Fafalios, M. Salampasis and Y. Tzitzikas, Exploratory Patent Search with Faceted Search and Configurable Entity Mining, In Proceedings of the 1st International Workshop of Integrating IR technologies for Professional Search in conjuction with the 35th European Conference on Information Retrieval (ECIR'13), Moscow, Russia, March 2013 (pdf)
  • P. Fafalios and Y. Tzitzikas, X-ENS: Semantic Enrichment of Web Search Results at Real-Time (demo paper), In Proceedings of the 36th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'13),Dublin, Ireland, August 2013 (pdf).


Demo Scenarios

A number of scenarios for demonstrating its value are possible, and obviously this depends also on the underlying systems. Moreover the configurability that is offered (underlying search systems, categories, entities of interest, etc) allows customizing this service according to the needs of a community. An indicative scenario, e.g. for the deployment over FIGIS and FLOD, follows (it is related with the ticket 1190 https://issue.imarine.research-infrastructures.eu/ticket/1190).

Indicative Scenario

Suppose that a user is looking for publications about tuna. Specifically he wants to find experiments that were applied to several species of tuna. So, he submits the query tuna and gets a sorted list of results and various categories of entities like Regional Fisheries Body, Species, FAO Country, etc. User realizes that the category Species may contain interesting entities. He notices that there is an entity with the label yellowfin which is a species of tuna found in pelagic waters of tropical and subtropical oceans worldwide, and an entity with the label skipjack tuna which is another species in the tuna family. Both entities contain one (common) result; one related publication which is the 17th in the ranked list. So, user by performing just one click can locate that result which is very relevant to what he is looking for. Furthermore, user is able to locate fast results that are related to several FAO countries, Regional Fisheries Bodies, Persons, etc. For example, there are 4 results about tuna that are related to Madagascar.

Entity Enrichment: By clicking the small RDF icon next to the entity’s name, user can instantly (at that time) get information about that particular entity by querying the FLOD endpoint (or the forthcoming TLO-SPARQL endpoint). For example, by clicking the icon next to yellowfin we could instantly get more information about yellowfin tuna and explore its characteristics (e.g. a list of is predator of, is prey of, etc.).

Management (plans, tickets, etc)

Current Activities

A shortcut showing (automatically) all the tickets that relate to Semantic Data Analysis functional area can be found here. Below we provide a more updated view of the current situation.

  • XSearch and gCubeSearch (various issues)
    • XSearch Portlet Memory Consumption (Ticket #628)
    • Ranking in gCube search (Ticket #684)
    • GUI improvements for offering an homogenized view within the iMarine portal #1854
  • XSearch that exploits objects associations (exploitation of forthcoming TLO) (Ticket #960)
    • So far XSearch exploits entity lists which can be results of SPARQL queries (e.g. see the IRF2012 paper). One rising issue is whether (why, how) it should also exploit the associations between these entities (i.e. the RDF properties that connect these entities). For example, consider the case where the search results page contains two facets Water Areas and Species which contain entities which have been mined from the snippets of the search results and at the same time they are described in the underlying RDF KB. In the KB, some entities of Water Areas could be connected with instances of Species through various RDF properties (in general they can be numerous). The question is how to exploit them in order to make the search/exploration process more powerful and/or more flexible/handy/effective. Should we exploit them only in the pop-up windows that show more information about each entity? Should we exploit them also for restricting the current answer set? How to tackle the big number of properties that may exist? On what principles / generic solutions could be founded on? If we understand the above, the oucomes could be useful for exploiting the structuring of the forthcoming TLO. A tentative plan is to (a) try to understand the problem and sketch scenarios (by Apr 2013), (b) decide what to design/implement (by May 2013), and (c) have a first implementation (by June 2013).
  • Enriching RDF files with the URIs of Named Entities (an XSearch Tagger like Agrotagger, i.e. an iMarine annotator) (Ticket #1187, #1814). There is now a wiki page for this:

http://wiki.i-marine.eu/index.php/XSearchLink

  • XSearch over RDF results

Past Activities

  • Various xsearch-portlet activities for improving scalability and extended functionality
    • Exploitation of IS in XSearch portlet (Ticket #780) - CLOSED - Feb 2013
    • Implementation of the new incremental algorithm for extended functionality (Ticket #1823) - CLOSED - Jun 2013
    • Exploitation of multiple xsearch-service instances (Ticket #1828) - CLOSED - Jun 2013
    • Dynamic fetching of xsearch configuration files (Ticket #783 CLOSED - Nov 2012
    • Retrieval of semantic information about the mined entities (Ticket # 1813) - CLOSED - Jun 2013
  • Various activities about xsearch and gCube search
    • Support search results snippets (Ticket #7) CLOSED - Jun 2012
    • Provision of textual snippets from gCube search (Ticket #838) - CLOSED - Nov 2012
    • Searching over multiple collections (Ticket #839) - CLOSED - Mar 2013
    • Configurability of TCPLocator (Ticket #627) - CLOSED - Feb 2013