X-Search

From D4Science Wiki
Jump to: navigation, search

General Description

Xsearch.jpg

Latest Presentation (July 2014)

  • Presentation from the 9th TCOM (Heraklion, July 2014) slides

Persons responsible for editing/maintaining this page

  • Pavlos Fafalios (fafalios@ics.forth.gr)
  • Yannis Marketakis (marketak@ics.forth.gr)

Type

Libraries, Web application, deployed (and configured) applications, gCube version (service and portlet) over gCube search.

Description

X-Search is a meta-search engine that reads the description of an underlying search source and is able to query that source and analyze in various ways the returned results. It also exploits the availability of semantic repositories.

Key features

Provision of textual clustering of the results.
Clustering is performed either over the textual snippets or over the entire contents.
Provision of textual entity mining of the results.
Text entity mining can be performed either over the textual snippets or over the entire contents.
Provision of faceted search-like exploration of the results.
The results of clustering, entity mining and metadata-based grouping are visualized and exploited according to the faceted exploration interaction paradigm: when the user clicks on a cluster or entity, the results are restricted to those that contain that cluster or entity.
Ability to semantically explore an identified entity.
X-Search provides the necessary linkage between the mined entities and semantic information. In particular, by exploiting appropriate Semantic Knowledge Bases, the user can retrieve more information about an entity by querying and browsing over these Knowledge Bases.
Ability to apply entity mining and explore the identified entities during plain Web browsing.
X-Search also offers entity discovery and exploration while the user is browsing on the Web. Specifically, the user is able to inspect the entities of a particular Web page by clicking a bookmarklet (a special bookmark) and then to further retrieve more information about an entity by querying a Knowledge Base.

A Detailed description of XSearch (functionality, uses cases, components) can be found at https://gcube.wiki.gcube-system.org/gcube/index.php/X-Search

Related iMarine WP/Tasks

T10.4 - Semantic Data Analysis Facilities

Related iMarine Deliverables

  • D10.4 - iMarine Data Consumpion Software pdf - DELIVERED (October 2012)
  • D10.5 - iMarine Data Consumpion Software pdf - DELIVERED (July 2014)
  • D11.4 - Application Programming Interface Software pdf - DELIVERED (March 2014)

Related Milestones

MS45 - Semantic Data Analysis Specification Cover Page

Detailed: https://gcube.wiki.gcube-system.org/gcube/index.php/Semantic_Data_Analysis

Related Presentations/Tutorials

  • XSearch-related presentations produced for the Review meetings
    • Presentation produced for the 1st Review slides
    • Presentation produced for the 2nd Review slides
  • XSearch-related presentations from the iMarine TCOMs
    • Presentation from the 1st TCOM slides
    • Presentation from the 2nd TCOM slides
    • Presentation from the 3rd TCOM slides
    • Document (Oct 2012) describing the implemented features: XSearch_Prototypes
    • Presentation from the 4th TCOM slides
    • Presentation from the 5th TCOM slides
    • Presentation from the 6th TCOM slides
    • Presentation from the 7th TCOM slides
    • Presentation from the 8th TCOM slides
    • Presentation from the 9th TCOM slides

User's manual and screen captures

  • XSearch user's manual pdf
  • XSearch Demo avi

Current Deployments

Key Features

X-Search has been designed to offer its functionality on top of other search systems. In particular (and according to the milestone) it offers:

  • Clustering of the results. Clustering is performed on the textual snippets of the returned results. Clustering of the textual contents is also supported. Furthermore a ranking on the identified clusters is performed.
  • Provision of extracted textual entities. Text entity mining can be performed either over the textual snippets or over the entire contents, and supports ranking of the identified entities.
  • Provision of gradual faceted search. The user is able to quickly explore the results space by exploiting the identified entities that have been mined and the results of clustering.
  • Ability to fetch semantic information about extracted entities. XSearch provides the necessary linkage between the mined entities and semantic information. In particular by exploiting appropriate knowledge bases (i.e. FactForge, DBPedia, FLOD, EcoScope KB, etc.) the user can retrieve more information about an entity by querying and browsing over these knowledge bases.
  • Exploitation of the offered services in any web page. Text entity mining can be performed over the whole contents of a particular result (HTML and PDF web pages).

Applications (click to run)

  • (P1) XSearch over Bing and TLO Warehouse (http://62.217.127.118/x-search/). This application runs on top of Bing web search engine, and analyzes the snippets of the top-K results (the default value of K is 50). In order to provide the linkage with semantic sources it uses the TLO Warehouse (accessed through a SPARQL endpoint). It also supports the analysis of more results (i.e. top 100, 200, 500), as well as the analysis over the whole content of the results (rather than just the snippets) upon user request. It is fully configurable in terms of the underlying web search engine (OpenSearch) or the knowledge bases that are used, the categories of the mined entities, etc.
  • (P3) Bookmarklet: The functionality of XSearch can be applied in any web page (or PDF file). In particular the user can trigger the bookmarklet as he is viewing a web page. The bookmarklet will retrieve the contents of this web page (or PDF file), sent it to the XSearch service and return to the user the web page he was looking with the mined entities annotated (in case of PDF file, the entities are displayed in a sidebar). Furthermore the user is able to get more information about the identified entities by exploiting the corresponding knowledge bases. The bookmarklet can be added to the user’s web browser from http://62.217.127.118/x-search/ or http://62.217.127.118/x-search-fao/ (see the upper right corner).
  • (P4) XSearch in gCube: The functionality of XSearch is now offered also over gCube search system in an integrated environment (in particular the Liferay portal). The activities carried out towards this directions was to separate the functionalities (the XSearch logic) from the actual representation of the results. For this reason we created two separate components: (a) the xsearch-service, which is responsible for performing the analysis of the results and (b) the xsearch-portlet, which is responsible for presenting the results to the user and carrying out the dialog with the user. The current version supports snippet-based results clustering, snippet-based text entity mining, entity enrichment and metadata-based grouping of the results. Both components (xsearh-service.1-0-2 and org.gcube.portlets-user.xsearch-portlet.1-0-1) have been released under gCube 2.11.1 release. Both XSearchPortlet and the XSearchService have been deployed in the portal https://portal.i-marine.d4science.org/ in the FCPPS VREVirtual Research Environment.. A list of queries we've tried can be found at iMarine workspace.

Related Papers

  • P. Fafalios and Y. Tzitzikas, Exploratory Professional Search through Semantic Post-Analysis of Search Results, Professional Search in the Modern World, Lecture Notes in Computer Science, Vol. 8830, Springer, 2014 (pdf).
  • P. Fafalios and Y. Tzitzikas, Post-Analysis of Keyword-based Search Results using Entity Mining, Linked Data and Link Analysis at Query Time, IEEE 8th International Conference on Semantic Computing (ICSC'14), Newport Beach, California, USA, June 2014 (pdf | slides).
  • P. Fafalios and P. Papadakos, Theophrastus: On Demand and Real-Time Automatic Annotation and Exploration of (Web) Documents using Open Linked Data, Web Semantics: Science, Services and Agents on the World Wide Web, Elsevier (ISSN: 1570-8268), 2014 (pdf).
  • P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, Web Searching with Entity Mining at Query Time, In Proceedings of the 5th Information Retrieval Facility Conference (IRF'2012), Vienna, July 2012 (pdf | slides).
  • P. Fafalios, M. Salampasis and Y. Tzitzikas, Exploratory Patent Search with Faceted Search and Configurable Entity Mining, In Proceedings of the 1st International Workshop of Integrating IR technologies for Professional Search in conjuction with the 35th European Conference on Information Retrieval (ECIR'13), Moscow, Russia, March 2013 (pdf)
  • P. Fafalios and Y. Tzitzikas, X-ENS: Semantic Enrichment of Web Search Results at Real-Time (demo paper), In Proceedings of the 36th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'13),Dublin, Ireland, August 2013 (pdf).

Demo Scenarios

A number of scenarios for demonstrating its value are possible, and obviously this depends also on the underlying systems. Moreover the configurability that is offered (underlying search systems, categories, entities of interest, etc) allows customizing this service according to the needs of a community. An indicative scenario, e.g. for the deployment over FIGIS and FLOD, follows (it is related with the ticket 1190 https://issue.imarine.research-infrastructures.eu/ticket/1190).

Indicative Scenario

Suppose that a user is looking for publications about tuna. Specifically he wants to find experiments that were applied to several species of tuna. So, he submits the query tuna and gets a sorted list of results and various categories of entities like Regional Fisheries Body, Species, FAO Country, etc. User realizes that the category Species may contain interesting entities. He notices that there is an entity with the label yellowfin which is a species of tuna found in pelagic waters of tropical and subtropical oceans worldwide, and an entity with the label skipjack tuna which is another species in the tuna family. Both entities contain one (common) result; one related publication which is the 17th in the ranked list. So, user by performing just one click can locate that result which is very relevant to what he is looking for. Furthermore, user is able to locate fast results that are related to several FAO countries, Regional Fisheries Bodies, Persons, etc. For example, there are 4 results about tuna that are related to Madagascar.

Entity Enrichment: By clicking the small RDF icon next to the entity’s name, user can instantly (at that time) get information about that particular entity by querying the FLOD endpoint (or the forthcoming TLO-SPARQL endpoint). For example, by clicking the icon next to yellowfin we could instantly get more information about yellowfin tuna and explore its characteristics (e.g. a list of is predator of, is prey of, etc.).