Difference between revisions of "X-Search"

From D4Science Wiki
Jump to: navigation, search
(Current (development) status)
(Related Papers)
Line 89: Line 89:
 
=Related Papers=
 
=Related Papers=
  
* P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, ''Web Searching with Entity Mining at Query Time'', Proceedings of the 5th Information Retrieval Facility Conference, IRF 2012, Vienna, July 2012.  
+
* P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, ''Web Searching with Entity Mining at Query Time'', Proceedings of the 5th Information Retrieval Facility Conference, (IRF 2012), Vienna, July 2012. [http://www.ics.forth.gr/~fafalios/files/pubs/fafalios_2012_irf.pdf paper], [http://www.ics.forth.gr/~fafalios/files/ppts/fafalios_2012_irfc_presentation.pdf presentation], [http://www.ics.forth.gr/~fafalios/files/bibs/fafalios2012websearching.bib bib]
  
'''Paper:''' http://www.ics.forth.gr/~fafalios/files/pubs/fafalios_2012_irf.pdf
+
* P. Fafalios, M. Salampasis and Y. Tzitzikas, Exploratory Patent Search with Faceted Search and Configurable Entity Mining. Proceedings of the 1st International Workshop on Integrating IR technologies for Professional Search in conjunction with the 35th European Conference on Information Retrieval (ECIR’13), Moscow, Russia, March 2013. [http://users.ics.forth.gr/~fafalios/files/pubs/fafalios_2013_explPatSearch.pdf paper], [http://users.ics.forth.gr/~fafalios/files/bibs/fafalios2013explPatSearch.bib bib]
'''Presentation:''' http://www.ics.forth.gr/~fafalios/files/ppts/fafalios_2012_irfc_presentation.pdf
+
 
'''BIB entry:''' http://www.ics.forth.gr/~fafalios/files/bibs/fafalios2012websearching.bib
+
* P. Fafalios and Y. Tzitzikas. X-ENS: Semantic Enrichment of Web Search Results at Real-Time. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (demo paper), SIGIR 2013, Dublin, Ireland. [http://62.217.127.118/x-ens/ demo][http://users.ics.forth.gr/~fafalios/files/bibs/fafalios2013xens.bib bib]
  
 
=Plans, Next Steps and Related Tickets=
 
=Plans, Next Steps and Related Tickets=

Revision as of 09:23, 3 July 2013

General Description

Persons responsible for editing/maintaining this page

  • Pavlos Fafalios (fafalios@ics.forth.gr)
  • Yannis Marketakis (marketak@ics.forth.gr)

Type

Libraries, Web application, deployed (and configured) applications, gCube version over gCube search.

Description

XSearch is a meta-search engine that reads the description of an underlying search source (OpenSearch), and is able to query that source and analyze in various ways the returned results and also exploit the availability of semantic repositories (SPARQL endpoints). It also has a gCube version in which the underlying search system is gCube search. Some of its key features are provision of textual clustering of the results, provision of snippet or content-based textual entity mining, ability to fetch and display the semantic information for identified entities, etc.

A Detailed description of XSearch can be found at https://gcube.wiki.gcube-system.org/gcube/index.php/X-Search

Related iMarine WP/Tasks

T10.4 - Semantic Data Analysis Facilities

Related iMarine Deliverables

D10.4 - iMarine Data Consumpion Software pdf - DELIVERED

This activity will also contribute to the forthcoming D10.5 (M27, Jan 2014) and D11.4 (M28, Feb 2014)

Related Milestones

MS45 - Semantic Data Analysis Specification Cover Page

Detailed: https://gcube.wiki.gcube-system.org/gcube/index.php/Semantic_Data_Analysis

Related Cluster

http://wiki.i-marine.eu/index.php/Semantic_cluster_achievements

Related Presentations/Tutorials

A presentation produced for the 1st Review can be found at iMarine workspace

Current (development) status

Link to a document that describes the implemented features by June 2012: XSearch_Prototypes

Current Deployments

Key Features

X-Search has been designed to offer its functionality on top of other search systems. In particular (and according to the milestone) it offers:

  • Clustering of the results. Clustering is performed on the textual snippets of the returned results. Clustering of the textual contents is also supported. Furthermore a ranking on the identified clusters is performed.
  • Provision of extracted textual entities. Text entity mining can be performed either over the textual snippets or over the entire contents, and supports ranking of the identified entities.
  • Provision of gradual faceted search. The user is able to quickly explore the results space by exploiting the identified entities that have been mined and the results of clustering.
  • Ability to fetch semantic information about extracted entities. XSearch provides the necessary linkage between the mined entities and semantic information. In particular by exploiting appropriate knowledge bases (i.e. FactForge, DBPedia, FLOD, EcoScope KB, etc.) the user can retrieve more information about an entity by querying and browsing over these knowledge bases.
  • Exploitation of the offered services in any web page. Text entity mining can be performed over the whole contents of a particular result (HTML and PDF web pages).

Applications (click to run)

  • (P1) XSearch over Bing and FactForge: http://139.91.183.72/x-search/. This prototype runs on top of Bing web search engine, and analyzes the snippets of the top-K results (the default value of K is 50). In order to provide the linkage with semantic sources it uses the FactForge knowledge base (accessed through SPARQL). It also supports the analysis of more results (i.e. top 100, 200, 500), as well as the analysis over the whole content of the results (rather than just the snippets) upon user request. It is fully configurable in terms of the underlying web search engine or the knowledge bases that are used, the categories of the mined entities, etc.
  • (P2) XSearch over FIGIS and FLOD: http://139.91.183.72/x-search-fao/. This prototype uses FAO FIGIS as the underlying search system, which searches for publications about fisheries and aquaculture. For supporting the entity enrichment, the FLOD dataset is queried.
  • (P4) Bookmarklet: The functionality of XSearch can be applied in any web page. In particular the user can trigger the bookmarklet as he is viewing a web page. The bookmarklet will retrieve the contents of this web page, sent it to the XSearch service and return to the user the web page he was looking with the mined entities annotated. Furthermore the user is able to get more information about the identified entities by exploiting the corresponding knowledge bases. The bookmarklet can be added to the user’s web browser from http://139.91.183.72/x-search/ or http://139.91.183.72/x-search-fao/ (see the upper right corner).
  • (P5) XSearch in gCube: The functionality of XSearch can be exploited in the gCube search system in an integrated environment (in particular the Liferay portal). The activities carried out towards this directions was to separate the functionalities (the XSearch logic) from the actual representation of the results. For this reason we created two separate components: (a) the xsearch-service, which is responsible for performing the analysis of the results and (b) the xsearch-portlet, which is responsible for presenting the results to the user and carrying out the dialog with the user. The current version supports snippet-based results clustering, snippet-based text entity mining, and provision of gradual faceted search. Both components (xsearh-service.1-0-2 and org.gcube.portlets-user.xsearch-portlet.1-0-1) have been released under gCube 2.11.1 release. Both XSearchPortlet and the XSearchService has been deployed in the portal https://portal.i-marine.d4science.org/ in the FCPPS VREVirtual Research Environment.. A list of queries we've tried can be found at iMarine workspace.

Demo Scenarios

A number of demo scenarios could be described. This depends also on the underlying systems. An indicative scenario, e.g. for the deployment over FIGIS and FLOD, follows. It is related with the ticket 1190 https://issue.imarine.research-infrastructures.eu/ticket/1190.

Demo Scenario 1

Suppose that a user is looking for publications about tuna. Specifically he wants to find experiments that were applied to several species of tuna. So, he submits the query tuna and gets a sorted list of results and various categories of entities like Regional Fisheries Body, Species, FAO Country, etc. User realizes that the category Species may contain interesting entities. He notices that there is an entity with the label yellowfin which is a species of tuna found in pelagic waters of tropical and subtropical oceans worldwide, and an entity with the label skipjack tuna which is another species in the tuna family. Both entities contain one (common) result; one related publication which is the 17th in the ranked list. So, user by performing just one click can locate that result which is very relevant to what he is looking for. Furthermore, user is able to locate fast results that are related to several FAO countries, Regional Fisheries Bodies, Persons, etc. For example, there are 4 results about tuna that are related to Madagascar.

Entity Enrichment: By clicking the small RDF icon next to the entity’s name, user can instantly (at that time) get information about that particular entity by querying the FLOD endpoint (or the forthcoming TLO-SPARQL endpoint). For example, by clicking the icon next to yellowfin we could instantly get more information about yellowfin tuna and explore its characteristics (e.g. a list of is predator of, is prey of, etc.).

Related Papers

  • P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, Web Searching with Entity Mining at Query Time, Proceedings of the 5th Information Retrieval Facility Conference, (IRF 2012), Vienna, July 2012. paper, presentation, bib
  • P. Fafalios, M. Salampasis and Y. Tzitzikas, Exploratory Patent Search with Faceted Search and Configurable Entity Mining. Proceedings of the 1st International Workshop on Integrating IR technologies for Professional Search in conjunction with the 35th European Conference on Information Retrieval (ECIR’13), Moscow, Russia, March 2013. paper, bib
  • P. Fafalios and Y. Tzitzikas. X-ENS: Semantic Enrichment of Web Search Results at Real-Time. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (demo paper), SIGIR 2013, Dublin, Ireland. demobib

Plans, Next Steps and Related Tickets

A shortcut showing (automatically) all the tickets that relate to Semantic Data Analysis functional area can be found here. The above gives the more updated view of the current situation.


  • Exploitation of IS in XSearch portlet
  • XSearch and gCubeSearch (various issues)
    • Support search results snippets (Ticket #7)
    • Configurability of TCPLocator (Ticket #627)
    • XSearch Portlet Memory Consumption (Ticket #628)
    • Ranking in gCube search (Ticket #684)
    • Fetch XSearch configuration files during deployment time (Ticket #783)
    • Provision of textual snippets from gCube search (Ticket #838)
    • Searching over multiple collections (Ticket #839)
  • XSearch over gCubeSearch results (Web App)
    • XSearch webApp on top of gCubeSearch over ASL-HTTP of a procuction portal (Ticket #840)
  • XSearch that exploits objects associations (exploitation of forthcoming TLO) (Ticket #960)
    • So far XSearch exploits entity lists which can be results of SPARQL queries (e.g. see the IRF2012 paper). One rising issue is whether (why, how) it should also exploit the associations between these entities (i.e. the RDF properties that connect these entities). For example, consider the case where the search results page contains two facets Water Areas and Species which contain entities which have been mined from the snippets of the search results and at the same time they are described in the underlying RDF KB. In the KB, some entities of Water Areas could be connected with instances of Species through various RDF properties (in general they can be numerous). The question is how to exploit them in order to make the search/exploration process more powerful and/or more flexible/handy/effective. Should we exploit them only in the pop-up windows that show more information about each entity? Should we exploit them also for restricting the current answer set? How to tackle the big number of properties that may exist? On what principles / generic solutions could be founded on? If we understand the above, the oucomes could be useful for exploiting the structuring of the forthcoming TLO. A tentative plan is to (a) try to understand the problem and sketch scenarios (by Apr 2013), (b) decide what to design/implement (by May 2013), and (c) have a first implementation (by June 2013).
  • Enriching RDF files with the URIs of Named Entities (an XSearch Tagger like Agrotagger, i.e. an iMarine annotator) (Ticket #1187). There is now a wiki page for this:

http://wiki.i-marine.eu/index.php/XSearchLink

  • XSearch over RDF results
  • XSearch 2nd portlet verion (Exploitation of LinkedData)

Executive Summary for the Next Steps

Proposed in Nov/Dec 2012: The above steps should be demonstrated in the context of iMarine (certainly in the next review). This is related to all activities of the semantic cluster and involved partners (IRD, FAO) and NKUA/CNR (related ticket: 1190 https://issue.imarine.research-infrastructures.eu/ticket/1190)

Key Issue

During the 1st year we (FORTH) have made various prototypes over systems/artifacts from the partners (FAO and IRD), specificaly P2, P3 and P4, in order to investigate the rising issues and evaluate various techniques. It seems quite reasonable (to us) to investigate whether the same functionality can be provided through the gCube, i.e. to test P5 over systems of the community partners.

A possible scenario is the following. Two community search systems (i.e. FIGIS and ECOSCOPE) are registered in the infrastructure, e.g. as external open search systems. The added value of gCube search is that it will enable querying both of them. XSearch then analyzes the top-results and provides the functionality that is currently exposed by the prototypes of the 1st year. We believe that this scenario could serve as a good testbed for evaluating various things (ranking by gCube search, ticket #684, communication with X-Search, registration of FLOD SPARQL Endpoint, efficiency, etc.). This also raises the issue about the markup of the fields that are appropriate for applying analysis (in this case the metadata returned with the hits should pass to X-Search), ticket #363. For instance, the search result from FIGIS apart from formatted HTML can be returned in XML format which uses Dublin Core schema to encapsulate bibliographic information. Each returned hit has various textual elements, including publication title and abstract. The first is around 9 words, the second cannot go beyond 3,000 characters. As regards ECOSCOPE, the results can also be returned in XML format which uses various schemata of ECOSCOPE together with Dublin Core, SKOS, Wordnet, etc. Specifically, title, description, preferred label and comment are interesting textual elements for further analysis. This would also allow testing the snippets that are generated by gCube search (ticket #838) and how gCube searches over multiple collections (ticket #839).

Possible Actions: The involved partners (NKUA, FAO, IRD, FORTH, CNR, ... ) use this as scenario.