Difference between revisions of "X-Search"
(→Description) |
|||
Line 3: | Line 3: | ||
=General Description= | =General Description= | ||
+ | |||
+ | == Latest Presentation (July 2003) == | ||
+ | |||
+ | * to put | ||
==Persons responsible for editing/maintaining this page== | ==Persons responsible for editing/maintaining this page== |
Revision as of 20:29, 3 July 2013
General Description
Latest Presentation (July 2003)
- to put
Persons responsible for editing/maintaining this page
- Pavlos Fafalios (fafalios@ics.forth.gr)
- Yannis Marketakis (marketak@ics.forth.gr)
Type
Libraries, Web application, deployed (and configured) applications, gCube version (service and portlet) over gCube search.
Description
XSearch is a meta-search engine that offers semantic post-processing of results. It reads the description of an underlying search source (OpenSearch), and is able to query that source and analyze in various ways the returned results and also exploit the availability of semantic repositories (SPARQL endpoints). It also has a gCube version in which the underlying search system is gCube search. Some of its key features are provision of textual clustering of the results, provision of snippet or content-based textual entity mining, ability to fetch and display the semantic information for identified entities, etc.
A Detailed description of XSearch can be found at https://gcube.wiki.gcube-system.org/gcube/index.php/X-Search
Related iMarine WP/Tasks
T10.4 - Semantic Data Analysis Facilities
Related iMarine Deliverables
D10.4 - iMarine Data Consumpion Software pdf - DELIVERED
This activity will also contribute to the forthcoming D10.5 (M27, Jan 2014) and D11.4 (M28, Feb 2014)
Related Milestones
MS45 - Semantic Data Analysis Specification Cover Page
Detailed: https://gcube.wiki.gcube-system.org/gcube/index.php/Semantic_Data_Analysis
Related Cluster
http://wiki.i-marine.eu/index.php/Semantic_cluster_achievements
Related Presentations/Tutorials
A presentation produced for the 1st Review can be found at iMarine workspace
Some XSearch-related presentations from the iMarine TCOMs follow.
- Presentation from the 1st TCOM slides
- Presentation from the 2nd TCOM slides
- Presentation from the 3rd TCOM slides
- Presentation from the 4th TCOM slides
- Presentation from the 5th TCOM slides
- Presentation from the 6th TCOM slides
Current (development) status
Link to a document that describes the implemented features by June 2012: XSearch_Prototypes
Current Deployments
Key Features
X-Search has been designed to offer its functionality on top of other search systems. In particular (and according to the milestone) it offers:
- Clustering of the results. Clustering is performed on the textual snippets of the returned results. Clustering of the textual contents is also supported. Furthermore a ranking on the identified clusters is performed.
- Provision of extracted textual entities. Text entity mining can be performed either over the textual snippets or over the entire contents, and supports ranking of the identified entities.
- Provision of gradual faceted search. The user is able to quickly explore the results space by exploiting the identified entities that have been mined and the results of clustering.
- Ability to fetch semantic information about extracted entities. XSearch provides the necessary linkage between the mined entities and semantic information. In particular by exploiting appropriate knowledge bases (i.e. FactForge, DBPedia, FLOD, EcoScope KB, etc.) the user can retrieve more information about an entity by querying and browsing over these knowledge bases.
- Exploitation of the offered services in any web page. Text entity mining can be performed over the whole contents of a particular result (HTML and PDF web pages).
Applications (click to run)
- (P1) XSearch over Bing and TLO Warehouse (http://62.217.127.118/x-search/). This prototype runs on top of Bing web search engine, and analyzes the snippets of the top-K results (the default value of K is 50). In order to provide the linkage with semantic sources it uses the TLO Warehouse (accessed through a SPARQL endpoint). It also supports the analysis of more results (i.e. top 100, 200, 500), as well as the analysis over the whole content of the results (rather than just the snippets) upon user request. It is fully configurable in terms of the underlying web search engine (OpenSearch) or the knowledge bases that are used, the categories of the mined entities, etc.
- (P2) XSearch over ECOSCOPE or FIGIS and TLO Warehouse (http://62.217.127.118/x-search-fao/). This prototype uses the search systems of the communities (ECOSCOPE or FAO FIGIS). User can select the undelying search system through X-Search's configuration page (http://62.217.127.118/x-search-fao/login.jsp). The default search system is ECOSCOPE. For supporting the entity enrichment, the TLO Warehouse dataset is queried.
- (P3) Bookmarklet: The functionality of XSearch can be applied in any web page (or PDF file). In particular the user can trigger the bookmarklet as he is viewing a web page. The bookmarklet will retrieve the contents of this web page (or PDF file), sent it to the XSearch service and return to the user the web page he was looking with the mined entities annotated (in case of PDF file, the entities are displayed in a sidebar). Furthermore the user is able to get more information about the identified entities by exploiting the corresponding knowledge bases. The bookmarklet can be added to the user’s web browser from http://62.217.127.118/x-search/ or http://62.217.127.118/x-search-fao/ (see the upper right corner).
- (P4) XSearch in gCube: The functionality of XSearch can be exploited in the gCube search system in an integrated environment (in particular the Liferay portal). The activities carried out towards this directions was to separate the functionalities (the XSearch logic) from the actual representation of the results. For this reason we created two separate components: (a) the xsearch-service, which is responsible for performing the analysis of the results and (b) the xsearch-portlet, which is responsible for presenting the results to the user and carrying out the dialog with the user. The current version supports snippet-based results clustering, snippet-based text entity mining, entity enrichment and metadata-based grouping of the results. Both components (xsearh-service.1-0-2 and org.gcube.portlets-user.xsearch-portlet.1-0-1) have been released under gCube 2.11.1 release. Both XSearchPortlet and the XSearchService has been deployed in the portal https://portal.i-marine.d4science.org/ in the FCPPS VREVirtual Research Environment.. A list of queries we've tried can be found at iMarine workspace.
Demo Scenarios
A number of demo scenarios could be described. This depends also on the underlying systems. An indicative scenario, e.g. for the deployment over FIGIS and FLOD, follows. It is related with the ticket 1190 https://issue.imarine.research-infrastructures.eu/ticket/1190.
Demo Scenario 1
Suppose that a user is looking for publications about tuna. Specifically he wants to find experiments that were applied to several species of tuna. So, he submits the query tuna and gets a sorted list of results and various categories of entities like Regional Fisheries Body, Species, FAO Country, etc. User realizes that the category Species may contain interesting entities. He notices that there is an entity with the label yellowfin which is a species of tuna found in pelagic waters of tropical and subtropical oceans worldwide, and an entity with the label skipjack tuna which is another species in the tuna family. Both entities contain one (common) result; one related publication which is the 17th in the ranked list. So, user by performing just one click can locate that result which is very relevant to what he is looking for. Furthermore, user is able to locate fast results that are related to several FAO countries, Regional Fisheries Bodies, Persons, etc. For example, there are 4 results about tuna that are related to Madagascar.
Entity Enrichment: By clicking the small RDF icon next to the entity’s name, user can instantly (at that time) get information about that particular entity by querying the FLOD endpoint (or the forthcoming TLO-SPARQL endpoint). For example, by clicking the icon next to yellowfin we could instantly get more information about yellowfin tuna and explore its characteristics (e.g. a list of is predator of, is prey of, etc.).
Related Papers
- P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, Web Searching with Entity Mining at Query Time, Proceedings of the 5th Information Retrieval Facility Conference, (IRF 2012), Vienna, July 2012. paper, presentation, bib
- P. Fafalios, M. Salampasis and Y. Tzitzikas, Exploratory Patent Search with Faceted Search and Configurable Entity Mining. Proceedings of the 1st International Workshop on Integrating IR technologies for Professional Search in conjunction with the 35th European Conference on Information Retrieval (ECIR’13), Moscow, Russia, March 2013. paper, bib
- P. Fafalios and Y. Tzitzikas. X-ENS: Semantic Enrichment of Web Search Results at Real-Time. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (demo paper), SIGIR 2013, Dublin, Ireland. demobib
Plans, Next Steps and Related Tickets
A shortcut showing (automatically) all the tickets that relate to Semantic Data Analysis functional area can be found here. Below we provide a more updated view of the current situation.
- XSearch and gCubeSearch (various issues)
- XSearch that exploits objects associations (exploitation of forthcoming TLO) (Ticket #960)
- So far XSearch exploits entity lists which can be results of SPARQL queries (e.g. see the IRF2012 paper). One rising issue is whether (why, how) it should also exploit the associations between these entities (i.e. the RDF properties that connect these entities). For example, consider the case where the search results page contains two facets Water Areas and Species which contain entities which have been mined from the snippets of the search results and at the same time they are described in the underlying RDF KB. In the KB, some entities of Water Areas could be connected with instances of Species through various RDF properties (in general they can be numerous). The question is how to exploit them in order to make the search/exploration process more powerful and/or more flexible/handy/effective. Should we exploit them only in the pop-up windows that show more information about each entity? Should we exploit them also for restricting the current answer set? How to tackle the big number of properties that may exist? On what principles / generic solutions could be founded on? If we understand the above, the oucomes could be useful for exploiting the structuring of the forthcoming TLO. A tentative plan is to (a) try to understand the problem and sketch scenarios (by Apr 2013), (b) decide what to design/implement (by May 2013), and (c) have a first implementation (by June 2013).
- Enriching RDF files with the URIs of Named Entities (an XSearch Tagger like Agrotagger, i.e. an iMarine annotator) (Ticket #1187, #1814). There is now a wiki page for this:
http://wiki.i-marine.eu/index.php/XSearchLink
- XSearch over RDF results
Past Activities
- Various xsearch-portlet activities for improving scalability and extended functionality
- Exploitation of IS in XSearch portlet (Ticket #780) - CLOSED - Feb 2013
- Implementation of the new incremental algorithm for extended functionality (Ticket #1823) - CLOSED - Jun 2013
- Exploitation of multiple xsearch-service instances (Ticket #1828) - CLOSED - Jun 2013
- Dynamic fetching of xsearch configuration files (Ticket #783 CLOSED - Nov 2012
- Retrieval of semantic information about the mined entities (Ticket # 1813) - CLOSED - Jun 2013
- Various activities about xsearch and gCube search
Possible Plan & Next Steps
Proposed in Nov/Dec 2012: The above steps should be demonstrated in the context of iMarine (certainly in the next review). This is related to all activities of the semantic cluster and involved partners (IRD, FAO) and NKUA/CNR (related ticket: 1190 https://issue.imarine.research-infrastructures.eu/ticket/1190)
Key Issue
During the 1st year we (FORTH) have made various prototypes over systems/artifacts from the partners (FAO and IRD), specificaly P2, P3 and P4, in order to investigate the rising issues and evaluate various techniques. It seems quite reasonable (to us) to investigate whether the same functionality can be provided through the gCube, i.e. to test P5 over systems of the community partners.
A possible scenario is the following. Two community search systems (i.e. FIGIS and ECOSCOPE) are registered in the infrastructure, e.g. as external open search systems. The added value of gCube search is that it will enable querying both of them. XSearch then analyzes the top-results and provides the functionality that is currently exposed by the prototypes of the 1st year. We believe that this scenario could serve as a good testbed for evaluating various things (ranking by gCube search, ticket #684, communication with X-Search, registration of FLOD SPARQL Endpoint, efficiency, etc.). This also raises the issue about the markup of the fields that are appropriate for applying analysis (in this case the metadata returned with the hits should pass to X-Search), ticket #363. For instance, the search result from FIGIS apart from formatted HTML can be returned in XML format which uses Dublin Core schema to encapsulate bibliographic information. Each returned hit has various textual elements, including publication title and abstract. The first is around 9 words, the second cannot go beyond 3,000 characters. As regards ECOSCOPE, the results can also be returned in XML format which uses various schemata of ECOSCOPE together with Dublin Core, SKOS, Wordnet, etc. Specifically, title, description, preferred label and comment are interesting textual elements for further analysis. This would also allow testing the snippets that are generated by gCube search (ticket #838) and how gCube searches over multiple collections (ticket #839).
Possible Actions: The involved partners (NKUA, FAO, IRD, FORTH, CNR, ... ) use this as scenario.