16.02.2012 T10.4 Conference Call

From D4Science Wiki
Jump to: navigation, search

Agenda

Date/time: Thursday February 16, 2012, 15:00-16:00 CET

Topic: Discussion on Task T10.4 and related topics (interactions with T10.1 and WP11)

Related Document

Participants

FORTH: Yannis Tzitzikas, Yannis Marketakis, Pavlos Fafalios

FAO: Fabio Simeoni, Claudio Baldassarre

CNR: Pasquale Pagano

ENG: Ciro Formisano

NKUA: Gerasimos Farantatos, Rena Tsantouli

Executive Summary of the Meeting

XSearch services can function on top of gCube Search system. In this way all indexes that are currently used by gCube Search system and its distributed evaluation approach are exploited. Some of the XSearch services (e.g. clustering) rely on the availability of textual snippets. A generic approach to tackle this requirement is the following: the Open Search description document of the source (here gCube Search system), or the query response itself, can indicate which is the attribute that holds textual descriptions (if such exist). Some of the XSearch services (on demand) could further process the actual contents of a hit. This requires an identifier for the hit and a resolver. This could be tackled analogously, e.g. the OpenSearch description document of the gCube search system (or its response) could indicate which is the attribute the carries identifier information.

Discussion

FORTH: Made an introduction, based on the contents of the document.

Lino: Regarding the requirements about textual snippets: not every resource is textual; there is a lot of heterogeneity (images, series, maps). However some of these may have textual descriptions (e.g. the value of a title attribute). In D4S not everything is indexed and/or retrievable. Of course if something is stored in the storage service then there is a file for that. As regards the question “Ability to get the actual content of a digital object that is returned as a search result.” There is not a single identity resolver for every identifier (different protocols are used).

Fabio: Fabio discussed about the tree-based model. Regarding the question of identity he mentioned that in principle every node could have an identity if an identifier has been assigned to it. He asked FORTH to verify that the assumed backend is a search system rather than a data access service.

FORTH: Yes it is a search system in general. Don’t care if the backend is a db or IR, what we assume is the result of a search request which consist of items each described probably by a textual description plus metadata.

Fabio: The tree-based model can have metadata plus links to the actual content.

Gerasimos: Some of the resources which are retrievable through the gCube Search System are external ones.

Claudio: He finds the idea of clustering and groupings useful in general. He stressed the need for controlled categories. If such categories exist then they could be useful also for other purposes (indexing, annotation, etc). He also stressed the heterogeneity of an object; they currently try to approach this problem by splitting the notion of representation by that of interpretation (not very clear – Claudio feel free to improve this).

Lino: Stressed that currently there is not a common schema. He wondered about the architectural perspective (gCube Search System and XSearch).

FORTH: XSearch should be placed on top of gCube Search System.

Fabio: Fabio mentioned that if X-Search functions on top of XSearch then the interoperability that has been achieved so far (by gCube Search System) is exploited, so XSearch abstracts from the internal details and has a single point to search. Regarding the availability of textual snippets he mentioned that the OpenSearch description of the underlying source could indicate the attributes that hold textual descriptions. The same approach could be used for indicating the attribute that holds identifiers.

FORTH: Agreed that this is what they have in mind too. Asked if this is clear by all participants and if there is any objection.

Lino: Clear, no objection at this point.

Ciro: Nothing to add.

Fabio: Asked if XSearch is implemented in Java. If so XSearch could resolve URIs in the usual way.

FORTH: Yes the implementation is in Java.

Fabio: Asked NKUA to verify that a query to Search system will be distributed to the infrastructure.

Gerasimos: He verified that a query to search system is distributed and evaluated over the various indexes and mappings that exist in the infrastructure.

Fabio: Asked FORTH if the assumption is to query a single service (in particular gCube Search system )

FORTH: Yes, however we have in mind the case where more than one could be searched mainly for improving the quality of the services provided by XSearch.