8th TCom Meeting: 4th February 2014 Discussions and Notes

From D4Science Wiki
Jump to: navigation, search

8th TCom Meeting: Agenda

8th TCom Meeting: Participants

Join Live: https://plus.google.com/hangouts/_/7ecpiv9rbp9k7j23bktfanoiqs?authuser=0

WP9 - Data Access

Presenter: Massimiliano Assante (CNR) - Slides

  • New Home Library features in gCube 3.0

This has implications on all the clients of the HL, clients have to deal with ACLs.

The folder created to store search results should be a "non removal" folder, applications can set the privileges they want.

Think about the possibility to index the user workspace via the NKUA machinery. It is needed an approach to (incrementally) harvest the workspace content and to cope with the workspace evolving space.

The new version of the HL is backward compatible, however the requested changes are minor.

The medium-term plan is to replace the Storage Management Lib with the HL, because of a number of facilities offered by the HL including policy management.

WP9 - Data Transfer

Presenter: M. Simon (CERN) - Slides

  • Overview of the enhancement on the Storage Manager portlet and demo
  • Presentation of the Service and API in order to stimulate integration with the rest of the system

The facility should be revised to deal with the new version of HL, namely ACL.

The layout should be revised, with the goal to serve an average user, e.g. to simplify the offered options.

Since WP9 is over, they are planning to report the activity in WP11 and WP5 (to check).

In the case of Scheduled Transfer it would be nice to implement a filter on the transferred data to only transfer changed files for every scheduled run.

CERN is going to check the effort to implement this change.

WP10 - Data Retrieval

Presenter: A. Antoniadis (NKUA) - Slides

  • Federated Search Status and Plan

WP10 - Data Ingestion

Presenter: J. Gerbesiotis (NKUA) - Slides

It might be useful to decouple DB access information (e.g. JDBC) from data schema information while configuring the SQL2XML adapter.

  • A ticket has been created for this #2600;

An approach to access the actual payload should be envisaged. It is currently based on IDs.

The priority should be given to:

  • data sources that we do not serve yet including:
    • Biodiversity data, via SPD and DarwinCore-Archive files;
    • GIS data sources, e.g. via GeoNetwork that means CSW;
  • These options should be investigated.

WP10 - Data Publishing

Presenter: N. Laskaris (NKUA) - Slides

  • new OAI-PMH, OAI-ORE

A long discussion on making publicly available data / metadata. It was clarified that we should not mix the actual data / resource with the metadata that are exposed via OAI-PMH.

From a technical point of view, the machinery is there. It should be up to WP3 to discuss the policy to be implemented.

WP10 - XSearch

Presenter: Pavlos Fafalios (Forth) (remotely) - Slides

  • XSearch and Xlink related activities

The configuration of entity linking (regarding X-Link) is based on categories of entities. Specifically, for each category the user/admin can provide the SPARQL endpoint and the SPARQL template query.

The services are not yet implemented. Currently, FORTH is working on the output format.

Although the analysis of a collection of documents is not supported currently by X-Link, its support is straightforward.

Comments from Claudio Baldassarre (FAO)

  • X-Link has a good potential to support use case application like SmartFish; although xlink hasn't been adopted by any use case partner in iMarine, on whose feedback we can base more positive expectations.
  • The capacity of scanning a document for entity mining at runtime is interesting, and in the case of SmartFish should be available in a form that can target a document collection, and run it batch with an output that avoid the live running on a single document from the result set.
  • The capacity of customizing the source SPARQL endpoint(s), and the set of relationships networking together the entities are interesting for an adoption of xLink by Smartfish
  • The design of xLink low-level library, the generic client library, and the xSearch client library are also promising for an adoption of xLink by SmartFish
  • A deeper investigation on the algorithm of subgraph selection should be taken in consideration to see if it can adopted as an additional asset by SmartFish
  • I would like to have pdfs of publications mentioned during the presentation

WP10 - Semantic Data Analysis

Presenters: Carlo Allocca (Forth), Nikos Minadakis (Forth), Yannis Tzitzikas (Forth) - Slides

  • Marine TLO
  • Warehouse
  • Warehouse construction process

An Android application Ichthys has been developed. FORTH should check if the "About" part contains enough acknowledgments.

On TLO future versions:

  • CNR (Gianpaolo) provided FORTH with a revised version of a number of data sources produced via SPD discovery facility;
    • This was integrated, although FORTH focused on "marine fishes" ... to be clarified;
  • Provenance management should be reinforced by relying on existing "standards", e.g.:
    • it is important to capture when a given information has been collected;
    • it is important to capture how a given information has been collected / produced;
    • data providers have their own policies characterising how their data should be "cited";

On ByCatch modeling:

  • it is very important to have feedback from IRD (because of their institutional mandate);
  • FAO will analyse the proposal also;

WP10 - OWS

Presenter: Hervé Caumont (Terradue) (remotely) - Slides

  • OWS Context API and Visualization tools
  • Plan for exploitation in iMarine

CNR can not allocate effort on this before middle of March. What about the others?

  • T2 can allocate Francesco, so that during February he can do what is needed to further serve iMarine needs;
  • a Telco should be organised between CNR and T2 (at least) to reach a common understanding;

WP9/10 WPS and SOS service

Presenter: Hervé Caumont (Terradue) (remotely) - Slides

  • WPS status and plans
  • Using OGC SOS and O&M in iMarine

Is SOS enough and powerful to capture "big data, e.g. 1 billion of records such as the GEBCO measures

  • in particular, GP is facing the problems related with environmental enrichment process (where the scale is different)
    • Hervé replied by saying that this is probably not the case / scenario expected to be served; SOS seems to be more suited to index services that host sensor data but not for providing their full content.

A telco should be organised to agree next steps

On WPS deployment in iMarine production:

  • the current service (insulting cluster) is hosted by Terradue;
    • one of the open issue is about incompatibility between Hadoop clusters;
    • a ticket should be created to monitor this activity;
      • Actually a ticket is there #1274

WP9 - Tabular Data Manager

Presenter: P. Pagano (CNR) - Slides

  • Tabular Data Manager and its libraries
  • Release status: features and capabilities
  • Plan for the next releases

Versioning is a challenging task. The following cases have been discussed:

  • CLs can evolve, new versions be produced and published;
    • to consider whether an import task should lead to a new TR or to a new version of an existing TR;

SPREAD is expected to be integrated via a dedicated operation. However, like any other algorithms, it might have limitations in terms of processing capacity:

  • this should be properly managed;

Y. Laurent asks for when he can re-perform the validation.

  • the complete validation is not feasible right now simply because there are a number of GUI missing;