Catalogue:Services

From D4Science Wiki
Revision as of 13:35, 12 July 2013 by Andrea.manzi (Talk | contribs) (Data Manipulation)

Jump to: navigation, search

iMarine inherited its software stack (gCube) from 2 previous EU projects, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. and D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. II. The software has been further extended during the iMarine project in order to enhance the foundation and build Marine Applications on top of it concentrating the effort in 3 main functionalities areas :

  • Core Facilities : dedicated to provide its users with a range of services for the operation and management of the whole infrastructure. They are detailed in this section
  • Data Management Facilities  : dedicated to provide its users with a rich array of services for the management of data in the context of the whole infrastructure.
  • Data Consumption Facilities  : dedicated to provide its users with a rich array of services for the exploitation of data in the context of the whole infrastructure


Data Management Facilities

Data Consumption Facilities

The Data Consumption facilities can be further categorized in 5 different areas grouping a series of components.

Data Retrieval

gCube provides Information Retrieval facilities over large heterogeneous environments. The architecture and mechanisms provided by the framework ensure flexibility, scalability, high performance and availability. In particular:

  • Declarative Query Language over a heterogeneous environment

gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the CQL standard.

  • On the fly Integration of Data Sources

A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process.

  • Scalability in the number of Data Sources

Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution.

  • Direct Integration of External Information Providers

Through the OpenSearch standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering.

  • Indexing Capabilities for Replication and High Availability

Multidimensional and Full-text indexing capabilites using an architecture that efficiently supports replication and high availability.

  • Distributed Execution Environment offering High Performance and Flexibility

Efficient execution of search plans over a large heterogeneous environment.

Services in this area:

Data Manipulation

gCube provides Data Manipulation Facilities responsible for transforming content and metadata among different formats and specifications. The architecture and mechanisms provided by the framework satisfy the requirements for arbitrary transformation or homogenization of content and metadata. Its features are useful for:

  • information retrieval
  • information presentation
  • processing and exporting

In particular it offers:

  • Automatic transformation path identification

Given the content type of a source object and the target content type, framework finds out the appropriate transformation to use. In addition, there is the ability to dynamically form a path of a number of transformation steps to produce the final format. Shortest path length is favorable.

  • Pluggable algorithms for content transformation

A generic transformation framework that is based on pluggable components termed transformation programs. Transformation programs reveal the transformation capabilities of the framework. With this approach we are able to furnish domain and application specific data transformations.

  • Exploitation of Distributed Infrastructure

The integration with a Workflow Engine engine allows to have access to vast amounts of processing power and enables to handle virtually any transformation tasks thus consisting the standard Data Manipulation facility for gCube applications.

Services in this area:

Data Mining

Data Mining facilities include a set of features, services and methods for performing data processing and mining on biological information sets. These features face several aspects of biological data processing ranging from ecological modeling to niche modeling experiments. Algorithms are executed in parallel and possibly distributed fashion using working nodes. Furthermore, Services performing Data Mining operations are deployed according to a distributed architecture, in order to balance the load of those procedures requiring local resources.

Data Visualization

Data Visualisation facilities include a set of features, software and methods for performing visualisation of data. Data Visualisation is particularly meant for geo-spatial data, which is a kind of information that naturally lends to visualisation. Data are reproduced on interactive maps and can be explored by means of several inspection tools. The adopted paradigm for maps visualisation needs to query a central GeoNetwork instance that indexes several geo-spatial data sources.

Semantic Data Analysis

Semantic Data Analysis comprises a set of libraries and services to bridge the gap between communities and link distributed data across community boundaries. The introduction of the Semantic Web and the publication of expressive metadata in a shared knowledge framework enable the deployment of services that can intelligently use Web resources