Catalogue:Services

From D4Science Wiki
Revision as of 13:53, 12 July 2013 by Andrea.manzi (Talk | contribs) (Semantic Data Analysis)

Jump to: navigation, search

iMarine inherited its software stack (gCube) from 2 previous EU projects, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. and D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. II. The software has been further extended during the iMarine project in order to enhance the foundation and build Marine Applications on top of it concentrating the effort in 3 main functionalities areas :

  • Core Facilities : dedicated to provide its users with a range of services for the operation and management of the whole infrastructure. They are detailed in this section
  • Data Management Facilities  : dedicated to provide its users with a rich array of services for the management of data in the context of the whole infrastructure.
  • Data Consumption Facilities  : dedicated to provide its users with a rich array of services for the exploitation of data in the context of the whole infrastructure


Data Management Facilities

Data Consumption Facilities

The Data Consumption facilities can be further categorized in 5 different areas grouping a series of components.

Data Retrieval

gCube provides Information Retrieval facilities over large heterogeneous environments. The architecture and mechanisms provided by the framework ensure flexibility, scalability, high performance and availability. In particular:

  • Declarative Query Language over a heterogeneous environment. gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the CQL standard.
  • On the fly Integration of Data Sources. A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process.
  • Scalability in the number of Data Sources. Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution.
  • Direct Integration of External Information Providers. Through the OpenSearch standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering.
  • Indexing Capabilities for Replication and High Availability. Multidimensional and Full-text indexing capabilites using an architecture that efficiently supports replication and high availability.
  • Distributed Execution Environment offering High Performance and Flexibility. Efficient execution of search plans over a large heterogeneous environment.

Services in this area:

Data Manipulation

gCube provides Data Manipulation Facilities responsible for transforming content and metadata among different formats and specifications. The architecture and mechanisms provided by the framework satisfy the requirements for arbitrary transformation or homogenization of content and metadata. Its features are useful for:

  • information retrieval
  • information presentation
  • processing and exporting

In particular it offers:

  • Automatic transformation path identification. Given the content type of a source object and the target content type, framework finds out the appropriate transformation to use. In addition, there is the ability to dynamically form a path of a number of transformation steps to produce the final format. Shortest path length is favorable.
  • Pluggable algorithms for content transformation. A generic transformation framework that is based on pluggable components termed transformation programs. Transformation programs reveal the transformation capabilities of the framework. With this approach we are able to furnish domain and application specific data transformations.
  • Exploitation of Distributed Infrastructure. The integration with a Workflow Engine engine allows to have access to vast amounts of processing power and enables to handle virtually any transformation tasks thus consisting the standard Data Manipulation facility for gCube applications.

Services in this area:

Data Mining

Data Mining facilities include a set of features, services and methods for performing data processing and mining on information sets. These features face several aspects of biological data processing ranging from ecological modeling to niche modeling experiments. Algorithms are executed in parallel and possibly distributed fashion. Furthermore, Services performing Data Mining operations are deployed according to a distributed architecture, in order to balance the load of those procedures requiring local resources.

By means of the above features, Data Mining in iMarine aims to manage problems like

  • the prediction of the impact of climate changes on biodiversity,
  • the prevention of the spread of invasive species,
  • the identification of geographical and ecological aspects of disease transmission,
  • the conservation planning,
  • the prediction of suitable habitats for marine species.

Services in the area:

Data Visualization

Data Visualisation facilities include a set of features, software and methods for performing visualisation of data. Data Visualisation is particularly meant for geo-spatial data, which is a kind of information that naturally lends to visualisation. Data are reproduced on interactive maps and can be explored by means of several inspection tools. In particular it offers:

  • uniform access over geospatial GIS layers
    • investigation over layers indexed by GeoNetwork;
    • visualization of distributed layers;
    • add of remote layers published in standard OGC formats (WMSSee Workload Management System or Web Mapping Service. or WFSWeb Feature Service);
  • Filtering and analysis capabilities
    • possibility to perform CQL filters on layers;
    • possibility to trace transect charts;
    • possibility to select areas for investigating on environmental features;
  • Search and indexing capabilities
    • possibility to sort over titles on a huge quantity of layers;
    • possibility to search over titles and names on a huge quantity of layers;
    • possibility to index layers by invoking GeoNetwork functionalities;

Services in the area:

Semantic Data Analysis

Semantic Data Analysis comprises a set of libraries and services to bridge the gap between communities and link distributed data across community boundaries. The introduction of the Semantic Web and the publication of expressive metadata in a shared knowledge framework enable the deployment of services that can intelligently use Web resources

In particular it offers:

  • Provision of results clustering over any search system. Returns textual snippets and for which there is an OpenSearch description
  • Provision of snippet or contents-based entity recognition. Generic as well as vertical - based on predetermined entity categories and lists which can be obtained by querying SPARQL endpoints
  • Provision of gradual faceted (session-based) search. Allows to gradually restrict the answer based on the selected entities and/or clusters
  • Ability to fetch and display semantic information of an identified entity. Achieved by querying approprate SPARQL endpoints
  • Ability to apply these services on any web page through a web browser. Using the functionality of bookmarklets

Services in the area: