Difference between revisions of "Catalogue:Services"

Latest revision as of 16:26, 17 July 2013

iMarine inherited its software stack (gCube) from 2 previous EU projects, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. and D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. II. The software has been further extended during the iMarine project in order to enhance the foundation and build Marine Applications on top of it concentrating the effort in 3 main functionalities areas :

Core Facilities : dedicated to provide its users with a range of services for the operation and management of the whole infrastructure. They are detailed in this section
Data Management Facilities : dedicated to provide its users with a rich array of services for the management of data
Data Consumption Facilities : dedicated to provide its users with a rich array of services for the exploitation of data

Data Management Facilities

The Data Management facilities can be further categorized in 3 different areas grouping a series of components.

Data Access and Storage

The data access and storage components provide secure, scalable, efficient, standards-based storage and retrieval of data, where the data may be maintained by the system or outside the system and may vary in structure, size, and semantics. The Key features of this family of components are:

uniform model and access API over structured data.
- dynamically pluggable architecture of model and API transformations to and from internal and external data sources;
- plugins for document, biodiversity, statistical, and semantic data sources, including sources with custom APIs and standards-based APIs;
fine-grained access to structured data.
- horizontal and vertical filtering based on pattern matching; URI-based resolution; in-place remote updates;
scalable access to structured data:
- autonomic service replication with infrastructure-wide load balancing;
- efficient and scalable storage of structured data based on graph database technology;
uniform modelling and access API over document data:
- rich descriptions of document content, metadata, annotations, parts, and alternatives
- transformations from model and API of key document sources, including OAI providers;
uniform modelling and access API over semantic dat: tree-views over RDF graph data
- transformations from model and API of key document sources, including SPARQL endpoints;
uniform modelling and access over biodiversity data:
- access API tailored to biodiversity data sources; dynamically pluggable architecture of transformations from external sources of biodiversity data;
- plugins for key biodiversity data sources, including OBIS, GBIF and Catalogue of Life;
efficient and scalable storage of files:
- unstructured storage back-end based on MongoDB for replication and high availability, automatic balancing for changes in load and data distribution, no single points of failure, data sharding;
standards-based and structured storage of files:
- POSIX-like client API;
- support for hierarchical folder structures;
uniform model and access API over environmental data
- heterogeneous external datasources investigation;
- uniform access to OGC compliant services.

The services belonging to this area are:

Data Transfer

The Data Transfer components implements reliable data transfer mechanisms between the nodes of a gCube-based Infrastructure. The key features are :

Point to Point transfer : one writer-one reader as core functionality
Intuitive stream and iterator based interface simplified usage with reasonable default behavior for common use cases and a variety of features for increased usability and flexibility
Multiple protocols support: data transfer currently supports the following protocols: tcp and http
Reliable data transfer between Infrastructure Data Sources and Data Storages: by exploiting the uniform access interfaces provided by gCube
Structured and unstructured Data Transfer: both Tree based and File based transfer to cover all possible use-cases
Transfers to local nodes for data staging: data staging for particular use cases can be enabled on each node of the infrastructure
Advanced transfer scheduling and transfer optimization: a dedicated gCube service responsible for data transfer scheduling and transfer optimization
Transfer statistics availability: transfers are logged by the system and made available to interested consumers.

The services belonging to the area are:

Data Harmonization

The components part of the Data Harmonization are gives unified views over diverse data items and in particular they offer:

workflow-oriented tabular data manipulation
- user-defined definition and execution of workflows of data manipulation steps
- rich array of data manipulation facilities offered 'as-a-Service'
- rich array of data mining facilities offered 'as-a-Service'
- rich array of data visualisation facilities offered 'as-a-Service'
reference-data management support
- uniform model for reference-data representation including versioning and provenance
data curation and enrichment support
- species occurrence data enrichment with environmental data dynamically acquired by data providers
- data provenance recording
standard-based data presentation
- OGC standard-based Geospatial data presentation

The services belonging to this area can be further categorized according to the data type:

Tabular Data
- Tabular Data Flow Manager
- Tabular Data Manager
TIme Series
- TimeSeries Manager
- CodeList Manager
BIodiversity Data

Data Consumption Facilities

The Data Consumption facilities can be further categorized in 5 different areas grouping a series of components.

Data Retrieval

gCube provides Information Retrieval facilities over large heterogeneous environments. The architecture and mechanisms provided by the framework ensure flexibility, scalability, high performance and availability. In particular:

Declarative Query Language over a heterogeneous environment. gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the CQL standard.
On the fly Integration of Data Sources. A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process.
Scalability in the number of Data Sources. Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution.
Direct Integration of External Information Providers. Through the OpenSearch standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering.
Indexing Capabilities for Replication and High Availability. Multidimensional and Full-text indexing capabilites using an architecture that efficiently supports replication and high availability.
Distributed Execution Environment offering High Performance and Flexibility. Efficient execution of search plans over a large heterogeneous environment.

Services in this area:

Data Manipulation

gCube provides Data Manipulation Facilities responsible for transforming content and metadata among different formats and specifications. The architecture and mechanisms provided by the framework satisfy the requirements for arbitrary transformation or homogenization of content and metadata. Its features are useful for:

information retrieval
information presentation
processing and exporting

In particular it offers:

Automatic transformation path identification. Given the content type of a source object and the target content type, framework finds out the appropriate transformation to use. In addition, there is the ability to dynamically form a path of a number of transformation steps to produce the final format. Shortest path length is favorable.
Pluggable algorithms for content transformation. A generic transformation framework that is based on pluggable components ( transformation programs and Algorithms). Transformation programs and algorithms reveal the transformation capabilities of the framework. With this approach we are able to furnish domain and application specific data transformations.
Exploitation of Distributed Infrastructure. The integration with a Workflow Engine engine allows to have access to vast amounts of processing power and enables to handle virtually any transformation tasks thus consisting the standard Data Manipulation facility for gCube applications.
Advanced geospatial analytical and modeling features - e.g. R geospatial, reallocation, aggregation: The possibility to define advanced geospatial processes required in reallocation, aggregation, interpolation and design and plan implementation for geospatial processes capacity. In particular using the WPS standard interface to the Hadoop framework which exploits the power of the distributed computation.

Services in this area:

Data Mining

Data Mining facilities include a set of features, services and methods for performing data processing and mining on information sets. These features face several aspects of biological data processing ranging from ecological modeling to niche modeling experiments. Algorithms are executed in parallel and possibly distributed fashion. Furthermore, Services performing Data Mining operations are deployed according to a distributed architecture, in order to balance the load of those procedures requiring local resources.

By means of the above features, Data Mining in iMarine aims to manage problems like

the prediction of the impact of climate changes on biodiversity,
the prevention of the spread of invasive species,
the identification of geographical and ecological aspects of disease transmission,
the conservation planning,
the prediction of suitable habitats for marine species.

Services in the area:

Data Visualization

Data Visualisation facilities include a set of features, software and methods for performing visualisation of data. Data Visualisation is particularly meant for geo-spatial data, which is a kind of information that naturally lends to visualisation. Data are reproduced on interactive maps and can be explored by means of several inspection tools. In particular it offers:

uniform access over geospatial GIS layers
- investigation over layers indexed by GeoNetwork;
- visualization of distributed layers;
- add of remote layers published in standard OGC formats (WMSSee Workload Management System or Web Mapping Service. or WFSWeb Feature Service);
Filtering and analysis capabilities
- possibility to perform CQL filters on layers;
- possibility to trace transect charts;
- possibility to select areas for investigating on environmental features;
Search and indexing capabilities
- possibility to sort over titles on a huge quantity of layers;
- possibility to search over titles and names on a huge quantity of layers;
- possibility to index layers by invoking GeoNetwork functionalities;

Services in the area:

Semantic Data Analysis

Semantic Data Analysis comprises a set of libraries and services to bridge the gap between communities and link distributed data across community boundaries. The introduction of the Semantic Web and the publication of expressive metadata in a shared knowledge framework enable the deployment of services that can intelligently use Web resources

In particular it offers:

Provision of results clustering over any search system. Returns textual snippets and for which there is an OpenSearch description
Provision of snippet or contents-based entity recognition. Generic as well as vertical - based on predetermined entity categories and lists which can be obtained by querying SPARQL endpoints
Provision of gradual faceted (session-based) search. Allows to gradually restrict the answer based on the selected entities and/or clusters
Ability to fetch and display semantic information of an identified entity. Achieved by querying approprate SPARQL endpoints
Ability to apply these services on any web page through a web browser. Using the functionality of bookmarklets

Services in the area:

X-Search

@@ Line 2: / Line 2: @@
 * '''Core Facilities''' :  dedicated to provide its users with a range of services for the operation and management of the whole infrastructure. They are detailed in this [[Catalogue:Infrastructure|section]]
-* '''Data Management Facilities'''  :  dedicated to provide its users with a rich array of services for the management of data in the context of the whole infrastructure.
+* '''Data Management Facilities'''  :  dedicated to provide its users with a rich array of services for the management of data
-* '''Data Consumption Facilities'''  :  dedicated to provide its users with a rich array of services for the exploitation of data in the context of the whole infrastructure
+* '''Data Consumption Facilities'''  :  dedicated to provide its users with a rich array of services for the exploitation of data
 == Data Management Facilities ==
+The Data Management facilities can be further categorized in 3 different areas grouping a series of components.
+=== Data Access and Storage ===
+The data access and storage components provide secure, scalable, efficient, standards-based storage and retrieval of data, where the data may be maintained by the system or outside the system and may vary in structure, size, and semantics. The Key features of this family of components are:
+* ''uniform model and access API over structured data''.
+** dynamically pluggable architecture of model and API transformations to and from internal and external data sources;
+**plugins for document, biodiversity, statistical, and semantic data sources, including sources with custom APIs and standards-based APIs;
+* ''fine-grained access to structured data''.
+**horizontal and vertical filtering based on pattern matching; URI-based resolution; in-place remote updates;
+*''scalable access to structured data'':
+** autonomic service replication with infrastructure-wide load balancing;
+**efficient and scalable storage of structured data based on graph database technology;
+*''uniform modelling and access API over document data'':
+**rich descriptions of document content, metadata, annotations, parts, and alternatives
+** transformations from model and API of key document sources, including OAI providers;
+*''uniform modelling and access API over semantic dat'': tree-views over RDF graph data
+**transformations from model and API of key document sources, including SPARQL endpoints;
+*''uniform modelling and access over biodiversity data'':
+**access API tailored to biodiversity data sources; dynamically pluggable architecture of transformations from external sources of biodiversity data;
+** plugins for key biodiversity data sources, including OBIS, GBIF and Catalogue of Life;
+*''efficient and scalable storage of files'':
+**unstructured storage back-end based on MongoDB for replication and high availability, automatic balancing for changes in load and data distribution, no single points of failure, data sharding;
+*''standards-based and structured storage of files'':
+**POSIX-like client API;
+**support for hierarchical folder structures;
+*''uniform model and access API over environmental data''
+**heterogeneous external datasources investigation;
+**uniform access to OGC compliant services.
+The services belonging to this area are:
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Tree-Based_Access Tree Manager service]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Biodiversity_Access Species Product Discovery service]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/File-Based_Access Storage Manager service]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Environmental_Service Environmental Service]
+=== Data Transfer ===
+The Data Transfer components implements reliable data transfer mechanisms between the nodes of a gCube-based  Infrastructure. The key features are :
+* ''Point to Point transfer'' : one writer-one reader as core functionality
+*''Intuitive stream and iterator based interface'' simplified usage with reasonable default behavior for common use cases and a variety of features for increased usability and flexibility
+*''Multiple protocols support'': data transfer currently supports the following protocols: tcp and http
+*''Reliable data transfer between Infrastructure Data Sources and Data Storages'': by exploiting the uniform access interfaces provided by gCube
+*''Structured and unstructured Data Transfer'': both Tree based and File based transfer to cover all possible use-cases
+*''Transfers to local nodes for data staging'': data staging for particular use cases can be enabled on each node of the infrastructure
+*''Advanced transfer scheduling and transfer optimization'': a dedicated gCube service responsible for data transfer scheduling and transfer optimization
+*''Transfer statistics availability'': transfers are logged by the system and made available to interested consumers.
+The services belonging to the area are:
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Result_Set_components Result Set]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Transfer_Scheduler_%26_Agent_components Data Transfer and Scheduler services ]
+=== Data Harmonization ===
+The components part of the Data Harmonization are gives  unified views over diverse data items and in particular they offer:
+* ''workflow-oriented tabular data manipulation''
+**user-defined definition and execution of workflows of data manipulation steps
+**rich array of data manipulation facilities offered 'as-a-Service'
+**rich array of data mining facilities offered 'as-a-Service'
+** rich array of data visualisation facilities offered 'as-a-Service'
+*''reference-data management support''
+**uniform model for reference-data representation including versioning and provenance
+*''data curation and enrichment support''
+**species occurrence data enrichment with environmental data dynamically acquired by data providers
+**data provenance recording
+*''standard-based data presentation''
+**OGC standard-based Geospatial data presentation
+The services belonging to this area can be further categorized according to the data type:
+* Tabular Data
+** [https://gcube.wiki.gcube-system.org/gcube/index.php/Tabular_Data_Flow_Manager Tabular Data Flow Manager]
+** [https://gcube.wiki.gcube-system.org/gcube/index.php/Tabular_Data_Manager Tabular Data Manager]
+* TIme Series
+**[https://gcube.wiki.gcube-system.org/gcube/index.php/TimeSeries TimeSeries Manager]
+**[https://gcube.wiki.gcube-system.org/gcube/index.php/Codelist_Manager CodeList Manager]
+*BIodiversity Data
+**[https://gcube.wiki.gcube-system.org/gcube/index.php/Occurrence_Data_Reconciliation Occurrence Data Reconciliation]
+**[https://gcube.wiki.gcube-system.org/gcube/index.php/Occurrence_Data_Enrichment_Service Occurrence Data Enrichment Service]
+**[https://gcube.wiki.gcube-system.org/gcube/index.php/Taxon_Names_Reconciliation_Service Taxon Names Reconciliation Service]
 == Data Consumption Facilities ==
-The Data Consumption facilities can be further categorized in 5 different areas
+The Data Consumption facilities can be further categorized in 5 different areas grouping a series of components.
-=== [https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Retrieval_Facilities  Data Retrieval] ===
+=== Data Retrieval ===
-gCube provides Information Retrieval facilities over large heterogeneous environments. Sources of information that use different technologies, data representation and semantics can be integrated and exploited by gCube's Data Retrieval framework. The architecture and mechanisms provided by the framework ensure flexibility, scalability, high performance and availability. The gCube Data Retrieval Framework aims at hiding the complexity of the underlying environment by:
+gCube provides Information Retrieval facilities over large heterogeneous environments. The architecture and mechanisms provided by the framework ensure flexibility, scalability, high performance and availability. In particular:
-* providing a declarative approach for querying the hosted information
+* ''Declarative Query Language over a heterogeneous environment''. gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the CQL standard.
-* scaling to the number of hosted information sources
+* ''On the fly Integration of Data Sources''. A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process.
-* Integrating dynamically external sources of information
+* ''Scalability in the number of Data Sources''. Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution.
+* ''Direct Integration of External Information Providers''. Through the OpenSearch standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering.
+* ''Indexing Capabilities for Replication and High Availability''. Multidimensional and Full-text indexing capabilites using an architecture that efficiently supports replication and high availability.
+*'' Distributed Execution Environment offering High Performance and Flexibility.''  Efficient execution of search plans over a large heterogeneous environment.
-=== [https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Manipulation_Facilities Data Manipulation] ===
+Services in this area:
+*[https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Sources_Specification Data Sources]
+*[https://gcube.wiki.gcube-system.org/gcube/index.php/Search_Planning_and_Execution_Specification Search System service]
+===  Data Manipulation ===
 gCube provides Data Manipulation Facilities responsible for transforming content and metadata among different formats and specifications. The architecture and mechanisms provided by the framework satisfy the requirements for arbitrary transformation or homogenization of content and metadata.
 Its features are useful for:
@@ Line 24: / Line 117: @@
 * information presentation
 * processing and exporting
-Transformations can be performed offline and on demand on a single object or on a group of objects.
-===[https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Mining_Facilities Data Mining]===
+In particular it offers:
-Data Mining facilities include a set of features, services and methods for performing data processing and mining on biological information sets. These features face several aspects of biological data processing ranging from ecological modeling to niche modeling experiments. Algorithms are executed in parallel and possibly distributed fashion using  working nodes. Furthermore, Services performing Data Mining operations are deployed according to a distributed architecture, in order to balance the load of those procedures requiring local resources.
+* ''Automatic transformation path identification''. Given the content type of a source object and the target content type, framework finds out the appropriate transformation to use. In addition, there is the ability to dynamically form a path of a number of transformation steps to produce the final format. Shortest path length is favorable.
+* ''Pluggable algorithms for content transformation''. A generic transformation framework that is based on pluggable components ( transformation programs and Algorithms). Transformation programs and algorithms reveal the transformation capabilities of the framework. With this approach we are able to furnish domain and application specific data transformations.
+* ''Exploitation of Distributed  Infrastructure''. The integration with a Workflow Engine engine allows to have access to vast amounts of processing power and enables to handle virtually any transformation tasks thus consisting the standard Data Manipulation facility for gCube applications.
+* ''Advanced geospatial analytical and modeling features - e.g. R geospatial, reallocation, aggregation'': The possibility to define  advanced geospatial processes required in reallocation, aggregation, interpolation and design and plan implementation for geospatial processes capacity. In particular using the WPS standard interface to the Hadoop framework  which exploits the power of the distributed computation.
+Services in this area:
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Transformation_Service_Specification  Data Transformation Service]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Geospatial_Data_Processing  WPS-Hadoop Service]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Legacy_applications_integration Legacy Application Integration]
+===Data Mining ===
+Data Mining facilities include a set of features, services and methods for performing data processing and mining on information sets. These features face several aspects of biological data processing ranging from ecological modeling to niche modeling experiments. Algorithms are executed in parallel and possibly distributed fashion. Furthermore, Services performing Data Mining operations are deployed according to a distributed architecture, in order to balance the load of those procedures requiring local resources.
+By means of the above features, Data Mining in iMarine aims to manage problems like
+* ''the prediction of the impact of climate changes on biodiversity,''
+* ''the prevention of the spread of invasive species,''
+* ''the identification of geographical and ecological aspects of disease transmission,''
+* ''the conservation planning,''
+*'' the prediction of suitable habitats for marine species.''
+Services in the area:
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Statistical_Manager Statistical Manager]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Ecological_Modeling Ecological Modeling]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Signal_Processing Signal Processing]
+===  Data Visualization ===
+Data Visualisation facilities include a set of features, software and methods for performing visualisation of data. Data Visualisation is particularly meant for geo-spatial data, which is a kind of information that naturally lends to visualisation. Data are reproduced on interactive maps and can be explored by means of several inspection tools.
+In particular it offers:
+* ''uniform access over geospatial GIS layers''
+**investigation over layers indexed by GeoNetwork;
+**visualization of distributed layers;
+** add of remote layers published in standard OGC formats (WMS or WFS);
+*''Filtering and analysis capabilities''
+**possibility to perform CQL filters on layers;
+**possibility to trace transect charts;
+**possibility to select areas for investigating on environmental features;
+*''Search and indexing capabilities''
+**possibility to sort over titles on a huge quantity of layers;
+**possibility to search over titles and names on a huge quantity of layers;
+**possibility to index layers by invoking GeoNetwork functionalities;
+Services in the area:
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Gis_Viewer GISViewer]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Geo_Explorer GeoExplorer]
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/Geospatial_Data_Processing#TIFFUploader_Algorithm Tiff Uploader]
+=== Semantic Data Analysis ===
+Semantic Data Analysis comprises a set of libraries and services to bridge the gap between communities and link distributed data across community boundaries. The introduction of the Semantic Web and the publication of expressive metadata in a shared knowledge framework enable the deployment of services that can intelligently use Web resources
+In particular it offers:
+* ''Provision of results clustering over any search system''. Returns textual snippets and for which there is an OpenSearch description
+* ''Provision of snippet or contents-based entity recognition''. Generic as well as vertical - based on predetermined entity categories and lists which can be obtained by querying SPARQL endpoints
+* ''Provision of gradual faceted (session-based) search''. Allows to gradually restrict the answer based on the selected entities and/or clusters
+* ''Ability to fetch and display semantic information of an identified entity''. Achieved by querying approprate SPARQL endpoints
+* ''Ability to apply these services on any web page through a web browser''. Using the functionality of bookmarklets
-=== [https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Visualisation_Facilities Data Visualization]===
+Services in the area:
-Data Visualisation facilities include a set of features, software and methods for performing visualisation of data. Data Visualisation is particularly meant for geo-spatial data, which is a kind of information that naturally lends to visualisation. Data are reproduced on interactive maps and can be explored by means of several inspection tools. The adopted paradigm for maps visualisation needs to query a central GeoNetwork instance that indexes several geo-spatial data sources.
-=== [https://gcube.wiki.gcube-system.org/gcube/index.php/Semantic_Data_Analysis  Semantic Data Anaylisys] ===
+* [https://gcube.wiki.gcube-system.org/gcube/index.php/X-Search X-Search]
-This task aims to deliver a set of libraries and services to bridge the gap between communities and link distributed data across community boundaries. The introduction of the Semantic Web and the publication of expressive metadata in a shared knowledge framework enable the deployment of services that can intelligently use Web resources

Difference between revisions of "Catalogue:Services"

Latest revision as of 16:26, 17 July 2013

Data Management Facilities

Data Access and Storage

Data Transfer

Data Harmonization

Data Consumption Facilities

Data Retrieval

Data Manipulation

Data Mining

Data Visualization

Semantic Data Analysis

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

D4Science

Capacity

Procedures

Policies

Documentation

Tools