Difference between revisions of "Catalogue:Services"
Andrea.manzi (Talk | contribs) (→Data Management Facilities) |
Andrea.manzi (Talk | contribs) (→Data Management Facilities) |
||
Line 9: | Line 9: | ||
The Data Management facilities can be further categorized in 3 different areas grouping a series of components. | The Data Management facilities can be further categorized in 3 different areas grouping a series of components. | ||
+ | |||
+ | |||
+ | === Data Access and Storage === | ||
+ | |||
+ | The data access and storage components provide secure, scalable, efficient, standards-based storage and retrieval of data, where the data may be maintained by the system or outside the system and may vary in structure, size, and semantics. The Key features of this family of components are: | ||
+ | |||
+ | * ''uniform model and access API over structured data''. dynamically pluggable architecture of model and API transformations to and from internal and external data sources; plugins for document, biodiversity, statistical, and semantic data sources, including sources with custom APIs and standards-based APIs; | ||
+ | * ''fine-grained access to structured data''. horizontal and vertical filtering based on pattern matching; URI-based resolution; in-place remote updates; | ||
+ | *''scalable access to structured data'': autonomic service replication with infrastructure-wide load balancing; efficient and scalable storage of structured data based on graph database technology; | ||
+ | ''uniform modelling and access API over document data'': rich descriptions of document content, metadata, annotations, parts, and alternatives, transformations from model and API of key document sources, including OAI providers; | ||
+ | ''uniform modelling and access API over semantic data". tree-views over RDF graph data; transformations from model and API of key document sources, including SPARQL endpoints; | ||
+ | ''uniform modelling and access over biodiversity data''. access API tailored to biodiversity data sources; dynamically pluggable architecture of transformations from external sources of biodiversity data; | ||
+ | plugins for key biodiversity data sources, including OBIS, GBIF and Catalogue of Life; | ||
+ | ''efficient and scalable storage of files'': unstructured storage back-end based on MongoDB for replication and high availability, automatic balancing for changes in load and data distribution, no single points of failure, data sharding; | ||
+ | ''standards-based and structured storage of files'': POSIX-like client API; support for hierarchical folder structures; | ||
== Data Consumption Facilities == | == Data Consumption Facilities == |
Revision as of 14:17, 12 July 2013
iMarine inherited its software stack (gCube) from 2 previous EU projects, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. and D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. II. The software has been further extended during the iMarine project in order to enhance the foundation and build Marine Applications on top of it concentrating the effort in 3 main functionalities areas :
- Core Facilities : dedicated to provide its users with a range of services for the operation and management of the whole infrastructure. They are detailed in this section
- Data Management Facilities : dedicated to provide its users with a rich array of services for the management of data in the context of the whole infrastructure.
- Data Consumption Facilities : dedicated to provide its users with a rich array of services for the exploitation of data in the context of the whole infrastructure
Data Management Facilities
The Data Management facilities can be further categorized in 3 different areas grouping a series of components.
Data Access and Storage
The data access and storage components provide secure, scalable, efficient, standards-based storage and retrieval of data, where the data may be maintained by the system or outside the system and may vary in structure, size, and semantics. The Key features of this family of components are:
- uniform model and access API over structured data. dynamically pluggable architecture of model and API transformations to and from internal and external data sources; plugins for document, biodiversity, statistical, and semantic data sources, including sources with custom APIs and standards-based APIs;
- fine-grained access to structured data. horizontal and vertical filtering based on pattern matching; URI-based resolution; in-place remote updates;
- scalable access to structured data: autonomic service replication with infrastructure-wide load balancing; efficient and scalable storage of structured data based on graph database technology;
uniform modelling and access API over document data: rich descriptions of document content, metadata, annotations, parts, and alternatives, transformations from model and API of key document sources, including OAI providers; uniform modelling and access API over semantic data". tree-views over RDF graph data; transformations from model and API of key document sources, including SPARQL endpoints; uniform modelling and access over biodiversity data. access API tailored to biodiversity data sources; dynamically pluggable architecture of transformations from external sources of biodiversity data; plugins for key biodiversity data sources, including OBIS, GBIF and Catalogue of Life; efficient and scalable storage of files: unstructured storage back-end based on MongoDB for replication and high availability, automatic balancing for changes in load and data distribution, no single points of failure, data sharding; standards-based and structured storage of files: POSIX-like client API; support for hierarchical folder structures;
Data Consumption Facilities
The Data Consumption facilities can be further categorized in 5 different areas grouping a series of components.
Data Retrieval
gCube provides Information Retrieval facilities over large heterogeneous environments. The architecture and mechanisms provided by the framework ensure flexibility, scalability, high performance and availability. In particular:
- Declarative Query Language over a heterogeneous environment. gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the CQL standard.
- On the fly Integration of Data Sources. A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process.
- Scalability in the number of Data Sources. Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution.
- Direct Integration of External Information Providers. Through the OpenSearch standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering.
- Indexing Capabilities for Replication and High Availability. Multidimensional and Full-text indexing capabilites using an architecture that efficiently supports replication and high availability.
- Distributed Execution Environment offering High Performance and Flexibility. Efficient execution of search plans over a large heterogeneous environment.
Services in this area:
Data Manipulation
gCube provides Data Manipulation Facilities responsible for transforming content and metadata among different formats and specifications. The architecture and mechanisms provided by the framework satisfy the requirements for arbitrary transformation or homogenization of content and metadata. Its features are useful for:
- information retrieval
- information presentation
- processing and exporting
In particular it offers:
- Automatic transformation path identification. Given the content type of a source object and the target content type, framework finds out the appropriate transformation to use. In addition, there is the ability to dynamically form a path of a number of transformation steps to produce the final format. Shortest path length is favorable.
- Pluggable algorithms for content transformation. A generic transformation framework that is based on pluggable components termed transformation programs. Transformation programs reveal the transformation capabilities of the framework. With this approach we are able to furnish domain and application specific data transformations.
- Exploitation of Distributed Infrastructure. The integration with a Workflow Engine engine allows to have access to vast amounts of processing power and enables to handle virtually any transformation tasks thus consisting the standard Data Manipulation facility for gCube applications.
Services in this area:
Data Mining
Data Mining facilities include a set of features, services and methods for performing data processing and mining on information sets. These features face several aspects of biological data processing ranging from ecological modeling to niche modeling experiments. Algorithms are executed in parallel and possibly distributed fashion. Furthermore, Services performing Data Mining operations are deployed according to a distributed architecture, in order to balance the load of those procedures requiring local resources.
By means of the above features, Data Mining in iMarine aims to manage problems like
- the prediction of the impact of climate changes on biodiversity,
- the prevention of the spread of invasive species,
- the identification of geographical and ecological aspects of disease transmission,
- the conservation planning,
- the prediction of suitable habitats for marine species.
Services in the area:
Data Visualization
Data Visualisation facilities include a set of features, software and methods for performing visualisation of data. Data Visualisation is particularly meant for geo-spatial data, which is a kind of information that naturally lends to visualisation. Data are reproduced on interactive maps and can be explored by means of several inspection tools. In particular it offers:
- uniform access over geospatial GIS layers
- investigation over layers indexed by GeoNetwork;
- visualization of distributed layers;
- add of remote layers published in standard OGC formats (WMSSee Workload Management System or Web Mapping Service. or WFSWeb Feature Service);
- Filtering and analysis capabilities
- possibility to perform CQL filters on layers;
- possibility to trace transect charts;
- possibility to select areas for investigating on environmental features;
- Search and indexing capabilities
- possibility to sort over titles on a huge quantity of layers;
- possibility to search over titles and names on a huge quantity of layers;
- possibility to index layers by invoking GeoNetwork functionalities;
Services in the area:
Semantic Data Analysis
Semantic Data Analysis comprises a set of libraries and services to bridge the gap between communities and link distributed data across community boundaries. The introduction of the Semantic Web and the publication of expressive metadata in a shared knowledge framework enable the deployment of services that can intelligently use Web resources
In particular it offers:
- Provision of results clustering over any search system. Returns textual snippets and for which there is an OpenSearch description
- Provision of snippet or contents-based entity recognition. Generic as well as vertical - based on predetermined entity categories and lists which can be obtained by querying SPARQL endpoints
- Provision of gradual faceted (session-based) search. Allows to gradually restrict the answer based on the selected entities and/or clusters
- Ability to fetch and display semantic information of an identified entity. Achieved by querying approprate SPARQL endpoints
- Ability to apply these services on any web page through a web browser. Using the functionality of bookmarklets
Services in the area: