Difference between revisions of "Catalogue:Applications"
(→GeosCube) |
|||
(4 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
|| __TOC__ | || __TOC__ | ||
|} | |} | ||
− | The D4Science | + | The D4Science infrastructure combines the functionality of more than 500 components into a coherent and centrally managed infrastructure of hardware, software, and data resources. Together, these offer a platform that can host a variety of applications. These applications share a common theme; Provide a service to a Community of Practice. Other than other infrastructures that boast size, power, performance, or the latest technology, D4Science puts the community first. This does not imply we make concessions on quality or performance, but we see it as our mission to offer quality and performance to communities that have no resources of their own to jump high hurdles. |
The infrastructure resembles an archipelago where applications emerge as islands of services, resting on an underlying infrastructure bedrock. The islands specialize in one or more domains, yet are not isolated 'atolls'. Every island is well connected to others, and island-hopping is strongly encouraged. Each island offers a standard set of features that can be extended by selecting services from several topical bundles. | The infrastructure resembles an archipelago where applications emerge as islands of services, resting on an underlying infrastructure bedrock. The islands specialize in one or more domains, yet are not isolated 'atolls'. Every island is well connected to others, and island-hopping is strongly encouraged. Each island offers a standard set of features that can be extended by selecting services from several topical bundles. | ||
− | The | + | The infrastructure currently offers 4 main domain bundles that can be customized and/or enriched into flexible, purpose-built applications. Each application in the infrastructure is tightly integrated with the underlying gCube enabling software, and can access and re-purpose data from other applications. |
Through the enabling environment of [https://gcube.wiki.gcube-system.org/gcube/index.php/GCube_Wiki gCube], all users benefit from [[Catalogue:Infrastructure | Infrastructure Services]], but where to start? | Through the enabling environment of [https://gcube.wiki.gcube-system.org/gcube/index.php/GCube_Wiki gCube], all users benefit from [[Catalogue:Infrastructure | Infrastructure Services]], but where to start? | ||
− | For new users, | + | For new users, D4Science offers several domain-oriented solutions for 4 categories of users: data managers and analysts, biologists, spatial data managers, and policy oriented 'omnivores'. |
For each of these, a bundle of relevant [https://gcube.wiki.gcube-system.org/gcube/index.php/GCube_Wiki gCube software components] is available in a 'Cube'. | For each of these, a bundle of relevant [https://gcube.wiki.gcube-system.org/gcube/index.php/GCube_Wiki gCube software components] is available in a 'Cube'. | ||
This bundle can be limited to receive (and pay for) only those resources actually needed or consumed. | This bundle can be limited to receive (and pay for) only those resources actually needed or consumed. | ||
− | A bundle can also be extended with resources coming from other bundles; our aim is to offer bundles characterized by the domain tools | + | A bundle can also be extended with resources coming from other bundles; our aim is to offer bundles characterized by the domain tools and not by domain boundaries. |
− | In our experience, most experts rather manage their information in a bundle of domain specific software | + | In our experience, most experts rather manage their information in a bundle of domain specific software and are only consumers of data from other bundles. |
Thus, in most use scenarios, a user would be a data manager in a bundle, but only a consumer in another. | Thus, in most use scenarios, a user would be a data manager in a bundle, but only a consumer in another. | ||
− | The 4 key-applications that | + | The 4 key-applications that D4Science has delivered and continues to enrich are: |
{| | {| | ||
|- | |- | ||
|| [[File:BiolCube.png|100px]] | || [[File:BiolCube.png|100px]] | ||
− | || '''[[#BiolCube | BiolCube]]'''; focuses on the management and interpretation of | + | || '''[[#BiolCube | BiolCube]]'''; focuses on the management and interpretation of biodiversity data. |
|- | |- | ||
|| [[File:StatsCube.png|100px]] | || [[File:StatsCube.png|100px]] | ||
Line 34: | Line 34: | ||
The bundle approach, by itself an abstraction over a host of services, is expected to offer more 'flavors' in the near future. | The bundle approach, by itself an abstraction over a host of services, is expected to offer more 'flavors' in the near future. | ||
− | For instance a focused approach for infrastructure support for Mobile Apps is foreseen: | + | For instance, a focused approach for infrastructure support for Mobile Apps is foreseen: |
− | * '''[[#AppsCube | AppsCube]]'''; | + | * '''[[#AppsCube | AppsCube]]'''; offers an integrated approach to mobile app development. The infrastructure organizes the content and data-exchange with mobile apps, Please note that the App itself is not developed with D4Science, rather it relies on the infrastructure to maintain and manage the data collected with and exposed through this App. |
− | * '''[[#IceCube | IceCube]]'''; An Integrated Computing Environment | + | * '''[[#IceCube | IceCube]]'''; An Integrated Computing Environment offers access to infrastructure, cloud computing resources "as a Service". An instance of such Cube will offer users access to predefined data and algorithms that can be applied to these data. |
== BiolCube == | == BiolCube == | ||
− | '''BiolCube''' is available as a suite that packs many useful features in one environment | + | '''BiolCube''' is available as a suite that packs many useful features in one research environment where marine ecologists are offered a complete private work-space to manage species names and occurrence data, the main areas where '''BiolCube''' offers services: |
'''Taxonomic and occurrence data discovery and management''' | '''Taxonomic and occurrence data discovery and management''' | ||
− | * Occurrence data finder | + | * Occurrence data finder: Download public datasets from world-class biodiversity occurrence data repositories to your private environment where you can prepare datasets for use in further analysis with D4Science tools for data curation, filtering, merging and duplicate detection. Occurrence data can be directly visualized on maps using the geo-explorer, downloaded in several formats, and shared with / send to other environments. |
− | * Species name finder | + | * Species name finder: Not sure about a species name? Then D4Science offers tools to search, download and verify taxonomic and vernacular names of marine species. |
− | * Species name matcher | + | * Species name matcher: Correcting spelling mistakes or incomplete names can be very time-consuming. With D4Science tools you can validate the names of species names in your data to ensure they comply with the standard of your choice. D4Science offer powerful matching and reconciliation services, already in use at FAO, to identify close matches the names in your datasets. The infrastructure makes several key reference datasets available for consultation and reconciliation. These include the FAO ASFIS species list, FishBase for finfishes, and WoRMS the World Register of Marine Species. If you wish, you can add your own reference list. |
− | * Environmental enrichment of data | + | * Environmental enrichment of data: In a shared service with [[#GeosCube | GeosCube]], this service adds environmental information to occurrence data to improve their quality and usefulness in modelling and analytical exercises. The service allows obtaining an estimate of a range of dynamically computed environmental parameters such as water temperature, ocean color, salinity, aragonite content, or BOD. The services can identify the nearest observations in space and time and will return a computed average or nearest observation that can document an occurrence. The D4Science innovative tools allow to specify what the 'nearest' means; i.e. a distance, a distance over a gradient, a seasonal average, or a depth range. |
'''Modeling and analysis of distribution data''' | '''Modeling and analysis of distribution data''' | ||
− | * Biodiversity mapping tools | + | * Biodiversity mapping tools: The first D4Science species distribution and biodiversity mapping tool enabled the production of the well-known AquaMaps. With D4Science, the generation became faster, more robust, and results are shared in a collaborative environment. In addition to AquaMaps, many other biodiversity analytical and predictive tools are available. These include the toolset of OpenModeler and custom build Neural Network driven analytical services. |
− | * Species fact-sheets generator | + | * Species fact-sheets generator: With scientists spread over the globe, generating consistent information sheets on marine species is no sinecure. That is why the [https://i-marine.d4science.org/group/fishfindervre FishFinderVRE] was designed. It offers a complete templating and reporting work-flow operated by scientists, for scientists. The results, species fact-sheets, can be disseminated in a variety of formats, inp articular those established by FAO for its now famous species and regional catalogues, field guides, and the more recent pocket guides. |
− | * Trend-analysis of data | + | * Trend-analysis of data: In a shared service with [[#StatsCube | StatsCube]], Trendylyzer offers services to identify and visualize trends in time-series of data. Trendylyzer was developed to specifically address skewness and gaps in datasets. |
− | * Spatial analysis of data | + | * Spatial analysis of data: In a shared service with [[#StatsCube | StatsCube]], clustering, probability, and other spatial analytical features. |
− | BiolCube is an independent yet not isolated bundle of specialized services for | + | BiolCube is an independent yet not isolated bundle of specialized services for marine ecologists and natural aquatic resource managers. Well embedded in the D4Science e-Infrastructure, it provides access to auxiliary services that turn BiolCube in a multi-purpose toolbox for biodiversity data analysis. D4Science enables near-seamless access to powerful statistical analysis software through [[#StatsCube | StatsCube]], advanced plotting and geospatial data production through [[#GeosCube | GeosCube]]. |
With BiolCube and [[#StatsCube | StatsCube]] services combined, developers are now working to develop an integrated environment where species distribution can be studied in space and over time, with occurrence data analyzed using measured environmental observations, rather than estimated large scale average values. | With BiolCube and [[#StatsCube | StatsCube]] services combined, developers are now working to develop an integrated environment where species distribution can be studied in space and over time, with occurrence data analyzed using measured environmental observations, rather than estimated large scale average values. | ||
Line 69: | Line 69: | ||
== StatsCube == | == StatsCube == | ||
− | '''StatsCube''' offers a complete data suite to manage the entire data | + | '''StatsCube''' offers a complete data suite to manage the entire data cycle from collection to archiving. With D4Science technologies exciting new capabilities are added to the life-cycle management and analysis of especially time-series data. StatsCube is developed using state-of-the-art OpenSource components that are brought together in a managed infrastructure. This enables a very cost-effective offer to resource-poor institutes in need of sophisticated data services. Other benefits are the availability of shared services for reference data management, and harmonization of data repository services. |
StatsCube relies on continued support and ongoing development of a bundle of service. This bundle offers services that together support a complete life-cycle for statistical data, but can also connect to services offered through other bundles to establish a network of cross-domain services. | StatsCube relies on continued support and ongoing development of a bundle of service. This bundle offers services that together support a complete life-cycle for statistical data, but can also connect to services offered through other bundles to establish a network of cross-domain services. | ||
− | The StatsCube bundle offers a set of services available to VRE managers. They can select from this bundle to compose one or more VRE's, and decide who can access such services. This allows for a fine-grained approach to sometimes complex data-workflows, where data flow from detailed field level data through several aggregation and review stages until summary statistics can be produced. At each stage of such work-flow, other resources can be mobilized in support of specific activities such as geo-referencing, enrichment with environmental data, statistical | + | The StatsCube bundle offers a set of services available to VRE managers. They can select from this bundle to compose one or more VRE's, and decide who can access such services. This allows for a fine-grained approach to sometimes complex data-workflows, where data flow from detailed field level data through several aggregation and review stages until summary statistics can be produced. At each stage of such work-flow, other resources can be mobilized in support of specific activities such as geo-referencing, enrichment with environmental data, statistical modelling or analysis. With StatsCube, D4Science implements key data services: |
'''Data Work-flow''' | '''Data Work-flow''' | ||
− | If you need to manage data-flows, | + | If you need to manage data-flows, D4Science offers life-cycle support where data enter the system as observations or batch data, and can then be harmonized and validated before being added to a repository. Not only are data well described by metadata during this process, but also the processing steps are captured as process metadata. The entire process is under the control of a 'visor' that protect the data from unauthorized access and modifications. |
− | The harmonization can rely on powerful matching features that enable to establish matches between datasets that would be very time-consuming to establish manually. Just as one would expect in a work-flow, the matching results are kept for re-use and reference. The matching is usually performed against a (long) code list, that | + | The harmonization can rely on powerful matching features that enable to establish matches between datasets that would be very time-consuming to establish manually. Just as one would expect in a work-flow, the matching results are kept for re-use and reference. The matching is usually performed against a (long) code list, that is fully managed through the D4Science infrastructure. A specialized code list manager enables the ingestion (of existing SDMX code lists), creation, and maintenance of reference lists. |
'''Data Analysis''' | '''Data Analysis''' | ||
− | + | D4Science excels in offering advanced data analysis facilities to users. The clear separation of data and analytical resources makes it also easy to work with these analytical tools. The infrastructure stores the data, and no complicated steps are needed other than to select and filter the datasets and load these to the required analytical environment. For analysis, several environments are proposed, ranging from a bare-bone R-studio, parallelized R-servers, VRE-based analytical and predictive algorithms such as AquaMaps, to the Statistical manager, where users can integrate their own logic. This logic can exploit infrastructure computing resources, or interact with external Cloud or Hadoop clusters. With D4Science, the threshold for exploiting such resources is lowered considerably, making them accessible to a much wider, geographically dispersed EA-CoP. | |
− | Examples of analytical features implemented in | + | Examples of analytical features implemented in D4Science are: |
* Tools include R, WPS, Hadoop, WEKA data mining and access to Cloud resources; | * Tools include R, WPS, Hadoop, WEKA data mining and access to Cloud resources; | ||
* Algorithms in the statistical service include DBSCAN, Neurological Networks, Clustering, and trend analysis. | * Algorithms in the statistical service include DBSCAN, Neurological Networks, Clustering, and trend analysis. | ||
Line 87: | Line 87: | ||
'''Data reporting and visualization''' | '''Data reporting and visualization''' | ||
After a dataset has been added to the infrastructure, or once an analysis has been performed, the results are available in the same infrastructure to enrich reports, repositories or other infrastructure resources that can access them. | After a dataset has been added to the infrastructure, or once an analysis has been performed, the results are available in the same infrastructure to enrich reports, repositories or other infrastructure resources that can access them. | ||
− | + | Datasets are easily enriched and re-used in sometimes surprising new contexts. Some advanced facilities to work with statistical data are: | |
* geo-referencing time-series, and display these on maps; | * geo-referencing time-series, and display these on maps; | ||
* include time-series in reports; | * include time-series in reports; | ||
Line 100: | Line 100: | ||
**[https://gcube.wiki.gcube-system.org/gcube/index.php/TimeSeries TimeSeries Manager] | **[https://gcube.wiki.gcube-system.org/gcube/index.php/TimeSeries TimeSeries Manager] | ||
**[https://gcube.wiki.gcube-system.org/gcube/index.php/Codelist_Manager CodeList Manager] | **[https://gcube.wiki.gcube-system.org/gcube/index.php/Codelist_Manager CodeList Manager] | ||
− | * Data manipulation, mining and | + | * Data manipulation, mining and modelling |
** [https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Transformation_Service_Specification Data Transformation Service] | ** [https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Transformation_Service_Specification Data Transformation Service] | ||
** [https://gcube.wiki.gcube-system.org/gcube/index.php/Geospatial_Data_Processing WPS-Hadoop Service] | ** [https://gcube.wiki.gcube-system.org/gcube/index.php/Geospatial_Data_Processing WPS-Hadoop Service] | ||
Line 107: | Line 107: | ||
** [https://gcube.wiki.gcube-system.org/gcube/index.php/Ecological_Modeling Ecological Modeling] | ** [https://gcube.wiki.gcube-system.org/gcube/index.php/Ecological_Modeling Ecological Modeling] | ||
** [https://gcube.wiki.gcube-system.org/gcube/index.php/Signal_Processing Signal Processing] | ** [https://gcube.wiki.gcube-system.org/gcube/index.php/Signal_Processing Signal Processing] | ||
− | |||
− | |||
Examples of StatsCube implementations are | Examples of StatsCube implementations are | ||
Line 117: | Line 115: | ||
== GeosCube == | == GeosCube == | ||
− | '''GeosCube''' is the | + | '''GeosCube''' is the D4Science answer to the large and complex issue of understanding fisheries and biodiversity data in the spatial domain. Through GeosCube, spatial services are offered to consumers of the infrastructure, be they other D4Science tools or VRE's, or external organizations wishing to use D4Science web-services. |
− | Through GeosCube | + | Through GeosCube D4Science aims to offer an INSPIRE directive compliant bundle of services that will enable the generation and management of geospatial explicit data for practitioners who have no resources to develop and maintain their own spatial data infrastructure. From the onset of D4Science GeosCube was seen as a service provider to several business cases. The set of services, standards and protocols that together comprise the bundle rely on W*Ss, GeoNetwork, GeoServer, and THREDDS. In D4Science a catalogue is implemented using the CS-W protocol through a GeoNetwork service. The GeosCube bundles a range of OGC compliant resources that can be either made available in it's entirety or as a selection of services that can be mounted in a customized environment, such a VRE. |
These VRE's are vertically integrated, and horizontally interoperable. They rest on the gCube infrastructure, and are thus managed through a well-defined environment, while at the same time seamlessly benefit from data and processing resources made available through that infrastructure. | These VRE's are vertically integrated, and horizontally interoperable. They rest on the gCube infrastructure, and are thus managed through a well-defined environment, while at the same time seamlessly benefit from data and processing resources made available through that infrastructure. | ||
Line 148: | Line 146: | ||
* [https://wiki.i-marine.eu/index.php/Catalogue:Services#Data_Visualization Data Visualization] | * [https://wiki.i-marine.eu/index.php/Catalogue:Services#Data_Visualization Data Visualization] | ||
− | Example products that rely on services made available through this bundle in the | + | Example products that rely on services made available through this bundle in the D4Science infrastructure are: |
* AquaMaps; use this State-of-the-art suite to generate predictive species distribution maps; | * AquaMaps; use this State-of-the-art suite to generate predictive species distribution maps; | ||
* ICIS; Georeference Statistical datasets; | * ICIS; Georeference Statistical datasets; | ||
* Species Products Discovery species occurrence geospatial datasets disovery and sharing (KML / GML); | * Species Products Discovery species occurrence geospatial datasets disovery and sharing (KML / GML); | ||
− | * GeoExplorer; Vizualize species information, environmental | + | * GeoExplorer; Vizualize species information, environmental information, borders and competence areas and other geospatial explicit data. View details, select layers of information and share the results. |
== ConnectCube == | == ConnectCube == | ||
− | '''ConnectCube''' aims to deliver information to | + | '''ConnectCube''' aims to deliver information to policymakers from a variety of sources as an integrated view. These are generated using a variety of approaches, including semantic technologies. |
ConnectCube offers flexible sharing, storage, reporting, search and retrieval, aggregation and projection facilities. These are primarily offered as data-driven indicators and topical fact sheets. These facilities can only be effective if a modern toolset is available to enrich or annotate existing data with relevant information in the form of e.g. uri's. | ConnectCube offers flexible sharing, storage, reporting, search and retrieval, aggregation and projection facilities. These are primarily offered as data-driven indicators and topical fact sheets. These facilities can only be effective if a modern toolset is available to enrich or annotate existing data with relevant information in the form of e.g. uri's. | ||
− | ConnectCube includes several semantic technologies. One important objective is to identify and link equivalent concepts from different resources, in order to allow a harmonized search over datasets. The current semantic network includes entities and relationships from | + | ConnectCube includes several semantic technologies. One important objective is to identify and link equivalent concepts from different resources, in order to allow a harmonized search over datasets. The current semantic network includes entities and relationships from the domains of marine species, water areas, land areas, exclusive economic zones, and capture. It serves software applications in the domain of statistics, and GIS. The main information outlets are currently semantic factsheets. The content is also exposed via either SPARQL endpoints (suitable for semantic applications), or via JAVA API to be embedded in consumers' application code (one could also see the [http://wiki.i-marine.eu/index.php/Semantic_technologies_cluster Semantic Cluster technologies wiki page]). |
− | The use of | + | The use of infrastructure enables to focus on the needs of policy makers, that need to rely on dynamic reports, extracted near-real-time from data coming in from multiple directions, and with varying quality and accessibility policies attached to these data flows. |
− | * Organizational features of | + | * Organizational features of D4Science; Workspace, messaging, emailing, user management |
* Social tool; | * Social tool; | ||
* Semantic search and fact-sheets; | * Semantic search and fact-sheets; | ||
Line 175: | Line 173: | ||
* Smartfish; semantic factsheets on top of 3 data repositories; | * Smartfish; semantic factsheets on top of 3 data repositories; | ||
* FishFinder; factsheets of marine species enriched with semantic annotations. | * FishFinder; factsheets of marine species enriched with semantic annotations. | ||
+ | |||
+ | Some of the most indicative services for this bundle are: | ||
+ | * [https://gcube.wiki.gcube-system.org/gcube/index.php/X-Search X-Search] | ||
+ | * [https://gcube.wiki.gcube-system.org/gcube/index.php/Search_Planning_and_Execution_Specification Search Planning and Execution] | ||
+ | * [https://gcube.wiki.gcube-system.org/gcube/index.php/Data_Sources_Specification Data Sources Specification] | ||
Examples of products that already rely on services offered through this bundle are: | Examples of products that already rely on services offered through this bundle are: | ||
Line 183: | Line 186: | ||
== AppsCube == | == AppsCube == | ||
− | The rapidly growing use of mobile apps for data collection and dissemination requires that content and reference data are managed from an integrated data perspective. With ever more versatile and demanding apps, data often cannot be kept in one central repository that fits all sizes. Very often, apps mash-up data from e.g. | + | The rapidly growing use of mobile apps for data collection and dissemination requires that content and reference data are managed from an integrated data perspective. With ever more versatile and demanding apps, data often cannot be kept in one central repository that fits all sizes. Very often, apps mash-up data from e.g. geospatial and statistical data resources, or, when used in data collection, rely on constantly updated reference data, such as of names of species, vessel characteristics, or local reporting requirements. |
− | Modern apps | + | Modern apps D4Science an infrastructure that was designed specifically to deal with data discovery, access, and manipulation features in mind, and combines this with search and retrieval functionality over multiple resources. With the D4Science infrastructure, D4Science offers a very powerful backbone to mobile apps. |
− | In | + | In D4Science, mobile apps are considered as data clients for data managed through the infrastructure, which are exposed to the apps (or vice-versa) through web-services. Examples are map-display in the AppliFish mobile app, and the infrastructure search enabled in the search mobile app. The D4Science infrastructure can make data available to apps through reliable connectors and can offer services that collect and validate mobile application data. |
− | The first mobile applications in | + | The first mobile applications in D4Science that provide evidence of the suitability of the infrastructure are: |
− | * AppliFish; The FAO species fact sheets enriched with domain specific data (+4000 downloads!) | + | * AppliFish; The FAO species fact sheets enriched with domain-specific data (+4000 downloads!) |
* MobileSearch; | * MobileSearch; | ||
== IceCube == | == IceCube == | ||
− | A key benefit of | + | A key benefit of D4Science is the ease to set up scalable data processing solutions. A scalable solution may be needed because you have to manage any combination of a lot of users, a lot of data, a lot of processing, and a lot of new functionality. This requires expertise that is usually not found in one place. An infrastructure can offer more than one solution; offering a dedicated computing environment, parallelization, access to a grid or cloud environment, or outsourcing computations to external infrastructures are all options to consider. With D4Science expertise, you can ask for a technology solution, where several options can be discussed. |
− | The | + | The D4Science Integrated Computation Environment Bundle (ICE-Cube) aims to speed up not only the computational processes, but also the administrative and organizational process to select, tune, and test a new infrastructure. |
− | The services available on demand can be separated | + | The services available on-demand can be separated into several categories: |
* Manage administrative scalability | * Manage administrative scalability | ||
Line 214: | Line 217: | ||
* Manage Load scalability | * Manage Load scalability | ||
** If your computations take more time then expected, or are growing fast in number or size, more resources can be dynamically added; | ** If your computations take more time then expected, or are growing fast in number or size, more resources can be dynamically added; | ||
− | ** If your computation is complex or | + | ** If your computation is complex or unstable, D4Science can offer expertise from trained computer experts to analyze the code and propose alternative solutions. |
* Manage geographic scalability | * Manage geographic scalability |
Latest revision as of 18:59, 3 December 2020
|
The D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure combines the functionality of more than 500 components into a coherent and centrally managed infrastructure of hardware, software, and data resources. Together, these offer a platform that can host a variety of applications. These applications share a common theme; Provide a service to a Community of PracticeA term coined to capture an "activity system" that includes individuals who are united in action and in the meaning that "action" has for them and for the larger collective. The communities of practice are "virtual", ''i.e.'', they are not formal structures, such as departments or project teams. Instead, these communities exist in the minds of their members, are glued together by the connections they have with each other, as well as by their specific shared problems or areas of interest. The generation of knowledge in communities of practice occurs when people participate in problem solving and share the knowledge necessary to solve the problems.. Other than other infrastructures that boast size, power, performance, or the latest technology, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. puts the community first. This does not imply we make concessions on quality or performance, but we see it as our mission to offer quality and performance to communities that have no resources of their own to jump high hurdles.
The infrastructure resembles an archipelago where applications emerge as islands of services, resting on an underlying infrastructure bedrock. The islands specialize in one or more domains, yet are not isolated 'atolls'. Every island is well connected to others, and island-hopping is strongly encouraged. Each island offers a standard set of features that can be extended by selecting services from several topical bundles.
The infrastructure currently offers 4 main domain bundles that can be customized and/or enriched into flexible, purpose-built applications. Each application in the infrastructure is tightly integrated with the underlying gCube enabling software, and can access and re-purpose data from other applications.
Through the enabling environment of gCube, all users benefit from Infrastructure Services, but where to start? For new users, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. offers several domain-oriented solutions for 4 categories of users: data managers and analysts, biologists, spatial data managers, and policy oriented 'omnivores'. For each of these, a bundle of relevant gCube software components is available in a 'Cube'. This bundle can be limited to receive (and pay for) only those resources actually needed or consumed. A bundle can also be extended with resources coming from other bundles; our aim is to offer bundles characterized by the domain tools and not by domain boundaries. In our experience, most experts rather manage their information in a bundle of domain specific software and are only consumers of data from other bundles. Thus, in most use scenarios, a user would be a data manager in a bundle, but only a consumer in another.
The 4 key-applications that D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. has delivered and continues to enrich are:
BiolCube; focuses on the management and interpretation of biodiversity data. | |
StatsCube; a complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools. | |
GeosCube; tightly connected to the BiolCube, the framework, based on OGC compliant tools and services manage the storage and interpretation of geospatial explicit information, including WPS processing. | |
ConnectCube; brings semantic technologies for publishing structured data so that it can be interlinked and become more useful to end-users, enabling them to produce LOD, to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried. |
The bundle approach, by itself an abstraction over a host of services, is expected to offer more 'flavors' in the near future. For instance, a focused approach for infrastructure support for Mobile Apps is foreseen:
- AppsCube; offers an integrated approach to mobile app development. The infrastructure organizes the content and data-exchange with mobile apps, Please note that the App itself is not developed with D4ScienceAn e-Infrastructure operated by the D4Science.org initiative., rather it relies on the infrastructure to maintain and manage the data collected with and exposed through this App.
- IceCube; An Integrated Computing Environment offers access to infrastructure, cloud computing resources "as a Service". An instance of such Cube will offer users access to predefined data and algorithms that can be applied to these data.
BiolCube
BiolCube is available as a suite that packs many useful features in one research environment where marine ecologists are offered a complete private work-space to manage species names and occurrence data, the main areas where BiolCube offers services:
Taxonomic and occurrence data discovery and management
- Occurrence data finder: Download public datasets from world-class biodiversity occurrence data repositories to your private environment where you can prepare datasets for use in further analysis with D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. tools for data curation, filtering, merging and duplicate detection. Occurrence data can be directly visualized on maps using the geo-explorer, downloaded in several formats, and shared with / send to other environments.
- Species name finder: Not sure about a species name? Then D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. offers tools to search, download and verify taxonomic and vernacular names of marine species.
- Species name matcher: Correcting spelling mistakes or incomplete names can be very time-consuming. With D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. tools you can validate the names of species names in your data to ensure they comply with the standard of your choice. D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. offer powerful matching and reconciliation services, already in use at FAO, to identify close matches the names in your datasets. The infrastructure makes several key reference datasets available for consultation and reconciliation. These include the FAO ASFIS species list, FishBase for finfishes, and WoRMS the World Register of Marine Species. If you wish, you can add your own reference list.
- Environmental enrichment of data: In a shared service with GeosCube, this service adds environmental information to occurrence data to improve their quality and usefulness in modelling and analytical exercises. The service allows obtaining an estimate of a range of dynamically computed environmental parameters such as water temperature, ocean color, salinity, aragonite content, or BOD. The services can identify the nearest observations in space and time and will return a computed average or nearest observation that can document an occurrence. The D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. innovative tools allow to specify what the 'nearest' means; i.e. a distance, a distance over a gradient, a seasonal average, or a depth range.
Modeling and analysis of distribution data
- Biodiversity mapping tools: The first D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. species distribution and biodiversity mapping tool enabled the production of the well-known AquaMaps. With D4ScienceAn e-Infrastructure operated by the D4Science.org initiative., the generation became faster, more robust, and results are shared in a collaborative environment. In addition to AquaMaps, many other biodiversity analytical and predictive tools are available. These include the toolset of OpenModeler and custom build Neural Network driven analytical services.
- Species fact-sheets generator: With scientists spread over the globe, generating consistent information sheets on marine species is no sinecure. That is why the FishFinderVRE was designed. It offers a complete templating and reporting work-flow operated by scientists, for scientists. The results, species fact-sheets, can be disseminated in a variety of formats, inp articular those established by FAO for its now famous species and regional catalogues, field guides, and the more recent pocket guides.
- Trend-analysis of data: In a shared service with StatsCube, Trendylyzer offers services to identify and visualize trends in time-series of data. Trendylyzer was developed to specifically address skewness and gaps in datasets.
- Spatial analysis of data: In a shared service with StatsCube, clustering, probability, and other spatial analytical features.
BiolCube is an independent yet not isolated bundle of specialized services for marine ecologists and natural aquatic resource managers. Well embedded in the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large., it provides access to auxiliary services that turn BiolCube in a multi-purpose toolbox for biodiversity data analysis. D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. enables near-seamless access to powerful statistical analysis software through StatsCube, advanced plotting and geospatial data production through GeosCube.
With BiolCube and StatsCube services combined, developers are now working to develop an integrated environment where species distribution can be studied in space and over time, with occurrence data analyzed using measured environmental observations, rather than estimated large scale average values.
The services that are most characteristic of this bundle are:
- Species Product Discovery service
- Occurrence Data Reconciliation
- Occurrence Data Enrichment Service
- Taxon Names Reconciliation Service
If you wish to learn more about using BiolCube or specific services, please contact us.
StatsCube
StatsCube offers a complete data suite to manage the entire data cycle from collection to archiving. With D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. technologies exciting new capabilities are added to the life-cycle management and analysis of especially time-series data. StatsCube is developed using state-of-the-art OpenSource components that are brought together in a managed infrastructure. This enables a very cost-effective offer to resource-poor institutes in need of sophisticated data services. Other benefits are the availability of shared services for reference data management, and harmonization of data repository services.
StatsCube relies on continued support and ongoing development of a bundle of service. This bundle offers services that together support a complete life-cycle for statistical data, but can also connect to services offered through other bundles to establish a network of cross-domain services.
The StatsCube bundle offers a set of services available to VREVirtual Research Environment. managers. They can select from this bundle to compose one or more VREVirtual Research Environment.'s, and decide who can access such services. This allows for a fine-grained approach to sometimes complex data-workflows, where data flow from detailed field level data through several aggregation and review stages until summary statistics can be produced. At each stage of such work-flow, other resources can be mobilized in support of specific activities such as geo-referencing, enrichment with environmental data, statistical modelling or analysis. With StatsCube, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. implements key data services:
Data Work-flow If you need to manage data-flows, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. offers life-cycle support where data enter the system as observations or batch data, and can then be harmonized and validated before being added to a repository. Not only are data well described by metadata during this process, but also the processing steps are captured as process metadata. The entire process is under the control of a 'visor' that protect the data from unauthorized access and modifications. The harmonization can rely on powerful matching features that enable to establish matches between datasets that would be very time-consuming to establish manually. Just as one would expect in a work-flow, the matching results are kept for re-use and reference. The matching is usually performed against a (long) code list, that is fully managed through the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure. A specialized code list manager enables the ingestion (of existing SDMX code lists), creation, and maintenance of reference lists.
Data Analysis D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. excels in offering advanced data analysis facilities to users. The clear separation of data and analytical resources makes it also easy to work with these analytical tools. The infrastructure stores the data, and no complicated steps are needed other than to select and filter the datasets and load these to the required analytical environment. For analysis, several environments are proposed, ranging from a bare-bone R-studio, parallelized R-servers, VREVirtual Research Environment.-based analytical and predictive algorithms such as AquaMaps, to the Statistical manager, where users can integrate their own logic. This logic can exploit infrastructure computing resources, or interact with external Cloud or Hadoop clusters. With D4ScienceAn e-Infrastructure operated by the D4Science.org initiative., the threshold for exploiting such resources is lowered considerably, making them accessible to a much wider, geographically dispersed EA-CoPCommunity of Practice.. Examples of analytical features implemented in D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. are:
- Tools include R, WPS, Hadoop, WEKA data mining and access to Cloud resources;
- Algorithms in the statistical service include DBSCAN, Neurological Networks, Clustering, and trend analysis.
Data reporting and visualization After a dataset has been added to the infrastructure, or once an analysis has been performed, the results are available in the same infrastructure to enrich reports, repositories or other infrastructure resources that can access them. Datasets are easily enriched and re-used in sometimes surprising new contexts. Some advanced facilities to work with statistical data are:
- geo-referencing time-series, and display these on maps;
- include time-series in reports;
- data-graphs;
- infrastructure services for download, sharing and sending datasets.
A few key services of this bundle are:
- Tabular Data
- Time Series
- Data manipulation, mining and modelling
Examples of StatsCube implementations are
- ICIS; a complete solution for the collection and dissemination of fisheries capture data.
- Tuna Atlas; a focused ICIS implementation, with extended mapping capabilities provided through GeosCube.
- TimeSeries Environment; An open free-to-use private solution of ICIS.
- Trendylyzer; A trend-analysis toolkit for time series that have evolved over time, and have incorporated inconsistencies, gaps, and discrepancies. Trendylyzer employs a range of mining and manipulation techniques to first prepare a harmonized data-set, and then discover trends, if the data allow.
GeosCube
GeosCube is the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. answer to the large and complex issue of understanding fisheries and biodiversity data in the spatial domain. Through GeosCube, spatial services are offered to consumers of the infrastructure, be they other D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. tools or VREVirtual Research Environment.'s, or external organizations wishing to use D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. web-services.
Through GeosCube D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. aims to offer an INSPIRE directive compliant bundle of services that will enable the generation and management of geospatial explicit data for practitioners who have no resources to develop and maintain their own spatial data infrastructure. From the onset of D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. GeosCube was seen as a service provider to several business cases. The set of services, standards and protocols that together comprise the bundle rely on W*Ss, GeoNetwork, GeoServer, and THREDDS. In D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. a catalogue is implemented using the CS-W protocol through a GeoNetwork service. The GeosCube bundles a range of OGC compliant resources that can be either made available in it's entirety or as a selection of services that can be mounted in a customized environment, such a VREVirtual Research Environment.. These VREVirtual Research Environment.'s are vertically integrated, and horizontally interoperable. They rest on the gCube infrastructure, and are thus managed through a well-defined environment, while at the same time seamlessly benefit from data and processing resources made available through that infrastructure.
GeosCube bridges the gap between powerful infrastructure-based geospatial tools and data, and lightweight web map solutions with limited processing capacity. It thus enables the use of these powerful tools for resource limited users and organizations.
GeosCube bundles the tools to:
- Upload large datasets and overlay them up with thousands of other layers;
- Share edit or view access with small or large groups;
- Export data to standard formats;
- Make use of powerful online geospatial tools;
- Predictive mapping using world-class algorithms such as AquaMaps;
- Analytical features such as clustering and trend-analysis with the custom build statistical manager;
- Legacy applications for e.g. interpolation and map comparison using WPS/Hadoop;
- Use our DIY approach to convert and host your application;
- Georeference statistical data, occurrence data, fact-sheets, and documents online;
- Publish one’s data to the world or to just a few collaborators.
GeosCube is constantly being enriched with features. We are working hard on:
- Annotation and commenting on maps;
- Create and edit maps and link map features to rich media content including LOD;
- Validation of geospatial explicit data such as names, location, and movements;
- Interpolate environmental data sets to add information to occurrence data;
- Mobile client;
- Field-data collection.
Interested users can select services from this bundle described in detail here:
Example products that rely on services made available through this bundle in the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure are:
- AquaMaps; use this State-of-the-art suite to generate predictive species distribution maps;
- ICIS; Georeference Statistical datasets;
- Species Products Discovery species occurrence geospatial datasets disovery and sharing (KML / GML);
- GeoExplorer; Vizualize species information, environmental information, borders and competence areas and other geospatial explicit data. View details, select layers of information and share the results.
ConnectCube
ConnectCube aims to deliver information to policymakers from a variety of sources as an integrated view. These are generated using a variety of approaches, including semantic technologies.
ConnectCube offers flexible sharing, storage, reporting, search and retrieval, aggregation and projection facilities. These are primarily offered as data-driven indicators and topical fact sheets. These facilities can only be effective if a modern toolset is available to enrich or annotate existing data with relevant information in the form of e.g. uri's.
ConnectCube includes several semantic technologies. One important objective is to identify and link equivalent concepts from different resources, in order to allow a harmonized search over datasets. The current semantic network includes entities and relationships from the domains of marine species, water areas, land areas, exclusive economic zones, and capture. It serves software applications in the domain of statistics, and GIS. The main information outlets are currently semantic factsheets. The content is also exposed via either SPARQL endpoints (suitable for semantic applications), or via JAVA API to be embedded in consumers' application code (one could also see the Semantic Cluster technologies wiki page).
The use of infrastructure enables to focus on the needs of policy makers, that need to rely on dynamic reports, extracted near-real-time from data coming in from multiple directions, and with varying quality and accessibility policies attached to these data flows.
- Organizational features of D4ScienceAn e-Infrastructure operated by the D4Science.org initiative.; Workspace, messaging, emailing, user management
- Social tool;
- Semantic search and fact-sheets;
- Ontology engineering and use, especially in the fisheries domain;
- Linked Open Data engineering and maintenance;
- Plugins for remote information (OAI, OpenSearch)
Expected products that use semantic services from the ConnectCube bundle are:
- Ecoscope; semantic fact sheets for tuna fisheries;
- Smartfish; semantic factsheets on top of 3 data repositories;
- FishFinder; factsheets of marine species enriched with semantic annotations.
Some of the most indicative services for this bundle are:
Examples of products that already rely on services offered through this bundle are:
- The reporting VREVirtual Research Environment.'s FCPPS and FishFinderVRE;
- The iSearch VRE;
- All VREVirtual Research Environment.'s equipped with the social tools and workspace.
AppsCube
The rapidly growing use of mobile apps for data collection and dissemination requires that content and reference data are managed from an integrated data perspective. With ever more versatile and demanding apps, data often cannot be kept in one central repository that fits all sizes. Very often, apps mash-up data from e.g. geospatial and statistical data resources, or, when used in data collection, rely on constantly updated reference data, such as of names of species, vessel characteristics, or local reporting requirements.
Modern apps D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. an infrastructure that was designed specifically to deal with data discovery, access, and manipulation features in mind, and combines this with search and retrieval functionality over multiple resources. With the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. offers a very powerful backbone to mobile apps.
In D4ScienceAn e-Infrastructure operated by the D4Science.org initiative., mobile apps are considered as data clients for data managed through the infrastructure, which are exposed to the apps (or vice-versa) through web-services. Examples are map-display in the AppliFish mobile app, and the infrastructure search enabled in the search mobile app. The D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure can make data available to apps through reliable connectors and can offer services that collect and validate mobile application data.
The first mobile applications in D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. that provide evidence of the suitability of the infrastructure are:
- AppliFish; The FAO species fact sheets enriched with domain-specific data (+4000 downloads!)
- MobileSearch;
IceCube
A key benefit of D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. is the ease to set up scalable data processing solutions. A scalable solution may be needed because you have to manage any combination of a lot of users, a lot of data, a lot of processing, and a lot of new functionality. This requires expertise that is usually not found in one place. An infrastructure can offer more than one solution; offering a dedicated computing environment, parallelization, access to a grid or cloud environment, or outsourcing computations to external infrastructures are all options to consider. With D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. expertise, you can ask for a technology solution, where several options can be discussed.
The D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Integrated Computation Environment Bundle (ICE-Cube) aims to speed up not only the computational processes, but also the administrative and organizational process to select, tune, and test a new infrastructure.
The services available on-demand can be separated into several categories:
- Manage administrative scalability
- Manage users;
- Manage virtual Organizations.
- Manage Functionality
- Manage data in a pre-processing environment;
- Select the processes you wish to apply to your data;
- Perform the computation and monitor progress, intervene if needed;
- Share the results, or use in another process in the same infrastructure, eliminating the need to transfer data;
- Keep a trail of the applied processes with the data results, boosting reproducibility and credibility.
- Manage Load scalability
- If your computations take more time then expected, or are growing fast in number or size, more resources can be dynamically added;
- If your computation is complex or unstable, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. can offer expertise from trained computer experts to analyze the code and propose alternative solutions.
- Manage geographic scalability
- Keep your data and processes together to ensure confidentiality;
- Bring your computation to your data to reduce bandwidth use.
ICE-Cube is available and ready to be further exploited.