Difference between revisions of "Catalogue:Applications"
(→BiolCube) |
(→StatsCube) |
||
Line 61: | Line 61: | ||
'''Data Work-flow''' | '''Data Work-flow''' | ||
− | If you need to manage data-flows, iMarine offers a life-cycle support where data enter the system as observations or batch data, and can then be | + | If you need to manage data-flows, iMarine offers a life-cycle support where data enter the system as observations or batch data, and can then be harmonized and validated before being added to a repository. Not only are data well described by metadata during this process, but also the processing steps are captured as process metadata. The entire process is under the control of a 'visor' that protect the data from unauthorized access and modifications. |
− | The harmonization can rely on powerful matching features that enable to establish matches between datasets that would be very time-consuming to establish manually. Just as one would expect in a work-flow, the matching results are kept for re-use and reference. | + | The harmonization can rely on powerful matching features that enable to establish matches between datasets that would be very time-consuming to establish manually. Just as one would expect in a work-flow, the matching results are kept for re-use and reference. The matching is usually performed against a (long) code list, that are fully managed through the iMarine infrastructure. A specialized code list manager enables the ingestion (of existing SDMX code lists), creation, and maintenance of reference lists. |
'''Data Analysis''' | '''Data Analysis''' | ||
Line 77: | Line 77: | ||
* data-graphs; | * data-graphs; | ||
* infrastructure services for download, sharing and sending datasets. | * infrastructure services for download, sharing and sending datasets. | ||
+ | |||
+ | A few key services of this bundle are: | ||
+ | * Tabular Data | ||
+ | ** [https://gcube.wiki.gcube-system.org/gcube/index.php/Tabular_Data_Flow_Manager Tabular Data Flow Manager] | ||
+ | ** [https://gcube.wiki.gcube-system.org/gcube/index.php/Tabular_Data_Manager Tabular Data Manager] | ||
+ | * Time Series | ||
+ | **[https://gcube.wiki.gcube-system.org/gcube/index.php/TimeSeries TimeSeries Manager] | ||
+ | **[https://gcube.wiki.gcube-system.org/gcube/index.php/Codelist_Manager CodeList Manager] | ||
For more information on getting started with and using StatsCube, the iMarine website offers many resources. You can also register to the [https://i-marine.d4science.org/ iMarine gateway] to experience some of the components. | For more information on getting started with and using StatsCube, the iMarine website offers many resources. You can also register to the [https://i-marine.d4science.org/ iMarine gateway] to experience some of the components. |
Revision as of 14:49, 26 July 2013
The D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. / iMarine infrastructure combines the functionality of more than 500 components into a coherent and centrally managed infrastructure of hardware, software, and data resources. Together, these offer a platform that can host a variety of applications. These applications share a common theme; Provide a service to a Community of PracticeA term coined to capture an "activity system" that includes individuals who are united in action and in the meaning that "action" has for them and for the larger collective. The communities of practice are "virtual", ''i.e.'', they are not formal structures, such as departments or project teams. Instead, these communities exist in the minds of their members, are glued together by the connections they have with each other, as well as by their specific shared problems or areas of interest. The generation of knowledge in communities of practice occurs when people participate in problem solving and share the knowledge necessary to solve the problems.. Other than other infrastructures that boast size, power, performance, or latest technology, D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. puts the community first. In the context of iMarine, this is taken even further, quite literally, as the Ecosystem Approach Community of PracticeA term coined to capture an "activity system" that includes individuals who are united in action and in the meaning that "action" has for them and for the larger collective. The communities of practice are "virtual", ''i.e.'', they are not formal structures, such as departments or project teams. Instead, these communities exist in the minds of their members, are glued together by the connections they have with each other, as well as by their specific shared problems or areas of interest. The generation of knowledge in communities of practice occurs when people participate in problem solving and share the knowledge necessary to solve the problems. is spread around the globe. No other infrastructure equals iMarine in developing support the real-life scenarios overcoming 'low' hurdles; low resources, low training, low connectivity, low data quality. We are glad to leave the high hurdles to specialists, we rather serve communities that work to achieve the UN Millennium Development Goals. This does not imply we make concessions on quality or performance, but we see it as our mission to offer quality and performance to communities that have no resources of their own to jump the hurdles.
The infrastructure resembles an archipelago where applications emerge as islands of services, resting on an underlying infrastructure bedrock. The islands specialize in one or more domains, yet are not isolated 'atolls'. Every island is well connected to others, and island-hopping is strongly encouraged. Each island offers a standard set of features that can be extended by selecting services from several topical bundles.
The iMarine infrastructure currently offers 4 main domain bundles that can be customized and / or enriched into flexible, purpose-built applications. Each application in the infrastructure is tightly integrated with the underlying gCube enabling software, and can access and re-purpose data from other iMarine applications.
FYI Examples of other offers
Through the enabling environment of gCube, all users benefit from Infrastructure Services,
The 4 key-applications that iMarine has delivered and continues to enrich are:
- BiolCube; focuses on the management and interpretation of biological and ecological data in the environment.
- StatsCube; a complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools.
- GeosCube; tightly connected to the BiolCube, the framework, based on OGC compliant tools and services manage the storage and interpretation of geospatial explicit information, including WPS processing.
- PoliCube; brings semantic technologies for publishing structured data so that it can be interlinked and become more useful to end-users, enabling them to produce LOD, to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.
The bundle approach, by itself an abstraction over a host of services, is expected to offer more 'flavors' in the near future. For instance a focused approach on organizing access to computing resources, or a support infrastructure for Mobile Apps are foreseen:
- IceCube; will offer access to infrastructure, cloud and grid based computing resources for dummies.
- AppsCube; will offer an integrated approach to mobile app development. The infrastructure organizes the content and data-exchange with mobile apps, Please note that the App itself is not developed with iMarine, rather it relies on the infrastructure to maintain and manage the data collected with and exposed through this App.
BiolCube
BiolCube is available as a suite that packs many useful features in one environment to have a complete work-space for biologists working with occurrence data and reviewing species names. It offers services in two main areas; taxonomic and occurrence data discovery and management, and modeling and analysis of distribution data:
Taxonomic and occurrence data discovery and management
- Occurrence data finder; download public datasets from world class biodiversity repositories to your private environment. In this private environment you can prepare data sets for use in further analysis with iMarine tools for data sanitation, filtering, merging and duplicate detection. Occurrence data can be directly visualized on maps using the geo-explorer, downloaded in several formats, and shared with / send to other environments.
- Species name finder; not sure about a species name? Then iMarine offers tools to search, download and verify taxonomic and vernacular names of marine species.
- Species name matcher; correcting spelling mistakes or incomplete names can be very time-consuming. With iMarine tools you can validate the names of species names in your data to ensure they comply with the standard of your choice. iMarine offer powerful matching and reconciliation services, already in use at FAO, to identify close matches the names in your datasets. The infrastructure makes several key reference datasets available for consultation and reconciliation. These include the FAO ASFIS species list, and WoRMS register of marine species. If you wish, you can add your own reference list.
- Environmental enrichment of data. In a shared service with GeosCube, this service adds environmental information to occurrence data to improve their quality and usefulness in modeling and analytical exercises. The service allows to obtain an estimate of a range of dynamically computed environmental parameters such as water temperature, ocean color, salinity, argonite, or BOD. The services can identify the nearest observations in space and time, and will return a computed average or nearest observation that can be added to an observation. The iMarine innovative tools allow to specify what the 'nearest' means; i.e. a distance, a distance over a gradient, a seasonal average, or a depth range.
Modeling and analysis of distribution data
- Biodiversity mapping tools. The first iMarine species distribution and biodiversity mapping tools enabled the production of the well-known AquaMaps. With iMarine, the generation became faster, more robust, and results are shared in a collaborative environment. In addition to AquaMaps, many other biodiversity analytical and predictive tools are available. These include the toolset of OpenModeler and custom build Neural Network driven analytical services.
- Species fact-sheets generator. With scientist spread over the globe, generating consistent information sheets on marine species is no sinecure. That is why the FishFinderVRE was designed. It offers a complete templating and reporting work-flow operated by scientists, for scientists. The results, species fact-sheets, can be disseminated in a variety of formats.
- Trend-analysis of data. In a shared service with StatsCube, Trendylyzer offers services to identify and vizualize trends in time-series of data. Trendylyzer was developed to specifically address skewness and gaps in datasets.
- Spatial analysis of data. In a shared service with StatsCube, clustering, probability, and other spatial analytical features.
BiolCube is an independent yet not isolated bundle of specialized services for biologist. Well embedded in the iMarine e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large., it provides access to auxiliary services that turn BiolCube in a multi-purpose toolbox for biodiversity data analysis. iMarine enables a near-seamless access to powerful statistical analysis software through StatsCube, advanced plotting and geospatial data production through GeosCube.
With BiolCube and StatsCube services combined, developers are now working to develop an integrated environment where species distribution can be studied in space and over time, with occurrence data analyzed using measured environmental observations, rather than estimated large scale average values.
The services that are most characteristic of this bundle are:
- Species Product Discovery service
- Occurrence Data Reconciliation
- Occurrence Data Enrichment Service
- Taxon Names Reconciliation Service
If you wish to learn more about using BiolCube or specific services, please contact us.
StatsCube
Example 'competitor' GSIM
StatsCube offers a complete data suite to manage the entire data-cycle from collection to archiving. With iMarine technologies exiting new capabilities are added to the life-cycle management of especially time-series data. StatsCube is developed using state-of-the-art OpenSource components that are brought together in a managed infrastructure. This enables a very cost-effective offer to resource poor institutes in need of sophisticated data services. Other benefits are the availability of shared services for reference data management, and harmonization of data repository services.
StatsCube relies on continued support and ongoing development of a bundle of service. This bundle offers services that together support a complete life-cycle for statistical data, but can also connect to services offered through other bundles to establish a network of cross-domain services.
The StatsCube bundle offers makes a set of services available to VREVirtual Research Environment. managers. They can select from this bundle to compose one or more VREVirtual Research Environment.'s, and decide who can access such services. This allows for a fine-grained approach to sometimes complex data-workflows, where data flow from detailed field level data through several aggregation and review stages until an summary statistics can be produced. At each stage of such work-flow, other resources can be mobilized in support of specific activities such as geo-referencing, enrichment with environmental data, statistical modeling or analysis. With Statscube, iMarine implements key data services:
Data Work-flow If you need to manage data-flows, iMarine offers a life-cycle support where data enter the system as observations or batch data, and can then be harmonized and validated before being added to a repository. Not only are data well described by metadata during this process, but also the processing steps are captured as process metadata. The entire process is under the control of a 'visor' that protect the data from unauthorized access and modifications. The harmonization can rely on powerful matching features that enable to establish matches between datasets that would be very time-consuming to establish manually. Just as one would expect in a work-flow, the matching results are kept for re-use and reference. The matching is usually performed against a (long) code list, that are fully managed through the iMarine infrastructure. A specialized code list manager enables the ingestion (of existing SDMX code lists), creation, and maintenance of reference lists.
Data Analysis iMarine excels in offering advanced data analysis facilities to users. The clear separation of data and analytical resources makes it also easy work with these analytical tools. The infrastructure stores the data, and no complicated steps are needed other than to select and filter the dataset, and load these to the required analytical environment. For analysis, several environments are proposed, ranging from a bare-bone R-studio, parallelized R-servers, VREVirtual Research Environment.-based analytical and predictive algorithms such as AquaMaps, to the Statistical manager, where users can integrate their own logic. This logic can exploit infrastructure computing resources, or interact with external Cloud or Hadoop clusters. With iMarine, the threshold for exploiting such resources is lowered considerably, making them accessible to a much wider, geographically dispersed EA-CoPCommunity of Practice.. Examples of analytical features implemented in iMarine are:
- Tools include R, WPS, Hadoop, WEKA data mining and access to Cloud resources;
- Algorithms in the statistical service include DBSCAN, Neurological Networks, Clustering, and trend analysis.
Data reporting and visualization After a dataset has been added to the infrastructure, or once an analysis has been performed, the results are available in the same infrastructure to enrich reports, repositories or other infrastructure resources that can access them. Dataset in iMarine are easily enriched and re-used in sometimes surprising new contexts. Some advanced facilities to work with statistical data are:
- geo-referencing time-series, and display these on maps;
- include time-series in reports;
- data-graphs;
- infrastructure services for download, sharing and sending datasets.
A few key services of this bundle are:
- Tabular Data
- Time Series
For more information on getting started with and using StatsCube, the iMarine website offers many resources. You can also register to the iMarine gateway to experience some of the components.
Examples of StatsCube implementations are
- ICIS; a complete solution for the collection and dissemination of fisheries capture data.
- Tuna Atlas; a focused ICIS implementation, with extended mapping capabilities provided through GeosCube.
- TimeSeries Environment; An open free-to-use private solution of ICIS.
GeosCube
GeosCube is the iMarine answer to the large and complex issue of understanding fisheries and biodiversity data in the spatial domain. Through GeosCube, spatial services are offered to consumers of the iMarine infrastructure, be they other iMarine tools or VREVirtual Research Environment.'s, or external organizations wishing to use iMarine's web-services.
Through GeosCube iMarine aims to offer an INSPIRE directive compliant bundle of services that will enable the generation and management of geospatial explicit data for practioners who have no resources to develop and maintain their own spatial data infrastructure. From the onset of iMarine GeosCube was seen as a service provider to several business cases, and not as a complete Spatial Data Infrastructure. The set of services, standards and protocols that together comprise the bundle rely on W*Ss, GeoNetwork, GeoServer, and THREDDS. In iMarine a catalogue is implemented using the CSW protocol through a GeoNetwork. The GeosCube bundles a range of OGC compliant resources that can be either made available in it's entirety, or as a selection of services that can be mounted in a customized environment, such a VREVirtual Research Environment.. These VREVirtual Research Environment.'s are vertically integrated, and horizontally interoperable. They rest on the gCube infrastructure, and are thus managed through a well-defined environment, while at the same time seamlessly benefit from data and processing resources made available through that infrastructure.
Through this bundle several tools are available:
- Data discovery, access, and vizualization
GeosCube services are made available through iMarine portlets, VREVirtual Research Environment.'s, remote services and OGC compliant tools for discovery and access. These can be either accessed as individual components or services (see the detailed descriptions here), or pre-configured in a bundle that supports a range of services. Some examples of such bundles include:
- GeoExplorer;
- GisViewer;
- Analyze geosatial information.
Compare maps, download and share. Enrich datasets emanating from other bundles with env. info
- Publish ad share
Smaller components that leverage a specific task at infrastructure level are:
Example products that rely on services made available through this bundle in the iMarine infrastructure are:
- Species distribution map-products,
- Species occurrence geospatial datasets (KML / GML)
PoliCube
The primary aim of PoliCube is to deliver information to policy makers from a variety of sources as an integrated view generated using a variety of approaches, including semantic technologies.
PoliCube offers flexible reporting, search and retrieval, aggregation and projection facilities. These are primarily offered as data-driven indicators and topical fact sheets. These facilities can only be effective if a modern toolset is available to enrich or annotate existing data with relevant information in the form of e.g. uri's. This
The use of an infrastructure enables to focus on the needs of policy makers, that need to rely on dynamic reports, extracted near-real time from data coming in from multiple directions, and with varying quality and accessiblity policies attached to these data flows.
- Organizational features of iMarine; Workspace, messaging, mailing, user management
- Social tool
- Semantic search and factsheets
- Plugins for remote information (OAI, OpenSearch)
IceCube
A key benefit of iMarine is the ease to set up scalable data processing solutions. A scalable solution may be needed because you have to manage any combination of a lot of users, a lot of data, a lot of processing, and a lot of new functionality. This requires expertise that is usually not found in one place. An infrastructure can offer more than one solution; offering a dedicated computing environment, parallelization, access to a grid or cloud environment, or outsourcing computations to external infrastructures are all options to consider. With iMarine expertise, you can ask for a technology solution, where several options can be discussed. the services available on demand can be separeated in several categories:
- Manage administrative scalability
- Manage users
- Manage virtual Organizations
- Manage Functionality
- Manage Load scalability
- Manage geographic scalability
- Keep your data an processes together to reduce
- Bring your computation to your data to reduce bandwith use
AppsCube
The quickly growing use of mobile apps for data collection and dissemination requires that content and reference data are managed from an integrated data perspective. With ever more versatile and demanding apps, data often cannot be kept in one central repository that fits all sizes. Very often, apps mash up data from e.g. geospatial and statistical data resources, or when collection data, rely on constantly updated refrence data, such as on names of species, vessel characteristics, or local reporting requiremetns.
To manage a multitude of data collection and dissemination apps, an infrastructure that offers ...
In iMarine