Geospatial cluster

From D4Science Wiki
Jump to: navigation, search

The main purpose of the Cluster work plan (Ecosystem_Approach_Community_of_Practice_Overview:_Clusters#Cluster_Work_Plans_in_D4Science template here) is to provide the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Board with a management tool usable as a framework for planning activities, and that can serve as a guide for carrying out that work. The scope is thus the interface between the Board and the project's Work Packages activities. After drafting, a work plan needs approval from the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Board, following the Board procedures.

Executive Summary

The D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Geospatial Cluster is maintaining and promoting a Work Plan (this document) aimed at:

  • organizing collections of requirements gathered from the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Business Cases
  • providing recommendations for the implementation of the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure.

The requirements are inputs for the cluster, from Ecosystem_Approach_Community_of_Practice:_D4Science_Business_Cases D4Science Business Cases that are grouped as follows:

  • the EU Common Fishery Policy
  • the FAO deep seas fisheries programme
  • and the UN EAF Ecosystem Approach to fisheries - Tropical Pelagic LME

The recommendations are outputs from the cluster, primarily intended for the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Board, the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. project partners (Work Packages) and the Communities of Practice (CoPCommunity of Practice.) identified within the Ecosystem Approach. They are aimed at releasing infrastructure services such as:

  • Enrichment of Species Occurrences data with profiles of environmental parameters
  • Comparison of Species Distribution Maps

Such Infrastructure Services are needed by the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. eScience services (VREs & Apps).

Introduction and Background (The Problems)

For many biological observations, we have no data on the prevailing environmental conditions; either this information was never recorded (as is the case for many museum specimens, especially older ones), or the data was collected, but by others than the biologists, and different data streams were never re-united. Digging though archives of sampling campaigns of many years ago is tedious, if not impossible, by loss of essential information on the sampling event”.

Edward Vanden Berghe, Executive Director, Ocean Biogeographic Information System, April 2012

Environmental conditions such as salinity, temperature, or acidity, … are essential for conducting studies and developing applications related to Marine Species Distributions. Nevertheless, it is still a tedious task to collect and present coherently such parameters to the scientists. Through the activities of its Geospatial Cluster, the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. project is developing an approach and infrastructure tools in order to tackle this issue. Such activities aim at leveraging the spatial and temporal dimensions of environmental measurements, both being the bridges that can join environmental variables with the current assessments on species occurrences.

Way forward

Oceanographers have created large archives of data, the majority of them publicly accessible. Some examples discussed or assessed within the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. project:

Also, there are vast libraries of remotely-sensed data, some of them being in the public domain. Mining these data sources enables to reconstruct the environmental conditions in the neighbourhood and at the time of the biological observations.

It is unlikely that there will be a complete collection of environmental data for each and every point of interest corresponding to biological observations (e.g. latitude, longitude, depth and time for the OBIS observations). So we will need a step of interpolation between the existing environmental data. This interpolation can be either a statistical interpolation, or based on a model of the variable of interest.

Some further thoughts

A statistical interpolation will most probably be based on a weighted average of measurements of the parameter under consideration in the neighborhood of our 4D point of interest. The problem is that the weighting has to be done over dimensions that do not all behave the same – most obvious is the difference between spatial dimensions and time. I am not sure which models exist for the parameters we’re interested in, and how easily they would be available for our work. There is a difference between remote sensing data and in-situ data – remote sensing data is, first of all, spatially only 2D, which dramatically reduces complexity; and their geographic scope is usually very large, which means that we probably have measurements close to our points of interest. Another type of 2D data is bathymetry; here, the data do not change much in time (at least not on time scales we’re interested in), so we have to deal with only a single ‘layer’. The main source of in-situ data I am aware of is the World Ocean Database, and the Word Ocean Atlas which is derived from the WOD. Both are maintained by the World Ocean Data Center in Silver Spring, near Washington DC; the WDC is operated by the National Oceanographic Data Center of the USA, which is part of NOAA. Obviously, the WDC people know how to create the WOA based on the raw data from the WOD; we might want to look for their collaboration (I have some good contacts there).

What are the environmental variables of interest?

Bathymetry – is easily available, and in a resolution that is sufficient for our purposes; several sources: ETOPO, GEBCO. We could derive some extra parameters from bathymetry, like distance from continent, rugosity or aspect, but these are lower priority. Salinity and temperature – the classic in-situ data. There’s been a lot collected, mainly because these two parameters are influencing the speed of sound in water, so are needed for interpretation of sonar signals – in other words, they have military implications. For this reason, some countries refuse to make the data in their coastal waters public; but there is still *a lot* of data around. And for biodiversity and environmental envelope modelling, they are a priority. pH – very important if we want to be able to deal with global change, including ocean acidification. It’s an in-situ measurement, and not all that much is available. Ocean colour, productivity – the first is used as a proxy for the second; ocean colour reflects chlorophyll. Remote sensing data, so should be relatively easy to deal with? Nutrients – in situ, not as well measured as salinity and temperature, or even pH; lower priority? But is available, just as temperature, salinity and pH is WOA and WOD, so might be low hanging fruits.

Goals and Objectives (The Outputs)

Outputs of the cluster are Roadmaps, Tradeoff analysis and Guidelines for the development, deployment and maintenance of infrastructure services involving geospatial resources and technology, such as:

Such Infrastructure Services are needed by the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. eScience services (VREs & Apps) and other web service endpoints.

A validation process aims at matching the cluster outputs with 'consuming' eScience services like these ones:

  • [VTI]: a VREVirtual Research Environment. for Vessel Transmitted Information management and modeling
  • [ICIS]: a VREVirtual Research Environment. for Integrated Capture Information System with Timeseries management
  • ICIS - SPREAD: a VREVirtual Research Environment. extention for Geospatial Reallocation features
  • [AquaMaps]: a VREVirtual Research Environment. for species predictive modeling / Ecological Niche Modelling
  • ...

It is a demanding task for the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Clusters to well identify and match which are relevant D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. eScience services. This is because of the many business process implications, like creating matches between the Ecosystem Approach and the Infrastructure implementation tasks, or between the selected EA Business Cases and User communities requirements and the governance for D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. in addressing them in terms of Infrastructure services (including SLAs, Terms and Conditions, ...).

To coordinate this effort, across Clusters, and between Clusters and the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Board, some key guidelines are maintained:

Resources and Constraints (The Inputs)

The Business Cases requirements are inputs for the cluster, they come from 3 Business Cases that are grouped as follows:

  • the EU Common Fishery Policy
  • the FAO deep seas fisheries programme
  • and the UN EAF Ecosystem Approach to fisheries

Other inputs

Data sources FAO Tuna Atlas (Tropical tuna data) IRD Tuna Atlas (Tropical tuna data)

Other Fisheries data: (catches of fisheries targeting tuna, bycacth of tuna fisheries scientific tagging data),

FAO Global and Regional datasets

Species distributions, occurrences data of other fisheries databases, Environmental data and models outputs (physical, chemical, biological parameters) that are both managed in netCDF: SST, Wind, Clorophyll,etc Other GIS relevant products, e.g. FAO Intersection Engine products

Constraints

Strategy and Actions (from Inputs to Outputs)

From the strenghts and skills of the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. partners contributing to the Geospatial Cluster, the following action plans have been conducted or are underway:

  • Leveraging the Thredds servers base
  • Implementing the OGC WPS specification
  • Leveraging the OGC OWS Context 1.0 for sharing resources of interest from a research activity or experiment, and that can be consumed by an application (processing, visualization...)
  • Leveraging the Hadoop Processing framework
  • Legacy applications: wrapping the processors (WPS Hadoop deployment Use Cases)
  • Leveraging and/or implementing OGC/ISO Metadata standards, metadata encoding towards data and services sharing
  • Thematic Mapping
  • ...

For each of them, it is envisioned (by January 2013) to review and benchmark their added-value accordingly to the following D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. standard review:

  • Who are the Users
  • Who are the co-funding partners
  • What are the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure resources involved
  • What are the outcomes that do match the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Description of Work
  • How do they fit in the EA-CoPCommunity of Practice. business cases
  • How do they contribute to the sustainability of an EA-CoPCommunity of Practice.
  • How far are they re-usable with clear benefits to EA-CoPCommunity of Practice. representatives, and proven compatibility with EA-CoPCommunity of Practice. resources
  • How far are they consistent with EC regulations/strategies such as INSPIRE

Cluster Participants and Roles

Appendix A - Resources

Software

Computing resources

Connectivity

External Services endpoints (cf. Introduction - Way forward section for the external data sources)

D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Partners Services endpoints

D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure and eScience services (development)

D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure and eScience services (validated)

Algorithms / Processors / Scientific Applications

Expertise areas

...

Appendix B - Budget

Appendix C - Schedule

The Geospatial Cluster aligns its work plan to its primary 'customer' milestones, that are the planned D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Board meetings, appointed through the life-time of the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. project:

  • Semester 1 (Nov 2011 - Apr. 2012);
    • Mobilization phase: identification of opportunities for collaboration and technologies
    • Geospatial Cluster support:
  • Semester 2 (May 2012 - Oct. 2012);
    • Stabilization phase: validation of opportunities and definition of the technology scope
    • Geospatial Cluster support:
  • Semester 3 (Nov 2012 - Apr. 2013);
    • Experimentation phase: with technologies, and with expansion of the EA-CoPCommunity of Practice. user base
    • Geospatial Cluster support:
  • Semester 4 (May 2013 - Oct. 2013);
    • Validation phase: collaboration structures and EA-CoPCommunity of Practice. requirements consolidation
    • Geospatial Cluster support:
  • Semester 5 (Nov 2013 - Apr. 2014);
    • Exploitation phase: operations through EA-CoPCommunity of Practice. collaboration frameworks
    • Geospatial Cluster support:

Appendix D - Documents

Working draft documents

12/12/03 - Thematic Mapping Engine, Time series Map visualization, v2.2, Emmanuel Blondel (FAO), Y.Laurent, F.Brito (Terradue)

13/01/22 - Guidelines for Data and Service Providers, Julien Barde (IRD), Norbert Billet (IRD), Emmanuel Blondel (FAO)

Approved documents

...

TCOM Documents

Appendix E - Other

D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Technical Guidelines

  • D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Data Sources Assessment

D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Governance Rules

EA-CoP Guidelines and Best Practices

EA-CoP Data Access and Sharing Policies

D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Clusters and Boards

The Clusters' coordinated Work Plans for the Boards

D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. CoPCommunity of Practice. EA (Ecosystems Approach)

The business cases

The D4Science eScience services: VREs, Apps, Web Services endpoints