Geospatial cluster

From D4Science Wiki
Revision as of 18:23, 10 January 2013 by Herve.caumont (Talk | contribs) (Created page with "Enrichment of Species Occurrences data with profiles of environmental parameters Introduction “For many biological observations, we have no data on the prevailing environment...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Enrichment of Species Occurrences data with profiles of environmental parameters

Introduction

“For many biological observations, we have no data on the prevailing environmental conditions; either this information was never recorded (as is the case for many museum specimens, especially older ones), or the data was collected, but by others than the biologists, and different data streams were never re-united. Digging though archives of sampling campaigns of many years ago is tedious, if not impossible, by loss of essential information on the sampling event”. Edward Vanden Berghe, Executive Director, Ocean Biogeographic Information System, April 2012

Environmental conditions such as salinity, temperature, or acidity, … are essential for conducting studies and developing applications related to Marine Species Distributions. Nevertheless, it is still a tedious task to collect and present coherently such parameters to the scientists. Through the activities of its Geospatial Cluster, the iMarine project is developing an approach and infrastructure tools in order to tackle this issue. Such activities aim at leveraging the spatial and temporal dimensions of environmental measurements, both being the bridges that can join environmental variables with the current assessments on species occurrences.

The iMarine Geospatial Cluster is aimed at organizing collections of requirements gathered from the iMarine Business cases (EU Common Fishery Policy, FAO deep seas fisheries programme, and the UN EAF Ecosystem Approach to fisheries) and at providing infrastructure implementation recommendations. These recommendations are primarily intended for the iMarine project partners and the Communities of Practice (CoPCommunity of Practice.) identified within the Ecosystem Approach.

Way forward

Oceanographers have created large archives of data, the majority of them publicly accessible. Some examples discussed or assessed within the iMarine project: • The OBIS Ocean Biogeographic Information System: http://www.iobis.org/ • The Fishbase global information system: http://www.fishbase.org • The SeaLifebase biological information for biodiversity and ecosystem studies http://www.sealifebase.org • The WorldFish center ReefBase: http://www.reefbase.org • The World Register of Marine Species (WoRMS): http://www.marinespecies.org • The Catalog of Life indexing the world’s known species: http://www.catalogueoflife.org • … Also, there are vast libraries of remotely-sensed data that are in the public domain. Mining these data sources enables to reconstruct the environmental conditions in the neighbourhood and at the time of the biological observations. • The GMES myOcean Monitoring and Forecasting system for marine applications: http://www.myocean.eu.org • The pan-European infrastructure for ocean and marine data management: http://seadatanet.maris2.nl • …

It is unlikely that there will be a complete collection of environmental data for each and every point of interest corresponding to biological observations (e.g. latitude, longitude, depth and time for the OBIS observations). So we will need a step of interpolation between the existing environmental data. This interpolation can be either a statistical interpolation, or based on a model of the variable of interest.

Some further thoughts

A statistical interpolation will most probably be based on a weighted average of measurements of the parameter under consideration in the neighbourhood of our 4D point of interest. The problem is that the weighting has to be done over dimensions that do not all behave the same – most obvious is the difference between spatial dimensions and time. I am not sure which models exist for the parameters we’re interested in, and how easily they would be available for our work. There is a difference between remote sensing data and in-situ data – remote sensing data is, first of all, spatially only 2D, which dramatically reduces complexity; and their geographic scope is usually very large, which means that we probably have measurements close to our points of interest. Another type of 2D data is bathymetry; here, the data do not change much in time (at least not on time scales we’re interested in), so we have to deal with only a single ‘layer’. The main source of in-situ data I am aware of is the World Ocean Database, and the Word Ocean Atlas which is derived from the WOD. Both are maintained by the World Ocean Data Center in Silver Spring, near Washington DC; the WDC is operated by the National Oceanographic Data Center of the USA, which is part of NOAA. Obviously, the WDC people know how to create the WOA based on the raw data from the WOD; we might want to look for their collaboration (I have some good contacts there).

What are the environmental variables of interest?

Bathymetry – is easily available, and in a resolution that is sufficient for our purposes; several sources: ETOPO, GEBCO. We could derive some extra parameters from bathymetry, like distance from continent, rugosity or aspect, but these are lower priority. Salinity and temperature – the classic in-situ data. There’s been a lot collected, mainly because these two parameters are influencing the speed of sound in water, so are needed for interpretation of sonar signals – in other words, they have military implications. For this reason, some countries refuse to make the data in their coastal waters public; but there is still *a lot* of data around. And for biodiversity and environmental envelope modelling, they are a priority. pH – very important if we want to be able to deal with global change, including ocean acidification. It’s an in-situ measurement, and not all that much is available. Ocean colour, productivity – the first is used as a proxy for the second; ocean colour reflects chlorophyll. Remote sensing data, so should be relatively easy to deal with? Nutrients – in situ, not as well measured as salinity and temperature, or even pH; lower priority? But is available, just as temperature, salinity and pH is WOA and WOD, so might be low hanging fruit. Distance from ice/ice cover – can be calculated on the basis of data available from another section of NOAA. Important if we want to model some specific species, and if we want to deal with global change.

What are the use cases?

Calculating the environmental ranges for taxa – now OBIS do this on the basis of WOA (not WOD!!), and makes the data available on the OBIS web site and the Encyclopedia of Life. Through iMarine, we could aim at replacing WOA with WOA for existing parameters, and extend the list of available parameters. Quality control by checking for outliers in environmental space – having the environmental data in sufficiently high resolution will facilitate this; we need to look into an appropriate statistical procedure for this. As I mentioned before, there was someone working on this within the OBIS team, but she left before this was made operational; she was using a form of hierarchical clustering. I can look up the details later. Alternatively, we could look into Multidimensional Scaling. Environmental Envelope Modelling – for the ‘comprehensive’ approach we should wait until the openModeller work done within OpenBio is available to iMarine. But there are things we could do in the meantime in relation to AquaMaps.

Sub-setting Approach

We leverage the sub-setting capability of OpenDAP server offering, as the approach for the environmental re-conciliation of occurrences: • Analyse the temporal and spatial distribution of the occurrences as an R analysis process • Use the info above to identify the MyOcean products required and publish them behind Thredds server(s) • Process in parallel the occurrences with the OPeNDAP subsetting with R to (as per Anton wishes): 1. Start with one variable, e.g. Sea Surface Temperature 2. For each occurrence we want the average SST of the 30 days before the observation for the half degree cell in which the observation falls, and the average SST for that month for the last five years (monthly products) 3. Later we also want the min and max observations, the Mean and St.Dev. 4. Then we also want the 'number' or 'sample' used to generate the average. That would be the nr of maps used, and the data points (there may be 1 map with 100 points used or 10 maps with 10 points)