Environmental Data Enrichment

From D4Science Wiki
Jump to: navigation, search

Context

Enriching bio-geographical data with general environmental data has been one of the objectives of the biodiversity cluster from the inception of the iMarine project. Environmental information is essential in the characterization of the biological data, and a sine-qua-non for Species Distribution Modelling (or, as this is also known, Environmental Envelope Modelling or Ecological Niche Modelling). Most of the biogeographical data is lacking the environmental context: for many of the museum specimens we find in OBIS, for example, that information has never been collected in the first place. For other biological data, it was maybe collected, but the biological and environmental data was separated, and it is now not easy to associate the biological data with the original environmental data.

For this reason, we started in OBIS to associate OBIS data points with environmental data from two global sources:

ETOPO is used to read the bottom depth for all positions where we have a (or several) data point in OBIS. The resolution of the ETOPO version used is 1 minute (1 nautical mile, or 1.8 km at the equator) – sufficient for virtually all purposes. Bathymetry is static, not fluctuating in time (disregarding sea level rise, which operates at time scales that makes it irrelevant for the analysis of OBIS data) – which means we only need a single value of ETOPO for each of the positions we have data for. We feel that bathymetry is adequately covered. The only task we could set ourselves is to make a comparison between ETOPO and GEBCO (General Electronic Bathymetric Chart of the Ocean, http://www.gebco.net/), the two rival sources of global bathymetry, but this is outside the scope of the environmental data enrichment activity.

The resolution of WOA is 1 degree; unlike bathymetry, parameters vary with depth, hence the need to store several values for each position; WOA records values at 40 standard depth levels (33 between the surface and 5500m). And parameter values vary in time; WOA stores monthly, seasonal and yearly averages per decade, and grand climatological mean for each of the ‘cells’ (longitude x latitude x depth level, 180 x 360 x 40 cells). In OBIS, we associated each biological datum with the climatological mean of the parameters for the relevant cell; these climatological means were extracted from the WOA CDs and uploaded in the PostgreSQL data base to facilitate this.

By using only the climatological mean in OBIS, we basically threw out the temporal aspect of the environment. A first step could be to bring the temporal variation back in, and look up the appropriate temporal mean in WOA instead of using the climatological mean; the ‘appropriate’ temporal mean will depend on the temporal resolution with which we know the date of the observation (e.g., often, this is as coarse as a year, so associating the biological observation with anything finer than a yearly mean is futile).

The finest available temporal resolution in WOA (monthly averages over a decadal period) are probably sufficient for purposes of analysing OBIS data; it would be a matter of building appropriate tools of reading these data off the NODC servers; not sure it is feasible or desirable to duplicate the whole of WOA as a relational database on the iMarine infrastructure. Associating OBIS data with yearly means would already facilitate investigating global change; associating OBIS data with monthly or seasonal means would facilitate incorporating seasonality in the analysis – useful if only to subtract seasonal signal from long-term trends, and thus to make it easier to detect these long term trends. This could easily be translated in a ‘use case’, demonstrating the advantages of improved access to WOA data, by looking at long-term trends and shifts in distribution of some suitable group with sufficient data.

The vertical resolution of WOA is probably sufficient for most types of analyses of OBIS data. For the horizontal resolution, it is clear that this is not the case. One degree, 60 nautical miles or roughly 110 km at the equator, might be enough for open ocean studies, but definitely is not when coming closer to the coast or to sea ice. The horizontal resolution available in WOA is not sufficient, we will have to go back to the WOD (World Ocean Database, http://www.nodc.noaa.gov/OC5/WOD09/pr_wod09.html), the database with the raw data from which WOA is derived. We will need tools to interpolate between the WOD values, to find a value corresponding to the 4D position of the biological observations. The functionality of such a tool is highly reminiscent of what is needed for the creation of the WOA out of the raw data stored in the WOD: at NODC, WOD data was used to interpolate and arrive at the values of the regularly spaced positions/time in the WOA; what we need with iMarine is a similar process, where we replace the regularly spaced WOA positions/time with the position and time of our biological observations.

The desired resolution: The original idea was to have environmental data as close as possible to be biological data, hence the resolution should always be ‘as fine as possible’. The ‘use case’ for the biodiversity cluster is to use these data layers as environmental variables for an exercise of Species Distribution Mapping; more concrete use cases are being developed and priorities for data ingestion/resolution will be defined by these use cases. The precision with which we know the biological positions and time is very high (at least for modern records, and for coastal records also for older records). It is clear that it is impossible to create a grid in 3D+time of sufficiently high resolution to satisfy our requirement – so we will need tools to associate environmental data with the biological observations on an on-demand basis. Spatial resolution vertically: should be much finer at the surface than in the middle of the water column (say better that 10% of the depth for anything shallower than 200 m, better than 20% for deeper values?). Temporal resolution of better than a day probably is not often relevant (except where tides would be important, but then, we don’t have a system to pull in tide information).

If there are many biological observations, and the 4D resolution is very fine, we only need the value of the environmental parameter at the position of the biological observation (plus possibly a ‘reliability’ of the interpolation). The natural variability of the environment should be captured by having sufficiently large numbers of observations, each associated with a relevant biological observation. Where spatial or temporal resolution with which we know the biological observation is too coarse, it is probably better to look at a window of environmental values, rather than a single value (plus standard error); for example, if the position of the biological observation is known precisely, but only the year of the observation is known, not the exact date, we might want to work with minimum and maximum temperature (and salinity…).

Accessing oceanographic environmental data is not a trivial task, even if the data are publicly available and property rights do not come into play. Some of the issues are discussed in here; in summary: the sheer volume, and the great variety of formats used, and the very specialised tools to access the different data sets, make it difficult for the non-specialist user to access these great resources. By facilitating access to oceanographic environmental data the iMarine project could make a real contribution to the marine biodiversity community.

Biodiversity Mapping use cases

In the end, iMarine infrastructure should be generally useful, independent of specific uses an end-user would want to make. In order to make fast progress and to show results before the end of the project, a number of use cases were defined to help setting priorities. Each of those use cases will mimick a scientific or managerial question, and demonstrate how the iMarine infrastructure will make it easier to answer these questions.

Use cases defined so far:

  • Comparing the effect of acidification on different groups of planktonic calcifiers (pelagic calcifiers)
  • Oxygen minimum zones and bottom-dwelling fishes

Variables of interest

Here is a list of variables that might be interesting for Species Distribution Modelling. This list is not taking into account what data would be easy to capture, and what data would be directly relevant to one of our use cases. It’s just a wish list.

  • Atmospheric: irradiance, temperature, cloud cover, rainfall
  • Surface: temperature, salinity, transparency, chlorophyll (colour?), production (derived?), ice cover, ice thickness, distance from ice, tidal amplitude, depth of mixed layer, some measure of wave energy; height of the ocean surface above datum
  • Water column: temperature, salinity, oxygen, Dissolved Organic Carbon, Particulate Organic Carbon, pH, CO2, Aragonite saturation, Calcite saturation
  • Bottom: (same as in water column plus) bathymetry, sediment thickness, sediment grain size (but are there global layers), geomorphology a la GOLD, benthic biomass (derived), sheer stress (speed and direction of current over bottom)
  • Administrative: marine protected areas, LME, EEZ, FAO areas, IHO regional seas

Priorities for the use cases:

  • General: bathymetry, salinity, temperature, oxygen, pH (other layers within WOD could be included as ‘low hanging fruit’)
  • Fishes use case: geomorphology, oxygen
  • Pelagic calcifiers use case: CO2, Aragonite saturation, (Calcite saturation??)

Potential data sources for environmental layers

NODC Silver Spring:

  • World Ocean Database (WOD): exists in many different editions, one every couple of years since 194. In practice, only the last one is relevant. Available information: Temperature, salinity, oxygen, phosphate, silicate, nitrate, pH, CO2, chlorophyll, plankton… Number of observations for each of these; pretty extensive for salinity and temperature. Available on CD/DVD and on line through WODSelect (requires interacting with user interface) or NetCDF (can be automated through OpenDAP or Live Access Server)
  • World Ocean Atlas (WOA): derived from WOD, but only exists where sufficient data are available to produce a meaningful global atlas; also exists in different editions, in tandem with editions of WOD. In particular, pH is not available. Grid cell is 1 degree square; also available as 5 degree square. Also different temporal aggregations; OBIS used the 1 degree long-term means of all available variables to enrich data on the OBIS web site, and to facilitate querying by physical oceanography. WOA2013 will have increased resolution: 102 depth levels between surface and 5500m, and ¼ degree cells. For temperature and salinity, this enhanced resolution exists for WOA 2001 (a new version was published in 2003 with improved calculations, but doesn’t seem to be updated with new data).

Other NOAA sections

MyOcean.eu

  • Sea ice forecasts
  • Chlorophyll? resolution 4 km; starting only from March 2013, so not relevant for historical data?
  • Sea Surface Temperature: resolution 0.25 degree, start from September 2009.

In general: emphasis more on presentation of model results than on historical data that we need for our environmental data enrichment. The data from myOcean.eu are not studied in full – but the web site contains mainly short-term forecasts produced from models; these are not a priority for the species distribution modeling at this stage.

British Oceanographic Data Centre (BODC, http://www.bodc.ac.uk/)

  • GEBCO (General Electronic Bathymetric Chart of the Ocean, http://www.gebco.net/). Data manager Pauline Weatherall; technical contact Ray Cramer.

Bio-ORACLE

A good collection of environmental layers; only gridded data. Data was taken from several sources, both satellite and in-situ, and rasters were derived with a uniform grid size and land mask. Available from Bio-ORACLE. See also this note

Sources of in-situ data: WOD09 (gridding done using Diva GIS). Only surface data!

Sources of satellite data: Aqua-MODIS and SeaWiFS (http:oceancolor.gsfc.nasa.gov)

GOLD (Global Ocean Life Distribution)

  • Seafloor morphology, should be very interesting; not clear from presentation whether these are plans, or actual layers exist that can be incorporated in the infrastructure. If the latter: that would be good. View the new Global Seafloor Geomorphology Mapping project at: http://www.grida.no/marine/news.aspx?id=5290 and http://geoiq.grida.no/maps/1136; the project is scheduled for completion by the second quarter of 2013.

Global biomass map

Could be requested from author (Chih-Lin Wei: weic@tamug.edu, but now in Newfoundland with Paul Snelgrove)

Coral reefs

From www.reefbase.org or WCMC (www.unep-wcmc.org)

IPCC Scenarios

http://data1.gfdl.noaa.gov/CM2.X/. Taking the relevant rasters from bio-Oracle could save us some work.

Kansas Geological Survey (KGS) Mapper

This site had a very extensive collection of environmental layers, to support species distribution modeling. The list of layers and the metadata still be can be accessed through the Hexacorals portal (http://hercules.kgs.ku.edu/hexacoral/envirodata/fulldb/hex_modfilt_secondstep3dev1.cfm); but the download and modeling server seems to be down

Data already available in the iMarine infrastructure

In the list of parameters available, through a text file received from GP Coro in June 2013, most of the layers are extracted from myOcean.eu and from the WOA. While the variables in the WOA collection indeed contain some of the ‘priority’ variables, the spatial resolution is not the one needed for detailed modeling efforts. OBIS data are 'enriched' with WOA data; each OBIS position is linked with WOA data from the relevant square, and the closest depth. The date/time of the biological observation is not taken into account, but the time-averaged data from WOA are used. Thus for these OBIS data, we could do better with the temporal resolution available within WOA, and this could definitely be the subject of a use case:

  • detect seasonality and long-term trends in a planktonic group;

SAHFOS has done this already for their own Continuous Plankton Recorder data; see for example

  • Alvarez-Fernandez, S., Lindeboom, H. and Meesters, E., 2012. Temporal changes in plankton of the North Sea: community shifts and environmental drivers. Marine Ecology Progress Series, 462: 21-32
  • Beaugrand, G., 2012. Unanticipated biological changes and global warming. Marine Ecology Progress Series, 445: 293-301
  • other papers listed on http://www.sahfos.ac.uk/research/publications/recent-publications.aspx).

Continuous plankton recorder data, and similar data sets are available in OBIS, and thus the iMarine infrastructure. A first step could be to try and replicate SAHFOS' analysis of data; afterwards apply similar analysis on other datasets.