Difference between revisions of "Ecosystem Approach Community of Practice: SpeciesModeller"

From D4Science Wiki
Jump to: navigation, search
(Product Description)
m
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
== Product Description ==
 
== Product Description ==
  
As a preliminary activity to the environmental envelope modeling, there should be a step of quality control. EEM is basically a process of extrapolating geography on the basis of observed environmental parameters. If there are outliers in the dataset (either through mistakes in identification or georeferencing, or genuine data but for a specimen that was completely out of the normal range for its species, this could degrade the quality of the predictions. Gianpaolo Coro and Edward started investigating clustering algorithms as a tool to automate the detection of outliers in environmental space. Originally this work was started by an employee of OBIS, but should now be continued, and made operational so that any data sent to EEM algorithms first passes a quality control check.
+
The SpeciesModeller services currently rely on an interplay of several VREs. At the time of the last edit of this proposal (Aug 2013), a decision to bring these resources together in one VRE had not been taken. Since the modelling activity has to rely on a variety of data from different domains, and requires expertise in several areas, the focus currently is on developing the building blocks for a species modelling environment. The urgency of bringing all this expertise into one environment is not immediately clear, since the underlying infrastructure is already capable of discovering and accessing the data required. Rather then a single product, this page describes thus an advanced workflow spanning several user nodes. Ultimately, and if a user community shows interest, a separate VRE could be created based on these individual components from other VREs; a unified interface offering access to all these components through a single point of access would definitely help in making the learning curve a bit less steep.
 +
 
 +
As a first activity to enable Environmental Envelope Modelling (EEM), there should be a step of quality control definition. EEM is basically a process of extrapolating geography on the basis of observed environmental parameters. If there are outliers in the dataset (either through mistakes in identification or georeferencing, or genuine data but for a specimen that was completely out of the normal range for its species), this could degrade the quality of the predictions. Gianpaolo Coro and Edward Vanden Berghe started investigating clustering algorithms as a tool to automate the detection of outliers in environmental space. Originally this work was started by Phoebe Zhang, then working at OBIS, but should now be continued, and made operational so that any data sent to EEM algorithms first passes a quality control check.
 +
 
 
The technical team at CNR has already expanded the possibilities of AquaMaps – by making available more environmental layers, and by providing an interface to allow selection of the layers to go into a model run. Integrating openModeller in the D4Science infrastructure was a separate activity under the OpenBio project. The integration is tested at this point, and once this is concluded satisfactorily, openModeller will be made available to iMarine as well. That way, users will not only be able to select environmental layers going into the model run, but also the algorithm. Edward would be more than happy to be involved in the testing and validation of the various components. A more concrete work plan should be developed in collaboration with iMarine leadership, and with Vanderlei and Townsend.
 
The technical team at CNR has already expanded the possibilities of AquaMaps – by making available more environmental layers, and by providing an interface to allow selection of the layers to go into a model run. Integrating openModeller in the D4Science infrastructure was a separate activity under the OpenBio project. The integration is tested at this point, and once this is concluded satisfactorily, openModeller will be made available to iMarine as well. That way, users will not only be able to select environmental layers going into the model run, but also the algorithm. Edward would be more than happy to be involved in the testing and validation of the various components. A more concrete work plan should be developed in collaboration with iMarine leadership, and with Vanderlei and Townsend.
One of the key points in EEM work in iMarine was the closer linking of biological and environmental data, i.e., to get the closest values/estimations of the environmental parameters at the moment of the collection or observation event. Now, many of the environmental data are taken from averages for a 1-degree square, and the temporal component is completely averaged out. The World Data Centre for Oceanography (WDC) at Silver Spring, Maryland, compiles the World Ocean Database (WOD), integrating all available observations of ocean temperature, salinity, pH, oxygen, and various nutrients. On the basis of WOD, they create the World Ocean Atlas (WOA) – interpolating values from WOD to the regular grid in 3D space and time of the WOA. Replacing the WOA regular grid with time and 3D position of the biological observations to be modeled would result in the desired close linking of biological and environmental data. Clearly iMarine could benefit from the experience of the WDC people. Edward will contact Fabrice Brito to discuss a strategy.
 
  
The main goal is to increase the scale of modeling and prediction both in term of time and space in order to be more accurate and precise in highs seas for migratory species such as tuna species and for heavily exploited coastal areas.
+
One of the key points in EEM work in iMarine was the closer linking of biological and environmental data, i.e., to get the closest values/estimations of the environmental parameters at the moment of the collection or observation event. Now, many of the environmental data are taken from averages for a 1-degree square, and the temporal component is completely averaged out. The World Data Center for Oceanography (WDC) at Silver Spring, Maryland, compiles the World Ocean Database (WOD), integrating all available observations of ocean temperature, salinity, pH, oxygen, and various nutrients. On the basis of WOD, they create the World Ocean Atlas (WOA) – interpolating values from WOD to the regular grid in 3D space and time of the WOA. Replacing the WOA regular grid with time and 3D position of the biological observations to be modelled would result in the desired close linking of biological and environmental data. Clearly iMarine could benefit from the experience of the WDC people. Edward, Fabrice Brito and others will contact WDC to discuss a strategy.
Several possible avenues for future work in EEM will be investigated, and further selected and implemented as iMarine project deliverable:
+
 
# Investigating the improvements of the predictions resulting from working with environmental data that are more tightly coupled with the biological observations, both in 3D and in time. Now time completely disappeared from the AquaMaps models. Bring time back in (by looking at quarterly or monthly means, rather than global means) would facilitate modeling of seasonal shifts.
+
The main goal is to increase the scale of modelling and prediction both in term of time and space in order to be more accurate and precise in highs seas for migratory species such as tuna species and for heavily exploited coastal areas. Several possible avenues for future work in EEM will be investigated, and further selected and implemented as iMarine project deliverable:
 +
# Investigating the improvements of the predictions resulting from working with environmental data that are more tightly coupled with the biological observations, both in 3D and in time. Now time completely disappeared from the AquaMaps models. Bring time back in (by looking at quarterly or monthly means, rather than global means) would facilitate modelling of seasonal shifts, and allow to detect long-term (decadal) shifts that have occurred in the resent past.
 
# Working on a variable-size grid, where the size of the grid cells is proportional either to 1/ the quantity of the available data, or 2/ the steepness of the environmental gradients. These two are likely highly correlated. Increasing the size of the more homogenous open-ocean cells would decrease the number of computing cycles and increase the number of available observations in each of the cells. Increasing the resolution in cells where the gradients are steeper will result in more precise predictions (hopefully also more accurate).
 
# Working on a variable-size grid, where the size of the grid cells is proportional either to 1/ the quantity of the available data, or 2/ the steepness of the environmental gradients. These two are likely highly correlated. Increasing the size of the more homogenous open-ocean cells would decrease the number of computing cycles and increase the number of available observations in each of the cells. Increasing the resolution in cells where the gradients are steeper will result in more precise predictions (hopefully also more accurate).
 
# Replacing the geographic constraint in AquaMaps, now based on FAO fishing areas, with a continuous constraint based on the distance to the closest actual observation
 
# Replacing the geographic constraint in AquaMaps, now based on FAO fishing areas, with a continuous constraint based on the distance to the closest actual observation
 
# Running the models in 3D, rather than a single 2D model like in AquaMaps
 
# Running the models in 3D, rather than a single 2D model like in AquaMaps
 
# Replace the ‘squares’ of the unprojected maps with a hexagonal equal-area tessellation. Squares in the unprojected maps are, on the ground not at all square; towards the poles they become very elongated trapeziums. A hexagonal tessellation would have the advantage to better reflect the reality on the ground. Also, being very close to a circle on the ground, this would make the intra-cell distances small in comparison with the area, and thus result in a better description of the environment.  
 
# Replace the ‘squares’ of the unprojected maps with a hexagonal equal-area tessellation. Squares in the unprojected maps are, on the ground not at all square; towards the poles they become very elongated trapeziums. A hexagonal tessellation would have the advantage to better reflect the reality on the ground. Also, being very close to a circle on the ground, this would make the intra-cell distances small in comparison with the area, and thus result in a better description of the environment.  
 +
 
For each of these possible topics, a short document could be prepared, to start a discussion on the scientific value, the potential contribution to iMarine, the resources needed, and time frame in which the work can be completed. This will make it possible for iMarine project leadership to contemplate the potential merit of these topics, and decide on priorities.
 
For each of these possible topics, a short document could be prepared, to start a discussion on the scientific value, the potential contribution to iMarine, the resources needed, and time frame in which the work can be completed. This will make it possible for iMarine project leadership to contemplate the potential merit of these topics, and decide on priorities.
  
Species modeling requires knowledge in three respective information domains:
+
Species modelling requires knowledge in three respective information domains:
* environmental
+
* [[Environmental Data Enrichment]] adds information on specific environmental parameters to species occurrence data.
* biology
+
* Biology. The species envelopes encompass the preferred and tolerated range of environmental variables.
* the relation between biology and environment; the model.  
+
* The relation between biology and environment; the models. These are expected to rely on the Statistical service.
  
 
=== Progress ===
 
=== Progress ===
Add link to top level ticket
+
Add link to TRAC ticket
  
 
=== Priority to CoP ===  
 
=== Priority to CoP ===  

Latest revision as of 10:36, 22 August 2013

Product Description

The SpeciesModeller services currently rely on an interplay of several VREs. At the time of the last edit of this proposal (Aug 2013), a decision to bring these resources together in one VREVirtual Research Environment. had not been taken. Since the modelling activity has to rely on a variety of data from different domains, and requires expertise in several areas, the focus currently is on developing the building blocks for a species modelling environment. The urgency of bringing all this expertise into one environment is not immediately clear, since the underlying infrastructure is already capable of discovering and accessing the data required. Rather then a single product, this page describes thus an advanced workflow spanning several user nodes. Ultimately, and if a user community shows interest, a separate VREVirtual Research Environment. could be created based on these individual components from other VREs; a unified interface offering access to all these components through a single point of access would definitely help in making the learning curve a bit less steep.

As a first activity to enable Environmental Envelope Modelling (EEM), there should be a step of quality control definition. EEM is basically a process of extrapolating geography on the basis of observed environmental parameters. If there are outliers in the dataset (either through mistakes in identification or georeferencing, or genuine data but for a specimen that was completely out of the normal range for its species), this could degrade the quality of the predictions. Gianpaolo Coro and Edward Vanden Berghe started investigating clustering algorithms as a tool to automate the detection of outliers in environmental space. Originally this work was started by Phoebe Zhang, then working at OBIS, but should now be continued, and made operational so that any data sent to EEM algorithms first passes a quality control check.

The technical team at CNR has already expanded the possibilities of AquaMaps – by making available more environmental layers, and by providing an interface to allow selection of the layers to go into a model run. Integrating openModeller in the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure was a separate activity under the OpenBio project. The integration is tested at this point, and once this is concluded satisfactorily, openModeller will be made available to iMarine as well. That way, users will not only be able to select environmental layers going into the model run, but also the algorithm. Edward would be more than happy to be involved in the testing and validation of the various components. A more concrete work plan should be developed in collaboration with iMarine leadership, and with Vanderlei and Townsend.

One of the key points in EEM work in iMarine was the closer linking of biological and environmental data, i.e., to get the closest values/estimations of the environmental parameters at the moment of the collection or observation event. Now, many of the environmental data are taken from averages for a 1-degree square, and the temporal component is completely averaged out. The World Data Center for Oceanography (WDC) at Silver Spring, Maryland, compiles the World Ocean Database (WOD), integrating all available observations of ocean temperature, salinity, pH, oxygen, and various nutrients. On the basis of WOD, they create the World Ocean Atlas (WOA) – interpolating values from WOD to the regular grid in 3D space and time of the WOA. Replacing the WOA regular grid with time and 3D position of the biological observations to be modelled would result in the desired close linking of biological and environmental data. Clearly iMarine could benefit from the experience of the WDC people. Edward, Fabrice Brito and others will contact WDC to discuss a strategy.

The main goal is to increase the scale of modelling and prediction both in term of time and space in order to be more accurate and precise in highs seas for migratory species such as tuna species and for heavily exploited coastal areas. Several possible avenues for future work in EEM will be investigated, and further selected and implemented as iMarine project deliverable:

  1. Investigating the improvements of the predictions resulting from working with environmental data that are more tightly coupled with the biological observations, both in 3D and in time. Now time completely disappeared from the AquaMaps models. Bring time back in (by looking at quarterly or monthly means, rather than global means) would facilitate modelling of seasonal shifts, and allow to detect long-term (decadal) shifts that have occurred in the resent past.
  2. Working on a variable-size grid, where the size of the grid cells is proportional either to 1/ the quantity of the available data, or 2/ the steepness of the environmental gradients. These two are likely highly correlated. Increasing the size of the more homogenous open-ocean cells would decrease the number of computing cycles and increase the number of available observations in each of the cells. Increasing the resolution in cells where the gradients are steeper will result in more precise predictions (hopefully also more accurate).
  3. Replacing the geographic constraint in AquaMaps, now based on FAO fishing areas, with a continuous constraint based on the distance to the closest actual observation
  4. Running the models in 3D, rather than a single 2D model like in AquaMaps
  5. Replace the ‘squares’ of the unprojected maps with a hexagonal equal-area tessellation. Squares in the unprojected maps are, on the ground not at all square; towards the poles they become very elongated trapeziums. A hexagonal tessellation would have the advantage to better reflect the reality on the ground. Also, being very close to a circle on the ground, this would make the intra-cell distances small in comparison with the area, and thus result in a better description of the environment.

For each of these possible topics, a short document could be prepared, to start a discussion on the scientific value, the potential contribution to iMarine, the resources needed, and time frame in which the work can be completed. This will make it possible for iMarine project leadership to contemplate the potential merit of these topics, and decide on priorities.

Species modelling requires knowledge in three respective information domains:

  • Environmental Data Enrichment adds information on specific environmental parameters to species occurrence data.
  • Biology. The species envelopes encompass the preferred and tolerated range of environmental variables.
  • The relation between biology and environment; the models. These are expected to rely on the Statistical service.

Progress

Add link to TRAC ticket

Priority to CoPCommunity of Practice.

List proposed solution priority following the iMarine Board priority setting criteria:
  • Potential target community;
  • Users;
  • Potential for co-funding;
  • Structural allocation of resources;
  • Referred in DoW;
  • Business Cases;
  • How does the proposed action generally support sustainability aspects;
  • How consistent it is with EC regulations/strategies (eg INSPIRE);
  • Re-usability – benefits – compatibility;

Parentage

Relation to CoPCommunity of Practice. Software
Relation to D4S technologies

Productivity

Are the proposed measures effective?
Does it reduce a known workload?

Presentation

How must the component be delivered to users? (UI Design / on-line help / training material / support)

Policy

Are there any policies available that describe data access and sharing?

Add link here. Chapter in [1]

Have the Copyright / attribution / metadata / legal aspects been addressed from a user and technology perspective?