Difference between revisions of "Biodiversity Draft Work Plan Q2-3- 2012"

From D4Science Wiki
Jump to: navigation, search
(Resources and Constraints (The Inputs))
Line 70: Line 70:
 
== Resources and Constraints (The Inputs) ==
 
== Resources and Constraints (The Inputs) ==
  
 +
The iMarine project was designed with a clear vision on the need for support to challenging data access and management scenarios. It also anticipated that specialized resources would have to be identified after the project started, e.g. in establishing collaborations with specialized departments in project partners' institutions (FAO, IRD), and related EA-CoP projects such as with AgInfra. A quick assessment of some potential resources to include can be used to identify the nex steps to bring them to the e-Infrastructure. The resources in this project that con be included are listd by contributing project partner:
  
The iMarine project was designed with a clear vision on the need for semantic technology support to chellenging scenarios. It also anticipated that specialized resources would have to be identified after the project started, e.g. in establishing collaborations with specialized departments in project partners' institutions (FAO, IRD), and related EA-CoP projects such as with AgInfra.
+
The below tables list the resources by:
 +
Name; a short identifyer
 +
Source; a url or other resource identifyer;
 +
MosCoW; Must Should, or Would the resource be exploitable through the e-Infrastructure;
 +
Purpose; in what scenario / Use Cases is the resource needed;
  
A quick and complete assessment of needs and constraints can only be made once such collaborations have stabalized.
+
OBIS - Use case description, data provider and developer.
  
The resources from the project would include:
+
{| border="1" cellpadding="4" cellspacing="0" valign="top"
 +
! width="50"|'''Name''' !! width="120"|'''Source''' !! width="120"|'''MosCoW''' !! width="120"|'''Purpose''' 
 +
|- valign="top"
 +
| align="center" | <!--  --> WoRMS
 +
| align="center" | <!--  --> Link to WoRMS
 +
| align="center" | <!--  --> Must
 +
| align="center" | <!--  --> Marine Species Occurences
 +
|}
  
OBIS - Use case description, data provider and developer.
+
 
 +
Taxonomic data:
 +
WoRMS
 +
Catalogue of Life
 +
ITIS
 +
IRMNG
 +
NCBI (=genbank)
 +
[OBIS taxonomy is already available, but should not be used as a source for taxonomy – OBIS is a ‘consumer’ of the taxonomy, not an authoritative source; same with GBIF]
 +
 
 +
Biogeographical data:
 +
GBIF
 +
OBIS
 +
[There are several more ‘thematic sub-networks’ of GBIF – such as VertNet. We could check whether GBIF has all these data of VertNet, just as we can do for OBIS. Let’s concentrate on OBIS to build the tools, then afterwards we can try and ‘sell’ this to the VertNet people. ‘VertNet’ is Vertebrates Network, and is itself a network with specialised nodes dealing with mammals, reptiles, fish, birds.]
 +
 
 +
Environmental data:
 +
I will leave this for Fabrice and others to fill in. The datasets I have been using are
 +
ETOPO (I’m using the 1 minute grid); we could also load GEBCO (seems to be better these days). There are a number of parameters we might want to derive from the bathymetry (such as rugosity, and slope and aspect)
 +
WOD [I am not using it for OBIS at the time being]
 +
WOA [derived from the WOD; it would be good to check with the NODC people how they do this – instead of reducing WOD to the regular-spaced points/intervals of the WOA, we might try and adapt their algorithms to give us the best value for the position and time of the OBIS data]
 +
The one thing missing from WOA is pH – which is very important
 +
Others we could include: distance from ice (can be derived from datasets available from a NOAA web site); ocean colour (as a proxy for production); distance from coast and from 200-m depth contour…
 +
Last but not least: IPCC predictions for change of temperature, oxygen and pH; apparently there are some models that predict these not only for the surface but over the water column.
 +
 
 +
This list is likely to grow – it would make sense to have it on a Wiki page
  
 
FAO - Use case description, data provider
 
FAO - Use case description, data provider
Line 88: Line 123:
  
 
Specific constraints are the low level of expertise in gCube technolgy development in th EA-CoP and with some partners that have developed biodiversity  tools. In addition, many data are volatile or incomplete, and will require specialized curation.
 
Specific constraints are the low level of expertise in gCube technolgy development in th EA-CoP and with some partners that have developed biodiversity  tools. In addition, many data are volatile or incomplete, and will require specialized curation.
 
  
 
== Strategy and Actions (from Inputs to Outputs) ==
 
== Strategy and Actions (from Inputs to Outputs) ==

Revision as of 10:45, 5 June 2012

The expectations of the Biodiversity Cluster (Mainly related to BC 2) were presented at the imarine Board meeting in Rome in March. After the meetng they were further detailed using WP3 Conference Calls, and meetings with EA-CoPCommunity of Practice. representatives. The iMarine Vice-Chair (E. Vanden Berghe) was intrumental in collecting the descriptions.

Abstract or Executive Summary

One of the 4 currently identified clusters in iMarine is 'biodiversity'. The term is sufficiently vague to enable the collection of requirements that operate on, or benefit from, 'biodiversity' domain. The boundary of this domain is far from sharp, and immediate relations with e.g. the geospatial and statistical clusters are evident.

The work-plan is not domain specific, and technology neutral. It decribes how the iMarine Board can be involved in the specification of use-cases, data policies, and harmonization issues, to name a few. This page is deliberately written from the end-users' point of view, without details of implementation, but concentrating on functionality. This page is seen as a spring-board from which to define a concrete work plan that can be passed on to the development teams.


Introduction and Background (The Problems)

The iMarine Board is responsible for the implementation of 2 Business Cases in the project, and brings a wealth of community expertise to the technical e-infrastructure. The EA-CoPCommunity of Practice. has needs to search over multiple resources, to extract data from several of them, and to integrate the data from many sources in information products transgressing boundaries between data sets and even scientific disciplines. For example, the ultimate aim of Business Case 2 is to perform analysis and create information products that will support the FAO Vulnerable Marine Ecosystem project; these products will have to be based on biogeographical data warehouses such as OBIS and GBIF (in themselves compilations of large number of individual data sets), and environmental data from oceanography, meteorology, bathymetry...

The opportunity was presented and discussed at the imarine Board meeting in Rome (March 19-21), and later furtehr elaborated and discussed by project partners with EA-CoPCommunity of Practice. input.

Goals and Objectives (The Outputs)

The ultimate goal of the Biodiversity Cluster, and of Business Case 2, is to support the ecosystem approach to fisheries. In the first instance this will be done through activities supporting FAO's Vulnerable Marine Ecosystems project, and CBD's Ecologically and Biologically Significant Areas activities. The ultimate goal of BC2 is to be able to bring biodiversity information into the process of defining VMEs and/or EBSAs, by predicting hot-spots of biodiversity, by providing information on distribution of rare and/or endangered species, by providing information on areas essential in the life cycle of marine species... The iMarine activities - and hence the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure, will be used to enhance the workflows to integrate data within the warehouses for biogeographic data, by enabling the creation of tools for quality control, and by bringing biodiversity data together with non-biodiversity environmental data.

The cluster discussion at the imarine board meeting was summarized by OBIS (Edward Vanden Berghe). Five goals were identified as products that can be delivered, for each of these, an initial set of objectives emerged that require further discussion. The goals are : Taxon name access, Taxon name reconciliation, Occurrence data access, Occurrence data reconciliation, and Occurrence data enrichment.


Taxon Names Access

The work of CNR will have to be reviewed and validated once the service is completed by the end of April 2012. It is noted that plug-ins now exist to consume WoRMS services, and that also Catalogue of Life is available within the infrastructure. Three mare reference lists used at OBIS at this point are ITIS, IRMNG and NCBI. Now these lists are kept up-to-date by downloading a copy of these databases, and uploading to the OBIS PosgreSQL database; this is a time-consuming process, and access to these name lists in an environment where also the OBIS names are available would enhance workflow for OBIS.

GBIF is exposing its taxonomic names as well; this could be explored as a separate reference list.

Efforts in this context will mainly be of benefit through combination with Taxonomic Name Reconciliation.

Taxon Name Reconciliation

Currently, OBIS (Edward) uses SQL statements to merge taxonomic lists. This is based on a number of rules that marshall the merging.

The proposed service will produce a list of pairs of Taxa each with a probability of similarity among the two Taxa; CNR and FIN will take the leadership of specifying services for Taxon Data; FAO has developed a very similar tool for vessel disambiguation that includes a well designed UI. This will be reviewed too.

Data availability depends on the number of plugins the infrastructure is equipped with, one plugin for each data source / provider. It is therefore evident that the first Use Case has to be operational.


Occurrence data access

CNR is already implementing a first schedule:

  • first develop occurrence points data access, i.e. work on services giving access to occurrence points from a number of data providers. The occurence data service will be based on a species name, and spatial and temporal parameters.
  • then, in a second phase the taxonomic data ;


Occurrence Data Reconciliation

There may be overlaps and gaps between the datasets contained in 2 (or more) repositories. With millions of occurence records, support is needed to identify both the gaps and overlaps, not only at data level, but also at dataset level.

  • OBIS and GBIF can serve data through an 'occurrences service';
  • The project partners have to consider how to define an 'occurrence service' for 'singleton'/'duplicate' identification;

CNR has plans to initiate work in May.


Occurrence data enrichment

By the end of April CNR expects to complete the activity on 'occurrence point access'. The enrichment will come in a successive phase, also since this depends on results of other clusters (namely the Geo-spatial one);

The ocurence data enrichment would see a user use a service that, either in on-line or in batch mode, takes a set of spatio-temporal parameters, and a set of occurence points, and queries and external environmental data repository to extract geospatial explicit information. For example, for 10.000 points, the nearest 1000 Sea Surface temperatures are interpolated over a 1 month period, and returned as average, max, min, std for each point.

For outliers flagging on land gazetteers are available, however, in a marine environment the notion of space is different, and iMarine can contribute truely innovative solutions.

CNR sees a role for the other project partners and the imarine Board to guide the classification of Occurrence Points, e.g. survey data rather than specimen.

Resources and Constraints (The Inputs)

The iMarine project was designed with a clear vision on the need for support to challenging data access and management scenarios. It also anticipated that specialized resources would have to be identified after the project started, e.g. in establishing collaborations with specialized departments in project partners' institutions (FAO, IRD), and related EA-CoPCommunity of Practice. projects such as with AgInfra. A quick assessment of some potential resources to include can be used to identify the nex steps to bring them to the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large.. The resources in this project that con be included are listd by contributing project partner:

The below tables list the resources by: Name; a short identifyer Source; a url or other resource identifyer; MosCoW; Must Should, or Would the resource be exploitable through the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large.; Purpose; in what scenario / Use Cases is the resource needed;

OBIS - Use case description, data provider and developer.

Name Source MosCoW Purpose
WoRMS Link to WoRMS Must Marine Species Occurences


Taxonomic data: WoRMS Catalogue of Life ITIS IRMNG NCBI (=genbank) [OBIS taxonomy is already available, but should not be used as a source for taxonomy – OBIS is a ‘consumer’ of the taxonomy, not an authoritative source; same with GBIF]

Biogeographical data: GBIF OBIS [There are several more ‘thematic sub-networks’ of GBIF – such as VertNet. We could check whether GBIF has all these data of VertNet, just as we can do for OBIS. Let’s concentrate on OBIS to build the tools, then afterwards we can try and ‘sell’ this to the VertNet people. ‘VertNet’ is Vertebrates Network, and is itself a network with specialised nodes dealing with mammals, reptiles, fish, birds.]

Environmental data: I will leave this for Fabrice and others to fill in. The datasets I have been using are ETOPO (I’m using the 1 minute grid); we could also load GEBCO (seems to be better these days). There are a number of parameters we might want to derive from the bathymetry (such as rugosity, and slope and aspect) WOD [I am not using it for OBIS at the time being] WOA [derived from the WOD; it would be good to check with the NODC people how they do this – instead of reducing WOD to the regular-spaced points/intervals of the WOA, we might try and adapt their algorithms to give us the best value for the position and time of the OBIS data] The one thing missing from WOA is pH – which is very important Others we could include: distance from ice (can be derived from datasets available from a NOAA web site); ocean colour (as a proxy for production); distance from coast and from 200-m depth contour… Last but not least: IPCC predictions for change of temperature, oxygen and pH; apparently there are some models that predict these not only for the surface but over the water column.

This list is likely to grow – it would make sense to have it on a Wiki page

FAO - Use case description, data provider

CNR - Tools and application provider, developer

CRIA - Tools and application provider, developer

FIN - .....

Specific constraints are the low level of expertise in gCube technolgy development in th EA-CoPCommunity of Practice. and with some partners that have developed biodiversity tools. In addition, many data are volatile or incomplete, and will require specialized curation.

Strategy and Actions (from Inputs to Outputs)

This schedule will have to be further elaborated, and discussed with the iMarine Board in May. Their response can then be used at the TCom in Greece in June.

The goals and objectives have been defined and discussed at the iMarine Board meeting in Rome in March. Here, it was also decided that a biodiversity cluster be established to define objectives, and prepare outlines for VREVirtual Research Environment.'s, applications and services. These will then be presented to the iMarine Board and the wider EA-CoPCommunity of Practice. (May 2012).

In June, the results from the EA-CoPCommunity of Practice. consultation will be discussed at the TCom, to establish feasibility, usability, and usefulness of the identified Use Cases and components.

The feed-back from the TCom and technical boards will then be discussed with the iMarine Board and selected EA-CoPCommunity of Practice. representatives for follow-up ations.

Meanwhile, project partners already can spend effort on the first 4 Use Cases to support; the Taxon service, the data access to biodiversity data repositories, and discovery and dowload of species occurence data; taxon name discovery, taxon name reconciliation, occurrence data access, and occurrence data reconciliation.

Appendices (Planned Effort, Resources, Meeting notes, Schedule and Others)

Planned effort & resources

FAO

OBIS

CNR

CRIA

FIN ...


meeting notes

Schedule