Ecosystem Approach Community of Practice: TaxonReconciliation

From D4Science Wiki
Revision as of 18:19, 4 September 2012 by Anton.ellenbroek (Talk | contribs)

Jump to: navigation, search
Taxon Reconciliation
The taxon matching facility will allow users in the Biodiversity community to load their datasets describing species, and reconcile the names and other descriptive features using matching algorithms against a pre-defined range of structured taxonomic names repositories. The output of the process will be identifiers of "mismatch" between the private dataset entries and entries in the remote repositories. The environment will be equipped with facilities where users can modify their taxonomic entries, persist the changes they have made, and publish their datasets to other e-infrastructure users and to subscribed e-mail addresses to inform when and what data have been reconciled.
Priority to CoPCommunity of Practice.
List proposed solution priority following the iMarine Board priority setting criteria:
  • The taxon matching aims at the biodiversity community involved in reconciling differences in taxonomic description between datasets. However, there are several use cases identified beyond the this target community. FAO already uses a very similar approach to entity mapping for it's vessel registry, and is willing to contribute to, and exploit functionality emerging in this product;
  • First level users (data owners) will be from FIN, IOC Unesco and FAO. Second level users (data managers) will be partners of FIN and IOC Unesco involved in maintaining and managing taxonomies. Third level users (consumers) will be potentially any biologist in need of access to detailed taxonomic information;
  • Potential for co-funding. The potential for co-funding can only be assessed after a prototype has been released (Q3 2013);
  • Structural allocation of resources. To be discussed;
  • The harmonization services are referred in DoW in T3.3, WP6 and WP9 and are thus highly relevant to the project objective;
  • The Business Case 2 for support to the biodiversity community is in need of reconciliation services in order to enable the EA-CoPCommunity of Practice. to interpret observational data across reporting and monitoring systems;
  • The proposed product aims to re-use gCube components for data access and management. In addition, it will provide specifc high value computational services to the biodiversity data providers, enabling them to reduce a specific and demanding work-load. These 2 aspects support the sustainability of the e-infrastructure by reducing development and operational costs to the EA-CoPCommunity of Practice.;
  • The consistency and compatibility of the product with EA-CoPCommunity of Practice. regulations and strategies (eg INSPIRE) has yet to be investigated;
  • The services re-use the components developed in the context of the Biodiversity Research Environment. It focuses on increasing their usability by adding matching algorithms between selected taxonomic entities, a means of persisting the results in a re-usable and re-accessible format, and sharing and notification services. The benefits to taxonomists are found in the field of single point data harmonization (eliminating double work and multiple error prone data entry steps), data processing (e.g. matching large datasets is very demanding), and flexibility. The aim is to offer all users a biodiversity 'compatibility' tool that allows them to use other organization's taxonomic data with trust; trust in the structure, the quality, and completeness of imported data;
Parentage
The Relation to existing CoPCommunity of Practice. Software and manual activity is evident.
  1. OBIS
  2. Tony Rees' TaxonMatcher.
  3. FAO Vessel registry tool.
Relation to D4S technologies; CNR to advise.
Productivity
Are the proposed measures effective?
Does it reduce a known workload?
Presentation
How must the component be delivered to users? (UI Design / on-line help / training material / support)

The community requires that several services are available for data source registration and communication:

  1. Reference data providers; taxonomic data repositories such as OBIS, GBIF, CoL, ....
  2. User management
  3. Data dissemination / feed-back services

For the actual reconciliation work, they expect to be able to step through the process in 5 steps.

  1. Data load; a set of Darwin Core (Archive) is loaded to a user account;
  2. Data reconciliation; select the target data-provider, the taxonomic rank from where to match down, boost / remove matching levels (e.g. boost Family level matching, ignore subspecies information)
  3. Data processing information
  4. Data review option with manual confirm / edit options
  5. Data sharing and feed back as a report to data owners
Policy
Are there any policies available that describe data access and sharing?
Have the Copyright / attribution / metadata / legal aspects been addressed from a user and technology perspective?