Ecosystem Approach Community of Practice: SpeciesNameFinder

From D4Science Wiki
Jump to: navigation, search

Elena Balestri of FAO is in charge of uploading data to the Fisheries website, but regularly encounters issues with species / taxon names that contain spelling errors.

We therefore seek the facility in a “mini”-VREVirtual Research Environment. that offers the following facilities on a per-file base;

  1. Upload a csv file with 5 columns; id; ‘arrays’ of names of Target species, associated species, discard species and protected species. (The arrays here are comma separated strings)
  2. Split the ‘arrays’ (normalize over all columns, that is a new feature) or enable another feature to identify the string values between consecutive commas
    1. (I would normalize to a table with columns id / speciesType / name )
    2. I would also add some columns to hold the results; returnName / returnSource / error
  3. For each string; use the ICIS CLM to match against the CLM species. Accept ‘some level’ of discrepancy (e.g. 3 wrong characters for name strings longer than 8 characters)
    1. After this first check, allow for users to manually continue on the AFSIS list
    2. Fill the columns returnName / returnSource / error
    3. After this check is complete, ask if user wants to continue
  4. For all unmatched records, perform a similar match against WoRMS names
    1. ONLY perform this for the records where no name as found in ASFIS
    2. First find matching names, using a similar discrepancy acceptance as above (or taxamatch)
    3. Allow a manual search phase after the automatic phase has ended
  5. Allow users to override any values added to the returnName column.
    1. If such an action is performed, ensure that also the returnSource and error fields are updated
    2. Maintain for these records a roll-back feature
  6. Generate a return set with some metadata on the process
    1. For example: Of x-records, y were 100% matched against AFSIS automatically, z were partially matched against AFSIS etc.
    2. Generate a denormalized datafile identical to the input, but with one columns for matched and unmatched values

In order to further structure Elena’s request, can we please agree on the following activities:

  1. Inform the consortium of the planned activity in next PEB (22 April);
  2. Describe the use case and expected activities, benefits in an iMarine products page (22 April – Anton, with update after PEB);
  3. Once the page is described and reviewed, contact developers for an assessment of implementation costs (02 May earliest);
  4. Review the cost / benefits at PEB and SB level (end May);
  5. Implement the feature as a VREVirtual Research Environment., if permission has been granted by project management (TBD).