Ecosystem Approach Community of Practice: SpeciesNameFinder
From D4Science Wiki
Elena Balestri of FAO is in charge of uploading data to the Fisheries website, but regularly encounters issues with species / taxon names that contain spelling errors.
We therefore seek the facility in a “mini”-VREVirtual Research Environment. that offers the following facilities on a per-file base;
- Upload a csv file with 5 columns; id; ‘arrays’ of names of Target species, associated species, discard species and protected species. (The arrays here are comma separated strings)
- Split the ‘arrays’ (normalize over all columns, that is a new feature) or enable another feature to identify the string values between consecutive commas
- (I would normalize to a table with columns id / speciesType / name )
- I would also add some columns to hold the results; returnName / returnSource / error
- For each string; use the ICIS CLM to match against the CLM species. Accept ‘some level’ of discrepancy (e.g. 3 wrong characters for name strings longer than 8 characters)
- After this first check, allow for users to manually continue on the AFSIS list
- Fill the columns returnName / returnSource / error
- After this check is complete, ask if user wants to continue
- For all unmatched records, perform a similar match against WoRMS names
- ONLY perform this for the records where no name as found in ASFIS
- First find matching names, using a similar discrepancy acceptance as above (or taxamatch)
- Allow a manual search phase after the automatic phase has ended
- Allow users to override any values added to the returnName column.
- If such an action is performed, ensure that also the returnSource and error fields are updated
- Maintain for these records a roll-back feature
- Generate a return set with some metadata on the process
- For example: Of x-records, y were 100% matched against AFSIS automatically, z were partially matched against AFSIS etc.
- Generate a denormalized datafile identical to the input, but with one columns for matched and unmatched values
In order to further structure Elena’s request, can we please agree on the following activities:
- Inform the consortium of the planned activity in next PEB (22 April);
- Describe the use case and expected activities, benefits in an iMarine products page (22 April – Anton, with update after PEB);
- Once the page is described and reviewed, contact developers for an assessment of implementation costs (02 May earliest);
- Review the cost / benefits at PEB and SB level (end May);
- Implement the feature as a VREVirtual Research Environment., if permission has been granted by project management (TBD).