13.02.2014 BiolDiv

From D4Science Wiki
Jump to: navigation, search

Meeting 13 February 2014, 11:00 am

Google Hangout

Present: Lino, Fabio, GP, Edward; Anton joined later

Notes

Update on web interface

Nicolas circulated a suggested update for the web interface by email. Unfortunately, neither Nicolas or Casey were on the call. An update on the implementation of the interface will have to wait until next meeting.

Fabio created an interface for matching code lists at FAO: http://figisapps.fao.org/vrmf/comet/codelists/mapping/; this could also serve as inspiration for the BiOnym interface.

Versions of BiOnym; validation

There are currently 3 versions of the interface/workflow available:

  • a general one, excluding taxonomy-specific features such as GSAy, for matching with general authority files
  • BiOnym local, with a standard workflow - the interface geared towards the 'naive' users
  • The complete configurable workflow - with only possibility of single name as input.

There is also a complete version available that allows long input lists; this interface is documented at https://gcube.wiki.gcube-system.org/gcube/index.php/Statistical_Manager_Tutorial. Edward will check this out ASAP. This version should allow the biologists to explore the BiOnym, and to validate the results.

There was a request from Ward Appeltans for information, and for a test run of 100 names new to OBIS. Results were discussed between GP and Ward, and let to a request from GP for an extra test at VLIZ; results of this extra test are pending. Some of the conclusions from this initial run (extracted from the email exchange 'first 100 unmatched snames', 30 January 2014):

  • BiOnym does not report infraspecific names, but resolves to the species level. To be investigated further; might require some substantial work to the parser?
  • BiOnym is about resolving spelling mistakes/lexical variations, not synonym resolution; the latter could be incorporated as a post-processing step, but is not BiOnym proper at this point.

Article

GP made good progress. To be emulated by the other people in the author team!!

The test runs by GP of real misspellings, and the artificial misspellings generated by random character substitutions lead to very different conclusions. Edward created a set of artificial misspellings that should be a better mimic of real misspellings. GP will report on the conclusions later; for the article, we'll probably stick to the earlier results.

TaxaMatch allows for a maximum of four hits to be returned; we did our test runs for BiOnym with maximum number of returned hits of 10, 6, 2 and 1 hit. To maximise comparability between BiOnym and TaxaMatch, we might want to consider running the test with a maximum of four hits returned by BiOnym as well.

Next meeting

Thursday 20 February 2014, 11 am.