24.10.2013 BioDiv
Taxon name matching
Hangout meeting Thursday 24th October 2014 – 11:00-12:00 CET
Participants: Caselyn Aldemita, Nicolas Bailly, Gianpaolo Coro, Anton Ellenbroek, Fabio Fiorellato
Matchers vs Matchlets
After a discussion between Gianpaolo and Fabio on Wednesday 23rd Oct. 2013, a difference between the workflow approaches of SpecinaME and BiOnym was clarified:
- Matchlets: one matching method to one atomic component of a full scientific name (G, S, A, Y) (SpecinaME’s approach).
- Matchers: one matching method is applied to the full scientific name (BiOnym approaches).
The Matcher behavior can be emulated by SpecinaME by setting properly the parameters that will activate the relevant Matchlets.
Transformations
In Edward’s R emulation, some Matchers contain also transformations (and indexations) of names that in principle should be run 1) independently for the TAFs, and 2) in the pre-processing phase for the submitted datasets by users.
However, the sequence of transformations may be variable because the set up can be modified by the user during the run configuration phase.
Thus the transformation of the TAFs have to be run on the fly, which for the 250,000 names from OBIS/WoRMS) seems possible, but may take too much time for CoL for instance (or GNI/GNUB).
After discussion, it is decided to implement the matcher ‘in’ SpecinaME that are easy to implement quickly, to select one of few sequence of transformations that are usual, or later on from test results for the matcher including such transformation. And comeback on the second round to tackle the issue (e.g., parallelise the transformation of the TAFs when on the fly).
BiOnym interface
Casey and Gianpaolo solved the issue where to put the interface for viewing and improvement (not yet connected to functionalities). They will and the URL when installed. In any case, the final implementation will be a portlet in the Biodiversity Research Environment, calling the Workflow/Matchers/Matchlets from the Statistical Manager VREVirtual Research Environment..
TDWG presentation
Deadline for a final version: Tuesday 29 Oct, 2013 evening. The work remaining to be done was distributed among participant. A new version of the presentation will be posted in the workspace with the new indications (TDWG_BiOnym_131024-1.pptx in folder [1]).