7.11.2013 BiolDiv

From D4Science Wiki
Jump to: navigation, search

Meeting 7 November 2013, 9:30 am

Google Hangout

Present: GP, Fabio, Edward; Anton for the first half of the meeting; Casey and Nicolas having connectivity problems

Notes

TDWG Meeting

Nicolas and GP both attended the TDDWG meeting in Firenze, 28 October - 1 November 2013. Nicolas gave a presentation on BiOnym, GP on the statistical manager.

Unfortunately Nicolas was not able to connect and brief us on the BiOnym presentation. GP attended that session, and had useful discussions with other participants. Several useful suggestions were made. The first is to try and apply techniques of patterns matching. Another one was to make use of domain expert knowledge - not only from the domain of taxonomic nomenclature, but also of other domains such as OCR, and lexicography.

BiOnym

GP reported on the work on interfaces to taxon matching now available on the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure, through https://dev.d4science.org/group/devvre/sm >SM > Taxa. Two different versions of the BiOnym workflow are available; one is based on the workflow as presented by Edward in the bionym.R script; the other is more general and allows more customisation. GP invites us to comment. Two possible extensions of the present workflow are:

  • Allow for matching against more than one Taxonomic Authority File
  • Place the maximum number of matches for each of the matching operations under the control of the user

He used the infrastructure to do some experiments calculating performance; some results are available through http://goo.gl/gdtVRg. We had some discussion on the performance/efficiency (computation time as a function of number of names tested, and number of nodes used), and on possible further experiments.

Fabio's matcher, now renamed YASMEEN from SpeciMEn, is well documented, on http://wiki.i-marine.eu/index.php/YASMEEN

We need to start on an analysis of the effectiveness of different workflows; for this we need to be able to apply the GSAy approach as a matter of urgency. Pending a final implementation of this approach from Casey, Fabio will incorporate an implementation in YASMEEN. As soon as the GSAy approach is available, we can start some experiments, like comparing BiOnym performance/effectiveness with Tony Rees' taxamatch.

Apparently the bionym.R script, written as a prototype illustrating a vision on taxon name matching, is not as effective a communication tool as initially hoped. Edward will create some pages on the wiki site to serve as a basis for a proper specification.

Next meeting

Thursday 14 November, 11:00 am. Topics for this meeting:

  • Hear from Nicolas about the TDWG meeting
  • Further discuss development, and priorities for the immediate future
  • Expectations for the end-of-project delivery; what functionality can we make available?
  • Long-term strategy