3.10.2013 BiolDiv

From D4Science Wiki
Jump to: navigation, search

Meeting 3 October 2013, 11:00 am

Google Hangout

Present: Nicolas, Casey, Anton, Lino, GP, Fabio, Edward

Notes

BiOnym

Two approaches of Casey have been made available on the iMarine infrastructure through the statistical manager. These will both be available after some technical issues have been sorted out.

Edward is still working on a prototype for BiOnym in R. Will be made available early next week, though chances that it will be finished by then are pretty slim. But hopefully it will help clarifying the vision.

GP discusses how he sees the integration between Casey’s work and Fabio’s –The original concept was to have a cascade of ‘switches/matchers’, with successively more relaxed criteria for matching, and so more and more names of the test dataset matching with names from the reference list. At each step, the matched names are either sent on to the next switch, or sent to the post-processing step. In such an approach, the switches incorporating expert knowledge on taxonomic nomenclature should come first (such as the GSAy approach), afterwards come the more general switches, based on a general lexical approach (such as Fabio’s specimen.jar). There would be a single, linear set of switches between pre-processing and post-processing.

GP suggests expanding this to a process where there would be at least two parallel cascades; the two would run in parallel, and the outcome of both compared via an ‘ensemble’ analysis. A postprocessing step would compare and consolidate results from the different flows. Taxon name matches found by different cascades have a higher probability of being ‘good’ matches.

It was agreed by all that GP’s idea is well worth exploring; but priority should now be with the linear, sequential approach. As soon as this is functional, it would be relatively easy to define single linear cascades that operate either on domain knowledge alone, or general lexical matching alone, and then combine the results of several such cascades, and thus implement GP’s ensemble concept.

Fabio suggests investigating how Casey’s work can be incorporated in his specimen.jar approach; this is in itself already a series of matchers. Prepending the existing series of matchers with matchers based on the GSAy work could be one way of integrating both approaches.

Species Distribution Modelling

A call specifically for 4D interpolation needed for Environmental Data Enrichment is planned for tomorrow, 4 October 2013.

There has been some progress between GP and Edward on SDM activities on cod; these will now be put on a halt till after the TDWG meeting of end October.

Next meeting

Will be held 10 October 2013, 11 am. Lino will call the meeting to order; Edward will send a reminder earlier that day.