18.07.2013 Biodiv

Meeting Notes: call on VALID (Skype) Date: 18-07-2013 12:00 - 13:00 Topics: * R scripts and versions

Casey and TaxaMatch
Comparing and positioning the tools
RuleFrame

Participants

E. vanden Berghe, GP Coro, A.Ellenbroek, P.Pagano, N.Bailly, C.Aldemita.

Notes

Use of R
1. E-infra uses mainly R-2.14, and in RStudio 2.11. Upgrade to R3.0.1 would be a costly effort, but in the long run will be unavoidable. There are versions of R available in Debian repositories that could be used to keep the R versions up to date
2. EvB uses a Eclipse plugin. Will test his scripts with RStudio 3.11. He asks to investigate (as a low priority) making also Eclipse and the StatET plugin available.
Coro: reports on FINs Taxamatch the first external algorithm to be added to the Statistical Manager
1. FIN still working on refinements, compliant with the iMarine Maven approach
EvB discussed the Finetti diagram and the relations between the components.
1. Some trouble with Fabio’s software. Plan to optimize the parameters to tune the software.
2. Integrate Dima’s (Dmitry Mozzherin, working for GNI, Global Names Initiative) parser, and have a selection from parsers. Compare results between taxamatch and the other tools.
3. Proposes to use the GNI parser where the input structure is unknown. In other cases the FAO tool may produce better results. However e.g. with sub-genus the FAO tool performance is poor. A combination of both, with GNI parser as pre-processing tool to Fabio’s lexical matcher, might be the bests solution.
4. MaxEnt was proposed by Coro to estimate lexical matching.
5. R could be considered
6. EvdB asks for documentation on simulated annealing
7. The usability of the Finetti to compare results was discussed.
Nicolas (difficult to undestand)
1. Will provide a database with historic and currently valid names
2. EvdB asks for misspelling list to share with FAO for testing

Progress with Environmental data enrichment
1. no news from NODC; EvB to re-contact them
2. News from Julien: IRD uses commercial tool for interpolation/kriging; tool is not available at Terradue or in D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure. Julien can help to establish contact between relevant people at IRD and whoever is going to implement the interpolation (CNR??)
3. Still to create a priority-list of environmental layers. Investigate what relevant layers/resolutions are available from MyOcean (http://www.myocean.eu/)

Follow-up actions

Test the RStudio, access solved, but version may be a problem.
Position the different tools in workflows.
Study MaxEnt for simulated annealing
Share mis-spelling list
Work on Rule Frame

18.07.2013 Biodiv

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

D4Science

Capacity

Procedures

Policies

Documentation

Tools