18.07.2013 Biodiv

From D4Science Wiki
Jump to: navigation, search

Date: 18-07-2013 12:00 - 13:00

Topics:

  • R scripts and versions
  • Casey and TaxaMatch
  • Comparing and positioning the tools
  • RuleFrame

Participants

  • E. vanden Berghe, GP Coro, A.Ellenbroek, P.Pagano, N.Bailly, C.Aldemita.

Notes

  1. Use of R
    1. E-infra uses mainly R-2.14, and in RStudio 2.11. Upgrade to R3.0.1 would be a costly effort, but in the long run will be unavoidable. There are versions of R available in Debian repositories that could be used to keep the R versions up to date
    2. EvB uses a Eclipse plugin. Will test his scripts with RStudio 3.11. He asks to investigate (as a low priority) making also Eclipse and the StatET plugin available.
  2. Coro: reports on FINs Taxamatch the first external algorithm to be added to the Statistical Manager
    1. FIN still working on refinements, compliant with the iMarine Maven approach
  3. EvB discussed the Finetti diagram and the relations between the components.
    1. Some trouble with Fabio’s software. Plan to optimize the parameters to tune the software.
    2. Integrate Dima’s (Dmitry Mozzherin, working for GNI, Global Names Initiative) parser, and have a selection from parsers. Compare results between taxamatch and the other tools.
    3. Proposes to use the GNI parser where the input structure is unknown. In other cases the FAO tool may produce better results. However e.g. with sub-genus the FAO tool performance is poor. A combination of both, with GNI parser as pre-processing tool to Fabio’s lexical matcher, might be the bests solution.
    4. MaxEnt was proposed by Coro to estimate lexical matching.
    5. R could be considered
    6. EvdB asks for documentation on simulated annealing
    7. The usability of the Finetti to compare results was discussed.
  4. Nicolas (difficult to undestand)
    1. Will provide a database with historic and currently valid names
    2. EvdB asks for misspelling list to share with FAO for testing
  1. Progress with Environmental data enrichment
    1. no news from NODC; EvB to re-contact them
    2. News from Julien: IRD uses commercial tool for interpolation/kriging; tool is not available at Terradue or in D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure. Julien can help to establish contact between relevant people at IRD and whoever is going to implement the interpolation (CNR??)
    3. Still to create a priority-list of environmental layers. Investigate what relevant layers/resolutions are available from MyOcean (http://www.myocean.eu/)

Follow-up actions

  • Test the RStudio, access solved, but version may be a problem.
  • Position the different tools in workflows.
  • Study MaxEnt for simulated annealing
  • Share mis-spelling list
  • Work on Rule Frame