18.07.2013 Biodiv
From D4Science Wiki
Date: 18-07-2013 12:00 - 13:00
Topics:
- R scripts and versions
- Casey and TaxaMatch
- Comparing and positioning the tools
- RuleFrame
Participants
- E. vanden Berghe, GP Coro, A.Ellenbroek, P.Pagano, N.Bailly, C.Aldemita.
Notes
- Use of R
- E-infra uses mainly R-2.14, and in RStudio 2.11. Upgrade to R3.0.1 would be a costly effort, but in the long run will be unavoidable. There are versions of R available in Debian repositories that could be used to keep the R versions up to date
- EvB uses a Eclipse plugin. Will test his scripts with RStudio 3.11. He asks to investigate (as a low priority) making also Eclipse and the StatET plugin available.
- Coro: reports on FINs Taxamatch the first external algorithm to be added to the Statistical Manager
- FIN still working on refinements, compliant with the iMarine Maven approach
- EvB discussed the Finetti diagram and the relations between the components.
- Some trouble with Fabio’s software. Plan to optimize the parameters to tune the software.
- Integrate Dima’s (Dmitry Mozzherin, working for GNI, Global Names Initiative) parser, and have a selection from parsers. Compare results between taxamatch and the other tools.
- Proposes to use the GNI parser where the input structure is unknown. In other cases the FAO tool may produce better results. However e.g. with sub-genus the FAO tool performance is poor. A combination of both, with GNI parser as pre-processing tool to Fabio’s lexical matcher, might be the bests solution.
- MaxEnt was proposed by Coro to estimate lexical matching.
- R could be considered
- EvdB asks for documentation on simulated annealing
- The usability of the Finetti to compare results was discussed.
- Nicolas (difficult to undestand)
- Will provide a database with historic and currently valid names
- EvdB asks for misspelling list to share with FAO for testing
- Progress with Environmental data enrichment
- no news from NODC; EvB to re-contact them
- News from Julien: IRD uses commercial tool for interpolation/kriging; tool is not available at Terradue or in D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure. Julien can help to establish contact between relevant people at IRD and whoever is going to implement the interpolation (CNR??)
- Still to create a priority-list of environmental layers. Investigate what relevant layers/resolutions are available from MyOcean (http://www.myocean.eu/)
Follow-up actions
- Test the RStudio, access solved, but version may be a problem.
- Position the different tools in workflows.
- Study MaxEnt for simulated annealing
- Share mis-spelling list
- Work on Rule Frame