14.11.2013 BiolDiv
From D4Science Wiki
Revision as of 14:38, 16 November 2013 by Edward.vberghe (Talk | contribs)
Meeting 14 November 2013, 11:00 am
Google Hangout
Present: GP, Lino, Fabio, Casey, Nicolas
Notes
- GP presented the performance file he sent.
- using an improved interface with selection of 5 matchers in the order you want (up to 20 possible);
- using a GSAy implementation with YASMEEN’s matchlets
- Somewhere in the documentation it will need to be precised how the data are used in the workflow for each matcher, if all matchers use all data, and it is only the result analyses that discard wrong matches;
- or if the non-matched output are the to be matched input;
- or if like in Bionym the user has the choice.
- in any case, the first approach may not be realistic when comparing the 300,000 fish names of GBIF to the 87,000 of FishBase.
- It seems that the FUZZYMATCH must be revised, as it generates unreliable ranks;
- The YASMEEN jar has to do some "diet" (currently is 12MB);
- Fabio will test the Casey’s GSAy implementation.
- The current GSAy has to be substituted with the version by Casey
- The main topic of next week will be to organise the testing and final performance evaluation.
- We need a benchmark made up of a list of raw strings with associated correct transcriptions. GP will produce this for fin fishes and will calculate the performances with respect to WoRMS. GP will discuss with Edward
- CNR needs to support DwC archives import and on-the-fly supply of these files to the Workflow
Next meeting
Thursday 21 November, 11:00 am. Topics for this meeting:
- Hear from Nicolas about the TDWG meeting (as both Anton and Edward missed the14 November meeting)
- Further discuss development, and priorities for the immediate future
- Expectations for the end-of-project delivery; what functionality can we make available?
- Long-term strategy