Difference between revisions of "14.11.2013 BiolDiv"
From D4Science Wiki
m |
m |
||
(2 intermediate revisions by one other user not shown) | |||
Line 3: | Line 3: | ||
Google Hangout | Google Hangout | ||
− | '''Present:''' GP | + | '''Present:''' GP, Fabio, Casey, Nicolas |
'''Notes''' | '''Notes''' | ||
− | * GP presented the performance file he sent. | + | * GP presented the performance file he sent: http://goo.gl/W4w5KX |
** using an improved interface with selection of 5 matchers in the order you want (up to 20 possible); | ** using an improved interface with selection of 5 matchers in the order you want (up to 20 possible); | ||
** using a GSAy implementation with YASMEEN’s matchlets | ** using a GSAy implementation with YASMEEN’s matchlets | ||
Line 14: | Line 14: | ||
** or if like in Bionym the user has the choice. | ** or if like in Bionym the user has the choice. | ||
** in any case, the first approach may not be realistic when comparing the 300,000 fish names of GBIF to the 87,000 of FishBase. | ** in any case, the first approach may not be realistic when comparing the 300,000 fish names of GBIF to the 87,000 of FishBase. | ||
− | * It seems that the FUZZYMATCH must be | + | * It seems that the FUZZYMATCH must be brought in line with the rest of the matchlets - now it reports only binary results, while the other matchlets report a finer-grained score; |
− | * The YASMEEN jar has to | + | * The YASMEEN jar has to go on a "diet" (currently is 12MB); |
* Fabio will test the Casey’s GSAy implementation. | * Fabio will test the Casey’s GSAy implementation. | ||
** The current GSAy has to be substituted with the version by Casey | ** The current GSAy has to be substituted with the version by Casey | ||
Line 21: | Line 21: | ||
** We need a benchmark made up of a list of raw strings with associated correct transcriptions. GP will produce this for fin fishes and will calculate the performances with respect to WoRMS. GP will discuss with Edward | ** We need a benchmark made up of a list of raw strings with associated correct transcriptions. GP will produce this for fin fishes and will calculate the performances with respect to WoRMS. GP will discuss with Edward | ||
* CNR needs to support DwC archives import and on-the-fly supply of these files to the Workflow | * CNR needs to support DwC archives import and on-the-fly supply of these files to the Workflow | ||
− | + | * The BiOnym workflow is available here for testing purposes: https://dev.d4science.org/group/devvre/sm | |
==Next meeting== | ==Next meeting== | ||
Line 27: | Line 27: | ||
* Hear from Nicolas about the TDWG meeting (as both Anton and Edward missed the14 November meeting) | * Hear from Nicolas about the TDWG meeting (as both Anton and Edward missed the14 November meeting) | ||
* Further discuss development, and priorities for the immediate future | * Further discuss development, and priorities for the immediate future | ||
+ | ** Further development of benchmarks | ||
* Expectations for the end-of-project delivery; what functionality can we make available? | * Expectations for the end-of-project delivery; what functionality can we make available? | ||
* Long-term strategy | * Long-term strategy |
Latest revision as of 11:39, 18 November 2013
Meeting 14 November 2013, 11:00 am
Google Hangout
Present: GP, Fabio, Casey, Nicolas
Notes
- GP presented the performance file he sent: http://goo.gl/W4w5KX
- using an improved interface with selection of 5 matchers in the order you want (up to 20 possible);
- using a GSAy implementation with YASMEEN’s matchlets
- Somewhere in the documentation it will need to be precised how the data are used in the workflow for each matcher, if all matchers use all data, and it is only the result analyses that discard wrong matches;
- or if the non-matched output are the to be matched input;
- or if like in Bionym the user has the choice.
- in any case, the first approach may not be realistic when comparing the 300,000 fish names of GBIF to the 87,000 of FishBase.
- It seems that the FUZZYMATCH must be brought in line with the rest of the matchlets - now it reports only binary results, while the other matchlets report a finer-grained score;
- The YASMEEN jar has to go on a "diet" (currently is 12MB);
- Fabio will test the Casey’s GSAy implementation.
- The current GSAy has to be substituted with the version by Casey
- The main topic of next week will be to organise the testing and final performance evaluation.
- We need a benchmark made up of a list of raw strings with associated correct transcriptions. GP will produce this for fin fishes and will calculate the performances with respect to WoRMS. GP will discuss with Edward
- CNR needs to support DwC archives import and on-the-fly supply of these files to the Workflow
- The BiOnym workflow is available here for testing purposes: https://dev.d4science.org/group/devvre/sm
Next meeting
Thursday 21 November, 11:00 am. Topics for this meeting:
- Hear from Nicolas about the TDWG meeting (as both Anton and Edward missed the14 November meeting)
- Further discuss development, and priorities for the immediate future
- Further development of benchmarks
- Expectations for the end-of-project delivery; what functionality can we make available?
- Long-term strategy