16.01.2014 BiolDiv

From D4Science Wiki
Jump to: navigation, search

Meeting 16 January 2014, 11:00 am

Google Hangout

Present: Edward, Anton, Fabio, GP

Notes

BiOnym Performance

GP presents the performance of the BiOnym workflow on the benchmark datasets provided by Edward. The performance is reported for several lengths of the output list and using two parsers (Dima's GNI and Fabio's SIMPLE). The report highlights the following points:

  • BiOnym WF outperforms the WoRMS taxamatch in all the configurations
  • the WF using the SIMPLE parser by Fabio gains better performance than the one using the GNI parser
  • the WF using the SIMPLE parser and only the Levenstein distance as matcher, is the best system to recognize the automatically generated species names (this is not true for real names)

Fabio gives a possible explanation for the different behavior between the SIMPLE and the GNI parser: the GNI parser is better on complex, but well formatted inputs. Edward asserts the detected behavior for the parsers is not surprising. Edward suggests to have contacts with other people involved in Taxa matching to search for collaborations; more specifically, we should look for collaboration with the people around GNI if we want, as per Yde de Jong's suggestion, keep track of all the misspellings we have resolved.

Anton asks to have one BiOnym recognizer available both on the FAO and on the i-Marine website

Next Steps

GP has to:

  • double check the WoRMS Taxamatch performance (GP just noticed there was a mistake in the description of the WoRMS Web ServiceSelf-contained, self-describing, modular application that can be published, located, and invoked across the Web. Web services perform functions that can be anything from simple requests to complicated business processes. Once a Web service is deployed, other applications (and other Web services) can discover and invoke the deployed service. which could have influenced the evaluation)
  • produce a matrix reporting the amount of complementary errors made by two pairs of Matchers inside BiOnym (to estimate to which extent Levenshtein can substitute the entire WF)
  • expand on the matrix with performances of single-matcher WFs; now the table has the performance for 10 names returned; we need similar tables for 6, 2 and 1 name returned
  • introduce Beam-Search options on the Statistical Manager
  • develop a local\fast version of BiOnym to be used by the websites

Fabio has to:

  • check with FIN if their implementation of GSAy with the YASMEEN Framework can be substituted to the currently running one

FIN has to:

  • give updates about the status of the BiOnym web interface
  • give a feedback about the effectiveness of their GSAy implementation with the YASMEEN framework

Next meeting

Thursday 23 January 2014, 11 am.