Difference between revisions of "16.01.2014 BiolDiv"

From D4Science Wiki
Jump to: navigation, search
(Created page with "'''Meeting 16 January 2014, 11:00 am''' Google Hangout '''Present:''' Edward, Anton, Fabio, GP =Notes= ==BiOnym Performance== GP presents the calculated performance of the ...")
 
 
(2 intermediate revisions by one other user not shown)
Line 10: Line 10:
 
==BiOnym Performance==
 
==BiOnym Performance==
  
GP presents the calculated performance of the BiOnym workflow, with variating number of outputs and of using the two available parsers (Dima's GNI and Fabio's SIMPLE).
+
GP presents the performance of the BiOnym workflow on the benchmark datasets provided by Edward. The performance is reported for several lengths of the output list and using two parsers (Dima's GNI and Fabio's SIMPLE).
The report highlight the following points:
+
The report highlights the following points:
  
* BiOnym outperforms the WoRMS taxamatch in all the configurations
+
* BiOnym WF outperforms the WoRMS taxamatch in all the configurations
 
* the WF using the SIMPLE parser by Fabio gains better performance than the one using the GNI parser
 
* the WF using the SIMPLE parser by Fabio gains better performance than the one using the GNI parser
* the WF using the SIMPLE parser and only the Levenstein distance is the best system to recognize the automatically generated species names (this is not true for the real names)
+
* the WF using the SIMPLE parser and only the Levenstein distance as matcher, is the best system to recognize the automatically generated species names (this is not true for real names)
  
Fabio explains the rationale behind the different behaviour between the SIMPLE and the GNI parser: the GNI parser is better on complex, but well formatted, inputs.
+
Fabio gives a possible explanation for the different behavior between the SIMPLE and the GNI parser: the GNI parser is better on complex, but well formatted inputs.
Edward asserts this behaviour is not surprising.
+
Edward asserts the detected behavior for the parsers is not surprising.
Edward suggests to have contacts with other people involved in Taxa matching to search for collaborations.
+
Edward suggests to have contacts with other people involved in Taxa matching to search for collaborations; more specifically, we should look for collaboration with the people around GNI if we want, as per Yde de Jong's suggestion, keep track of all the misspellings we have resolved.
  
 
Anton asks to have one BiOnym recognizer available both on the FAO and on the i-Marine website
 
Anton asks to have one BiOnym recognizer available both on the FAO and on the i-Marine website
Line 26: Line 26:
  
 
GP has to:
 
GP has to:
* double check the WoRMS Taxamatch performance (GP just noticed there was a mistake in the description of the WoRMS WebService which could have influenced the evaluation)
+
* double check the WoRMS Taxamatch performance (GP just noticed there was a mistake in the description of the WoRMS Web Service which could have influenced the evaluation)
 
* produce a matrix reporting the amount of complementary errors made by two pairs of Matchers inside BiOnym (to estimate to which extent Levenshtein can substitute the entire WF)
 
* produce a matrix reporting the amount of complementary errors made by two pairs of Matchers inside BiOnym (to estimate to which extent Levenshtein can substitute the entire WF)
 +
* expand on the matrix with performances of single-matcher WFs; now the table has the performance for 10 names returned; we need similar tables for 6, 2 and 1 name returned
 
* introduce Beam-Search options on the Statistical Manager
 
* introduce Beam-Search options on the Statistical Manager
* develop a local\fast version of BiOnym to be used from the websites
+
* develop a local\fast version of BiOnym to be used by the websites
  
 
Fabio has to:
 
Fabio has to:

Latest revision as of 15:36, 16 January 2014

Meeting 16 January 2014, 11:00 am

Google Hangout

Present: Edward, Anton, Fabio, GP

Notes

BiOnym Performance

GP presents the performance of the BiOnym workflow on the benchmark datasets provided by Edward. The performance is reported for several lengths of the output list and using two parsers (Dima's GNI and Fabio's SIMPLE). The report highlights the following points:

  • BiOnym WF outperforms the WoRMS taxamatch in all the configurations
  • the WF using the SIMPLE parser by Fabio gains better performance than the one using the GNI parser
  • the WF using the SIMPLE parser and only the Levenstein distance as matcher, is the best system to recognize the automatically generated species names (this is not true for real names)

Fabio gives a possible explanation for the different behavior between the SIMPLE and the GNI parser: the GNI parser is better on complex, but well formatted inputs. Edward asserts the detected behavior for the parsers is not surprising. Edward suggests to have contacts with other people involved in Taxa matching to search for collaborations; more specifically, we should look for collaboration with the people around GNI if we want, as per Yde de Jong's suggestion, keep track of all the misspellings we have resolved.

Anton asks to have one BiOnym recognizer available both on the FAO and on the i-Marine website

Next Steps

GP has to:

  • double check the WoRMS Taxamatch performance (GP just noticed there was a mistake in the description of the WoRMS Web ServiceSelf-contained, self-describing, modular application that can be published, located, and invoked across the Web. Web services perform functions that can be anything from simple requests to complicated business processes. Once a Web service is deployed, other applications (and other Web services) can discover and invoke the deployed service. which could have influenced the evaluation)
  • produce a matrix reporting the amount of complementary errors made by two pairs of Matchers inside BiOnym (to estimate to which extent Levenshtein can substitute the entire WF)
  • expand on the matrix with performances of single-matcher WFs; now the table has the performance for 10 names returned; we need similar tables for 6, 2 and 1 name returned
  • introduce Beam-Search options on the Statistical Manager
  • develop a local\fast version of BiOnym to be used by the websites

Fabio has to:

  • check with FIN if their implementation of GSAy with the YASMEEN Framework can be substituted to the currently running one

FIN has to:

  • give updates about the status of the BiOnym web interface
  • give a feedback about the effectiveness of their GSAy implementation with the YASMEEN framework

Next meeting

Thursday 23 January 2014, 11 am.