Difference between revisions of "21.02.2014 BiolDiv"

From D4Science Wiki
Jump to: navigation, search
m (User Interface)
m (Article)
Line 28: Line 28:
  
 
==Article==
 
==Article==
Nicolas started on the introduction, and ill incorporate his bits in the Google document soon (work was done off-line). Nicolas and Edward to agree n who will write what - will discuss on Monday.
+
Nicolas started on the introduction, and will incorporate his bits in the Google document soon (work was done off-line). Nicolas and Edward to agree on who will write what - will discuss during a one-on-one skype call on Monday.
  
We have to decide how far we want to go with BiOnym now, in terms of the complexity of the name strings we want the system to be able to handle. The bottleneck is mainly in the parser. The 'Simple' parser now only looks for uninomens (genera or above) and species. GNI parser is more sophisticated, but is very sensitive to formatting errors/errors in capitalisation. Edward and Nicolas will explore GNI parser further, also produce pseudo-code or another structured explanation on how to read and interpret the results from the GNI parser.
+
We have to decide how far we want to go with BiOnym now, in terms of the complexity of the name strings we want the system to be able to handle. The bottleneck is mainly in the parser. The 'Simple' parser now only looks for uninomina (genera or above) and species. GNI parser is more sophisticated, but is very sensitive to formatting errors/errors in capitalisation. Edward and Nicolas will explore GNI parser further, also produce pseudo-code or another structured explanation on how to read and interpret the results from the GNI parser.
  
 
For names including more than just a simple specific epithet, we can leave all the epithets in one string (i.e. all the nomenclatural 'atoms' following the genus name, but before the authority). This is done by FishBase.
 
For names including more than just a simple specific epithet, we can leave all the epithets in one string (i.e. all the nomenclatural 'atoms' following the genus name, but before the authority). This is done by FishBase.
  
For hybrids, in principle we should parse the name string, and replace it with two scientific names, one for each of the 'parents' of the hybrid: our TAFs don't list hybrids, but in principle they do list each of the parents separately. Dealing with this situation for BiOnym is probably going to be too complicated at this point. For the time being, if we find out that we're dealing with a hybrid (e.g. because it's flagged as uh by the GNI parser, or because we've detected a 'x' between two spaces, we should not even try and match (we'd only increase our false positives).
+
For hybrids, in principle we should parse the name string, and replace it with two scientific names, one for each of the 'parents' of the hybrid: our TAFs don't list hybrids, but in principle they do list each of the parents separately. Dealing with this situation for BiOnym is probably going to be too complicated at this point. For the time being, if we find out that we're dealing with a hybrid (e.g. because it's flagged as such by the GNI parser, or because we've detected a 'x' between two spaces), we should not even try and match (we'd only increase our false positives).
+
 
 
==Validation==
 
==Validation==
 
Anton announces that the validation of the BiOnym activities has started, as this is a deliverable for iMarine due pretty soon. Anton and Edward are working on this together.
 
Anton announces that the validation of the BiOnym activities has started, as this is a deliverable for iMarine due pretty soon. Anton and Edward are working on this together.

Revision as of 12:39, 21 February 2014

Meeting 20 February 2014, 11:00 am

Google Hangout

Present: Fabio, Nicolas, Casey, Edward

Notes

Going over yesterday's meting notes

Nicolas will contact Aaike; FADA, the taxonomic reference used by BioFresh, could be one of the Taxonomic Authority Files for BiOnym.

Edward will contact Yde, Tony, GBIF (Markus, Tim) and Dima to brief about progress with BiOnym; we'll wait till we have a functional implementation (including user-friendly GUI and documentation) to contact the wider biodiversity community.

User Interface

Casey presents the user interface as it stands now - see https://www.dropbox.com/s/4u22kyukj5srx95/Screenshot%202014-02-21%2018.24.38.png. Some points raised during the discussion:

  • Parser choice should be through a combo box, not a text box, and include the option 'NIL' for no parsing (for datasets where the authority has been split off the name proper already)
  • Input: offer the choice (check Dima's parser as an example, http://gni.globalnames.org/parsers/new)
    • enter several names by cut-and paste in a text box, each name (including authority) on a separate line
    • upload a file
  • Output: the user should be able to define how (s)he want to see the results
    • in the interface for very short lists of names, when we can expect the results to be available fast enough to make this feasible
    • download link to be emailed later, as soon as the matching process finishes (as is done in World Ocean Database for example - check WODselect)
    • in the workspace for users who have an iMarine account
  • include examples of what the distances between names are, and explain what the other options are, though help links
  • make number of matchers to be used a variable number, by adding a 'Add matcher' button, and starting out with just one; up to a maximum of five matchers?

For the time being the interface is not functional; Casey will contact GP directly to sort out any remaining problems.

Article

Nicolas started on the introduction, and will incorporate his bits in the Google document soon (work was done off-line). Nicolas and Edward to agree on who will write what - will discuss during a one-on-one skype call on Monday.

We have to decide how far we want to go with BiOnym now, in terms of the complexity of the name strings we want the system to be able to handle. The bottleneck is mainly in the parser. The 'Simple' parser now only looks for uninomina (genera or above) and species. GNI parser is more sophisticated, but is very sensitive to formatting errors/errors in capitalisation. Edward and Nicolas will explore GNI parser further, also produce pseudo-code or another structured explanation on how to read and interpret the results from the GNI parser.

For names including more than just a simple specific epithet, we can leave all the epithets in one string (i.e. all the nomenclatural 'atoms' following the genus name, but before the authority). This is done by FishBase.

For hybrids, in principle we should parse the name string, and replace it with two scientific names, one for each of the 'parents' of the hybrid: our TAFs don't list hybrids, but in principle they do list each of the parents separately. Dealing with this situation for BiOnym is probably going to be too complicated at this point. For the time being, if we find out that we're dealing with a hybrid (e.g. because it's flagged as such by the GNI parser, or because we've detected a 'x' between two spaces), we should not even try and match (we'd only increase our false positives).

Validation

Anton announces that the validation of the BiOnym activities has started, as this is a deliverable for iMarine due pretty soon. Anton and Edward are working on this together.

Next meetings

Monday 24 February, 11:00 am European time for Nicolas and Edward Thursday 28 February, 11:00 am for the group