12.12.2013 BiolDiv

From D4Science Wiki
Jump to: navigation, search

Meeting 12 December 2013, 11:00 am

Google Hangout

Present: Nicolas, Fabio, Anton, Edward

Notes

Development of BiOnym

Nicolas prepared a presentation with a mock-up of the interface for bionym-biodiv (the simple interace for bionym taxon name matching), and some points for discussion, available from http://goo.gl/OJiuJt. In preparation for this, Nicolas visited several web sites with taxon name matching systems, including the present bionym-biodiv one on the statistical manager.

Some issues raised during our discussion:

  • User should be able to set encoding
  • Interface should be able to accommodate both 'structured' and 'unstructured' data; the interface has to allow the user to specify the structure of his data set. Ideally, part of the interface should b dynamic, and be different for structured and unstructured data. For example, it does not make sense to apply, and so also not to ask to choose, a parser to structured data, where the information is already in separate fields
    • in unstructured data, both name proper and authority are on one line, in a single string
    • structured data has the authority in a separate field from the name proper; the name proper can be further separated in genus name, specific epitheton, infrasprecific... For the time being, the name proper will be reconstructed as a single string in those cases where the name is split in several parts.
  • Interface should allow selecting several TAFs rather han just one
  • Include authority in matching or not?
  • Allow fuzzy matching, stemming... Though for the simple interface, this might be a choice we make on behalf of the end user, and build into the system as default
  • How many matching names to return for each of the names to be tested; not separated per matcher, as then the user would have to understand how the internals of our matching process works, which we don't expect for the simple interface.
  • Postprocessing should be seen as separate from matching/processing, also in the user interface. Possible things to configure in terms of postprocessing:
    • which fields to return (e.g., include classification? Include valid name? Include known synonyms to allow query expansion in other systems?...)
    • how to reurn the results: only as .csv, or also open the csv and display results in the interface?...
  • System should return second file with metadata: all settings of the matching process, version of the authority files and the software...

In order not to overwhelm the user, it might be wise to have the configuration on several tab sheets, or sections...

Settings should be re-usable

  • saved (automatically?) with a given data set and presented as default when the matching is done a second time on those same data
  • saved under a named configuration file, to be re-used with other data sets in the future

Nicolas revised the presentation, now available as file BiOnymInterface_v3_131213.pptx from http://goo.gl/ELRiem

Next meeting

Thursday 19 December 2013, 11 am. Please confirm availability ASAP; if too many people can't make it, we'll look for a better date and time.