Difference between revisions of "10.09.2013 BiolDiv"

From D4Science Wiki
Jump to: navigation, search
m
m
Line 24: Line 24:
 
From Step 5 to 9, and within them, each matching event gives a set of matched names, and non-matched. The matched names are porposed for display for visual control. The unmatched names are send to the next step.
 
From Step 5 to 9, and within them, each matching event gives a set of matched names, and non-matched. The matched names are porposed for display for visual control. The unmatched names are send to the next step.
  
'Step description'
+
==Step description==
  
'''File type'''
+
===File type===
  
'''Dima's Parser'''
+
===Dima's Parser===
  
'''Nomenclatural and taxonomic files'''
+
===Nomenclatural and taxonomic files===
  
'''Classification tree'''
+
===Classification tree===
  
'''GSAy'''
+
===GSAy===
  
'''TaxaMatch fuzzy matching'''
+
===TaxaMatch fuzzy matching===
  
'''Lexical distances'''
+
===Lexical distances===
  
'''Soundex'''
+
===Soundex===
  
 
'''Taxonomic disambiguation'''
 
'''Taxonomic disambiguation'''

Revision as of 10:09, 11 September 2013

Meeting notes: 10.09.2013 Afternoon

Participants: N.Bailly, E.Vanden Berghe, GP Coro, P.Pagano

Notes

Workflow for matching names and taxa

Elements and workflow:

  1. Submitted file type: Text, Unstructured names, Structured names
  2. Dima’s Parser: If file type is Text or Unstructured name, apply Dima’s parser.
  3. Reference files: CoL, WoRMS, CofF, FB, SLB. Possibility to select one or several or all.
  4. Classification tree: Specify the taxon (taxa) in the reference files.
  5. GSAy: Step by step approach: default selection of steps + possibility to remove/add some.
  6. TaxaMatch fuzzy matching (Tony Rees): Removing or replacing letters. Selection of letters. Possibility to select genus and/or species.
  7. Lexical distances (Fabio, Casey): Levinstein, other? Selection of distances. Selection of threshold. Possibility to select genus and/or species.
  8. Soundex (Fabio, Casey): Java function. Selection of threshold. Possibility to select genus and/or species.
  9. Taxonomic disambiguation: Check if the matched name corresponds to one or several taxa.
  10. Taxonomic resolution (4D4Life classification comparison): Check if the family possibly provided by the user correspond to the family in the reference file. If not use the 4D4Life tool.
  11. For final unmatched names, visual check of genera in the lower taxon provided (e.g., family is usual), list of species names by genus, list of species by family, list of species by other taxon if provided.

If the taxon was restricted (step 4), propose extension to other taxa for unmatched names, and restart the process.

From Step 5 to 9, and within them, each matching event gives a set of matched names, and non-matched. The matched names are porposed for display for visual control. The unmatched names are send to the next step.

Step description

File type

Dima's Parser

Nomenclatural and taxonomic files

Classification tree

GSAy

TaxaMatch fuzzy matching

Lexical distances

Soundex

Taxonomic disambiguation

Taxonomic resolution

unmatched names