YASMEEN data formats
"Yet Another Species Matching Execution ENgine" - data formats specification
Raw input data
Input data are generally provided as a flat text file, containing one unstructured entry (species names and authority) per line.
Example of unstructured input data
Gnathophis sp. 1 (dg) Gymnothorax sp. (=sp. B of Chagos?) Glossogobius sp. A cf. hoesei Pseudocarcharias kamoharai e2 Hydrolagus deani [cf. 1x h. sp. a] Lethrinus sp. Starksia sp. Chimaera sp? 07a Centroscyllium nigrum 2b Prionace glauca (Linnaeus, 1758) Callogobius cf flavobrunneus Squalus sp. (asper?) Trimma cf macrophthalma Trimma RW SP 70 Pseudocarcharias kamoharai d1 Saurida grandi/undo complex Percina sp Chromis sp
If input data are built from data sets that already keep species names and authorship information as separate, these can be combined in a single line using the semicolon as separator.
Example of structured input data
Pamdea conica;[Quoy & Gaimard, 1827] Chroococcus;Naegeli, 1849 Proterythropsis vigilians;Marshall 1925 Microcnecus cingulatus; Pitar morrhuanum;Linsley 1848 Micropogonias megalops;Gilbert, 1893 Paraliparis avellaneum;Steinet al., 2001 Urosalpinx hanetti;(Petit, 1856) Neoodax balteatum;(Valenciennes, 1840) Acropora tenella;(G.H. Brook, 1892) Metridia assymmetrica;Brodsky, 1950 Acanthochoris scabrator;Fabricius Ponda carineola;Linnaeus Dulichella;Stout, 1912 Caenopedina;A. Agassiz, 1869 ;Linné 1732
The structured input data format is best suited to be parsed by the identity parser (more on this later), which basically applies no transformation to the structured entries beside the (optional) pre and post processing rules.
The unstructured input data format, on the contrary, needs to be parsed by a real parser in order to extract (or attempt to extract) as much information as possible from the raw data. Nothing prevents users to use the identity parser with unstructured input data: the outcome will most likely be sub-optimal, as the raw entry will be considered as a scientific name in its entirety.