YASMEEN input data parser

From D4Science Wiki
Revision as of 16:04, 25 October 2013 by Fabio.fiorellato (Talk | contribs) (Created page with ""''Yet Another Species Matching Execution ENgine''" - Input data parser CLI tool == Purposes == The YASMEEN Input data parser is the command line (CLI) tool that implements the...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

"Yet Another Species Matching Execution ENgine" - Input data parser CLI tool

Purposes

The YASMEEN Input data parser is the command line (CLI) tool that implements the first step in the YASMEEN data flow.

It ingests, pre-processes, parses, post-processes and converts in the proper format, a set of input data provided as unstructured (or semi-structured) lines in a text file.

Command line

java -jar YASMINE-parser-<version>.jar <options>

You can launch it with the '-h' option to get a report of the available options with their description:

java -jar YASMINE-parser-<version>.jar -h

Will give:

usage: InputDataParser:
 -dataSourceId <arg>             Specify the identifier for the data source originating the input data. Defaults to
                                 'UserProvidedData' when not set
 -h                              Print this message
 -inFile <arg>                   Specify a path to the file containing unstructured input data (one per line)
 -noHeader                       Omit the CSV header in the produced parsed results file
 -outFile <arg>                  Specify a path to the file that will contain the structured parsed results
 -parser <arg>                   Specify one of the available input parsers among { GNI (Global Names Index), GNI_LEGACY (Global
                                 Names Index (legacy)), IDENTITY (No action), SIMPLE (Simple, regexp-based) }
 -postParsingRuleset <arg>       Specify an embedded post-parsing ruleset among { bionymPostparsingRules }
 -postParsingRulesetFile <arg>   Specify a file containing a post-parsing ruleset
 -preParsingRuleset <arg>        Specify an embedded pre-parsing ruleset among { commonPreparsingRules, otherPreparsingRules,
                                 bionymPreparsingRules }
 -preParsingRulesetFile <arg>    Specify a file containing a pre-parsing ruleset