YASMEEN input data parser
From D4Science Wiki
Revision as of 16:04, 25 October 2013 by Fabio.fiorellato (Talk | contribs) (Created page with ""''Yet Another Species Matching Execution ENgine''" - Input data parser CLI tool == Purposes == The YASMEEN Input data parser is the command line (CLI) tool that implements the...")
"Yet Another Species Matching Execution ENgine" - Input data parser CLI tool
Purposes
The YASMEEN Input data parser is the command line (CLI) tool that implements the first step in the YASMEEN data flow.
It ingests, pre-processes, parses, post-processes and converts in the proper format, a set of input data provided as unstructured (or semi-structured) lines in a text file.
Command line
java -jar YASMINE-parser-<version>.jar <options>
You can launch it with the '-h' option to get a report of the available options with their description:
java -jar YASMINE-parser-<version>.jar -h
Will give:
usage: InputDataParser: -dataSourceId <arg> Specify the identifier for the data source originating the input data. Defaults to 'UserProvidedData' when not set -h Print this message -inFile <arg> Specify a path to the file containing unstructured input data (one per line) -noHeader Omit the CSV header in the produced parsed results file -outFile <arg> Specify a path to the file that will contain the structured parsed results -parser <arg> Specify one of the available input parsers among { GNI (Global Names Index), GNI_LEGACY (Global Names Index (legacy)), IDENTITY (No action), SIMPLE (Simple, regexp-based) } -postParsingRuleset <arg> Specify an embedded post-parsing ruleset among { bionymPostparsingRules } -postParsingRulesetFile <arg> Specify a file containing a post-parsing ruleset -preParsingRuleset <arg> Specify an embedded pre-parsing ruleset among { commonPreparsingRules, otherPreparsingRules, bionymPreparsingRules } -preParsingRulesetFile <arg> Specify a file containing a pre-parsing ruleset