YASMEEN input-output filter
"Yet Another Species Matching Execution ENvironment" - input-output filter CLI tool
Purposes
This is an optional YASMEEN CLI tool that can be effectively used to extract non-matching parsed input data as the intersection between an initial parsed input dataset and the results of a matching process for that same inputs.
It is particularly useful in the context of an iterative matching workflow, when non-matching input data need to be re-processed by different matchers (assuming these can ingest input data in the YASMEEN parsed input data format) or by another run of the YASMEEN matching engine with different configurations. See, as potential usage scenarios, the M1C, M2C and MNC components in the BiOnym workflow specification.
Command line
java -jar YASMEEN-inout-filter-<version>.jar <options>
This CLI tool can be launched with the '-h' option to get a report of the available options:
java -jar YASMEEN-inout-filter-<version>.jar -h
Will give:
-h Print this message -outFile <arg> Specify the path to the file that will contain the filtered subset of the provided parsed input data according to filtering configuration -outFileFormat <arg> Specify the format of the file that will contain the filtered subset of the provided parsed input data according to filtering configuration. Possible values are: {rawInput, parsedInput} -parsedInFile <arg> Specify a path to a file containing YASMEEN input data in parsed input format -resultFile <arg> Specify a path to the file containing YASMEEN matching results for the provided parsed input file -resultFileFormat <arg> Specify the format of the file containing YASMEEN matching results for the provided parsed input file. Possible values are: {rawInput, parsedInput}
General command line options
-h
This option requires no arguments, and - when set - will print the help message and exit
Input file command line options
-parsedInFile
Mandatory.
Specifies the path to an input dataset file (in the YASMEEN parsed input data format) that has already been processed and has produced a matching result output file in one of the formats available out-of-the-box in the YASMEEN matching engine.
Result file command line options
-resultFile
Mandatory.
Specifies the path to a matching results output file (in any of the formats available out-of-the-box in the YASMEEN matching engine) produced by the matching engine for the specified input file.
-resultFileFormat
Optional.
Provides a hint about the actual format (among those available out-of-the-box in the YASMEEN matching engine) the matching results output file have been emitted into.
When this option is not set, the YASMEEN input - output filter will attempt to guess the actual format by inspecting the content of the file itself.
Use identity
as this' option value when the matching results output file is in the raw COMET xml format (i.e. XML output is enabled and no transformation is selected with the -xslTemplate matching engine option).
Output file format
The output file produced by the input - output filter will contain a replica of all those input data (found in the file specified via the -parsedInputFile option) that have no matching identified in the matching results output file (specified via the -resultFile option).
-outFile
Mandatory.
Specifies the path to the file that will contain the actual filtered input data not appearing in the matching results output file.
-outFileFormat
Optional.
Specifies the format of the output file as a value among { rawInput, parsedInput }.
rawInput
The output file will be emitted in the raw input data format (either semi-structured or unstructured, according to the original input data file format). As such, it needs pre-parsing before it can be used as an input data set for the YASMEEN matching engine.
parsedInput
The output file will be emitted in the parsed input data format. As such, it can be immediately used as an input data set for the YASMEEN matching engine.
Appendix
Download
You can download the YASMEEN input-output filter with one of this URLs:
- v1.1.1 (2.789KB - MD5 sum: 6ce646891c6883fc929edc0fc7bf43ef)
Changelog
- v1.1.1: first working implementation