Difference between revisions of "YASMEEN input-output filter"

From D4Science Wiki
Jump to: navigation, search
(-resultFile)
(-resultFileFormat)
Line 51: Line 51:
 
Optional.
 
Optional.
  
Provides a hint about the actual format (among [[YASMEEN_matching_engine#-xslTemplate|those]] available out-of-the-box in the YASMEEN [[YASMEEN matching engine|matching engine]]) the matching results output file has been emitted into.
+
Provides a hint about the actual format (among [[YASMEEN_matching_engine#-xslTemplate|those]] available out-of-the-box in the YASMEEN [[YASMEEN matching engine|matching engine]]) the matching results output file have been emitted into.
  
 
When this option is not set, the YASMEEN [[YASMEEN_input-output_filter#Purposes|input - output filter]] will attempt to guess the actual format by inspecting the content of the file itself.
 
When this option is not set, the YASMEEN [[YASMEEN_input-output_filter#Purposes|input - output filter]] will attempt to guess the actual format by inspecting the content of the file itself.

Revision as of 19:14, 31 October 2013

"Yet Another Species Matching Execution ENvironment" - input-output filter CLI tool

Purposes

This is an optional YASMEEN CLI tool that can be effectively used to extract non-matching parsed input data as the intersection between an initial parsed input dataset and the results of a matching process for that same inputs.

It is particularly useful in the context of an iterative matching workflow, when non-matching input data need to be re-processed by different matchers (assuming these can ingest input data in the YASMEEN parsed input data format) or by another run of the YASMEEN matching engine with different configurations.

Command line

java -jar YASMEEN-inout-filter-<version>.jar <options>

This CLI tool can be launched with the '-h' option to get a report of the available options:

java -jar YASMEEN-inout-filter-<version>.jar -h

Will give:

 -h                        Print this message
 -outFile <arg>            Specify the path to the file that will contain the filtered subset of the provided parsed input data
                           according to filtering configuration
 -outFileFormat <arg>      Specify the format of the file that will contain the filtered subset of the provided parsed input
                           data according to filtering configuration. Possible values are: {rawInput, parsedInput}
 -parsedInFile <arg>       Specify a path to a file containing YASMEEN input data in parsed input format
 -resultFile <arg>         Specify a path to the file containing YASMEEN matching results for the provided parsed input file
 -resultFileFormat <arg>   Specify the format of the file containing YASMEEN matching results for the provided parsed input
                           file. Possible values are: {rawInput, parsedInput}

General command line options

-h

This option requires no arguments, and - when set - will print the help message and exit

Input file command line options

-parsedInFile

Mandatory.

Specifies the path to an input dataset file (in the YASMEEN parsed input data format) that has already been processed and has produced a matching result output file in one of the formats available out-of-the-box in the YASMEEN matching engine.

Result file command line options

-resultFile

Mandatory.

Specifies the path to a matching results output file (in any of the formats available out-of-the-box in the YASMEEN matching engine) produced by the matching engine for the specified input file.

-resultFileFormat

Optional.

Provides a hint about the actual format (among those available out-of-the-box in the YASMEEN matching engine) the matching results output file have been emitted into.

When this option is not set, the YASMEEN input - output filter will attempt to guess the actual format by inspecting the content of the file itself.

Use identity as this' option value when the matching results output file is in the raw COMET xml format(i.e. XML output is enabled and no transformation is applied in the -xslTemplate matching engine option).

Output file format

The output file produced by the input - output filter will contain a replica of all those input data (found in the file specified via the -parsedInputFile option) that have no matching identified in the matching results output file (specified via the -resultFile option).

-outFile

Mandatory.

Specifies the path to the file that will contain the actual filtered input data not appearing in the matching results output file.

-outFileFormat

Optional.

Specifies the format of the output file as a value among { rawInput, parsedInput }.

rawInput

The output file will be emitted in the raw input data format (either semi-structured or unstructured, according to the original input data file format). As such, it needs pre-parsing before it can be used as an input data set for the YASMEEN matching engine.

parsedInput

The output file will be emitted in the parsed input data format. As such, it can be immediately used as an input data set for the YASMEEN matching engine.

Appendix

Download

You can download the YASMEEN input-output filter with one of this URLs:

Changelog

  • v1.1.1: first working implementation