YASMEEN matching results merger

From D4Science Wiki
Jump to: navigation, search

"Yet Another Species Matching Execution ENvironment" - Matching results merger, filter and transformer CLI tool

Purposes

This is an optional YASMEEN CLI tool that can be used to take two or more matching output results (in raw COMET xml format) and produce a single output result file which is their composition.

The final output result file can be optionally filtered according to:

  • a user-specified maximum number of candidates per each input data appearing in the overall output result file
  • a user-specified minimum matching score per each matching appearing in the overall output result file

Additionally, the raw COMET xml files can be transformed in one of the formats already available out of the box (see: the -xslTemplate option in the matching engine CLI tool) or via a user-specified XSL template (see: the -xslTemplateFile option in the matching engine CLI tool).

When no filtering option is specified on the command line, the overall output result file will have:

  • a maximum number of candidates set to the minimum value among all the partial output result files provided as input
  • a minimum matching score set to the maximum value among all the partial output result files provided as input

Command line

java -jar YASMEEN-merger-<version>.jar <options>

This CLI tool can be launched with the '-h' option to get a report of the available options:

java -jar YASMEEN-merger-<version>.jar -h

Will give:

usage:
 -embeddedTemplate <arg>       Optional. Specify an embedded transformation template to apply to the produced output. Value is
                               one among { stripped, simple, csv, csvNoHeader
 -externalTemplateFile <arg>   Optional. Specify an external transformation template (XSLT) to apply to the produced output
 -h                            Print this message
 -inFile <arg>                 Specify a path to a file containing YASMEEN matching results in raw XML format. This option can
                               be repeated multiple time on the command line.
 -outFile <arg>                Specify a path to the file that will contain the merged matching results
 -retainMaxCandidates <arg>    Optional. Specify the maximum number of candidates (per input data) that must be retained in the
                               merged matching results. Accepts a positive integer as value.
 -retainMinScore <arg>         Optional. Specify the minimum matching score that candidates must have to be retained in the
                               merged matching results. Accepts a decimal value in the range (0.0 .. 1.0]

General command line options

-h

This option requires no arguments, and - when set - will print the help message and exit

Input file command line options

-inFile

Output file command line options

-outFile

Filtering command line options

-retainMinScore

-retainMaxCandidates

Transformation options

-embeddedTemplate

-externalTemplateFile

Appendix

Download

You can download the YASMEEN matching results merger with one of this URLs:

  • v1.1.1 (2.791KB - MD5 sum: 083057bb1ee190786fa792362002e1f8)

Changelog

  • v1.1.1: first working implementation