YASMEEN converter

From D4Science Wiki
Jump to: navigation, search

"Yet Another Species Matching Execution ENvironment" - DWCA to TAF data converter CLI tool

Purposes

The YASMEEN converter is the command line (CLI) tool that implements the PRODUCE REFERENCE DATA step in the YASMEEN data flow.

It ingests DWCA files (or folders), extracts their content, produces indexes and creates two TAF files (one for taxa and the other for vernacular names data) for later consumption - as reference data files - by the YASMEEN matching engine tool.

YASMEEN already ships with a set of predefined reference data files (in TAF format) for many a public source. Thus, the YASMEEN converter tool should be used only to produce missing TAF files from newly available DWCA sources.

Command line

java -jar YASMEEN-converter-<version>.jar <options>

This CLI tool can be launched with the '-h' option to get a report of the available options:

java -jar YASMEEN-converter-<version>.jar -h

Will give:

usage:
 -h                  Print this message
 -inFile <arg>       Specify an input file (either a DWCA file or a folder containing an exploded DWCA file content)
 -outDir <arg>       Specify the output folder that will contain the .taf.gz files resulting from the conversion of the input DWCA
 -providerId <arg>   Specify the provider ID. This will have impact on the name of the .taf.gz files generated by the
                     conversion, that will be <provider ID>_taxa.taf.gz and <provider ID>_vernacular.taf.gz

General command line options

-h

This option requires no arguments, and - when set - will print the help message and exit (no parsing will be performed)

Input data command line options

-inFile

Mandatory.

Specifies the input file. This can be either a DWCA file or a folder containing an exploded DWCA file content.

Output data command line options

-providerId

Mandatory.

Specifies the provider ID. This identifier will be used to actually name the TAF files generated by the conversion.

Taxa and vernacular TAF files produced out of the input DWCA will be named as:

and

respectively.

-outDir

Optional.

Specifies the output folder that will contain the TAF files resulting from the conversion of the input DWCA.

When this option is not explicitly set, the output directory is determined as follows:

  • If the input file is a proper DWCA file, the output directory will be created in its same folder and be named 'out'
  • If the input file is a folder containing the exploded files in a DWCA file, the output directory will be created in the input folder and be named 'out'

Placeholders expansion

Actual values of the -inFile and -outDir options can use the

{providerId}

placeholder that will in turn be converted in the value of the -providerId option before attempting to access the input file / folder and create the output folder specified by the corresponding options.

Usage examples

Common invocation

  • Convert a DWCA file and store the results in an user-specified folder
java -jar YASMEEN-converter-<version>.jar -inFile /path/to/DWCA/file/Provider1_DWCA_file.zip -outDir /path/to/TAF/dir/Provider1 -providerId PRVD1

Will produce the:

  • PRVD1_taxa.taf.gz

and

  • PRVD1_vernacular.taf.gz

in the

/path/to/TAF/dir/Provider1 

folder

DWCA folder as input

  • Convert DWCA from a folder and store the results in an user-specified folder
java -jar YASMEEN-converter-<version>.jar -inFile /path/to/DWCA/folder/Provider1 -outDir /path/to/TAF/dir/Provider1 -providerId PRVD1

Will produce the:

  • PRVD1_taxa.taf.gz

and

  • PRVD1_vernacular.taf.gz

in the

/path/to/TAF/dir/Provider1 

folder, assuming that the

/path/to/DWCA/folder/Provider1

folder contains the meta.xml and referenced .txt files as per DWCA specification.

No output folder specified

  • Convert a DWCA file and store the results in the default folder
java -jar YASMEEN-converter-<version>.jar -inFile /path/to/DWCA/file/Provider1_DWCA_file.zip -providerId PRVD1

Will produce the:

  • PRVD1_taxa.taf.gz

and

  • PRVD1_vernacular.taf.gz

in the

/path/to/DWCA/file/out 

folder

Placeholders substitution

  • Convert a DWCA file and store the results in an user-specified folder (using placeholders both in the input file and output dir options)
java -jar YASMEEN-converter-<version>.jar -inFile /path/to/DWCA/file/{providerId}/{providerId}_All_DWCA_file.zip -outDir /path/to/TAF/dir/{providerId} -providerId PRVD1

Will read the input DWCA file from:

/path/to/DWCA/file/PRVD1/PRVD1_All_DWCA_file.zip

and produce the

  • PRVD1_taxa.taf.gz

and

  • PRVD1_vernacular.taf.gz

in the

/path/to/TAF/dir/PRVD1 

folder

Appendix

Download

You can download the YASMEEN converter through one of this URLs:

  • v1.1.1 (11.986KB, MD5 sum: 3a7f271554a8527d644d165d60249a66)

Changelog

  • v1.1.1: first working implementation