YASMEEN converter
"Yet Another Species Matching Execution ENvironment" - DWCA to TAF data converter CLI tool
Purposes
The YASMEEN converter is the command line (CLI) tool that implements the PRODUCE REFERENCE DATA step in the YASMEEN data flow.
It ingests DWCA files (or folders), extracts their content, produces indexes and creates two TAF files (one for taxa and the other for vernacular names data) for later consumption - as reference data files - by the YASMEEN matching engine tool.
YASMEEN already ships with a set of predefined reference data files (in TAF format) for many a public source. Thus, the YASMEEN converter tool should be used only to produce missing TAF files from newly available DWCA sources.
Command line
java -jar YASMEEN-converter-<version>.jar <options>
This CLI tool can be launched with the '-h' option to get a report of the available options:
java -jar YASMEEN-converter-<version>.jar -h
Will give:
usage: -h Print this message -inFile <arg> Specify an input file (either a DWCA file or a folder containing an exploded DWCA file content) -outDir <arg> Specify the output folder that will contain the .taf.gz files resulting from the conversion of the input DWCA -providerId <arg> Specify the provider ID. This will have impact on the name of the .taf.gz files generated by the conversion, that will be <provider ID>_taxa.taf.gz and <provider ID>_vernacular.taf.gz
General command line options
-h
This option requires no arguments, and - when set - will print the help message and exit (no parsing will be performed)
Input data command line options
-inFile
Mandatory.
Specifies the input file. This can be either a DWCA file or a folder containing an exploded DWCA file content.
Output data command line options
-providerId
Mandatory.
Specifies the provider ID. This identifier will be used to actually name the TAF files generated by the conversion.
Taxa and vernacular TAF files produced out of the input DWCA will be named as:
and
respectively.
-outDir
Optional.
Specifies the output folder that will contain the TAF files resulting from the conversion of the input DWCA.
When this option is not explicitly set, the output directory is determined as follows:
- If the input file is a proper DWCA file, the output directory will be created in its same folder and be named 'out'
- If the input file is a folder containing the exploded files in a DWCA file, the output directory will be created in the input folder and be named 'out'
Placeholders expansion
Actual values of the -inFile and -outDir options can use the
{providerId}
placeholder that will in turn be converted in the value of the -providerId option before attempting to access the input file / folder and create the output folder specified by the corresponding options.
Usage examples
Common invocation
- Convert a DWCA file and store the results in an user-specified folder
java -jar YASMEEN-converter-<version>.jar -inFile /path/to/DWCA/file/Provider1_DWCA_file.zip -outDir /path/to/TAF/dir/Provider1 -providerId PRVD1
Will produce the:
- PRVD1_taxa.taf.gz
and
- PRVD1_vernacular.taf.gz
in the
/path/to/TAF/dir/Provider1
folder
DWCA folder as input
- Convert DWCA from a folder and store the results in an user-specified folder
java -jar YASMEEN-converter-<version>.jar -inFile /path/to/DWCA/folder/Provider1 -outDir /path/to/TAF/dir/Provider1 -providerId PRVD1
Will produce the:
- PRVD1_taxa.taf.gz
and
- PRVD1_vernacular.taf.gz
in the
/path/to/TAF/dir/Provider1
folder, assuming that the
/path/to/DWCA/folder/Provider1
folder contains the meta.xml and referenced .txt files as per DWCA specification.
No output folder specified
- Convert a DWCA file and store the results in the default folder
java -jar YASMEEN-converter-<version>.jar -inFile /path/to/DWCA/file/Provider1_DWCA_file.zip -providerId PRVD1
Will produce the:
- PRVD1_taxa.taf.gz
and
- PRVD1_vernacular.taf.gz
in the
/path/to/DWCA/file/out
folder
Placeholders substitution
- Convert a DWCA file and store the results in an user-specified folder (using placeholders both in the input file and output dir options)
java -jar YASMEEN-converter-<version>.jar -inFile /path/to/DWCA/file/{providerId}/{providerId}_All_DWCA_file.zip -outDir /path/to/TAF/dir/{providerId} -providerId PRVD1
Will read the input DWCA file from:
/path/to/DWCA/file/PRVD1/PRVD1_All_DWCA_file.zip
and produce the
- PRVD1_taxa.taf.gz
and
- PRVD1_vernacular.taf.gz
in the
/path/to/TAF/dir/PRVD1
folder
Appendix
Download
You can download the YASMEEN converter through one of this URLs:
- v1.1.1 (11.986KB, MD5 sum: 3a7f271554a8527d644d165d60249a66)
Changelog
- v1.1.1: first working implementation