Difference between revisions of "YASMEEN matching engine"

From D4Science Wiki
Jump to: navigation, search
(Command line)
(Command line)
Line 80: Line 80:
  
 
This option requires no arguments, and - when set - will print the help message and exit (no parsing will be performed)
 
This option requires no arguments, and - when set - will print the help message and exit (no parsing will be performed)
 +
 +
==== -wait ====
 +
 +
==== -verbose ====
  
 
=== Input file command line options ===
 
=== Input file command line options ===
Line 85: Line 89:
  
 
Mandatory.  
 
Mandatory.  
 +
 +
==== -dontSkipHeader ====
 +
 +
=== Reference data command line options ===
 +
 +
==== -mt ====
 +
 +
==== -refData ====
 +
 +
=== Matching execution configuration options ===
 +
 +
==== -pt ====
 +
==== -ps ====
 +
 +
=== Matching process configuration options ===
 +
 +
==== -mst ====
 +
==== -mc ====
 +
==== -hafm ====
 +
 +
=== Matchlets configuration options ===
 +
 +
==== -law ====
 +
 +
==== -mSn ====
 +
==== -mSnw ====
 +
==== -mSnt ====
 +
 +
==== -mgn ====
 +
==== -mgnw ====
 +
==== -mgnt ====
 +
 +
==== -mNgn ====
 +
==== -mNgnw ====
 +
==== -mNgnw ====
  
 
=== Output file command line options ===
 
=== Output file command line options ===
 
==== -outFile ====
 
==== -outFile ====
 +
 +
=== Output format command line options ===
 +
=== -xml ===
 +
=== -report ===
  
 
== Usage examples ==
 
== Usage examples ==

Revision as of 14:02, 27 October 2013

"Yet Another Species Matching Execution ENgine" - Matching engine CLI tool

Purposes

The YASMEEN matching engine is the command line (CLI) tool that implements the MATCH DATA and PRODUCE MATCHING RESULTS steps in the YASMEEN data flow.

It takes a parsed input data file as input, a set of TAF files as reference data, a set of matchlets configuration options and identifies matching between input and reference data entries, producing results in a format specified by the user.

Command line

java -jar YASMINE-engine-<version>.jar <options>

This CLI tool can be launched with the '-h' option to get a report of the available options:

java -jar YASMINE-engine-<version>.jar -h

Will give:

usage:
 -dontSkipHeader          Set this option if the parsed input data file doesn't start with a CSV header row
 -h                       Print this message
 -hfm                     Instructs the system to halt the current data process at the first valid matching (i.e. a matching
                          with an overall score higher than the minimum set)
 -inFile <arg>            Path to a text file containing the parsed input data (one per line)
 -law <arg>               Sets the different lexical algorithms weight for matchers that do perform lexical comparisons. The
                          syntax of this parameter is: <lev>:<sndx>:<trig>, with <lev> being the weight of the calculated
                          Levenshtein similarity, <sndx> being the weight of the calculated soundex comparison and <trig> being
                          the weight of the calculated trigram similarity. To enable Levenshtein similarity only, use -law
                          100:0:0. Conversely, to enable soundex only you should use: -law 0:100:0, to enable trigrams only you
                          should use -law 0:0:100 and to enable an equal mix of all three, you should use -law 100:100:100.
                          Valid values for each of these three weights are in the range [0, 100]
 -man                     Enables the authority name matching
 -mant <arg>              Sets the authority name matching results minimum score threshold (0.0, 1.0]
 -manw <arg>              Sets the authority name matching weight (0.0, n]
 -may                     Enables the authority year matching
 -mayt <arg>              Sets the authority year matching results minimum score threshold (0.0, 1.0]
 -mayw <arg>              Sets the authority year matching weight (0.0, n]
 -mc <arg>                Sets the maximum number of matching candidates for each entry [1, n]
 -mftm                    Enables the FuzzyTaxamatch matching
 -mftmt <arg>             Sets the FuzzyTaxamatch matching results minimum score threshold (0.0, 1.0]
 -mftmw <arg>             Sets the FuzzyTaxamatch matching weight (0.0, n]
 -mgn                     Enables the genus name matching
 -mgnt <arg>              Sets the genus name matching results minimum score threshold (0.0, 1.0]
 -mgnw <arg>              Sets the genus name matching weight (0.0, n]
 -mNgn                    Enables the normalized genus name matching
 -mNgnt <arg>             Sets the normalized genus name matching results minimum score threshold (0.0, 1.0]
 -mNgnw <arg>             Sets the normalized genus name matching weight (0.0, n]
 -mNsn                    Enables the normalized species name matching
 -mNsnt <arg>             Sets the normalized species name matching results minimum score threshold (0.0, 1.0]
 -mNsnw <arg>             Sets the normalized species name matching weight (0.0, n]
 -mSn                     Enables the scientific name matching
 -msn                     Enables the species name matching
 -msnt <arg>              Sets the species name matching results minimum score threshold (0.0, 1.0]
 -mSnt <arg>              Sets the scientific name matching results minimum score threshold (0.0, 1.0]
 -msnw <arg>              Sets the species name matching weight (0.0, n]
 -mSnw <arg>              Sets the scientific name matching weight (0.0, n]
 -mst <arg>               Sets the matching results minimum score threshold (0.0, 1.0]
 -mt                      If enabled, target data will be materialized in-memory before actually launching the process [
                          EXPERIMENTAL FEATURE ]
 -mtm                     Enables the Taxamatch matching
 -mtmw <arg>              Sets the Taxamatch matching weight (0.0, n]
 -outFile <arg>           Results will be written to this file. When not set defaults to standard output.
 -ps                      If the -pt option is enabled, each thread will be assigned a fraction of the input source data to
                          process against the target data [ EXPERIMENTAL FEATURE ]
 -pt <arg>                Specifies the number of threads for parallel execution. It can either be an absolute number (e.g. -pt
                          4 - use 4 parallel threads) or a relative number with respect to the number of cores (e.g. -pt 4.5x -
                          use a number of thread that is 4.5 times the number of available cores) [ EXPERIMENTAL FEATURE ]
-refData <arg> Specify coordinates for a reference data source. These are in the form:

Invalid language.

You need to specify a language like this: <source lang="html4strict">...</source>

Supported languages for syntax highlighting:

4cs, 6502acme, 6502kickass, 6502tasm, 68000devpac, abap, actionscript, actionscript3, ada, aimms, algol68, apache, applescript, arm, asm, asp, asymptote, autoconf, autohotkey, autoit, avisynth, awk, bascomavr, bash, basic4gl, bf, bibtex, blitzbasic, bnf, boo, c, caddcl, cadlisp, cfdg, cfm, chaiscript, chapel, cil, clojure, cmake, cobol, coffeescript, cpp, csharp, css, cuesheet, d, dart, dcl, dcpu16, dcs, delphi, diff, div, dos, dot, e, ecmascript, eiffel, email, epc, erlang, euphoria, ezt, f1, falcon, fo, fortran, freebasic, freeswitch, fsharp, gambas, gdb, genero, genie, gettext, glsl, gml, gnuplot, go, groovy, gwbasic, haskell, haxe, hicest, hq9plus, html4strict, html5, icon, idl, ini, inno, intercal, io, ispfpanel, j, java, java5, javascript, jcl, jquery, kixtart, klonec, klonecpp, latex, lb, ldif, lisp, llvm, locobasic, logtalk, lolcode, lotusformulas, lotusscript, lscript, lsl2, lua, m68k, magiksf, make, mapbasic, matlab, mirc, mmix, modula2, modula3, mpasm, mxml, mysql, nagios, netrexx, newlisp, nginx, nimrod, nsis, oberon2, objc, objeck, ocaml, octave, oobas, oorexx, oracle11, oracle8, oxygene, oz, parasail, parigp, pascal, pcre, per, perl, perl6, pf, php, pic16, pike, pixelbender, pli, plsql, postgresql, postscript, povray, powerbuilder, powershell, proftpd, progress, prolog, properties, providex, purebasic, pycon, pys60, python, q, qbasic, qml, racket, rails, rbs, rebol, reg, rexx, robots, rpmspec, rsplus, ruby, rust, sas, scala, scheme, scilab, scl, sdlbasic, smalltalk, smarty, spark, sparql, sql, standardml, stonescript, systemverilog, tcl, teraterm, text, thinbasic, tsql, typoscript, unicon, upc, urbi, uscript, vala, vb, vbnet, vbscript, vedit, verilog, vhdl, vim, visualfoxpro, visualprolog, whitespace, whois, winbatch, xbasic, xml, xpp, yaml, z80, zxbasic


@<TAXA SOURCE
                           URL>(,<VERNACULAR NAMES SOURCE URL>
  -report                  Results are emitted in human-readable format
  -verbose                 Enables emitting some (very) verbose messages during the process
  -wait                    Request to wait for users hitting ENTER before starting the process
  -xml                     Results will be emitted in XML format
  -xslTemplate <arg>       Specifies an embedded transformation template for the XML output among { stripped, simple, csv,
                           csvNoHeader }
  -xslTemplateFile <arg>   Apply the given XSL stylesheet to the XML output before emitting the results
 
=== General command line options ===
==== -h ====

This option requires no arguments, and - when set - will print the help message and exit (no parsing will be performed)

==== -wait ====

==== -verbose ====

=== Input file command line options ===
==== -inFile ====

Mandatory. 

==== -dontSkipHeader ====

=== Reference data command line options ===

==== -mt ====

==== -refData ====

=== Matching execution configuration options ===

==== -pt ====
==== -ps ====

=== Matching process configuration options ===

==== -mst ====
==== -mc ====
==== -hafm ====

=== Matchlets configuration options ===

==== -law ====

==== -mSn ====
==== -mSnw ====
==== -mSnt ====

==== -mgn ====
==== -mgnw ====
==== -mgnt ====

==== -mNgn ====
==== -mNgnw ====
==== -mNgnw ====

=== Output file command line options ===
==== -outFile ====

=== Output format command line options ===
=== -xml ===
=== -report ===

== Usage examples ==

== Appendix ==

== Download ==

You can download the [[ YASMEEN ]] matching engine through one of this URLs:

* [http://goo.gl/ v1.1.1] ( KB)