YASMEEN string transformations

From D4Science Wiki
Revision as of 16:54, 26 October 2013 by Fabio.fiorellato (Talk | contribs) (Simplification)

Jump to: navigation, search

"Yet Another Species Matching Execution ENgine" - common string transformations

Here follows a list of common string transformations involved in the YASMEEN data conversion and matching processes.

Simplification

This is the process of removing all unnecessary characters (symbols, multiple spaces, leading / trailing spaces) from a string, convert the result in the ASCII character set and return the uppercase version of such conversion.

Unnecessary character substitutions is achieved by means of simple RegEx replacements whilst the ASCII character set conversion is delegated to the ICU libraries

Stemming

Soundex

Trigrams