Difference between revisions of "YASMEEN string transformations"

Revision as of 16:56, 26 October 2013

"Yet Another Species Matching Execution ENgine" - common string transformations

Here follows a list of common string transformations involved in the YASMEEN data conversion and matching processes.

Simplification

This is the process of removing all non-letter and unnecessary characters (symbols, digits, multiple spaces, leading / trailing spaces) from a string, convert the result in the ASCII character set and return the uppercase version of such conversion.

Unnecessary character substitutions is achieved by means of simple RegEx replacements whilst the ASCII character set conversion is delegated to the ICU Java libraries. In particular, the transliterator ID actually used during the process is:

Any-Latin; NFD; [:nonspacing mark:] remove; NFC; Latin-ASCII;

@@ Line 5: / Line 5: @@
 == Simplification ==
-This is the process of removing all unnecessary characters (symbols, multiple spaces, leading / trailing spaces) from a string, convert the result in the ASCII character set and return the uppercase version of such conversion.
+This is the process of removing all non-letter and unnecessary characters (symbols, digits, multiple spaces, leading / trailing spaces) from a string, convert the result in the ASCII character set and return the uppercase version of such conversion.
-Unnecessary character substitutions is achieved by means of simple RegEx replacements whilst the ASCII character set conversion is delegated to the [http://userguide.icu-project.org/icufaq/icu4j-faq ICU libraries]
+Unnecessary character substitutions is achieved by means of simple RegEx replacements whilst the ASCII character set conversion is delegated to the [http://icu-project.org/ ICU Java libraries]. In particular, the transliterator ID actually used during the process is:
+ Any-Latin; NFD; [:nonspacing mark:] remove; NFC; Latin-ASCII;
 == Stemming ==

Difference between revisions of "YASMEEN string transformations"

Revision as of 16:56, 26 October 2013

Simplification

Stemming

Soundex

Trigrams

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

D4Science

Capacity

Procedures

Policies

Documentation

Tools