Str.tokenizeAndFold

Tokenises string and performs folding on each token.

A token is a non-empty sequence of alphanumeric characters in the source string, separated by non-alphanumeric characters. An "alphanumeric" character for this purpose is one that matches g_unichar_isalnum() or g_unichar_ismark().

Each token is then (Unicode) normalised and case-folded. If ascii_alternates is non-NULL and some of the returned tokens contain non-ASCII characters, ASCII alternatives will be generated.

The number of ASCII alternatives that are generated and the method for doing so is unspecified, but translit_locale (if specified) may improve the transliteration if the language of the source string is known.

struct Str
static
string[]
tokenizeAndFold
(
string string_
,,
out string[] asciiAlternates
)

Parameters

string_ string

a string

translitLocale string

the language code (like 'de' or 'en_GB') from which string originates

asciiAlternates string[]

a return location for ASCII alternates

Return Value

Type: string[]

the folded tokens

Meta

Since

2.40