Tailent.StringUtils.FuzzSetMatching

This method matches a given string to a given string dataset, using a set of fuzzy text matching algorithms. This is more useful than traditional exact string matching when the provided string is not an exact match with one from the dataset.

Given the following example dataset:

“Apple Inc.” , “IBM”, “Oracle Ltd.”

If we are trying to locate “Apple Inc.” from the dataset using traditional string matching, we would need to look for the exact same string. Using the FuzzSetMatching method, we can specify a partial or incomplete string to look for matches within the dataset - for instance, we could just look for “Apple” or “App” and we would still get a match. This also works if the target item from the dataset starts differently (for instance, “Apple” would still match with “2020 Apple Inc.”).

The way the method works is simple - it will analyze every item in the dataset and provide a score between 0 and 100 to every item (higher means better match). After the scoring phase is complete, a shortlist is built and any items from a dataset with a score lower than the specified threshold are discarded. The remaining items are considered good matches and the one with the highest score will be returned.

Usage

The method can be called within C# Script actions blocks, using the following syntax:

string Tailent.StringUtils.FuzzSetMatching(string TargetString, 
IEnumerable<string> DataSet, int Threshold = 75, bool MatchCase = false)

Returns

The best matching string from the dataset (according to the set threshold) or null if the dataset is empty, or no strings from the dataset received a score larger than the threshold.

Parameters

string TargetString - the partial string used for searching.

IEnumerable<string> DataSet - any IEnumerable string collection, representing the dataset that will be searched.

int Threshold - (OPTIONAL) an integer value representing the minimum score required to consider candidates. The higher the score, the more exact the match has to be. This needs to be adjusted depending on dataset consistency and usage scenarios. Default value is 75.

bool MatchCase - (OPTIONAL) a boolean value specifying if the matching process should be case sensitive or not. Default value is false.

Example usage

Tailent.StringUtils.FuzzSetMatching(Partner, Partners.Keys.Cast<string>(), 85);

The above example does a case-insensitive search for the string stored in the Partner variable (string), within the dataset (Partners variable of type Dictionary<string, string>). With the dataset being a dictionary, it has to be cast to a list of string (hence the .Cast call). The matching threshold is set to 85 in this case, to force a more exact matching.

Last updated