Solutions

Solutions

Digital solutions for your business.
Combining strategy, technology, automation and people

Go to overview

Technology advisory
Navigate the fast-changing world.

Cloud engineering
Transformational change at scale and speed

Data solutions
Realise the untapped potential of data.

AI and machine learning
Leverage your data assets.

Application engineering
Optimise and grow your digital investment.

Maintenance and support
End-to-end application management.

Business process solutions
Manage business processes to reduce operating costs.

Quality solutions
Independent testing for your systems and software.
Industries

Industry

We provide solutions tailored to your sector to assist you in identifying opportunities, realising value and opening up new markets.

Go to overview

Financial services and insurance
Insurance, risk management, banks, and fintech

Healthcare
Patient empowerment, Lifesciences, and pharma

Retail
Functional and emotional customer experiences online and in-store

Travel
Airlines, online travel giants, niche disruptors

Media and publishing
Content consumption for the tech-driven audience

Hi-tech and IOT
Real-time information and operational agility and flexibility to respond to market changes

Logistics and supply chain
Reimagine a supply chain that is more flexible and resilient to change

Education
Create an exciting and engaging digital experience for students and departments
Our thinking

Our thinking

The latest updates to help future-focused organisations on the issues that matter most in business.

Go to overview

News
Keep up to date with news and announcements at NashTech.

Insights
We are dedicated to enhancing the tech sector and are enthusiastic about imparting our knowledge in technology, business and culture.

Blog
Expert guidance on everything from complex technological issues to current business trends.
Case studies
About us

Exploring Fuzzy Search and Levenshtein Distance in Data Retrieval

hoannguyen1

Image on Internet

Introduction

In the world of search engines and data retrieval, achieving high accuracy and relevance in the results is a constant challenge. One of the techniques used to improve search results is Fuzzy Search.

This blog post will delve into the concept of fuzzy search, its implementation using the Levenshtein Distance, and how to test its effectiveness.

Understanding Fuzzy Search

Fuzzy search is a powerful tool for finding approximate matches in a dataset.
It’s particularly useful when dealing with human language, where small typos or variations in spelling can make exact matching impractical.

Understanding the Levenshtein Distance

The Levenshtein distance, named after Vladimir Levenshtein, is a measure of the difference between two strings,
It’s defined as the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into the other.

Fuzzy Search Utilizing Levenshtein Distance

Fuzzy search becomes more powerful and flexible when combined with the Levenshtein distance. When performing a fuzzy search using the Levenshtein distance, we can specify a threshold of how “fuzzy” our search can be.

For instance, with a Levenshtein distance of 1, we can find all words in our dataset that can be obtained by changing the original word with at most one single-character edit.

Testing Fuzzy Search with a Sample

Sample 01: Fuzzy search for a dataset Composed of Single Words

Imagine we have a list of words: “brush,” “blush,” “crush,” “brash,” “bush,” “rush,” and “brunch.”

Now, let’s say we’re looking for the word “brush” using a fuzzy search, which means we’re okay with small changes in spelling.

We set a rule that allows only one change in the word we’re searching for, like replacing a letter, adding a letter, or removing a letter. This rule is called the Levenshtein distance of 1.

When we search for “brush” the fuzzy search algorithm finds similar words from our list that are within one change away. Here’s what it finds:

“brush” – Exactly the same, no changes needed.
“blush” – Replace ‘r’ with ‘l’ in “brush.”
“crush” – Replace ‘b’ with ‘c’ in “brush.”
“brash” – Replace ‘u’ with ‘a’ in “brush.”
“bush” – Delete ‘r’ from “brush.”
“rush” – Delete ‘b’ from “brush.”

But, it doesn’t include “brunch” because it requires more than one change to turn “brush” into “brunch.” We would need to change ‘s’ to ‘n’ and add a ‘c’ after ‘n,’ which is two changes, exceeding the allowed one change in the Levenshtein distance of 1.

So, “brunch” isn’t in the search results because it needs more changes than the fuzzy search allows (the Levenshtein distance of 1).

Sample 02: Fuzzy search for a dataset Composed of Two Words

Imagine we have a list of cool subjects we can learn about like [“data science”, “machine learning”, “artificial intelligence”, “natural language processing”].

Search keyword: “mahcine learnign”, with Levenshtein distance of 2.

To figure out if these two phrases are similar, we break them down into smaller parts, or tokens, like breaking a cookie into pieces. So, “machine learning” becomes [“machine”, “learning”].

For transforming “mahcine learnign” into “machine learning”, Here’s what it finds:

Delete “h” from “mahcine” to get “macine”.
Insert “h” after “c” in “macine” to get “machine”.
Delete “g” from “learnign” to get “learnin”.
Insert “g” after “n” in “learnin” to get “learning”.

So, it takes 4 operations (2 deletions and 2 insertions) to transform “mahcine learnign” into “machine learning”. This exceeds the maximum of 2 changes (Levenshtein distance of 2) we mentioned earlier. Therefore, “mahcine learnign” would not be a match for “machine learning” in a fuzzy search allowing a maximum of 2 changes.

Conclusion

Fuzzy search, especially when combined with techniques like the Levenshtein distance, can significantly enhance the user experience by returning more relevant search results and accommodating minor discrepancies in the search query. Like any algorithm, it’s important to continually test and refine the fuzzy search to ensure it meets the needs of the users.

Remember, the effectiveness of fuzzy search can depend on the specific use case and the nature of the data being searched. It’s always important to consider the context when deciding whether to use fuzzy search or another search method.

References

Fuzzy search – Azure AI Search | Microsoft Learn

What is a Fuzzy Search? | AddSearch

Understanding How Fuzzy Search Works: A Comprehensive Guide – Expertrec

hoannguyen1

I am a Senior Test Engineer with over 9 years of experience in the software testing industry. I am responsible for managing the testing process within a software development project to ensure the quality standards and expected performance.

Solutions

Industry

Our thinking

Exploring Fuzzy Search and Levenshtein Distance in Data Retrieval

hoannguyen1

Table of Contents

Introduction

Understanding Fuzzy Search

Understanding the Levenshtein Distance

Fuzzy Search Utilizing Levenshtein Distance

Testing Fuzzy Search with a Sample

Sample 01: Fuzzy search for a dataset Composed of Single Words

Sample 02: Fuzzy search for a dataset Composed of Two Words

Conclusion

References

hoannguyen1

Leave a Comment Cancel Reply

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements