Last post Jun 20, 2017 08:03 AM by EvenMa
Jun 06, 2017 11:07 AM|0belix|LINK
I've implemented a Full Text Search using the FREETEXTTABLE function and it works fine, but i need to upgrade this to search for approximate words like for instance if you insert a typo on your search string like "uprade" when i wanted to search for "upgrade".
I've read about the Levenshtein algorithm but i'm not seeing how to implement this when searching on Text fields. I can see its usage when comparing with an ID, for instance, or a single word, but how can i implement this, or any other solution,
that allows me to search inside a text field with 1000 words or more on each row for instance, and return weighted results that are aproximate to the search string?
Any help is appreciated
Jun 07, 2017 05:53 AM|zxj|LINK
Use CONTAINS and CONTAINSTABLE to match words and phrases.
Use FREETEXT and FREETEXTTABLE to match the meaning, but not the exact wording.
You can use CONTAINS, CONTAINSTABLE, FREETEXT, or FREETEXTTABLE to search a table for a specific phrase.
You can use CONTAINS or CONTAINSTABLE to search for words or phrases with a specified prefix.
You can use CONTAINS, CONTAINSTABLE, FREETEXT, or FREETEXTTABLE to search for all the different tenses and conjugations of a verb or both the singular and plural forms of anoun (an inflectional search) or for synonymous forms of a specific word (a thesaurus
You can use CONTAINSTABLE to search for words or phrases and specify a weighting value.
Query with Full-Text Search
There are few other popular algorithms that can be used to do Fuzzy Search, such as
Jaccard Similarity Coefficient
Longest Common Subsequence and so on
SQL Server Fuzzy Search - Levenshtein Algorithm
String Comparisons in SQL: Edit Distance and the Levenshtein algorithm
Jun 07, 2017 02:49 PM|0belix|LINK
I'm not sure this works with my requirements... when searching a TEXT column in SQL, that has, lets say, 2000 words on each row, can i search for 'ultimats' (wrongly typed, missing the E between the T and the S) and produce results when somewhere in the
middle of the contents of the row on the TEXT column there is a word 'ultimates' without breaking the the full content of the TEXT column of each row into separate words? Will it produce any result at all? or will it produce a result like 2000-8 (8 because
its the number of letters that are a match for the 'ultimates' word when searching for 'ultimats')?
I ask this because all the examples and implementations i've seen so far are always comparing individual words or simple sentences with the searchstring, and never on a full text.
Jun 07, 2017 03:32 PM|PatriceSc|LINK
My first though would be perhaps to check if the word is found in the index and if not to suggest close words. The final search would be done with a "fixed" word rather than with a typo.
I believe other more general search products are doing that but I'm really not sure SQL Server handles typo out of the box (another option would be to include the typo as part of a thesaurus but I'm not sure it's that a good idea).
Jun 08, 2017 09:59 AM|EvenMa|LINK
According to your description, I suggest that you could compare the similarity of two strings, see the following statements:
CREATE FUNCTION DBO.FN_Resemble
DECLARE @num FLOAT,@len int
WHILE(LEN(@Cloumna)<>0 AND LEN(@CloumnB)<>0)
DECLARE @a NVARCHAR(4)
DECLARE @b NVARCHAR(4)
ELSE IF (LEN(@Cloumna)<LEN(@CloumnB))
SELECT DBO.FN_Resemble ('ABDC12345AG','ABDC123G')
If you have any other questions, please feel free to contact me any time.
Jun 14, 2017 05:26 PM|0belix|LINK
Thx for the response, but that is not the goal i'm attempting to achieve... I do not want to compare 2 simple strings... i want to perform a 'fuzzy' search on multiple texts that are on the database to find out which ones contain the word i'm searching for,
and order the results with the texts that have the exact word coming first, and then the results of the texts where approximate words were found, also ordered from the most aproximate to the less aproximate within a 'distance' of 4, for instance, so if i search
for 'batle' (instead of 'battle'), i can get the texts where the word 'batle' is included, but also the results where 'battle' was found, or 'bttle' or 'attle' or 'batte', etc... so i'm not comparing a word with another word, but rather finding if the word
is found on multiple texts and within which 'distance' (Levenshtein distance, that is).
If i was to compare the search string with the text and obtain a levenstein distance as you suggested, i would get a result of 996 for a text with 1000 characters where it would be needed to change 1 character of the actual word im searching, while if the
text had 2000 characters, and no change required to match the exact word i would get the result of 1995 for a word with 5 characters, so the exact match would show a bigger levenshtein distance (1995) than the one that was not an exact match (996), hence resulting
on the not so similar result appearing on top of the list. You could argue that then i need to order the results the other way round, but that would retain the issue as if the example was inverted, with the exact match found on the smaller text, and the fuzzy
match found on the larger one, so that is not an option.
Jun 20, 2017 08:03 AM|EvenMa|LINK
According to your description, it seemed so difficult to achieve this feature just by SQL, this is a very complex work.
I suggest you could join the search engine in your work to achieve this feature such as