Last post Jun 20, 2017 07:06 AM by TW88
Jun 19, 2017 04:29 AM|TW88|LINK
the platform run for years and store 3x,xxx records of person info (name, id, mobile, fax....)
We found many duplicates by users input
how can I sort out those likely similar records in SQL?
I know how to count by sql on specify fields but seems not acceptable on multiple fields records thx.
Jun 19, 2017 04:42 AM|kaushalparik27|LINK
Based on question description, I understand that you need to find duplicate records across more than one field. If this is correct, then please refer to below example.
This query finds and results duplicate records that has same ProductName AND ProductType. Example is to show how to find duplicate records on multiple SQL Server table columns:
select p1.ProductId, p2.*
from tblProduct p1
select ProductName, ProductType, count(*) as qty
group by ProductName, ProductType
having count(*) > 1
) p2 on p1.ProductType = p2.ProductType and p1.ProductName = p2.ProductName
Jun 19, 2017 05:50 AM|TW88|LINK
Thanks a lot. Let me give some example
Case 1 - name
Record 1 : Tony WONG
Record 2 : Tony MS WONG
Case 2 - name and mobile
Record 1 : Tony WONG 12345678
Record 2 : Tony WONG no mobile or different mobile phones
Possible to sort them out? content on fields are not exactly matched. thx.
Jun 19, 2017 06:13 AM|kaushalparik27|LINK
Well, in that case; you can modify above SQL Query to use LIKE operator and then manually decide by looking at the result of the query, that which record(s) "can be" duplicate record(s). Because quite possible that name are same but phone numbers are different.
Jun 19, 2017 07:43 AM|EvenMa|LINK
According to your description, I suggest you could use the following function to compare 2 strings to calculate the similarity.
1: Create function:
CREATE FUNCTION DBO.FN_getSimilarity
DECLARE @num FLOAT,@len int
WHILE(LEN(@Cloumna)<>0 AND LEN(@CloumnB)<>0)
DECLARE @a NVARCHAR(4)
DECLARE @b NVARCHAR(4)
ELSE IF (LEN(@Cloumna)<LEN(@CloumnB))
--you could adjust the similaritySELECT * FROM person WHERE dbo.FN_getSimilarity([name],'Tony WONG')>0.3
If you have any other questions, please feel free to contact me any time.
Jun 20, 2017 07:06 AM|TW88|LINK
not try SQL function before.
it seems it is the right direction to calculation similarity
Thanks a lot. Let me explore.