Last post Jun 23, 2016 06:26 AM by PatriceSc
Jun 22, 2016 04:49 PM|pl1997|LINK
I have a table that has duplicate rows with the same primary key, but have different values in some of the columns. I created this statement, to narrow down all the rows that are repeated:
SELECT [Post-Visit].VisitRefNum, Count(*) AS NumberofRecords
GROUP BY [Post-Visit].VisitRefNum
Since this resulted in 300+ entries, I was wondering if there was a fast and easy way to compile these repeated rows into a single row that combines all the different data. I know this may sound confusing, so please let me know if you need any clarification.
Thanks in advance.
Jun 22, 2016 10:19 PM|PatriceSc|LINK
If you really meant the same primary key with different values in some columns you have anyway quite a big design issue (the primary key should be unique by definition). Fixing this would likely require quite a number of steps (maybe fixing fist the pk likely
in multiple steps even before merging perhaps some rows).
If you meant distinct primary key values with ALL other values being the same you could delete the additional rows (ie assign related rows to the lower pk, delete higher pk rows).
If distinct primary keys but with some values only being the same not sure how you could deal with columns having distinct values. How could you decide which value should be kept?
Or maybe you meant something else? For now we need to fully understand your issue. First do you really mean that you have multiple rows with the SAME primary key value ????
Jun 22, 2016 10:41 PM|pl1997|LINK
Thanks for the response.
The table is formatted where essentially there are double entries, but the data is spread out between the two rows. For instance, if the VisitRefNum (PK) was 12345, the data would look something like this:
12345 A _ C _
12345 _ B _ D
Letters corresponding to data and the _ referencing to a blank cell. I need to compile the data so it should look something like 12345 A B C D. Since there are 300+ instances of this duplication I was wondering if there was an easier way for me to do this
without having to go and fix each duplication by hand. Let me know if this makes more sense to you...
Jun 23, 2016 06:26 AM|PatriceSc|LINK
Ok so if you are 100% sure you only have a single non null value for each pk in all colums, the final data you want should be given by:
SELECT pk,MAX(col1),MAX(col2),MAX(col3),MAX(col4) FROM MyTable GROUP BY pk
As NULL is lower than any other value, MAX will retain the only non null value found in each column if any (and if you have single row for some pk values, will return the row unchanged).
If needed I would do first additional queries to make 100% sure of the exact situation, that is for example something such as:
SELECT pk FROM MyTable GROUP BY pk HAVING COUNT(DISTINCT Col1)>1
to check if I really have at most only 1 non null value in each column for a given pk (IF NOT TRUE YOU'LL LOSE DATA).
If all goes well, you could then use an INSERT INTO statement to copy that to temp table, delete existing rows and finally insert that in your final table (if you can't for some reason you may have to add temporarty a column so that you can distinguish between
orignal and new rows, you could then insert those rows first and delete old rows).
Of course do that first on a database copy until it is fully tested and ready to be applied to the real db.
It is still weird you can even run into that. Because of the duplicated pk you shouldn't be able to select an individual row so it's suprising you can have different values for the same "primary key".
Also once the current situation is fixed, make sure to fix the root cause so it can't happen any more (adding a pirmary key constraint for example).