# Database Discussions > MySQL >  Improvement to article: Deleting Duplicate Rows in a MySQL Database

## greenman

A reader has just pointed out an improvement to my most recent article  :Smilie:  

On page 2, you can of course use:

CREATE TEMPORARY TABLE bad_temp AS SELECT DISTINCT * FROM bad_table

instead of:

CREATE TEMPORARY TABLE 
bad_temp(id INT,name VARCHAR(20))
TYPE=HEAP;

INSERT INTO bad_temp(id,name) SELECT DISTINCT id,name FROM bad_table;

saving yourself doing it in two steps.

Thanks to P-A Fredriksson. Hopefully the changes will reflect on the site soon.

----------


## nicc777

You all probably read the article, especially:




> Be careful when using temporary tables in this way. With our tiny data set, there is not much of a problem, but huge tables could cause problems with memory. etc...


BUT, creating a PERMANENT temporary table on LARGE data sets will move your memory problem from RAM to DISK SPACE. There is also additional DISK IO, which you might not want on an operational system.

There is another option however - Programatically. I didn't have time to write the whole Perl script, but here is the short version:

*STEP 1 :* Determin which fields are duplicated. This can be a simple:



```
mysql> select distinct name,count( name ) from bad_table2 group by 1 order by 2 desc;
+-------------------+---------------+
| name              | count( name ) |
+-------------------+---------------+
| Things Fall Apart |             3 |
| The Famished Road |             2 |
| Thirteen cents    |             2 |
+-------------------+---------------+
3 rows in set (0.00 sec)
```

We only need to process entries with a count MORE then 1.

*STEP 2 :*  Loop through the results and delete the duplicates. The loop will produce something similar to:



```
mysql> DELETE FROM bad_table2 WHERE name = 'Things Fall Apart' LIMIT 2;
Query OK, 2 rows affected (0.00 sec)

mysql> select distinct name,count( name ) from bad_table2 group by 1 order by 2 desc;
+-------------------+---------------+
| name              | count( name ) |
+-------------------+---------------+
| The Famished Road |             2 |
| Thirteen cents    |             2 |
| Things Fall Apart |             1 |
+-------------------+---------------+
3 rows in set (0.00 sec)

mysql> select * from bad_table2 where name = 'Things Fall Apart';
+----+-------------------+
| id | name              |
+----+-------------------+
|  4 | Things Fall Apart |
+----+-------------------+
1 row in set (0.01 sec)
```

*NOTES :*  

1) NO additional memory was required.

2) By adding _LOW_PRIORITY_ to the script, the impact on a LIVE DB will be less.

3) Only the "newest" record is retained. Some tinckering might be neccesary if you want to retain only the "oldest" - for whatever reason.

Now, this method is tried and tested. One of my operational tables for IP accounting contains nearly 4 GB of data. Creating TEMP tables to delete duplicates doesn't cut it.

Of course, smaller and even non-operational environments are different stories altogether.

Cheers

----------


## Angus

Hi there,

Found your example while doing a search on Google, but I am not having much luck and need a small hint on what I am not doing right...

I have an image gallery that has several duplicate records (with a unique ID / primary key) so was using your "Duplicates with Unique Keys" example...

I want to be able to only copy across records where the "bigimage" value is unique.

The query I was trying (and I used HEAPS of different combinations) is:

INSERT INTO photos_unique(user,userid,cat,date,title,descripti  on,keywords,bigimage,width,height,filesize,views,m  edwidth,medheight,medsize,approved,lastpost,rating  ,watermarked,allowprint,extra1,extra2,extra3,extra  4,extra5,extra6,disporder) 
SELECT DISTINCT bigimage user,userid,cat,date,title,description,keywords,bi  gimage,width,height,filesize,views,medwidth,medhei  ght,medsize,approved,lastpost,rating,watermarked,a  llowprint,extra1,extra2,extra3,extra4,extra5,extra  6,disporder FROM photos

Ive added commas, brackets, moved DISTINCT all over the place but keep getting errors. The above query works BUT returns all records... Not what I wanted...

Is there something dumb I am doing or an easier way ?

Angus

----------


## nicc777

I assume you noticed that there is a syntax error in the SQL you submitted:

_SELECT DISTINCT bigimage user,userid_ should read: _SELECT DISTINCT bigimage,user,userid_ ( notice the comma between bigimage and user )

With that out of the way, I see you mention that the SELECT query returns all records. I would like you to do an experiment:



```
SELECT 
   DISTINCT bigimage, COUNT( bigimage ) AS qty 
FROM 
   photos 
GROUP BY 1 HAVING qty > 1
```

The above code is untested, but should work. The idea is to only display those rows where the 'qty' is more then 1 ( duplicates ). If nothing is returned, then there is no duplicates.

Let us know what the results are.

Cheers

----------


## Angus

>> I assume you noticed that there is a syntax error in the SQL you submitted..

Again - I tried it MANY MANY ways. With and without the comma...



```
SELECT 
   DISTINCT bigimage, COUNT( bigimage ) AS qty 
FROM 
   photos 
GROUP BY 1 HAVING qty > 1
```

About 20 results returned...

If I run 

SELECT * FROM `photos` WHERE 1 I get 16987 results...

If I run 

SELECT DISTINCT bigimage FROM photos I get 15819 results...

----------


## nicc777

Ok, and what happens if you do:



```
CREATE TEMPORARY TABLE bad_temp AS SELECT DISTINCT * FROM photos
```

How many entries in the 'bad_temp' table?

Cheers

----------


## Angus

Wasnt sure how to run what you suggested - ie couldnt find the table bad_temp...

Ran 

CREATE TABLE bad_temp AS SELECT DISTINCT * FROM photos

and got the 16,978 number... this would be correct though wouldnt it as the table has a value of id which is unique for each row (an auto incrementing value).

Appreciate the attempt at helping so far though  :Smilie:

----------


## nicc777

That was the output I expected. Now, the longer way...



```
CREATE TABLE bad_temp AS SELECT 
  DISTINCT     
   bigimage,user,userid,cat,date,
   title,description,keywords,width,
   height,filesize,views,medwidth,
   medheight,medsize,approved,
   lastpost,rating,watermarked,
   allowprint,extra1,extra2,extra3,
   extra4,extra5,extra6,disporder 
FROM 
   photos
```

You might need to dropt the previously created temp table first.

Cheers

----------


## Angus

16978 records still....

----------


## greenman

Your problem seems to be that:

SELECT 
  DISTINCT     
   bigimage,user,userid,cat,date,
   title,description,keywords,width,
   height,filesize,views,medwidth,
   medheight,medsize,approved,
   lastpost,rating,watermarked,
   allowprint,extra1,extra2,extra3,
   extra4,extra5,extra6,disporder 
FROM 
   photos

does not return the correct 15819 results. This can be due to:
1) A 'bug' in MySQL
2) That combination actually does contain 16978 unique combinations.

You obviously have 15819 distinct bigimages, but that does not necessarily mean the entire row is unique. For example:
image1,3
image1,4
Only one distinct 'image1', but 2 distinct rows. So double-check your data to be sure it's a 'bug'. 
If you're sure it's MySQL misbehaving, try create an index (not unique) on some fields. Perhaps MySQL is not returning the correct data because its table metadata is not correct. You don't give the full table structure, whether there are other indexes, but try analyse/optimize these to make sure they're in shape. 
If you're still having trouble, post your full table structure, and an example of a duplicate returned by your DISTINCT statement.

----------


## nicc777

greenman's taking the words right out of my mouth - or is that keyboard  :Smilie: 

This is a very strange behaviour, and something that I have never seen before. 

All I can do is echo what greenman's suggested.

Cheers

----------

