MySQL

How to Identify Duplicate Values of Row in a Table on MySQL

Suppose we have a table called fruits which stores many distinct fruit names.
mysql> select * from fruits;
+----+------------+
| id | name       |
+----+------------+
|  1 | apple      |
|  2 | banana     |
|  3 | cherry     |
|  4 | mango      |
|  5 | watermelon |
|  6 | banana     |
|  7 | kiwi       |
|  8 | pineapple  |
|  9 | orange     |
| 10 | guava      |
| 11 | mango      |
| 12 | banana     |
| 13 | apple      |
+----+------------+
13 rows in set (0.00 sec)

Theoretically, we can create a unique index on the column name, but we failed to create it due to duplicate values of fruit name.

Our goal is to identify the duplicate values in this table and remove them. So we play a trick to retrieve the duplicate values by grouping.
mysql> select *, count(*) count from fruits group by name having count > 1;
+----+--------+-------+
| id | name   | count |
+----+--------+-------+
|  1 | apple  |     2 |
|  2 | banana |     3 |
|  4 | mango  |     2 |
+----+--------+-------+
3 rows in set (0.00 sec)

Better, but the result only shows one row per duplicate value, we need to know all duplicate row id in order to remove them.

So, we evolve the above statement as a subquery in where clause to retrieve what we want.
mysql> select * from fruits where name in (select name from fruits group by name having count(*) > 1) order by name, id;
+----+--------+
| id | name   |
+----+--------+
|  1 | apple  |
| 13 | apple  |
|  2 | banana |
|  6 | banana |
| 12 | banana |
|  4 | mango  |
| 11 | mango  |
+----+--------+
7 rows in set (0.00 sec)

Great! Now we know apple is duplicated by row #13, banana is duplicated by row #6 and #12, and mango is duplicated by row #11.

Leave a Reply

Your email address will not be published. Required fields are marked *