Hello everyone! In this article, you will learn how to identify and handle duplicate data in MySQL. Often, we may come across databases that contain duplicate records due to errors in the system or repeated inserts. These duplicates can arise when unique constraints have not been set on the tables. Here we will show you how to find these duplicates and clean up your database effectively.
To find duplicate records in MySQL, you can use an SQL query that identifies rows that have repeated values in a specific column. Below is an SQL statement that will help you locate all the duplicate data in your database.
Suppose you have a table called registrants that contains data about people, including a personal identification number (identity_number), name (name), and registration date (created_at). To find duplicate records based on ID number, use the following query:
SELECT `identity_number`, `name`, `created_at` FROM `inscritos` WHERE `identity_number` IN ( SELECT `identity_number` FROM `inscritos` GROUP BY `identity_number` HAVING COUNT(`identity_number`) > 1 );
Query Explanation
1- Inner Subquery:
SELECT `identity_number` FROM `inscritos` GROUP BY `identity_number` HAVING COUNT(`identity_number`) > 1
This subquery groups records by identity_number and counts how many times each number appears. Then, the clause HAVING COUNT(identity_number) > 1 filters out those numbers that appear more than once.
2- Main Query:
SELECT `identity_number`, `name`, `created_at` FROM `inscritos` WHERE `identity_number` IN (...)
The main query selects records from the inscritos table where the identity_number is in the list of duplicate numbers provided by the subquery.
You can adjust this SQL query to your specific needs. For example, if you want to find duplicates based on different columns or include more details in the results, simply modify the fields and criteria of the query to suit your case.
How to Clean Duplicate Records
Once you have identified the duplicate records, you can proceed to remove them. Here is an example query to remove duplicates, keeping only the first occurrence of each identity_number:
DELETE FROM `inscritos` WHERE `id` NOT IN ( SELECT MIN(`id`) FROM `inscritos` GROUP BY `identity_number` );
This query removes all rows that do not have the lowest id for each identity_number, leaving only one instance of each duplicate value.
I hope the article is useful to you and you have learned something new or clarified some knowledge
Page loaded in 25.88 ms