What level of experience do I need to understand this article?

This article is designed for intermediate to advanced levels, but we explain fundamental concepts.

Can I apply these concepts in real projects?

Absolutely. All examples are based on real use cases and industry best practices.

How often is the content updated?

We regularly review and update our articles to keep information relevant and current.

Where can I find more information about this topic?

Check the related articles at the end of this page and our categories section for similar content.

How to Find and Remove Duplicate Rows in MySQL

Hello everyone! In this article, you will learn how to identify and handle duplicate data in MySQL. Often, we may come across databases that contain duplicate records due to errors in the system or repeated inserts. These duplicates can arise when unique constraints have not been set on the tables. Here we will show you how to find these duplicates and clean up your database effectively.

Finding Duplicate Data in MySQL

To find duplicate records in MySQL, you can use an SQL query that identifies rows that have repeated values in a specific column. Below is an SQL statement that will help you locate all the duplicate data in your database.

Suppose you have a table called registrants that contains data about people, including a personal identification number (identity_number), name (name), and registration date (created_at). To find duplicate records based on ID number, use the following query:

SELECT `identity_number`, `name`, `created_at`

FROM `inscritos`

WHERE `identity_number` IN (
SELECT `identity_number`

FROM `inscritos`

GROUP BY `identity_number`

HAVING COUNT(`identity_number`) > 1

);

Query Explanation

1- Inner Subquery:

SELECT `identity_number`
FROM `inscritos`
GROUP BY `identity_number`
HAVING COUNT(`identity_number`) > 1

This subquery groups records by identity_number and counts how many times each number appears. Then, the clause HAVING COUNT(identity_number) > 1 filters out those numbers that appear more than once.

2- Main Query:

SELECT `identity_number`, `name`, `created_at`
FROM `inscritos`
WHERE `identity_number` IN (...)

The main query selects records from the inscritos table where the identity_number is in the list of duplicate numbers provided by the subquery.

Customize the Query

You can adjust this SQL query to your specific needs. For example, if you want to find duplicates based on different columns or include more details in the results, simply modify the fields and criteria of the query to suit your case.

How to Clean Duplicate Records
Once you have identified the duplicate records, you can proceed to remove them. Here is an example query to remove duplicates, keeping only the first occurrence of each identity_number:

DELETE FROM `inscritos`
WHERE `id` NOT IN (
SELECT MIN(`id`)
FROM `inscritos`
GROUP BY `identity_number`
);

This query removes all rows that do not have the lowest id for each identity_number, leaving only one instance of each duplicate value.

I hope the article is useful to you and you have learned something new or clarified some knowledge