MySQL - 查找重复记录
表中的重复记录会降低 MySQL 数据库的效率(增加执行时间、占用不必要的空间等)。因此,为了高效使用数据库,查找重复项至关重要。
但是,我们也可以通过在目标列上添加约束(例如 PRIMARY KEY 和 UNIQUE 约束)来阻止用户在表中输入重复值。
但是,由于各种原因,例如人为错误、应用程序错误或从外部资源提取数据,如果仍然有重复项输入数据库,则可以通过多种方法查找记录。使用 SQL GROUP BY 和 HAVING 子句是过滤包含重复项的记录的常用方法之一。
查找重复记录
在表中查找重复记录之前,我们需要定义需要重复记录的条件。您可以分两步完成 -
首先,我们需要使用 GROUPBY 子句,根据要检查重复性的列对所有行进行分组。
然后,使用 Having 子句和 count 函数,我们需要验证上述形成的组中是否有多个实体。
示例
首先,我们使用以下查询创建一个名为 CUSTOMERS 的表 -
CREATE TABLE CUSTOMERS ( ID INT NOT NULL, NAME VARCHAR (20) NOT NULL, AGE INT NOT NULL, ADDRESS CHAR (25), SALARY DECIMAL (18, 2), PRIMARY KEY (ID) );
现在,让我们使用 INSERT IGNORE INTO 语句将一些重复记录插入到上面创建的表中,如下所示 -
INSERT INTO CUSTOMERS VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00), (2, 'Khilan', 25, 'Delhi', 1500.00), (3, 'Kaushik', 23, 'Kota', 2000.00), (4, 'Chaitali', 25, 'Mumbai', 6500.00), (5, 'Hardik', 27, 'Bhopal', 8500.00), (6, 'Komal', 22, 'Hyderabad', 4500.00), (7, 'Muffy', 24, 'Indore', 10000.00);
该表创建为 −
ID | NAME | AGE | ADDRESS | SALARY |
---|---|---|---|---|
1 | Ramesh | 32 | Ahmedabad | 2000.00 |
2 | Khilan | 25 | Delhi | 1500.00 |
3 | Kaushik | 23 | Kota | 2000.00 |
4 | Chaitali | 25 | Mumbai | 6500.00 |
5 | Hardik | 27 | Bhopal | 8500.00 |
6 | Komal | 22 | Hyderabad | 4500.00 |
7 | Muffy | 24 | Indore | 10000.00 |
在以下查询中,我们尝试使用 MySQL COUNT() 函数返回重复记录的数量 -
SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY;
输出
上述查询的输出如下所示 -
SALARY | COUNT |
---|---|
1500.00 | 1 |
2000.00 | 2 |
4500.00 | 1 |
6500.00 | 1 |
8500.00 | 1 |
10000.00 | 1 |
使用 Having 子句
MySQL 中的 HAVING 子句可用于筛选表中一组行的条件。在这里,我们将使用 HAVING 子句和 COUNT() 函数来查找表中一个或多个列中的重复值。
单列中的重复值
以下是在表的单列中查找重复值的步骤:
步骤 1:首先,我们需要使用 GROUP BY 子句对要检查重复项的列中的所有行进行分组。
步骤 2:然后,要查找重复的组,请在 HAVING 子句中使用 COUNT() 函数来检查是否有任何组包含多个元素。
示例
使用以下查询,我们可以在 PETS 表中查找所有包含重复 DOG_NAMES 的行 -
SELECT SALARY, COUNT(SALARY) FROM CUSTOMERS GROUP BY SALARY HAVING COUNT(SALARY) > 1;
输出
输出如下 -
SALARY | COUNT |
---|---|
2000.00 | 2 |
多列中的重复值
我们可以在 HAVING 子句中使用 AND 运算符来查找多列中的重复行。只有当列的组合重复时,行才会被视为重复。
示例
在以下查询中,我们将在 PETS 表中查找 DOG_NAME、AGE 和 OWNER_NAME 列中包含重复记录的行 -
SELECT SALARY, COUNT(SALARY), AGE, COUNT(AGE) FROM CUSTOMERS GROUP BY SALARY, AGE HAVING COUNT(SALARY) > 1 AND COUNT(AGE) > 1;
输出
输出如下 -
SALARY | COUNT | AGE | COUNT |
---|---|---|---|
2000.00 | 2 | 23 | 2 |
ROW_NUMBER() 函数与 PARTITION BY 结合使用
在 MySQL 中,ROW_NUMBER() 函数和 PARTITION BY 子句可用于查找表中的重复记录。分区子句根据特定列或多列对表进行划分,然后 ROW_NUMBER() 函数为每个分区中的每一行分配唯一的行号。具有相同分区和行号的行被视为重复行。
示例
在以下查询中,我们将分配一个
SELECT *, ROW_NUMBER() OVER ( PARTITION BY SALARY, AGE ORDER BY SALARY, AGE ) AS row_numbers FROM CUSTOMERS;
输出
上述查询的输出如下 -
ID | NAME | AGE | ADDRESS | SALARY | row_numbers |
---|---|---|---|---|---|
2 | Khilan | 25 | Delhi | 1500.00 | 1 |
1 | Ramesh | 23 | Ahmedabad | 2000.00 | 1 |
3 | Kaushik | 23 | Kota | 2000.00 | 2 |
4 | Chaitali | 25 | Mumbai | 6500.00 | 1 |
5 | Hardik | 27 | Bhopal | 8500.00 | 1 |
6 | Komal | 22 | Hyderabad | 4500.00 | 1 |
7 | Muffy | 24 | Indore | 10000.00 | 1 |
使用客户端程序查找重复记录
我们也可以使用客户端程序查找重复记录。
语法
要通过 PHP 程序查找重复记录,我们需要使用 GROUPBY 子句按列对所有行进行分组,然后使用 COUNT 函数来统计重复项。为此,我们需要使用 mysqli 函数 query() 执行 SELECT 语句,如下所示 -
$sql = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY"; $mysqli->query($sql);
要通过 JavaScript 程序查找重复记录,我们需要使用 GROUPBY 子句按列对所有行进行分组,然后使用 COUNT 函数来统计重复项。为此,我们需要使用 mysql2 库的 query() 函数执行 SELECT 语句,如下所示:-
sql = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY"; con.query(sql)
要通过 Java 程序查找重复记录,我们需要使用 GROUPBY 子句按列对所有行进行分组,然后使用 COUNT 函数来统计重复项。为此,我们需要使用 JDBC 函数 executeQuery() 执行 SELECT 语句,如下所示 -
String sql = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY"; statement.executeQuery(sql);
要通过 Python 程序查找重复记录,我们需要使用 GROUPBY 子句按列对所有行进行分组,然后使用 COUNT 函数来统计重复项。为此,我们需要使用 MySQL Connector/Python 的 execute() 函数执行 SELECT 语句,如下所示 -
duplicate_records_query = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY" cursorObj.execute(duplicate_records_query)
示例
以下是程序 -
$dbhost = 'localhost'; $dbuser = 'root'; $dbpass = 'password'; $db = 'TUTORIALS'; $mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db); if ($mysqli->connect_errno) { printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); //让我们创建一个表 $sql = "CREATE TABLE Pets (ID int,DOG_NAME varchar(30) not null,AGE int not null,OWNER_NAME varchar(30) not null)"; if($mysqli->query($sql)){ printf("Pets table created successfully...! "); } //now lets insert some duplicate records; $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1, 'Fluffy', 1, 'Micheal')"; if($mysqli->query($sql)){ printf("First record inserted successfully...! "); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1, 'Fluffy', 1, 'Micheal')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...! "); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(2, 'Harry', 2, 'Jack')"; if($mysqli->query($sql)){ printf("Third records inserted successfully...! "); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(3, 'Sheero', 1, 'Rose')"; if($mysqli->query($sql)){ printf("Fourth record inserted successfully...! "); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(4, 'Simba', 2, 'Rahul')"; if($mysqli->query($sql)){ printf("Fifth record inserted successfully...! "); } //显示表记录 $sql = "SELECT * FROM PETS"; if($result = $mysqli->query($sql)){ printf("Table records: "); while($row = mysqli_fetch_array($result)){ printf("ID: %d, DOG_NAME %s, AGE: %d,OWNER_NAME: %s ", $row['ID'], $row['DOG_NAME'], $row['AGE'], $row['OWNER_NAME']); printf(" "); } } //现在让我们对所有行进行分组以查找重复的记录... $sql = "SELECT ID, DOG_NAME, AGE, OWNER_NAME, COUNT(*) AS 'Count' FROM PETS GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID"; if($result = $mysqli->query($sql)){ printf("Table duplicate records: "); while($row = mysqli_fetch_array($result)){ printf("ID: %d, DOG_NAME %s, AGE: %d, OWNER_NAME: %s ", $row['ID'], $row['DOG_NAME'], $row['AGE'], $row['OWNER_NAME'], $row['Count']); printf(" "); } } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();
输出
获得的输出如下所示 -
Pets table created successfully...! First record inserted successfully...! Second record inserted successfully...! Third records inserted successfully...! Fourth record inserted successfully...! Fifth record inserted successfully...! Table records: ID: 1, DOG_NAME Fluffy, AGE: 1,OWNER_NAME: Micheal ID: 1, DOG_NAME Fluffy, AGE: 1,OWNER_NAME: Micheal ID: 2, DOG_NAME Harry, AGE: 2,OWNER_NAME: Jack ID: 3, DOG_NAME Sheero, AGE: 1,OWNER_NAME: Rose ID: 4, DOG_NAME Simba, AGE: 2,OWNER_NAME: Rahul Table duplicate records: ID: 1, DOG_NAME Fluffy, AGE: 1,OWNER_NAME: Micheal ID: 2, DOG_NAME Harry, AGE: 2,OWNER_NAME: Jack ID: 3, DOG_NAME Sheero, AGE: 1,OWNER_NAME: Rose ID: 4, DOG_NAME Simba, AGE: 2,OWNER_NAME: Rahul
var mysql = require('mysql2'); var con = mysql.createConnection({ host: "localhost", user: "root", password: "Nr5a0204@123" }); // 连接到 MySQL con.connect(function (err) { if (err) throw err; console.log("Connected!"); console.log("--------------------------"); // 创建新数据库 sql = "Create Database TUTORIALS"; con.query(sql); sql = "USE TUTORIALS"; con.query(sql); //创建 TABLE 表 sql = "CREATE TABLE Pets (ID int,DOG_NAME varchar(30) not null,AGE int not null,OWNER_NAME varchar(30) not null);" con.query(sql); sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1,'Fluffy', 1, 'Micheal'),(1,'Fluffy', 1, 'Micheal'),(2,'Harry', 2, 'Jack'),(3,'Sheero', 1, 'Rose'),(4,'Simba', 2, 'Rahul'),(3,'Sheero', 1, 'Rose'),(3,'Sheero', 1, 'Rose');" con.query(sql); sql = "SELECT * FROM Pets;" con.query(sql, function(err, result){ if (err) throw err console.log("**Records in Pets Table**"); console.log(result); console.log("--------------------------"); }); sql = "SELECT ID, DOG_NAME, OWNER_NAME, COUNT(*) AS 'Count' FROM PETS GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID"; con.query(sql, function(err, result){ if (err) throw err console.log("**Count of duplicate records:**"); console.log(result); }); });
输出
获得的输出如下所示 -
Connected! -------------------------- **Records in Pets Table** [ { ID: 1, DOG_NAME: 'Fluffy', AGE: 1, OWNER_NAME: 'Micheal' }, { ID: 1, DOG_NAME: 'Fluffy', AGE: 1, OWNER_NAME: 'Micheal' }, { ID: 2, DOG_NAME: 'Harry', AGE: 2, OWNER_NAME: 'Jack' }, { ID: 3, DOG_NAME: 'Sheero', AGE: 1, OWNER_NAME: 'Rose' }, { ID: 4, DOG_NAME: 'Simba', AGE: 2, OWNER_NAME: 'Rahul' }, { ID: 3, DOG_NAME: 'Sheero', AGE: 1, OWNER_NAME: 'Rose' }, { ID: 3, DOG_NAME: 'Sheero', AGE: 1, OWNER_NAME: 'Rose' } ] -------------------------- **Count of duplicate records:** [ { ID: 1, DOG_NAME: 'Fluffy', OWNER_NAME: 'Micheal', Count: 2 }, { ID: 2, DOG_NAME: 'Harry', OWNER_NAME: 'Jack', Count: 1 }, { ID: 3, DOG_NAME: 'Sheero', OWNER_NAME: 'Rose', Count: 3 }, { ID: 4, DOG_NAME: 'Simba', OWNER_NAME: 'Rahul', Count: 1 } ]
import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; public class FindDuplicates { public static void main(String[] args) { String url = "jdbc:mysql://localhost:3306/TUTORIALS"; String user = "root"; String password = "password"; ResultSet rs; try { Class.forName("com.mysql.cj.jdbc.Driver"); Connection con = DriverManager.getConnection(url, user, password); Statement st = con.createStatement(); //System.out.println("Database connected successfully...!"); String sql = "CREATE TABLE Pets (ID int,DOG_NAME varchar(30) not null,AGE int not null,OWNER_NAME varchar(30) not null)"; st.execute(sql); System.out.println("Table Pets created successfully...!"); //让我们在其中插入一些记录... String sql1 = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1, 'Fluffy', 1, 'Micheal'), (1, 'Fluffy', 1, 'Micheal'), (3, 'Sheero', 1, 'Rose'), (4, 'Simba', 2, 'Rahul')"; st.execute(sql1); System.out.println("Records inserted successfully....!"); String sql2 = "SELECT * FROM PETS"; rs = st.executeQuery(sql2); System.out.println("Table records: "); while(rs.next()) { String id = rs.getString("ID"); String dog_name = rs.getString("DOG_NAME"); String age = rs.getString("AGE"); String owner_name = rs.getString("OWNER_NAME"); System.out.println("Id: " + id + ", Dog_name: " + dog_name + ", Age: " + age + ", Owner_name: " + owner_name); } //让我们找出重复的记录 String sql3 = "SELECT ID, DOG_NAME, AGE, OWNER_NAME, COUNT(*) AS 'Count' FROM PETS GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID"; rs = st.executeQuery(sql3); System.out.println("Table records are(with duplicate counts): "); while(rs.next()) { String id = rs.getString("ID"); String dog_name = rs.getString("DOG_NAME"); String age = rs.getString("AGE"); String owner_name = rs.getString("OWNER_NAME"); String t_count = rs.getString("Count"); System.out.println("Id: " + id + ", Dog_name: " + dog_name + ", Age: " + age + ", Owner_name: " + owner_name + ", T_count: " + t_count); } }catch(Exception e) { e.printStackTrace(); } } }
输出
获得的输出如下所示 -
Table Pets created successfully...! Records inserted successfully....! Table records: Id: 1, Dog_name: Fluffy, Age: 1, Owner_name: Micheal Id: 1, Dog_name: Fluffy, Age: 1, Owner_name: Micheal Id: 3, Dog_name: Sheero, Age: 1, Owner_name: Rose Id: 4, Dog_name: Simba, Age: 2, Owner_name: Rahul Table records are(with duplicate counts): Id: 1, Dog_name: Fluffy, Age: 1, Owner_name: Micheal, T_count: 2 Id: 3, Dog_name: Sheero, Age: 1, Owner_name: Rose, T_count: 1 Id: 4, Dog_name: Simba, Age: 2, Owner_name: Rahul, T_count: 1
import mysql.connector # 建立连接 connection = mysql.connector.connect( host='localhost', user='root', password='password', database='tut' ) # 创建游标对象 cursorObj = connection.cursor() # 创建表"Pets" create_table_query = ''' CREATE TABLE Pets ( ID int, DOG_NAME varchar(30) not null, AGE int not null, OWNER_NAME varchar(30) not null ); ''' cursorObj.execute(create_table_query) print("Table 'Pets' is created successfully!") # 将记录插入"Pets"表 sql = "INSERT IGNORE INTO Pets (ID, DOG_NAME, AGE, OWNER_NAME) VALUES (%s, %s, %s, %s);" values = [ (1, 'Fluffy', 1, 'Micheal'), (1, 'Fluffy', 1, 'Micheal'), (2, 'Harry', 2, 'Jack'), (3, 'Sheero', 1, 'Rose'), (4, 'Simba', 2, 'Rahul'), (3, 'Sheero', 1, 'Rose'), (3, 'Sheero', 1, 'Rose') ] cursorObj.executemany(sql, values) print("Values inserted successfully") # 显示表 display_table = "SELECT * FROM Pets;" cursorObj.execute(display_table) # 打印表"Pets" results = cursorObj.fetchall() print(" Pets Table:") for result in results: print(result) # 返回重复记录的数量 duplicate_records_query = """ SELECT ID, DOG_NAME, OWNER_NAME, COUNT(*) AS Count FROM Pets GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID; """ cursorObj.execute(duplicate_records_query) dup_rec = cursorObj.fetchall() print(" Duplicate records:") for record in dup_rec: print(record) # 关闭游标和连接 cursorObj.close() connection.close()
输出
获得的输出如下所示 -
Table 'Pets' is created successfully! Values inserted successfully Pets Table: (1, 'Fluffy', 1, 'Micheal') (1, 'Fluffy', 1, 'Micheal') (2, 'Harry', 2, 'Jack') (3, 'Sheero', 1, 'Rose') (4, 'Simba', 2, 'Rahul') (3, 'Sheero', 1, 'Rose') (3, 'Sheero', 1, 'Rose') Duplicate records: (1, 'Fluffy', 'Micheal', 2) (2, 'Harry', 'Jack', 1) (3, 'Sheero', 'Rose', 3) (4, 'Simba', 'Rahul', 1)