MySQL - 删除重复记录
MySQL 删除重复记录
在数据库中(包括 MySQL),重复记录非常常见。MySQL 数据库以由行和列组成的表的形式存储数据。现在,当数据库表中两行或多行具有相同的值时,该记录即为重复记录。
这种冗余可能由于多种原因而发生 -
- 该行可能被插入两次。
- 从外部源导入原始数据时。
- 数据库应用程序中可能存在错误。
无论原因是什么,删除此类冗余记录对于提高数据准确性、减少错误或提高数据库性能效率都至关重要。
查找重复值
在删除重复记录之前,我们必须确定它们是否存在于表中。可以使用以下方法实现 -
GROUP BY 子句
COUNT() 方法
示例
首先,我们创建一个名为"CUSTOMERS"的表,其中包含重复值 -
CREATE TABLE CUSTOMERS( ID int, NAME varchar(100) );
使用以下 INSERT 查询,将几条记录插入"CUSTOMERS"表。此处,我们将"John"作为重复记录添加了 3 次 -
INSERT INTO CUSTOMERS VALUES (1,'John'), (2,'Johnson'), (3,'John'), (4,'John');
获得的 CUSTOMERS 表如下 -
id | name |
---|---|
1 | John |
2 | Johnson |
3 | John |
4 | John |
现在,我们使用 COUNT() 方法和 GROUP BY 子句检索表中重复的记录,如下查询所示 -
SELECT NAME, COUNT(NAME) FROM CUSTOMERS GROUP BY NAME HAVING COUNT(NAME) > 1;
输出
以下是获得的输出 -
NAME | COUNT(NAME) |
---|---|
John | 3 |
删除重复记录
要从数据库表中删除重复记录,我们可以使用 DELETE 命令。但是,此 DELETE 命令有两种删除重复记录的方法 -
使用 DELETE...JOIN
使用 ROW_NUMBER() 函数
使用 DELETE...JOIN
要使用 DELETE...JOIN 命令从表中删除重复记录,我们需要对其自身进行内连接。这适用于不完全相同的情况。
例如,假设客户记录中有重复的客户详细信息,但序列号不断递增。这里,即使 ID 不同,记录也会重复。
示例
在以下查询中,我们使用之前创建的 CUSTOMERS 表,通过 DELETE...JOIN 命令删除重复记录 -
DELETE t1 FROM CUSTOMERS t1 INNER JOIN CUSTOMERS t2 WHERE t1.id < t2.id AND t1.name = t2.name;
输出
以下是获得的输出 -
Query OK, 2 rows affected (0.01 sec)
验证
我们可以使用以下 SELECT 语句验证重复记录是否已被删除 -
SELECT * FROM CUSTOMERS;
我们可以在得到的表中看到,该查询删除了重复项,并在表中保留了不同的记录 -
ID | NAME |
---|---|
2 | Johnson |
4 | John |
使用 ROW_NUMBER() 函数
MySQL 中的 ROW_NUMBER() 函数用于为查询结果集中的每一行分配一个从 1 开始的序列号。
使用此函数,MySQL 允许您检测重复行,并使用 DELETE 语句删除这些重复行。
示例
在这里,我们将 ROW_NUMBER() 函数应用于 CUSTOMERS 表,该表的"NAME"列中有重复值。我们将使用以下查询根据"NAME"列在分区内分配行号 -
SELECT id, ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS row_num FROM CUSTOMERS;
以下是获得的输出 -
id | row_num |
---|---|
1 | 1 |
3 | 2 |
4 | 3 |
2 | 1 |
现在,使用以下语句删除重复行(行号大于 1 的行)-
DELETE FROM CUSTOMERS WHERE id IN( SELECT id FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS row_num FROM CUSTOMERS) AS temp_table WHERE row_num>1 );
我们得到如下所示的输出 -
Query OK, 2 rows affected (0.00 sec)
要验证重复记录是否已被删除,请使用以下 SELECT 查询 -
SELECT * FROM CUSTOMERS;
生成的结果如下 -
ID | NAME |
---|---|
1 | John |
2 | Johnson |
使用客户端程序删除重复记录
我们也可以使用客户端程序删除重复记录。
语法
要通过 PHP 程序删除重复记录,我们需要使用 mysqli 函数 query() 执行包含"DELETE"命令的内连接,如下所示 -
$sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; $mysqli->query($sql);
要通过 JavaScript 程序删除重复记录,我们需要使用 mysql2 库的 query() 函数,通过"DELETE"命令执行内连接,如下所示 -
sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; con.query(sql)
要通过 Java 程序删除重复记录,我们需要使用 JDBC 函数 execute() 执行包含"DELETE"命令的内连接,如下所示 -
String sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; statement.execute(sql);
要通过 Python 程序删除重复记录,我们需要使用 MySQL Connector/Python 的 execute() 函数,通过"DELETE"命令执行内连接,如下所示 -
delete_query = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name" cursorObj.execute(delete_query)
示例
以下是程序 -
$dbhost = 'localhost'; $dbuser = 'root'; $dbpass = 'password'; $db = 'TUTORIALS'; $mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db); if ($mysqli->connect_errno) { printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); //让我们创建一个表 $sql = "CREATE TABLE DuplicateDeleteDemo(ID int,NAME varchar(100))"; if($mysqli->query($sql)){ printf("DuplicateDeleteDemo table created successfully...! "); } //现在让我们插入一些重复的记录; $sql = "INSERT INTO DuplicateDeleteDemo VALUES(1,'John')"; if($mysqli->query($sql)){ printf("First record inserted successfully...! "); } $sql = "INSERT INTO DuplicateDeleteDemo VALUES(2,'Johnson')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...! "); } $sql = "INSERT INTO DuplicateDeleteDemo VALUES(3,'John')"; if($mysqli->query($sql)){ printf("Third records inserted successfully...! "); } $sql = "INSERT INTO DuplicateDeleteDemo VALUES(4,'John')"; if($mysqli->query($sql)){ printf("Fourth record inserted successfully...! "); } //显示表记录 $sql = "SELECT * FROM DuplicateDeleteDemo"; if($result = $mysqli->query($sql)){ printf("Table records(before deleting): "); while($row = mysqli_fetch_array($result)){ printf("ID: %d, NAME %s", $row['ID'], $row['NAME']); printf(" "); } } //现在让我们计算重复记录 $sql = "SELECT NAME, COUNT(NAME) FROM DuplicateDeleteDemo GROUP BY NAME HAVING COUNT(NAME) > 1"; if($result = $mysqli->query($sql)){ printf("Duplicate records: "); while($row = mysqli_fetch_array($result)){ print_r($row); } } //删除重复的记录 $sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; if($mysqli->query($sql)){ printf("Duplicate records deleted successfully...! "); } $sql = "SELECT ID, NAME FROM DuplicateDeleteDemo"; if($result = $mysqli->query($sql)){ printf("Table records after deleting: "); while($row = mysqli_fetch_row($result)){ print_r($row); } } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();
输出
获得的输出如下所示 -
DuplicateDeleteDemo table created successfully...! First record inserted successfully...! Second record inserted successfully...! Third records inserted successfully...! Fourth record inserted successfully...! Table records(before deleting): ID: 1, NAME John ID: 2, NAME Johnson ID: 3, NAME John ID: 4, NAME John Duplicate records: Array ( [0] => John [NAME] => John [1] => 3 [COUNT(NAME)] => 3 ) Duplicate records deleted successfully...! Table records after deleting: Array ( [0] => 2 [1] => Johnson ) Array ( [0] => 4 [1] => John )
var mysql = require('mysql2'); var con = mysql.createConnection({ host: "localhost", user: "root", password: "Nr5a0204@123" }); // 连接到 MySQL con.connect(function (err) { if (err) throw err; console.log("Connected!"); console.log("--------------------------"); // 创建新数据库 sql = "Create Database TUTORIALS"; con.query(sql); sql = "USE TUTORIALS"; con.query(sql); sql = "CREATE TABLE DuplicateDeleteDemo(ID int,NAME varchar(100));" con.query(sql); sql = "INSERT INTO DuplicateDeleteDemo VALUES(1,'John'),(2,'Johnson'),(3,'John'),(4,'John');" con.query(sql); sql = "SELECT * FROM DuplicateDeleteDemo;" con.query(sql, function(err, result){ if (err) throw err console.log("**Records of DuplicateDeleteDemo Table:**"); console.log(result); console.log("--------------------------"); }); //获取表中重复的记录 sql = "SELECT NAME, COUNT(NAME) FROM DuplicateDeleteDemo GROUP BY NAME HAVING COUNT(NAME) > 1;" con.query(sql, function(err, result){ if (err) throw err console.log("**Records that are duplicated in the table:**"); console.log(result); console.log("--------------------------"); }); sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; con.query(sql); sql = "SELECT * FROM DuplicateDeleteDemo;" con.query(sql, function(err, result){ if (err) throw err console.log("**Records after deleting Duplicates:**"); console.log(result); }); });
输出
获得的输出如下所示 -
Connected! -------------------------- **Records of DuplicateDeleteDemo Table:** [ { ID: 1, NAME: 'John' }, { ID: 2, NAME: 'Johnson' }, { ID: 3, NAME: 'John' }, { ID: 4, NAME: 'John' } ] -------------------------- **Records that are duplicated in the table:** [ { NAME: 'John', 'COUNT(NAME)': 3 } ] -------------------------- **Records after deleting Duplicates:** [ { ID: 2, NAME: 'Johnson' }, { ID: 4, NAME: 'John' } ]
import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; public class DeleteDuplicates { public static void main(String[] args) { String url = "jdbc:mysql://localhost:3306/TUTORIALS"; String user = "root"; String password = "password"; ResultSet rs; try { Class.forName("com.mysql.cj.jdbc.Driver"); Connection con = DriverManager.getConnection(url, user, password); Statement st = con.createStatement(); //System.out.println("Database connected successfully...!"); String sql = "CREATE TABLE DuplicateDeleteDemo(ID int,NAME varchar(100))"; st.execute(sql); System.out.println("Table DuplicateDeleteDemo created successfully...!"); //让我们在其中插入一些记录... String sql1 = "INSERT INTO DuplicateDeleteDemo VALUES (1,'John'), (2,'Johnson'), (3,'John'), (4,'John')"; st.execute(sql1); System.out.println("Records inserted successfully....!"); //打印表记录 String sql2 = "SELECT * FROM DuplicateDeleteDemo"; rs = st.executeQuery(sql2); System.out.println("Table records(before deleting the duplicate rcords): "); while(rs.next()) { String id = rs.getString("id"); String name = rs.getString("name"); System.out.println("Id: " + id + ", Name: " + name); } //使用删除连接删除重复记录 String sql3 = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; st.execute(sql3); System.out.println("Duplicate records deleted successfully....!"); String sql4 = "SELECT * FROM DuplicateDeleteDemo"; rs = st.executeQuery(sql4); System.out.println("Table records(after deleting the duplicate rcords): "); while(rs.next()) { String id = rs.getString("id"); String name = rs.getString("name"); System.out.println("Id: " + id + ", Name: " + name); } }catch(Exception e) { e.printStackTrace(); } } }
输出
获得的输出如下所示 -
Table DuplicateDeleteDemo created successfully...! Records inserted successfully....! Table records(before deleting the duplicate rcords): Id: 1, Name: John Id: 2, Name: Johnson Id: 3, Name: John Id: 4, Name: John Duplicate records deleted successfully....! Table records(after deleting the duplicate rcords): Id: 2, Name: Johnson Id: 4, Name: John
import mysql.connector # 建立连接 connection = mysql.connector.connect( host='localhost', user='root', password='password', database='tut' ) # 创建游标对象 cursorObj = connection.cursor() # 创建表"DuplicateDeleteDemo" create_table_query = '''CREATE TABLE DuplicateDeleteDemo(ID int, NAME varchar(100))''' cursorObj.execute(create_table_query) print("Table 'DuplicateDeleteDemo' is created successfully!") # 将记录插入"DuplicateDeleteDemo"表 sql = "INSERT INTO DuplicateDeleteDemo (ID, NAME) VALUES (%s, %s);" values = [(1, 'John'), (2, 'Johnson'), (3, 'John'), (4, 'John')] cursorObj.executemany(sql, values) print("Values inserted successfully") # 显示表 display_table = "SELECT * FROM DuplicateDeleteDemo;" cursorObj.execute(display_table) # 打印表"DuplicateDeleteDemo" results = cursorObj.fetchall() print(" DuplicateDeleteDemo Table:") for result in results: print(result) # 检索重复记录 duplicate_records_query = """ SELECT NAME, COUNT(NAME) FROM DuplicateDeleteDemo GROUP BY NAME HAVING COUNT(NAME) > 1; """ cursorObj.execute(duplicate_records_query) dup_rec = cursorObj.fetchall() print(" Duplicate records:") for record in dup_rec: print(record) # 删除重复记录 delete_query = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name" cursorObj.execute(delete_query) print("Duplicate records deleted successfully") # 确认 display_table_after_delete = "SELECT * FROM DuplicateDeleteDemo;" cursorObj.execute(display_table_after_delete) results_after_delete = cursorObj.fetchall() print(" DuplicateDeleteDemo Table (After Delete):") for result in results_after_delete: print(result) # 关闭游标和连接 cursorObj.close() connection.close()
输出
获得的输出如下所示 -
Table 'DuplicateDeleteDemo' is created successfully! Values inserted successfully DuplicateDeleteDemo Table: (1, 'John') (2, 'Johnson') (3, 'John') (4, 'John') Duplicate records: ('John', 3) Duplicate records deleted successfully DuplicateDeleteDemo Table (After Delete): (2, 'Johnson') (4, 'John')