MySQL - 处理重复项
表或结果集数据库通常包含重复记录。虽然通常允许重复,但在某些情况下有必要避免重复。在这种情况下,识别并删除数据库表中的重复记录至关重要。
处理 MySQL 重复项的重要性
处理数据库中的重复项有多种必要性。其中一个主要原因是,组织数据库中存在重复项会导致逻辑错误。除此之外,我们还需要处理冗余数据,以防止以下后果:
- 重复数据会占用存储空间,降低数据库使用效率并增加存储成本。
- 处理重复记录会消耗额外的资源,从而增加数据库的总体维护成本。
- 数据库中的重复项会导致数据出现逻辑错误,影响存储信息的完整性和可靠性。
防止重复条目
您可以在表上使用相应字段的PRIMARY KEY或UNIQUE索引,以防止表中出现重复记录。
示例
下表不包含这样的索引或主键,因此它允许first_name和last_name.
CREATE TABLE CUSTOMERS ( first_name CHAR(20), last_name CHAR(20), sex CHAR(10) );
为防止此表中创建多个具有相同姓氏和名字值的记录,请在其定义中添加 PRIMARY KEY。执行此操作时,还需要将索引列声明为 NOT NULL,因为 PRIMARY KEY 不允许 NULL 值 -
CREATE TABLE CUSTOMERS ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name) );
使用 INSERT IGNORE 查询 -
如果表中存在唯一索引,则在尝试插入与索引列中现有记录重复的记录时通常会导致错误。
为了处理这种情况而不产生错误,可以使用"INSERT IGNORE"命令。如果记录不重复,MySQL 会照常插入。但是,如果记录重复,"IGNORE"关键字会指示 MySQL 丢弃该记录而不会产生错误。
提供的示例不会导致错误,并且还能确保不会插入重复的记录 -
INSERT IGNORE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) VALUES ('Jay', 'Thomas'), ('Jay', 'Thomas');
我们得到以下输出 -
Query OK, 1 row affected, 1 warning (0.01 sec) Records: 2 Duplicates: 1 Warnings: 1
使用 REPLACE 查询 -
与其使用 INSERT 命令,不如考虑使用 REPLACE 命令。处理新记录时,其插入方式与 INSERT 类似。但是,如果记录重复,新记录将替换旧记录。
REPLACE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) VALUES ( 'Ajay', 'Kumar'), ( 'Ajay', 'Kumar');
以下是上述代码的输出 -
Query OK, 2 rows affected (0.01 sec) Records: 2 Duplicates: 0 Warnings: 0
在 INSERT IGNORE 和 REPLACE 命令之间进行选择,取决于您希望实现的具体重复处理行为。INSERT IGNORE 命令会保留第一组重复记录并丢弃剩余记录。另一方面,REPLACE 命令会保留最后一组重复记录并删除任何更早的记录。
使用 UNIQUE 约束 −
另一种强制唯一性的方法是向表添加 UNIQUE 索引,而不是 PRIMARY KEY −
CREATE TABLE CUSTOMERS ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), UNIQUE (last_name, first_name) );
计数和识别重复项
您可以使用 COUNT 函数和 GROUP BY 子句根据特定列计数和识别重复记录。
示例
以下查询用于计数表中 first_name 和 last_name 的重复记录 -
SELECT COUNT(*) as repetitions, last_name, first_name FROM CUSTOMERS GROUP BY last_name, first_name HAVING repetitions > 1;
此查询将返回 CUSTOMERS 表中所有重复记录的列表。一般来说,要识别重复的值集,请遵循以下步骤。
确定哪些列可能包含重复值。
将这些列以及 COUNT(*) 添加到列选择列表中。
同时在 GROUP BY 子句中列出这些列。
应用 HAVING 子句,通过要求组计数大于 1 来过滤唯一值。
从查询结果中消除重复项
您可以将 DISTINCT 命令与 SELECT 语句结合使用,以找出表中可用的唯一记录。
SELECT DISTINCT last_name, first_name FROM CUSTOMERS ORDER BY last_name;
DISTINCT 命令的另一种选择是添加 GROUP BY 子句,用于指定要选择的列。这种方法可以消除重复项,并仅检索指定列中值的唯一组合。
SELECT last_name, first_name FROM CUSTOMERS GROUP BY (last_name, first_name);
使用表替换删除重复项
如果表中有重复记录,并且您想要从该表中删除所有重复记录,请按照以下步骤操作 -
CREATE TABLE tmp AS SELECT DISTINCT last_name, first_name, sex FROM CUSTOMERS; DROP TABLE CUSTOMERS; ALTER TABLE tmp RENAME TO CUSTOMERS;
使用客户端程序处理重复项
我们也可以使用客户端程序处理重复项。
语法
要通过 PHP 程序处理重复值,我们需要使用 mysqli 函数 query() 执行"INSERT IGNORE"语句,如下所示 -
$sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; $mysqli->query($sql);
要通过 JavaScript 程序处理重复值,我们需要使用 mysql2 库的 query() 函数执行"INSERT IGNORE"语句,如下所示 -
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; con.query(sql);
要通过 Java 程序处理重复值,我们需要使用 JDBC 函数 execute() 执行"INSERT IGNORE"语句,如下所示 -
String sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; statement.execute(sql);
要通过 Python 程序处理重复值,我们需要使用 MySQL Connector/Python 的 execute() 函数执行"INSERT IGNORE"语句,如下所示 -
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')" cursorObj.execute(sql)
示例
以下是程序 -
$dbhost = 'localhost'; $dbuser = 'root'; $dbpass = 'password'; $db = 'TUTORIALS'; $mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db); if ($mysqli->connect_errno) { printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); $sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))"; if($mysqli->query($sql)){ printf("Table created successfully...! "); } //let's insert some records $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("First record inserted successfully...! "); } $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...! "); } //now lets insert duplicate record with IGNORE keyword $sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("Duplicate record inserted successfully using IGNORE keyword...! "); } $sql = "SELECT * from person_tbl"; if($result = $mysqli->query($sql)){ printf("Table records: "); while($row = mysqli_fetch_array($result)){ printf("First Name: %s, Last name: %s, Sex: %s", $row['first_name'], $row['last_name'], $row['sex']); printf(" "); } } //lets insert a duplicate record $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if(!$mysqli->query($sql)){ printf("You can't insert any duplicate records...! "); } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();
输出
获得的输出如下所示 -
Table created successfully...! First record inserted successfully...! Second record inserted successfully...! Duplicate record inserted successfully using IGNORE keyword...! Table records: First Name: Thomas, Last name: Jay, Sex: First Name: Smith, Last name: John, Sex: PHP Fatal error: Uncaught mysqli_sql_exception: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY' in D: est\handlingduplicates.php:48
var mysql = require('mysql2'); var con = mysql.createConnection({ host:"localhost", user:"root", password:"password" }); //连接到 MySQL con.connect(function(err) { if (err) throw err; // console.log("Connected successfully...!"); // console.log("--------------------------"); sql = "USE TUTORIALS"; con.query(sql); //create table sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))"; con.query(sql, function(err, result){ if (err) throw err; console.log("Table created successfully....!"); }); //现在让我们插入一些记录 sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; con.query(sql, function(err, result){ if (err) throw err; console.log("First record inserted successfully...!"); }); sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; con.query(sql, function(err, result){ if (err) throw err; console.log("Second record inserted successfully...!"); }); //现在让我们使用 IGNORE 关键字插入重复记录 sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; con.query(sql, function(err, result){ console.log("Insert duplicate record with IGNORE keyword") if (err) throw err; console.log("Duplicate record inserted successfully with the help of IGNORE keyword"); }); //插入一条重复记录 sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; con.query(sql, function(err, result){ console.log("Insert duplicate record"); if (err) throw err; console.log("You can't insert the duplicate record because columns are primary key"); }); sql = "SELECT * FROM person_tbl"; con.query(sql, function(err, result){ console.log("Table records(with ID auto_increment sequence)."); if (err) throw err; console.log(result); }); });
输出
获得的输出如下所示 -
Table created successfully....! First record inserted successfully...! Second record inserted successfully...! Insert duplicate record with IGNORE keyword Duplicate record inserted successfully with the help of IGNORE keyword Insert duplicate record Insert duplicate record D: est1\duplicate.js:43 if (err) throw err; ^ Error: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY'
import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; public class HandleDuplicates { public static void main(String[] args) { String url = "jdbc:mysql://localhost:3306/TUTORIALS"; String user = "root"; String password = "password"; ResultSet rs; try { Class.forName("com.mysql.cj.jdbc.Driver"); Connection con = DriverManager.getConnection(url, user, password); Statement st = con.createStatement(); //System.out.println("Database connected successfully...!"); //create table String sql = "CREATE TABLE person_tbl ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))"; st.execute(sql); System.out.println("Table created successfully....!"); //let's insert some records String sql1 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; st.execute(sql1); System.out.println("Record inserted successfully...!"); String sql2 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; st.execute(sql2); System.out.println("Duplicate record inserted successfully...!"); //lets print the table records String sql3 = "SELECT * FROM PERSON_TBL"; rs = st.executeQuery(sql3); System.out.println("Table records: "); while(rs.next()) { String fname = rs.getString("first_name"); String lname = rs.getString("last_name"); System.out.println("First name: " + fname + ", Last name: " + lname); } }catch(Exception e) { e.printStackTrace(); } } }
输出
获得的输出如下所示 -
Table created successfully....! Record inserted successfully...! Duplicate record inserted successfully...! Table records: First name: Thomas, Last name: Jay
import mysql.connector # 连接到 MySQL con = mysql.connector.connect( host="localhost", user="root", password="password", database="tut" ) # 创建游标对象 cursorObj = con.cursor() # 创建表 create_table_query = """ CREATE TABLE person_tbl ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name) ) """ cursorObj.execute(create_table_query) print("Table 'person_tbl' is created successfully!") # 插入一些记录 first_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')" print("First record inserted successfully!") cursorObj.execute(first_record) second_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')" print("Second record inserted successfully!") cursorObj.execute(second_record) # 使用 IGNORE 关键字插入重复记录 sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')" print("Duplicate record inserted successfully with the help of IGNORE keyword") cursorObj.execute(sql) # 插入重复记录(这将引发错误) try: cursorObj.execute("INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')") print("Insert duplicate record") except mysql.connector.Error as err: print("Insert duplicate record error:", err) con.commit() # 检索记录 cursorObj.execute("SELECT * FROM person_tbl") records = cursorObj.fetchall() # 打印记录 print("Table records.") for record in records: print(record) # 关闭连接 cursorObj.close() con.close()
输出
获得的输出如下所示 -
First record inserted successfully! Second record inserted successfully! Duplicate record inserted successfully with the help of IGNORE keyword Insert duplicate record error: 1062 (23000): Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY' Table records. ('Thomas', 'Jay', None) ('Smith', 'John', None)