如何消除 Python 函数中的重复行?

pythonserver side programmingprogramming

在本文中，我们将讨论如何在 Python 中删除重复的多行。如果文件很小，只有几行，则可以手动执行从中删除重复行的过程。但是，在处理大型文件时，Python 可以提供帮助。

使用文件处理方法

Python 具有用于创建、打开和关闭文件的内置方法，这使得处理文件更加容易。Python 还允许在文件打开时执行多种文件操作，例如读取、写入和附加数据。

要从 Python 文本文件或函数中删除重复的行，我们使用 Python 中的文件处理方法。文本文件或函数必须与包含 Python 程序的 .py 文件位于同一目录中。

算法

以下是消除 Python 函数中重复行的方法

由于我们只读取此文件的内容，因此请先以只读模式打开输入文件。
现在，要将内容写入此文件，请以写入模式打开输出文件。
逐行读取输入文件，然后检查输出文件以查看是否有任何与此行类似的行。
如果没有，请将此行添加到输出文件并将该行的哈希值保存在一个集合中。我们不会检查和存储整行，而是检查每行的哈希值。处理大文件时，这种方法更加有效，占用空间也更少。
如果哈希值已添加到集合中，请跳过该行。
完成所有操作后，输出文件将包含输入文件的每一行，不会重复任何内容。

在这里，输入文件即"File.txt"包含以下数据 -

Welcome to TutorialsPoint.
Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
eliminate repeated lines.
eliminate repeated lines.
Skip the line.

示例

以下是消除 Python 函数中重复行的示例 -

import hashlib
# path of the input and output files
OutFile = 'C:\Users\Lenovo\Downloads\Work TP\pre.txt'
InFile = r'C:\Users\Lenovo\Downloads\Work TP\File.txt'
# holding the line which is already seen
lines_present = set()
# opening the output file in write mode to write in it
The_Output_File = open(OutFile, "w")

# loop for opening the file in read mode
for l in open(InFile, "r"):
   # finding the hash value of the current line
      # Before performing the hash, we remove any blank spaces and new lines from the end of the line.
      # Using hashlib library determine the hash value of a line.
      hash_value = hashlib.md5(l.rstrip().encode('utf-8')).hexdigest()
      if hash_value not in lines_present:
         The_Output_File.write(l)
         lines_present.add(hash_value)
# closing the output text file
The_Output_File.close()

输出

我们可以在以下输出中看到，输出文件中输入文件的所有重复行都被消除，输出文件包含唯一数据，如下所示 -

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

示例

以下是另一个消除 Python 函数中重复行的示例 -

# path of the input and output files
# Create the output file in write mode
OutFile = open('C:\Users\Lenovo\Downloads\Work TP\pre.txt',"w")
11
# Create an input file in read mode
InFile = open('C:\Users\Lenovo\Downloads\Work TP\File.txt', "r")
# holding the line which is already seen
lines_present = set()
# iterate every line present in the file
for l in InFile:
   # check whether the lines are unique
   if l not in lines_present:
      # writing all the unique lines in the output file
      OutFile.write(l)
      # adding unique lines in the lines_present
      lines_present.add(l)
# closing the output text files
OutFile.close()
InFile.close()

输出

我们可以在下面的输出中看到，输入文件中的所有重复行都被消除在包含唯一数据的输出文件中，如下所示

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

技术文章和资源

热门类别

如何消除 Python 函数中的重复行?

使用文件处理方法

算法

示例

输出

示例

输出

相关文章

颜色选择器

读后有收获微信请站长喝咖啡

错误报告

您的建议:

感谢您的帮助！