数据结构和算法

DSA - 主页 DSA - 概述 DSA - 环境设置 DSA - 算法基础 DSA - 渐近分析

数据结构

DSA - 数据结构基础 DSA - 数据结构和类型 DSA - 数组数据结构

链接列表

DSA - 链接列表数据结构 DSA - 双向链接列表数据结构 DSA - 循环链表数据结构

堆栈 &队列

DSA - 堆栈数据结构 DSA - 表达式解析 DSA - 队列数据结构

搜索算法

DSA - 搜索算法 DSA - 线性搜索算法 DSA - 二分搜索算法 DSA - 插值搜索 DSA - 跳跃搜索算法 DSA - 指数搜索 DSA - 斐波那契搜索 DSA - 子列表搜索 DSA - 哈希表

排序算法

DSA - 排序算法 DSA - 冒泡排序算法 DSA - 插入排序算法 DSA - 选择排序算法 DSA - 归并排序算法 DSA - 希尔排序算法 DSA - 堆排序 DSA - 桶排序算法 DSA - 计数排序算法 DSA - 基数排序算法 DSA - 快速排序算法

图形数据结构

DSA - 图形数据结构 DSA - 深度优先遍历 DSA - 广度优先遍历 DSA - 生成树

树数据结构

DSA - 树数据结构 DSA - 树遍历 DSA - 二叉搜索树 DSA - AVL 树 DSA - 红黑树 DSA - B树 DSA - B+ 树 DSA - 伸展树 DSA - 尝试 DSA - 堆数据结构

递归

DSA - 递归算法 DSA - 使用递归的汉诺塔 DSA - 使用递归的斐波那契数列

分而治之

DSA - 分而治之 DSA - 最大最小问题 DSA - 施特拉森矩阵乘法 DSA - Karatsuba 算法

贪婪算法

DSA - 贪婪算法 DSA - 旅行商问题(贪婪方法) DSA - Prim 最小生成树 DSA - Kruskal 最小生成树 DSA - Dijkstra 最短路径算法 DSA - 地图着色算法 DSA - 分数背包问题 DSA - 作业排序截止日期 DSA - 最佳合并模式算法

动态规划

DSA - 动态规划 DSA - 矩阵链乘法 DSA - Floyd Warshall 算法 DSA - 0-1 背包问题 DSA - 最长公共子序列算法 DSA - 旅行商问题(动态方法)

近似算法

DSA - 近似算法 DSA - 顶点覆盖算法 DSA - 集合覆盖问题 DSA - 旅行商问题(近似方法)

随机算法

DSA - 随机算法 DSA - 随机快速排序算法 DSA - Karger 最小割算法 DSA - Fisher-Yates 洗牌算法

DSA 有用资源

DSA - 问答 DSA - 快速指南 DSA - 有用资源 DSA - 讨论


Boyer Moore Algorithm

Boyer Moore Algorithm for Pattern Matching

The Boyer Moore Algorithm is used to determine whether a given pattern is present within a specified text or not. It follows a backward approach for pattern searching/matching. The task of searching a particular pattern within a given string is known as a pattern searching problem. For example, if the text is "THIS IS A SAMPLE TEXT" and the pattern is "TEXT", then the output should be 10, which is the index of the first occurrence of pattern in the given text.

This algorithm was developed by Robert Boyer and J Strother Moore in 1977. It is considered as the most efficient and widely used algorithm for pattern matching.

How does Boyer Moore Algorithm work?

In the previous chapters, we have seen the naive way to solve this problem which involves sliding the pattern over the text one by one and comparing each character. However, this approach is very slow, as it takes O(n*m) time, where 'n' is the length of the text and 'm' is the length of the pattern. The Boyer Moore algorithm improves this by preprocessing the pattern and using two heuristics to skip some comparisons that are not going to match.

The two heuristics are as follows −

  • Bad character heuristic − This heuristic uses a table that stores the last occurrence of each character in the pattern. When a mismatch occurs at some character(bad character) in the text, the algorithm checks if this character appears in the pattern. If it does, then it shifts the pattern such that the last occurrence of this character in the pattern aligns with the bad character in the text. If it does not, then it shifts the pattern past the bad character.

  • Good suffix heuristic − This heuristic uses another table that stores shift information when the bad heuristic fails. In this case, we look within the pattern till bad character become good suffix of the text. Then we shift onward to find the given pattern.

Boyer Solution

The Boyer-Moore algorithm combines these two heuristics by choosing the maximum shift suggested by them at each step. In this procedure, the substring or pattern is searched from the last character of the pattern. When a substring of the main string matches with a substring of the pattern, it moves to find other occurrences of the matched substring. If there is a mismatch, it applies the heuristics and shifts the pattern accordingly. The algorithm stops when it finds a complete match or when it reaches the end of the text.

The Boyer-Moore algorithm has a worst-case time complexity of O(nm), but, it can perform much better than that. In fact, in some cases, it can achieve a sublinear time complexity of O(n/m), which means that it can skip some characters in the text without comparing them. This happens when the pattern has no repeated characters or when it has a large alphabet size.

To illustrate how the Boyer-Moore algorithm works, let's consider an example −

Input:
main String: "AABAAABCEDBABCDDEBC" 
pattern: "ABC"
Output:
Pattern found at position: 5
Pattern found at position: 11

Example

In the following example, we are going to illustrate the working of Boyer-Moore Algorithm in various programming languages.

#include<stdio.h> 
#include<string.h> 
// Function for full suffix match
void computeFullShift(int shiftArr[], int longSuffArr[], char patrn[], int n) {
   int i = n;
   int j = n+1;
   longSuffArr[i] = j;
   while(i > 0) {
      // Searching right if (i-1)th and (j-1)th item are not the same
      while(j <= n && patrn[i-1] != patrn[j-1] ) {
          // to shift pattern from i to j
         if(shiftArr[j] == 0) {
            shiftArr[j] = j-i; 
         }
         // Updating long suffix value
         j = longSuffArr[j]; 
      }
      i--;
      j--;
      longSuffArr[i] = j;
   }  
}
// Function for good suffix match
void computeGoodSuffix(int shiftArr[], int longSuffArr[], char patrn[], int n) {
   int j;
   j = longSuffArr[0];
   // Looping through the pattern
   for(int i = 0; i<n; i++) {
      // setting shift to long suffix value
      if(shiftArr[i] == 0) {
         shiftArr[i] = j; 
         if(i == j) {
            // Updating long suffix value
            j = longSuffArr[j]; 
         }
      }
   }
}
// Function for searching the pattern
void searchPattern(char orgnStr[], char patrn[], int array[], int *index) {
   // length of the pattern
   int patLen = strlen(patrn); 
   // length of main string
   int strLen = strlen(orgnStr);
   int longerSuffArray[patLen+1];
   int shiftArr[patLen + 1];
   // Initializing shift array elements to 0
   for(int i = 0; i<=patLen; i++) {
      shiftArr[i] = 0;
   }
   // Calling computeFullShift function
   computeFullShift(shiftArr, longerSuffArray, patrn, patLen); 
   // Calling computeGoodSuffix function
   computeGoodSuffix(shiftArr, longerSuffArray, patrn, patLen); 
   int shift = 0;
   while(shift <= (strLen - patLen)) {
      int j = patLen - 1;
      // decrement j when pattern and main string character is matching
      while(j >= 0 && patrn[j] == orgnStr[shift+j]) {
         j--; 
      }
      if(j < 0) {
         (*index)++;
         // to store the position where pattern is found
         array[(*index)] = shift; 
         shift += shiftArr[0];
      }else {
          shift += shiftArr[j+1];
      }
   }
}
int main() {
   // original string    
   char orgnStr[] = "AABAAABCEDBABCDDEBC"; 
   // Pattern to be searched
   char patrn[] = "ABC"; 
   // Array to store the positions where pattern is found
   int locArray[strlen(orgnStr)]; 
   int index = -1;
   // Calling the searchPattern function
   searchPattern(orgnStr, patrn, locArray, &index); 
   // Printing the positions where pattern is found
   for(int i = 0; i <= index; i++) {
      printf("Pattern found at position: %d
", locArray[i]);
   }
   return 0;
}
#include<iostream> 
using namespace std; 
// Function for full suffix match
void computeFullShift(int shiftArr[], int longSuffArr[], string patrn) {
   // length of pattern
   int n = patrn.size(); 
   int i = n;
   int j = n+1;
   longSuffArr[i] = j;
   while(i > 0) {
      // Searching right if (i-1)th and (j-1)th item are not the same
      while(j <= n && patrn[i-1] != patrn[j-1] ) {
          // to shift pattern from i to j
         if(shiftArr[j] == 0) {
            shiftArr[j] = j-i; 
         }
         // Updating long suffix value
         j = longSuffArr[j]; 
      }
      i--;
      j--;
      longSuffArr[i] = j;
   }  
}
// Function for good suffix match
void computeGoodSuffix(int shiftArr[], int longSuffArr[], string patrn) {
   // length of the pattern
   int n = patrn.size(); 
   int j;
   j = longSuffArr[0];
   // Looping through the pattern
   for(int i = 0; i<n; i++) {
      // setting shift to long suffix value
      if(shiftArr[i] == 0) {
         shiftArr[i] = j; 
         if(i == j) {
            // Updating long suffix value
            j = longSuffArr[j]; 
         }
      }
   }
}
// Function for searching the pattern
void searchPattern(string orgnStr, string patrn, int array[], int *index) {
   // length of the pattern
   int patLen = patrn.size(); 
   // length of main string
   int strLen = orgnStr.size();
   int longerSuffArray[patLen+1];
   int shiftArr[patLen + 1];
   // Initializing shift array elements to 0
   for(int i = 0; i<=patLen; i++) {
      shiftArr[i] = 0;
   }
   // Calling computeFullShift function
   computeFullShift(shiftArr, longerSuffArray, patrn); 
   // Calling computeGoodSuffix function
   computeGoodSuffix(shiftArr, longerSuffArray, patrn); 
   int shift = 0;
   while(shift <= (strLen - patLen)) {
      int j = patLen - 1;
      // decrement j when pattern and main string character is matching
      while(j >= 0 && patrn[j] == orgnStr[shift+j]) {
         j--; 
      }
      if(j < 0) {
         (*index)++;
         // to store the position where pattern is found
         array[(*index)] = shift; 
         shift += shiftArr[0];
      }else {
          shift += shiftArr[j+1];
      }
   }
}
int main() {
   // original string    
   string orgnStr = "AABAAABCEDBABCDDEBC"; 
   // Pattern to be searched
   string patrn = "ABC"; 
   // Array to store the positions where pattern is found
   int locArray[orgnStr.size()]; 
   int index = -1;
   // Calling the searchPattern function
   searchPattern(orgnStr, patrn, locArray, &index); 
   // Printing the positions where pattern is found
   for(int i = 0; i <= index; i++) {
      cout << "Pattern found at position: " << locArray[i]<<endl;
   }
}
public class BMalgo {
   // method for full suffix match
   static void computeFullShift(int[] shiftArr, int[] longSuffArr, String patrn) {
      // length of pattern
      int n = patrn.length();
      int i = n;
      int j = n+1;
      longSuffArr[i] = j;
      while(i > 0) {
         // Searching right if (i-1)th and (j-1)th item are not the same
         while(j <= n && patrn.charAt(i-1) != patrn.charAt(j-1)) {
            // to shift pattern from i to j
            if(shiftArr[j] == 0) {
               shiftArr[j] = j-i;
            }
            // Updating long suffix value
            j = longSuffArr[j];
         }
         i--;
         j--;
         longSuffArr[i] = j;
      }
   }
   // method for good suffix match
   static void computeGoodSuffix(int[] shiftArr, int[] longSuffArr, String patrn) {
      // length of the pattern
      int n = patrn.length();
      int j;
      j = longSuffArr[0];
      // Looping through the pattern
      for(int i = 0; i<n; i++) {
         // setting shift to long suffix value
         if(shiftArr[i] == 0) {
            shiftArr[i] = j;
            if(i == j) {
               // Updating long suffix value
               j = longSuffArr[j];
            }
         }
      }
   }
   // method for searching the pattern
   static void searchPattern(String orgnStr, String patrn, int[] array, int[] index) {
      // length of the pattern
      int patLen = patrn.length();
      // length of main string
      int strLen = orgnStr.length();
      int[] longerSuffArray = new int[patLen+1];
      int[] shiftArr = new int[patLen + 1];
      // Initializing shift array elements to 0
      for(int i = 0; i<=patLen; i++) {
         shiftArr[i] = 0;
      }
      // Calling computeFullShift method
      computeFullShift(shiftArr, longerSuffArray, patrn);
      // Calling computeGoodSuffix method
      computeGoodSuffix(shiftArr, longerSuffArray, patrn);
      int shift = 0;
      while(shift <= (strLen - patLen)) {
         int j = patLen - 1;
         // decrement j when pattern and main string character is matching
         while(j >= 0 && patrn.charAt(j) == orgnStr.charAt(shift+j)) {
            j--;
         }
         if(j < 0) {
            index[0]++;
            // to store the position where pattern is found
            array[index[0]] = shift;
            shift += shiftArr[0];
         }else {
            shift += shiftArr[j+1];
         }
      }
   }
   public static void main(String[] args) {
      // original string
      String orgnStr = "AABAAABCEDBABCDDEBC";
      // Pattern to be searched
      String patrn = "ABC";
      // Array to store the positions where pattern is found
      int[] locArray = new int[orgnStr.length()];
      int[] index = {-1};
      // Calling the searchPattern method
      searchPattern(orgnStr, patrn, locArray, index);
      // Printing the positions where pattern is found
      for(int i = 0; i <= index[0]; i++) {
         System.out.println("Pattern found at position: " + locArray[i]);
      }
   }
}
# Function for full suffix match
def compute_full_shift(shift_arr, long_suff_arr, patrn):
    # length of pattern
    n = len(patrn)
    i = n
    j = n+1
    long_suff_arr[i] = j
    while i > 0:
        # Searching right if (i-1)th and (j-1)th item are not the same
        while j <= n and patrn[i-1] != patrn[j-1]:
            # to shift pattern from i to j
            if shift_arr[j] == 0:
                shift_arr[j] = j-i
            # Updating long suffix value
            j = long_suff_arr[j]
        i -= 1
        j -= 1
        long_suff_arr[i] = j
# Function for good suffix match
def compute_good_suffix(shift_arr, long_suff_arr, patrn):
    # length of the pattern
    n = len(patrn)
    j = long_suff_arr[0]
    # Looping through the pattern
    for i in range(n):
        # setting shift to long suffix value
        if shift_arr[i] == 0:
            shift_arr[i] = j
            if i == j:
                # Updating long suffix value
                j = long_suff_arr[j]

# Function for searching the pattern
def search_pattern(orgn_str, patrn, array, index):
    # length of the pattern
    pat_len = len(patrn)
    # length of main string
    str_len = len(orgn_str)
    longer_suff_array = [0]*(pat_len+1)
    shift_arr = [0]*(pat_len + 1)
    # Initializing shift array elements to 0
    for i in range(pat_len+1):
        shift_arr[i] = 0
    # Calling compute_full_shift function
    compute_full_shift(shift_arr, longer_suff_array, patrn)
    # Calling compute_good_suffix function
    compute_good_suffix(shift_arr, longer_suff_array, patrn)
    shift = 0
    while shift <= (str_len - pat_len):
        j = pat_len - 1
        # decrement j when pattern and main string character is matching
        while j >= 0 and patrn[j] == orgn_str[shift+j]:
            j -= 1
        if j < 0:
            index[0] += 1
            # to store the position where pattern is found
            array[index[0]] = shift
            shift += shift_arr[0]
        else:
            shift += shift_arr[j+1]

# original string
orgn_str = "AABAAABCEDBABCDDEBC"
# Pattern to be searched
patrn = "ABC"
# Array to store the positions where pattern is found
loc_array = [0]*len(orgn_str)
index = [-1]
# Calling the search_pattern function
search_pattern(orgn_str, patrn, loc_array, index)
# Printing the positions where pattern is found
for i in range(index[0]+1):
    print("Pattern found at position: ", loc_array[i])

Output

Pattern found at position: 5
Pattern found at position: 11