Boyer Moore Algorithm
Boyer Moore Algorithm for Pattern Matching
The Boyer Moore Algorithm is used to determine whether a given pattern is present within a specified text or not. It follows a backward approach for pattern searching/matching. The task of searching a particular pattern within a given string is known as a pattern searching problem. For example, if the text is "THIS IS A SAMPLE TEXT" and the pattern is "TEXT", then the output should be 10, which is the index of the first occurrence of pattern in the given text.
This algorithm was developed by Robert Boyer and J Strother Moore in 1977. It is considered as the most efficient and widely used algorithm for pattern matching.
How does Boyer Moore Algorithm work?
In the previous chapters, we have seen the naive way to solve this problem which involves sliding the pattern over the text one by one and comparing each character. However, this approach is very slow, as it takes O(n*m) time, where 'n' is the length of the text and 'm' is the length of the pattern. The Boyer Moore algorithm improves this by preprocessing the pattern and using two heuristics to skip some comparisons that are not going to match.
The two heuristics are as follows −
Bad character heuristic − This heuristic uses a table that stores the last occurrence of each character in the pattern. When a mismatch occurs at some character(bad character) in the text, the algorithm checks if this character appears in the pattern. If it does, then it shifts the pattern such that the last occurrence of this character in the pattern aligns with the bad character in the text. If it does not, then it shifts the pattern past the bad character.
Good suffix heuristic − This heuristic uses another table that stores shift information when the bad heuristic fails. In this case, we look within the pattern till bad character become good suffix of the text. Then we shift onward to find the given pattern.

The Boyer-Moore algorithm combines these two heuristics by choosing the maximum shift suggested by them at each step. In this procedure, the substring or pattern is searched from the last character of the pattern. When a substring of the main string matches with a substring of the pattern, it moves to find other occurrences of the matched substring. If there is a mismatch, it applies the heuristics and shifts the pattern accordingly. The algorithm stops when it finds a complete match or when it reaches the end of the text.
The Boyer-Moore algorithm has a worst-case time complexity of O(nm), but, it can perform much better than that. In fact, in some cases, it can achieve a sublinear time complexity of O(n/m), which means that it can skip some characters in the text without comparing them. This happens when the pattern has no repeated characters or when it has a large alphabet size.
To illustrate how the Boyer-Moore algorithm works, let's consider an example −
Input: main String: "AABAAABCEDBABCDDEBC" pattern: "ABC" Output: Pattern found at position: 5 Pattern found at position: 11
Example
In the following example, we are going to illustrate the working of Boyer-Moore Algorithm in various programming languages.
#include<stdio.h> #include<string.h> // Function for full suffix match void computeFullShift(int shiftArr[], int longSuffArr[], char patrn[], int n) { int i = n; int j = n+1; longSuffArr[i] = j; while(i > 0) { // Searching right if (i-1)th and (j-1)th item are not the same while(j <= n && patrn[i-1] != patrn[j-1] ) { // to shift pattern from i to j if(shiftArr[j] == 0) { shiftArr[j] = j-i; } // Updating long suffix value j = longSuffArr[j]; } i--; j--; longSuffArr[i] = j; } } // Function for good suffix match void computeGoodSuffix(int shiftArr[], int longSuffArr[], char patrn[], int n) { int j; j = longSuffArr[0]; // Looping through the pattern for(int i = 0; i<n; i++) { // setting shift to long suffix value if(shiftArr[i] == 0) { shiftArr[i] = j; if(i == j) { // Updating long suffix value j = longSuffArr[j]; } } } } // Function for searching the pattern void searchPattern(char orgnStr[], char patrn[], int array[], int *index) { // length of the pattern int patLen = strlen(patrn); // length of main string int strLen = strlen(orgnStr); int longerSuffArray[patLen+1]; int shiftArr[patLen + 1]; // Initializing shift array elements to 0 for(int i = 0; i<=patLen; i++) { shiftArr[i] = 0; } // Calling computeFullShift function computeFullShift(shiftArr, longerSuffArray, patrn, patLen); // Calling computeGoodSuffix function computeGoodSuffix(shiftArr, longerSuffArray, patrn, patLen); int shift = 0; while(shift <= (strLen - patLen)) { int j = patLen - 1; // decrement j when pattern and main string character is matching while(j >= 0 && patrn[j] == orgnStr[shift+j]) { j--; } if(j < 0) { (*index)++; // to store the position where pattern is found array[(*index)] = shift; shift += shiftArr[0]; }else { shift += shiftArr[j+1]; } } } int main() { // original string char orgnStr[] = "AABAAABCEDBABCDDEBC"; // Pattern to be searched char patrn[] = "ABC"; // Array to store the positions where pattern is found int locArray[strlen(orgnStr)]; int index = -1; // Calling the searchPattern function searchPattern(orgnStr, patrn, locArray, &index); // Printing the positions where pattern is found for(int i = 0; i <= index; i++) { printf("Pattern found at position: %d ", locArray[i]); } return 0; }
#include<iostream> using namespace std; // Function for full suffix match void computeFullShift(int shiftArr[], int longSuffArr[], string patrn) { // length of pattern int n = patrn.size(); int i = n; int j = n+1; longSuffArr[i] = j; while(i > 0) { // Searching right if (i-1)th and (j-1)th item are not the same while(j <= n && patrn[i-1] != patrn[j-1] ) { // to shift pattern from i to j if(shiftArr[j] == 0) { shiftArr[j] = j-i; } // Updating long suffix value j = longSuffArr[j]; } i--; j--; longSuffArr[i] = j; } } // Function for good suffix match void computeGoodSuffix(int shiftArr[], int longSuffArr[], string patrn) { // length of the pattern int n = patrn.size(); int j; j = longSuffArr[0]; // Looping through the pattern for(int i = 0; i<n; i++) { // setting shift to long suffix value if(shiftArr[i] == 0) { shiftArr[i] = j; if(i == j) { // Updating long suffix value j = longSuffArr[j]; } } } } // Function for searching the pattern void searchPattern(string orgnStr, string patrn, int array[], int *index) { // length of the pattern int patLen = patrn.size(); // length of main string int strLen = orgnStr.size(); int longerSuffArray[patLen+1]; int shiftArr[patLen + 1]; // Initializing shift array elements to 0 for(int i = 0; i<=patLen; i++) { shiftArr[i] = 0; } // Calling computeFullShift function computeFullShift(shiftArr, longerSuffArray, patrn); // Calling computeGoodSuffix function computeGoodSuffix(shiftArr, longerSuffArray, patrn); int shift = 0; while(shift <= (strLen - patLen)) { int j = patLen - 1; // decrement j when pattern and main string character is matching while(j >= 0 && patrn[j] == orgnStr[shift+j]) { j--; } if(j < 0) { (*index)++; // to store the position where pattern is found array[(*index)] = shift; shift += shiftArr[0]; }else { shift += shiftArr[j+1]; } } } int main() { // original string string orgnStr = "AABAAABCEDBABCDDEBC"; // Pattern to be searched string patrn = "ABC"; // Array to store the positions where pattern is found int locArray[orgnStr.size()]; int index = -1; // Calling the searchPattern function searchPattern(orgnStr, patrn, locArray, &index); // Printing the positions where pattern is found for(int i = 0; i <= index; i++) { cout << "Pattern found at position: " << locArray[i]<<endl; } }
public class BMalgo { // method for full suffix match static void computeFullShift(int[] shiftArr, int[] longSuffArr, String patrn) { // length of pattern int n = patrn.length(); int i = n; int j = n+1; longSuffArr[i] = j; while(i > 0) { // Searching right if (i-1)th and (j-1)th item are not the same while(j <= n && patrn.charAt(i-1) != patrn.charAt(j-1)) { // to shift pattern from i to j if(shiftArr[j] == 0) { shiftArr[j] = j-i; } // Updating long suffix value j = longSuffArr[j]; } i--; j--; longSuffArr[i] = j; } } // method for good suffix match static void computeGoodSuffix(int[] shiftArr, int[] longSuffArr, String patrn) { // length of the pattern int n = patrn.length(); int j; j = longSuffArr[0]; // Looping through the pattern for(int i = 0; i<n; i++) { // setting shift to long suffix value if(shiftArr[i] == 0) { shiftArr[i] = j; if(i == j) { // Updating long suffix value j = longSuffArr[j]; } } } } // method for searching the pattern static void searchPattern(String orgnStr, String patrn, int[] array, int[] index) { // length of the pattern int patLen = patrn.length(); // length of main string int strLen = orgnStr.length(); int[] longerSuffArray = new int[patLen+1]; int[] shiftArr = new int[patLen + 1]; // Initializing shift array elements to 0 for(int i = 0; i<=patLen; i++) { shiftArr[i] = 0; } // Calling computeFullShift method computeFullShift(shiftArr, longerSuffArray, patrn); // Calling computeGoodSuffix method computeGoodSuffix(shiftArr, longerSuffArray, patrn); int shift = 0; while(shift <= (strLen - patLen)) { int j = patLen - 1; // decrement j when pattern and main string character is matching while(j >= 0 && patrn.charAt(j) == orgnStr.charAt(shift+j)) { j--; } if(j < 0) { index[0]++; // to store the position where pattern is found array[index[0]] = shift; shift += shiftArr[0]; }else { shift += shiftArr[j+1]; } } } public static void main(String[] args) { // original string String orgnStr = "AABAAABCEDBABCDDEBC"; // Pattern to be searched String patrn = "ABC"; // Array to store the positions where pattern is found int[] locArray = new int[orgnStr.length()]; int[] index = {-1}; // Calling the searchPattern method searchPattern(orgnStr, patrn, locArray, index); // Printing the positions where pattern is found for(int i = 0; i <= index[0]; i++) { System.out.println("Pattern found at position: " + locArray[i]); } } }
# Function for full suffix match def compute_full_shift(shift_arr, long_suff_arr, patrn): # length of pattern n = len(patrn) i = n j = n+1 long_suff_arr[i] = j while i > 0: # Searching right if (i-1)th and (j-1)th item are not the same while j <= n and patrn[i-1] != patrn[j-1]: # to shift pattern from i to j if shift_arr[j] == 0: shift_arr[j] = j-i # Updating long suffix value j = long_suff_arr[j] i -= 1 j -= 1 long_suff_arr[i] = j # Function for good suffix match def compute_good_suffix(shift_arr, long_suff_arr, patrn): # length of the pattern n = len(patrn) j = long_suff_arr[0] # Looping through the pattern for i in range(n): # setting shift to long suffix value if shift_arr[i] == 0: shift_arr[i] = j if i == j: # Updating long suffix value j = long_suff_arr[j] # Function for searching the pattern def search_pattern(orgn_str, patrn, array, index): # length of the pattern pat_len = len(patrn) # length of main string str_len = len(orgn_str) longer_suff_array = [0]*(pat_len+1) shift_arr = [0]*(pat_len + 1) # Initializing shift array elements to 0 for i in range(pat_len+1): shift_arr[i] = 0 # Calling compute_full_shift function compute_full_shift(shift_arr, longer_suff_array, patrn) # Calling compute_good_suffix function compute_good_suffix(shift_arr, longer_suff_array, patrn) shift = 0 while shift <= (str_len - pat_len): j = pat_len - 1 # decrement j when pattern and main string character is matching while j >= 0 and patrn[j] == orgn_str[shift+j]: j -= 1 if j < 0: index[0] += 1 # to store the position where pattern is found array[index[0]] = shift shift += shift_arr[0] else: shift += shift_arr[j+1] # original string orgn_str = "AABAAABCEDBABCDDEBC" # Pattern to be searched patrn = "ABC" # Array to store the positions where pattern is found loc_array = [0]*len(orgn_str) index = [-1] # Calling the search_pattern function search_pattern(orgn_str, patrn, loc_array, index) # Printing the positions where pattern is found for i in range(index[0]+1): print("Pattern found at position: ", loc_array[i])
Output
Pattern found at position: 5 Pattern found at position: 11