Knuth-Morris-Pratt Algorithm
KMP Algorithm for Pattern Matching
The KMP algorithm is used to solve the pattern matching problem which is a task of finding all the occurrences of a given pattern in a text. It is very useful when it comes to finding multiple patterns. For instance, if the text is "aabbaaccaabbaadde" and the pattern is "aabaa", then the pattern occurs twice in the text, at indices 0 and 8.
The naive solution to this problem is to compare the pattern with every possible substring of the text, starting from the leftmost position and moving rightwards. This takes O(n*m) time, where 'n' is the length of the text and 'm' is the length of the pattern.
When we work with long text documents, the brute force and naive approaches may result in redundant comparisons. To avoid such redundancy, Knuth, Morris, and Pratt developed a linear sequence-matching algorithm named the KMP pattern matching algorithm. It is also referred to as Knuth Morris Pratt pattern matching algorithm.
How does KMP Algorithm work?
The KMP algorithm starts the search operation from left to right. It uses the prefix function to avoid unnecessary comparisons while searching for the pattern. This function stores the number of characters matched so far which is known as LPS value. The following steps are involved in KMP algorithm −
Define a prefix function.
Slide the pattern over the text for comparison.
If all the characters match, we have found a match.
If not, use the prefix function to skip the unnecessary comparisons. If the LPS value of previous character from the mismatched character is '0', then start comparison from index 0 of pattern with the next character in the text. However, if the LPS value is more than '0', start the comparison from index value equal to LPS value of the previously mismatched character.

The KMP algorithm takes O(n + m) time and O(m) space. It is faster than the naive solution because it skips the redundant comparisons, and only compares each character of the text at most once.
Let's understand the input-output scenario of a pattern matching problem with an example −
Input: main String: "AAAABCAAAABCBAAAABC" pattern: "AAABC" Output: Pattern found at position: 1 Pattern found at position: 7 Pattern found at position: 14
Example
The following example practically illustrates the KMP algorithm for pattern matching.
#include<stdio.h> #include<stdlib.h> #include<string.h> // function to find prefix void prefixSearch(char* pat, int m, int* pps) { int length = 0; // array to store prefix pps[0] = 0; int i = 1; while(i < m) { // to check if the current character matches the previous character if(pat[i] == pat[length]) { // increment the length length++; // store the length in the prefix array pps[i] = length; }else { if(length != 0) { // to update length of previous prefix length length = pps[length - 1]; i--; } else // if the length is 0, store 0 in the prefix array pps[i] = 0; } i++; // incrementing i } } // function to search for pattern void patrnSearch(char* orgnString, char* patt, int m, int *locArray, int *loc) { int n, i = 0, j = 0; n = strlen(orgnString); // array to store the prefix values int* prefixArray = (int*)malloc(m * sizeof(int)); // allocate memory for the prefix array // calling prefix function to fill the prefix array prefixSearch(patt, m, prefixArray); *loc = 0; // initialize the location index while(i < n) { // checking if main string character matches pattern string character if(orgnString[i] == patt[j]) { // increment both i and j i++; j++; } // if j and m are equal pattern is found if(j == m) { // store the location of the pattern locArray[*loc] = i-j; (*loc)++; // increment the location index // update j to the previous prefix value j = prefixArray[j-1]; // checking if i is less than n and the current characters do not match }else if(i < n && patt[j] != orgnString[i]) { if(j != 0) // update j to the previous prefix value j = prefixArray[j-1]; // if j is zero else i++; // increment i } } free(prefixArray); // free the memory of the prefix array } int main() { // declare the original text char* orgnStr = "AAAABCAEAAABCBDDAAAABC"; // pattern to be found char* patrn = "AAABC"; // get the size of the pattern int m = strlen(patrn); // array to store the locations of the pattern int locationArray[strlen(orgnStr)]; // to store the number of locations int index; // calling pattern search function patrnSearch(orgnStr, patrn, m, locationArray, &index); // to loop through location array for(int i = 0; i<index; i++) { // print the location of the pattern printf("Pattern found at location: %d ", locationArray[i]); } }
#include<iostream> using namespace std; // function to find prefix void prefixSearch(string pattern, int m, int storePrefx[]) { int length = 0; // array to store prefix storePrefx[0] = 0; int i = 1; while(i < m) { // to check if the current character matches the previous character if(pattern[i] == pattern[length]) { // increment the length length++; // store the length in the prefix array storePrefx[i] = length; }else { if(length != 0) { // to update length of previous prefix length length = storePrefx[length - 1]; i--; } else // if the length is 0, store 0 in the prefix array storePrefx[i] = 0; } i++; // incrementing i } } // function to search for pattern void patrnSearch(string orgnString, string patt, int *locArray, int &loc) { int n, m, i = 0, j = 0; n = orgnString.size(); m = patt.size(); // array to store the prefix values int prefixArray[m]; // calling prefix function to fill the prefix array prefixSearch(patt, m, prefixArray); loc = 0; // initialize the location index while(i < n) { // checking if main string character matches pattern string character if(orgnString[i] == patt[j]) { // increment both i and j i++; j++; } // if j and m are equal pattern is found if(j == m) { // store the location of the pattern locArray[loc] = i-j; loc++; // increment the location index // update j to the previous prefix value j = prefixArray[j-1]; // checking if i is less than n and the current characters do not match }else if(i < n && patt[j] != orgnString[i]) { if(j != 0) // update j to the previous prefix value j = prefixArray[j-1]; // if j is zero else i++; // increment i } } } int main() { // declare the original text string orgnStr = "AAAABCAEAAABCBDDAAAABC"; // pattern to be found string patrn = "AAABC"; // array to store the locations of the pattern int locationArray[orgnStr.size()]; // to store the number of locations int index; // calling pattern search function patrnSearch(orgnStr, patrn, locationArray, index); // to loop through location array for(int i = 0; i<index; i++) { // print the location of the pattern cout << "Pattern found at location: " <<locationArray[i] << endl; } }
import java.io.*; // class to implement the KMP algorithm public class KMPalgo { // function to find prefix public static void prefixSearch(String pat, int m, int[] storePrefx) { int length = 0; // array to store prefix storePrefx[0] = 0; int i = 1; while (i < m) { // to check if the current character matches the previous character if (pat.charAt(i) == pat.charAt(length)) { // increment the length length++; // store the length in the prefix array storePrefx[i] = length; } else { if (length != 0) { // to update length of previous prefix length length = storePrefx[length - 1]; i--; } else // if the length is 0, store 0 in the prefix array storePrefx[i] = 0; } i++; // incrementing i } } // function to search for pattern public static int patrnSearch(String orgnString, String patt, int[] locArray) { int n, m, i = 0, j = 0; n = orgnString.length(); m = patt.length(); // array to store the prefix values int[] prefixArray = new int[m]; // allocate memory for the prefix array // calling prefix function to fill the prefix array prefixSearch(patt, m, prefixArray); int loc = 0; // initialize the location index while (i < n) { // checking if main string character matches pattern string character if (orgnString.charAt(i) == patt.charAt(j)) { // increment both i and j i++; j++; } // if j and m are equal pattern is found if (j == m) { // store the location of the pattern locArray[loc] = i - j; loc++; // increment the location index // update j to the previous prefix value j = prefixArray[j - 1]; // checking if i is less than n and the current characters do not match } else if (i < n && patt.charAt(j) != orgnString.charAt(i)) { if (j != 0) // update j to the previous prefix value j = prefixArray[j - 1]; // if j is zero else i++; // increment i } } return loc; } public static void main(String[] args) throws IOException { // declare the original text String orgnStr = "AAAABCAEAAABCBDDAAAABC"; // pattern to be found String patrn = "AAABC"; // array to store the locations of the pattern int[] locationArray = new int[orgnStr.length()]; // calling pattern search function int index = patrnSearch(orgnStr, patrn, locationArray); // to loop through location array for (int i = 0; i < index; i++) { // print the location of pattern System.out.println("Pattern found at location: " + locationArray[i]); } } }
# function to find prefix def prefix_search(pattern, m, store_prefx): length = 0 # array to store prefix store_prefx[0] = 0 i = 1 while i < m: # to check if the current character matches the previous character if pattern[i] == pattern[length]: # increment the length length += 1 # store the length in the prefix array store_prefx[i] = length else: if length != 0: # to update length of previous prefix length length = store_prefx[length - 1] i -= 1 else: # if the length is 0, store 0 in the prefix array store_prefx[i] = 0 i += 1 # incrementing i # function to search for pattern def pattern_search(orgn_string, patt, loc_array): n = len(orgn_string) m = len(patt) i = j = loc = 0 # array to store the prefix values prefix_array = [0] * m # calling prefix function to fill the prefix array prefix_search(patt, m, prefix_array) while i < n: # checking if main string character matches pattern string character if orgn_string[i] == patt[j]: # increment both i and j i += 1 j += 1 # if j and m are equal pattern is found if j == m: # store the location of the pattern loc_array[loc] = i - j loc += 1 # increment the location index # update j to the previous prefix value j = prefix_array[j - 1] # checking if i is less than n and the current characters do not match elif i < n and patt[j] != orgn_string[i]: if j != 0: # update j to the previous prefix value j = prefix_array[j - 1] else: i += 1 # increment i return loc # main function def main(): # declare the original text orgn_str = "AAAABCAEAAABCBDDAAAABC" # pattern to be found patrn = "AAABC" # array to store the locations of the pattern location_array = [0] * len(orgn_str) # calling pattern search function index = pattern_search(orgn_str, patrn, location_array) # to loop through location array for i in range(index): # print the location of the pattern print("Pattern found at location:", location_array[i]) # call the main function if __name__ == "__main__": main()
Output
Pattern found at location: 1 Pattern found at location: 8 Pattern found at location: 17