Practice Problem Link: Substring Search - II
Please make sure to try solving the problem yourself before looking at the editorial.
Problem Statement
Given two strings s1 and s2, find the index of the first occurrence of s2 in s1 as a substring.
If no such occurence exists, return -1.
This problem is also known as finding needle in a haystack.
Use the Rabin-Karp algorithm to solve this problem.
Approach
We will use the Rabin - Karp algorithm to solve this problem. The Rabin Karp algorithm relies on the technique of String Hashing as a prerequisite. The Rabin Karp algorithm is an optimization over the standard naive algorithm of sliding over the substrings of the string to search for the pattern, in the way that first it compares the hash values of the 2 strings, and if they match then only it begins to do the character by character matching for confirmation. The hash values we need to know are:
- Hash Value of the pattern.
- Hash Values of all the substrings of the first string.
Finding Hash Values of Substrings Quickly:
The hash function should be computable from the current hash value and the character next in line in the string. By choosing such a hash function, we can compute the hash as:
hash[s[i]…..s[j]] = rehash(text[s[i]….s[j]], hash[s[i]….s[j - 1]])
The rehash function should be chosen so as to perform in constant time.
Analysis:
- Time Complexity: O(|s1| * |s2|) // Average Case is O(|s1| + |s2|)
- Space Complexity: O(1)
Implementation
C++
const int CHARSET = 26;
const int MOD = 29;
int rabinKarp(string pattern, string text, int m, int n) {
int hash = 1, patternHash = 0, textHash = 0;
for(int i = 1; i < m; i++) {
hash *= CHARSET;
hash %= MOD;
}
for(int i = 0; i < m; i++) {
patternHash = (patternHash * CHARSET + pattern[i]) % MOD;
textHash = (textHash * CHARSET + text[i]) % MOD;
}
for(int i = 0; i <= n - m; i++) {
if(patternHash == textHash) {
int index = 0;
for(int j = 0; j < m; j += 1, index += 1) {
if(text[i + j] != pattern[j]) {
break;
}
}
if(index == m) {
return i;
}
}
if(i + m < n) {
textHash = (text[i + m] + CHARSET * (textHash - text[i] * hash)) % MOD;
if(textHash < 0) {
textHash += MOD;
}
}
}
return -1;
}
int findStartIndexOfSubstring(string s1, string s2) {
int m = s2.length();
int n = s1.length();
if(m > n) {
return -1;
}
return rabinKarp(s2, s1, m, n);
}Java
class Solution {
int CHARSET = 26;
int MOD = 29;
int rabinKarp(char[] pattern, char[] text, int m, int n) {
int hash = 1, patternHash = 0, textHash = 0;
for(int i = 1; i < m; i++) {
hash *= CHARSET;
hash %= MOD;
}
for(int i = 0; i < m; i++) {
patternHash = (patternHash * CHARSET + pattern[i]) % MOD;
textHash = (textHash * CHARSET + text[i]) % MOD;
}
for(int i = 0; i <= n - m; i++) {
if(patternHash == textHash) {
int index = 0;
for(int j = 0; j < m; j += 1, index += 1) {
if(text[i + j] != pattern[j]) {
break;
}
}
if(index == m) {
return i;
}
}
if(i + m < n) {
textHash = (text[i + m] + CHARSET * (textHash - text[i] * hash)) % MOD;
if(textHash < 0) {
textHash += MOD;
}
}
}
return -1;
}
int findStartIndexOfSubstring(String s1, String s2) {
int m = s2.length();
int n = s1.length();
if(m > n) {
return -1;
}
return rabinKarp(s2.toCharArray(), s1.toCharArray(), m, n);
}
}