Practice Problem Link: Substring Search - IV
Please make sure to try solving the problem yourself before looking at the editorial.
Problem Statement
Given two strings s1 and s2, find the index of the first occurrence of s2 in s1 as a substring.
If no such occurrence exists, return -1. This problem is also known as finding a needle in a haystack. Use the Z-algorithm to solve this problem.
Approach
The Z function is an algorithm that takes a string as an input, and return a list of integers of size same as that of the string, such that Z[i] will represent the length of the longest matching prefix of the original string, which matches with the string if it had started at the ith position, and all of it is done in linear time.
The algorithm to solve this problem using the Z algorithm is as follows:
- Concatenate the two strings
patternandoriginalinto a 3rd string, separating them by a character not present in either of the strings. - Compute the Z function of this new string.
- Traverse through the Z array, and check if at any point
Z[i] = Length of the pattern, if yes, return the index asi - length of pattern - 1. - If such a point is not found, return -1.
Construct Z Array in Linear Time:
To compute the Z function in linear time, we try to maintain an interval marked by pointers left and right, so that the right end of the interval is maximum possible and the range [left, right] is a prefix of the original string.
The key idea is:
- If we are outside this interval, we reset the values of left and right and start computing the interval again, and set
Z[i] = right - left + 1directly. - Else we try to compute the Z function by trying to maximize the right end of the interval and setting
Z[i] = min(Z[i - left], right - i + 1).
Analysis:
- Time Complexity: O(n + m) // n = length of 1st string, m = length of 2nd string.
- Space Complexity: O(n + m)
Implementation
C++
vector<int> zFunction(string s, int n) {
vector<int> z(n);
int left = 0, right = 0;
for(int i = 1; i < n; i++) {
if(i <= right) {
z[i] = min(right - i + 1, z[i - left]);
}
while(i + z[i] < n && s[z[i]] == s[z[i] + i]) {
z[i]++;
}
if(i + z[i] - 1 > right) {
left = i;
right = i + z[i] - 1;
}
}
return z;
}
int findStartIndexOfSubstring(string s1, string s2) {
string mixed = s2 + "$" + s1;
vector<int> getZ = zFunction(mixed, mixed.length());
for(int i = 0; i < mixed.length(); i++) {
if(getZ[i] == s2.length()) {
return (i - s2.length() - 1);
}
}
return -1;
}Java
class Solution {
int[] zFunction(char[] s, int n) {
int[] z = new int[n];
int left = 0, right = 0;
for(int i = 1; i < n; i++) {
if(i <= right) {
z[i] = Math.min(right - i + 1, z[i - left]);
}
while(i + z[i] < n && s[z[i]] == s[z[i] + i]) {
z[i]++;
}
if(i + z[i] - 1 > right) {
left = i;
right = i + z[i] - 1;
}
}
return z;
}
int findStartIndexOfSubstring(String s1, String s2) {
String mixed = s2 + "$" + s1;
char[] aux = mixed.toCharArray();
int[] getZ = zFunction(aux, mixed.length());
for(int i = 0; i < mixed.length(); i++) {
if(getZ[i] == s2.length()) {
return (i - s2.length() - 1);
}
}
return -1;
}
}