Practice
Data Structures and Algorithms
Machine Coding Round (LLD)
System Design & Architecture (HLD)
Frontend UI Machine Coding
Resources
Career Advice and Roadmaps
Data Structures and Algorithms
Machine Coding Round (LLD)
System Design & Architecture (HLD)
Backend Development
Frontend Development
Project Ideas for Software Developers
Core Computer Science
Companies
SDE Jobs & Internships
Interview Questions
Compare Companies
IDE
Online IDE
Collaborative IDE

Substring Search - III Editorial

DSA Editorial, Solution and Code

Practice Problem Link: Substring Search - III

Please make sure to try solving the problem yourself before looking at the editorial.

Problem Statement

Given two strings s1 and s2, find the index of the first occurrence of s2 in s1 as a substring.
If no such occurrence exists, return -1.

This problem is also known as finding a needle in a haystack.
Use the KMP algorithm to solve this problem.

Approach

In a naive string search algorithm, we need to check every substring of length s2 until a matching substring is found. But in KMP the approach is optimized by not checking the sub-patterns occurring multiple times. While comparing a substring, if a mismatch occurs we move to the next substring and since the previous and current substring have some common characters which has been already visited, the idea is to skip some of the characters from these visited characters that will match anyway.

To do this efficiently, the concept of the LPS is used. LPS of a string is the longest proper prefix which is also a suffix.

Example:

String: AABCDAA

Proper prefixes: A,AA,AAB,AABC,AABCD and AABCDA.

LPS of the above-given string is AA.

To solve this problem we will create an lps array of the string s2 where lps[i] denotes the length of the lps of substring s2[0…i].

Creating LPS Array (Examples)

String S: AACA

LPS array: [0, 1, 0, 1]

String S: BACDABACD

LPS array: [0, 0, 0, 0, 0, 1, 2, 3, 4]

After creating the LPS array all that needs to be done is searching the occurrence of the string s2 in s1 with the help of the LPS array in linear time.

  • Start matching both the given strings s1 and s2 with initial indexes as i = 0 and j = 0 respectively.
  • While s1[i] == s2[j] simply increment i and j.
  • If j is equal to the length of the string s2 then the first occurrence of s2 is found so return the starting index of this substring.
  • In case of a mismatch (s1[i] != s2[j]), we will set j = lps[j - 1], if (j > 0) as we know that all the previous characters of s2 will match anyway.

Analysis

  • Time Complexity: O(n)
  • Auxiliary Space Complexity: O(n)

Implementation

C++
int findStartIndexOfSubstring(string s1, string s2) {
    vector<int> lps(s2.size(), 0);
	int i = 1, j = 0;
	while(i < s2.size()) {
		if(s2[i] == s2[j]) {
			j++;
			lps[i++] = j;
		} else {
			if(j == 0) {
				lps[i++] = 0;
			} else {
				j = lps[j - 1];
			}
		}
	}
	i = 0;
	j = 0;
	while(i < s1.size()) {
		if(s1[i] == s2[j]) {
			i++;
			j++;
		}
		if(j == s2.size()) {
			return i - j;
		} else if (i < s1.size() && s1[i] != s2[j]) {
			if (j == 0) {
				i++;
			} else {
				j = lps[j - 1];
			}
		}
	}
	return - 1;
}
Java
class Solution {
    int findStartIndexOfSubstring(String s1, String s2) {
        int[] lps = new int[s2.length()];
		int i = 1, j = 0;
		while(i < s2.length()) {
			if(s2.charAt(i) == s2.charAt(j)) {
				j++;
				lps[i++] = j;
			} else {
				if(j == 0) {
					lps[i++] = 0;
				} else {
					j = lps[j - 1];
				}
			}
		}
		i = 0;
		j = 0;
		while(i < s1.length()) {
			if(s1.charAt(i) == s2.charAt(j)) {
				i++;
				j++;
			}
			if(j == s2.length()) {
				return i - j;
			} else if (i < s1.length() && s1.charAt(i) != s2.charAt(j)) {
				if (j == 0) {
					i++;
				} else {
					j = lps[j - 1];
				}
			}
		}
		return - 1;
    }
}
Related Content
Anagrams
Compare Version Numbers
Count And Say
Integer to Roman Numeral
Longest Common Prefix
Longest Palindrome in String
Insert Minimum To Make Palindrome
Reverse Words in String
Roman Numeral to Integer
Substring Search - I
Substring Search - II
Substring Search - IV
SDE Bootcamp - Become a software engineer at a product-based company
Practice Data Structures & Algorithms
Learning Resources
Interview Prep Resources
Community
Join our community
Blog
  • Career Advice and Roadmaps
  • Data Structures & Algorithms
  • Machine Coding Round (LLD)
  • System Design & Architecture
  • Backend Development
  • Frontend Development
  • Awesome Project Ideas
  • Core Computer Science
Practice Questions
  • Machine Coding (LLD) Questions
  • System Design (HLD) Questions
  • Topic-wise DSA Questions
  • Company-wise DSA Questions
  • DSA Sheets (Curated Lists)
  • JavaScript Interview Questions
  • Frontend UI Machine Coding Questions
Online Compilers (IDE)
  • Online Java Compiler
  • Online C++ Compiler
  • Online C Compiler
  • Online Python Compiler
  • Online JavaScript Compiler
Topic-wise Problems
  • Dynamic Programming Interview Questions
  • Linked List Interview Questions
  • Graph Interview Questions
  • Backtracking Interview Questions
  • Arrays Interview Questions
  • Trees Interview Questions
Company-wise Problems
  • Amazon Interview Questions
  • Microsoft Interview Questions
  • Google Interview Questions
  • Flipkart Interview Questions
  • Adobe Interview Questions
  • Facebook Interview Questions
DSA Sheets (Curated Lists)
  • Top Interview Questions
  • FAANG Interview Questions
  • Most Asked Interview Questions
  • 6 month DSA Practice Sheet
  • 3 month DSA Practice Sheet
  • Last minute DSA Practice Sheet