Problem Statement
Pattern: Related: 1143. Longest Common Subsequence
Solution
Integer[][] cache;
int dp(char[] edit, char[] target, int n1, int n2) {
// base condition
if (n1 < 0 || n2 < 0) return (n1 >= n2) ? n1 + 1 : n2 + 1;
if(cache[n1][n2] != null) return cache[n1][n2];
// same character
if (edit[n1] == target[n2]) return cache[n1][n2] = dp(edit, target, n1 - 1, n2 - 1);
int replace = dp(edit, target, n1 - 1, n2 - 1) + 1;
int insert = dp(edit, target, n1, n2 - 1) + 1;
int delete = dp(edit, target, n1 - 1, n2) + 1;
return cache[n1][n2] = Math.min(replace, Math.min(insert, delete));
}
public int minDistance(String word1, String word2) {
char[] edit = word1.toCharArray();
char[] target = word2.toCharArray();
cache = new Integer[edit.length + 1][target.length + 1];
return dp(edit, target, edit.length - 1, target.length - 1);
}
TC : SC :
Notes
We have 2 strings, edit
, we pick anyone that we will transform, and target
, the other one we transfer into. If we,
- Shift pointers of both fwd
- add 1 to the solution → character has been replace
Relevant
Different types of edit distance allow different sets of string operations. For instance:
- The Levenshtein distance allows deletion, insertion and substitution.
- The longest common subsequence (LCS) distance allows only insertion and deletion, not substitution.
- The Hamming distance allows only substitution, hence, it only applies to strings of the same length.
- The Damerau–Levenshtein distance allows insertion, deletion, substitution, and the transposition of two adjacent characters.
- The Jaro distance allows only transposition.
The Levenshtein distance between “kitten” and “sitting” is 3. A minimal edit script that transforms the former into the latter is:
- kitten → sitten (substitute “s” for “k”)
- sitten → sittin (substitute “i” for “e”)
- sittin → sitting (insert “g” at the end)
LCS distance (insertions and deletions only) gives a different distance and minimal edit script:
- kitten → itten (delete “k” at 0)
- itten → sitten (insert “s” at 0)
- sitten → sittn (delete “e” at 4)
- sittn → sittin (insert “i” at 4)
- sittin → sitting (insert “g” at 6)
for a total cost/distance of 5 operations.