この投稿にあるレーベンシュタイン距離 VBA 関数を変更して、1 次元配列を使用しました。はるかに高速に実行されます。
'Calculate the Levenshtein Distance between two strings (the number of insertions,
'deletions, and substitutions needed to transform the first string into the second)
Public Function LevenshteinDistance2(ByRef s1 As String, ByRef s2 As String) As Long
Dim L1 As Long, L2 As Long, D() As Long, LD As Long 'Length of input strings and distance matrix
Dim i As Long, j As Long, ss2 As Long, ssL As Long, cost As Long 'loop counters, loop step, loop start, and cost of substitution for current letter
Dim cI As Long, cD As Long, cS As Long 'cost of next Insertion, Deletion and Substitution
Dim L1p1 As Long, L1p2 As Long 'Length of S1 + 1, Length of S1 + 2
L1 = Len(s1): L2 = Len(s2)
L1p1 = L1 + 1
L1p2 = L1 + 2
LD = (((L1 + 1) * (L2 + 1))) - 1
ReDim D(0 To LD)
ss2 = L1 + 1
For i = 0 To L1 Step 1: D(i) = i: Next i 'setup array positions 0,1,2,3,4,...
For j = 0 To LD Step ss2: D(j) = j / ss2: Next j 'setup array positions 0,1,2,3,4,...
For j = 1 To L2
ssL = (L1 + 1) * j
For i = (ssL + 1) To (ssL + L1)
If Mid$(s1, i Mod ssL, 1) <> Mid$(s2, j, 1) Then cost = 1 Else cost = 0
cI = D(i - 1) + 1
cD = D(i - L1p1) + 1
cS = D(i - L1p2) + cost
If cI <= cD Then 'Insertion or Substitution
If cI <= cS Then D(i) = cI Else D(i) = cS
Else 'Deletion or Substitution
If cD <= cS Then D(i) = cD Else D(i) = cS
End If
Next i
Next j
LevenshteinDistance2 = D(LD)
End Function
この関数を、長さ 11,304 の文字列 's1' と長さ 5,665 の 's2' でテストしました (> 6,400 万文字の比較)。上記の関数の 1 次元バージョンでは、私のマシンでの実行時間は ~24 秒です。上記のリンクで参照した元の 2 次元関数では、同じ文字列に対して約 37 秒かかります。以下に示すように、1 次元関数をさらに最適化しました。同じ文字列に約 10 秒かかります。
'Calculate the Levenshtein Distance between two strings (the number of insertions,
'deletions, and substitutions needed to transform the first string into the second)
Public Function LevenshteinDistance(ByRef s1 As String, ByRef s2 As String) As Long
Dim L1 As Long, L2 As Long, D() As Long, LD As Long 'Length of input strings and distance matrix
Dim i As Long, j As Long, ss2 As Long 'loop counters, loop step
Dim ssL As Long, cost As Long 'loop start, and cost of substitution for current letter
Dim cI As Long, cD As Long, cS As Long 'cost of next Insertion, Deletion and Substitution
Dim L1p1 As Long, L1p2 As Long 'Length of S1 + 1, Length of S1 + 2
Dim sss1() As String, sss2() As String 'Character arrays for string S1 & S2
L1 = Len(s1): L2 = Len(s2)
L1p1 = L1 + 1
L1p2 = L1 + 2
LD = (((L1 + 1) * (L2 + 1))) - 1
ReDim D(0 To LD)
ss2 = L1 + 1
For i = 0 To L1 Step 1: D(i) = i: Next i 'setup array positions 0,1,2,3,4,...
For j = 0 To LD Step ss2: D(j) = j / ss2: Next j 'setup array positions 0,1,2,3,4,...
ReDim sss1(1 To L1) 'Size character array S1
ReDim sss2(1 To L2) 'Size character array S2
For i = 1 To L1 Step 1: sss1(i) = Mid$(s1, i, 1): Next i 'Fill S1 character array
For i = 1 To L2 Step 1: sss2(i) = Mid$(s2, i, 1): Next i 'Fill S2 character array
For j = 1 To L2
ssL = (L1 + 1) * j
For i = (ssL + 1) To (ssL + L1)
If sss1(i Mod ssL) <> sss2(j) Then cost = 1 Else cost = 0
cI = D(i - 1) + 1
cD = D(i - L1p1) + 1
cS = D(i - L1p2) + cost
If cI <= cD Then 'Insertion or Substitution
If cI <= cS Then D(i) = cI Else D(i) = cS
Else 'Deletion or Substitution
If cD <= cS Then D(i) = cD Else D(i) = cS
End If
Next i
Next j
LevenshteinDistance = D(LD)
End Function