c# - すべての最長共通部分文字列のリストとバリエーションのリストを生成する

Question

上級

文のリストで一般的な部分文字列を折りたたんで、それらが異なる領域のみを提示しようとしています。だからこれを取る：

Please don't kick any of the cats
Please do kick any of the cats
Please don't kick any of the dogs
Please do kick any of the dogs
Please don't kick any of the garden snakes
Please do pet any of the garden snakes

そしてこれを返します：

Please [don't|do] [kick|pet] any of the [cats|dogs|garden snakes]

詳細

Longest Common Substring アルゴリズムを見てきましたが、それは 2 つの文字列しか比較していないようです。
文字列内の単語全体を比較することにのみ関心があります。
文字列を左から右に評価するだけです。
珍しい部分文字列の長さは、同じ単語数にはなりません (「猫」と「庭のヘビ」)

アルゴリズムのヘルプを探しています。これは LCS 問題の変種だと思います。ある種のサフィックスツリーの処理だと思います。説明と実装の可能性がある疑似コードが理想的です。

もう一つの例

Please join thirteen of your friends at the Midnight Bash this Friday
Don't forget to join your friend John at the Midnight Bash tomorrow
Don't forget to join your friends John and Julie at the Midnight Bash tonight

になる：

[Please|Don't forget to]
join
[thirteen of your friends|your friend John|your friends John and Julie]
at the Midnight Bash
[this Friday|tomorrow|tonight]

たぶん、このアプローチ

このアプローチはどうですか...

for an array of sentences
  loop with the remaining sentence
    find the "first common substring (FCS)"
    split the sentences on the FCS
    every unique phrase before the FCS is part of the set of uncommon phrases
    trim the sentence by the first uncommon phrase
  end loop

score -1 · Accepted Answer

興味深いことに、これが実際には一種の AI であることに気付くまで、私はずっと前からあなたのようなものを作成することを考えていました. 考慮する要素が多すぎる: 文法、構文、状況、エラーなど。 |C2|..]" であれば、おそらく単純な Regex パターンで実行できます: "^Please\s*(?(don't|do))\s*(?\w+)+\s*\s のいずれか*(?.)*$".

c# - すべての最長共通部分文字列のリストとバリエーションのリストを生成する

上級

詳細

もう一つの例

たぶん、このアプローチ

2 に答える 2

Related

Reference