c# - C# を使用して文字列内の複数の単語を置き換える方法は?

Question

文字列から複数の単語 (500 以上など) を置き換える (削除する) 方法を考えています。replace 関数を使用して 1 つの単語を置換できることはわかっていますが、500 以上の単語を置換したい場合はどうすればよいでしょうか? 記事から一般的なキーワード ("and"、"I"、"you" など) をすべて削除したいと考えています。

ここに1つの交換用のコードがあります..私は500以上を探しています..

        string a = "why and you it";
        string b = a.Replace("why", "");
        MessageBox.Show(b);

ありがとう

@ Sergey Kucher テキストのサイズは、数百語から数千語までさまざまです。これらの単語をランダムな記事から置き換えています。

score 0 · Accepted Answer

正規表現はこれをより適切に行うことができます。リスト内のすべての置換単語が必要なだけで、次のようになります。

var escapedStrings = yourReplaces.Select(PadAndEscape);
string result = Regex.Replace(yourInput, string.Join("|", escapedStrings);

これには、文字列をエスケープする前にスペースを埋め込む関数が必要です。

public string PadAndEscape(string s)
{
    return Regex.Escape(" " + s + " ");
}

score 0 · Accepted Answer

もちろん状況によって異なり
ますが、テキストが長く、単語が多く、
パフォーマンスを最適化したい場合。

単語からトライを作成し、トライで一致するものを検索する必要があります。

複雑さの順序を下げることはありませんが、それでも O(nm) ですが、単語の大きなグループの場合、1 つずつではなく、各文字に対して複数の単語をチェックできます。
これを高速化するには、数百語で十分だと思います。

これは私の意見では最速の方法であり、私
はあなたが始めるための関数を書きました:

public struct FindRecord
    {
        public int WordIndex;
        public int PositionInString;
    }

    public static FindRecord[] FindAll(string input, string[] words)
    {
        LinkedList<FindRecord> result = new LinkedList<FindRecord>();
        int[] matchs = new int[words.Length];

        for (int i = 0; i < input.Length; i++)
        {
            for (int j = 0; j < words.Length; j++)
            {
                if (input[i] == words[j][matchs[j]])
                {
                    matchs[j]++;
                    if(matchs[j] == words[j].Length)
                    {
                        FindRecord findRecord = new FindRecord {WordIndex = j, PositionInString = i - matchs[j] + 1};
                        result.AddLast(findRecord);
                        matchs[j] = 0;
                    }

                }
                else
                    matchs[j] = 0;
            }
        }
        return result.ToArray();
    }

別のオプション:
正規表現がコードのビルドよりも高速になるまれなケースかもしれません。

使ってみて

public static string ReplaceAll(string input, string[] words)
    {
        string wordlist = string.Join("|", words);
        Regex rx = new Regex(wordlist, RegexOptions.Compiled);
        return rx.Replace(input, m => "");
    }

c# - C# を使用して文字列内の複数の単語を置き換える方法は?

6 に答える 6

Related

Reference