c# - フィルタリングと正規表現による置換の両方を行う効率的な方法

Question

文字列の配列を使用していて、次のことを実行したいと思います。

//Regex regex; List<string> strList; List<string> strList2; 
foreach (string str in strList){
    if (regex.IsMatch(str)) {      //only need in new array if matches...
        strList2.Add(regex.Replace(str, myMatchEvaluator)) 
                                   //but still have to apply transformation
    }
}

今、私はそれが機能することを知っていますが、それは事実上、配列内の各文字列で同じ正規表現を2回実行することを意味します。これらのステップ（フィルタリングと変換）の両方を1つの正規表現解析呼び出しに折りたたむ方法はありますか？

（ほとんどの場合に機能するのは

string str2 = regex.Replace(str, myMatchEvaluator);
if (str2 == str)
    strList2.Add(str2);

しかし、それでも交換の必要がないいくつかの有効な一致がスローされることがよくあります。）

編集：これがトリッキーである理由を説明するための、私のものとほぼ同じ正規表現の例：ログファイルの行の先頭にある単語を探し、それらを大文字にしたい場合を想像してみてください。

正規表現はnew Regex("^[a-z]+", RegexOptions.IgnorePatternWhiteSpace)、になり、置換関数はになりますmatch => match.ToUpper()。

現在、いくつかの最初の単語はすでに大文字になっているので、それらを捨てたくありません。一方、行の単語のすべてのインスタンスを大文字にするのではなく、最初のインスタンスだけを大文字にします。

score 2 · Accepted Answer

独自のマッチエバリュエーターを作成できます。

private class DetectEvaluator {
    public bool HasBeenAvaluated { get; private set }
    private MatchEvaluator evaluator;
    public DetectEvaluator(MatchEvaluator evaluator) { 
        HasBeenAvaluated = false;
        this.evaluator = evaluator;
    }
    public string Evaluate(Match m) {
        HasBeenAvaluated = true;
        return evaluator(m);
    }
}

次に、チェックごとに新しいものを作成します。

var de1 = new DetectEvaluator(myMatchEvaluator);
string str2 = regex.Replace(str, de1.Evaluate);
if( de1.HasBeenEvaluated ) strList2.Add(str2);

しかし、ここでは読みやすさの改善は見られません。

score 1 · Accepted Answer

単語のリストを更新する一致評価器として、ラムダ関数を使用できます。

IEnumerable<string> Replaces(string source)
{
    var rx = new Regex(@"\w+m", RegexOptions.IgnoreCase); // match words ending with 'm'
    var result = new List<string>(); 
    rx.Replace(source, m => { result.Add(m.ToString().ToUpper()); return m.ToString(); });
    return result;
}

    List<string> GetReplacements(List<string> sources) {
        var rx = new Regex(@"\w+m", RegexOptions.IgnoreCase); // match words ending with 'm'.
        var replacements = new List<string>(sources.Count);   // no need to allocate more space than needed.

        foreach(string source in sources) 
            // for each string in sources that matches 'rx', add the ToUpper() version to the result and replace 'source' with itself.
            rx.Replace(source, m  => {replacements.Add(m.ToString().ToUpper()); return m.ToString(); });

        return replacements;
    }

    List<string> GetReplacements2(List<string> sources) {
        var rx = new Regex(@"\w+m", RegexOptions.IgnoreCase); // match words ending with 'm'.
        var replacements = new List<string>(sources.Count);   // no need to allocate more space than needed.

        foreach(string source in sources) {
            var m = rx.Match(source);                         // do one rx match
            if (m.Success)                                    // if successfull
                replacements.Add(m.ToString().ToUpper());     // add to result.
        }

        return replacements;
    }

元のソースを変更し、変更されていない一致を収集する必要がある場合は、ラムダ式の部分を交換します。

score 0 · Accepted Answer

私が受け取ったすべての回答に基づいて、次のように機能します。

void AddToIfMatch(List<string> list, string str; Regex regex; 
                                        MatchEvaluator evaluator)
{
    bool hasBeenEvaluated = false;
    string str2 = regex.Replace(
        str, 
        m => {HasBeenEvaluated = true; return evaluator(m);}
    );
    if( hasBeenEvaluated ) {list.Add(str2);}
}

score 0 · Accepted Answer

このようなものは機能しますか？

foreach (string str in strList)
{
    str = regex.Replace(str, delegate(Match thisMatch) {
        // only gets here if matched the regex already
        string str2 = yourReplacementFunction(thisMatch);  
        strList2.Add(str2);

        return thisMatch.Value;

    }); 
}

c# - フィルタリングと正規表現による置換の両方を行う効率的な方法

4 に答える 4

Related

Reference