c# - 文字列内のすべての種類の数値を検索する

Question

文字列には、たとえば、int、float、16進数を含めることができます。

「これは、-345と57を使用できる文字列であり、35.4656または微妙な0xF46434などを使用することもできます。」

C＃でこれらの番号を見つけるために何を使用できますか？

score 3 · Accepted Answer

これらの行に沿って何かを使用してください：（私は自分で書いたので、あなたが見つけようとしているどんな種類の数字でもすべてを網羅しているとは言いませんが、あなたの例ではうまくいきます）

var str = "123 This a string than can have -345 and 57 and could also have 35.4656 or a subtle 0XF46434 and more like -0xf46434";
var a = Regex.Matches(str, @"(?<=(^|[^a-zA-Z0-9_^]))(-?\d+(\.\d+)?|-?0[xX][0-9A-Fa-f]+)(?=([^a-zA-Z0-9_]|$))");
foreach (Match match in a)
{
    //do something
}

Regex は書き込み専用言語のようです (つまり、非常に読みにくい) ので、理解できるよう(?<=(^|[^a-zA-Z0-9_^]))に分解します。\b境界文字を考慮しているため使用できず、代わりに-のみ一致します。 10 進数、オプションで負、オプションで小数桁に一致します。 16 進数に一致します。大文字と小文字は区別されません。オプションで負の数も指定できます。最後に、単語境界としての先読みです。最初の境界では文字列の開始を許可し、ここでは文字列の終了を許可していることに注意してください。345-345-?\d+(\.\d+)?-?0[xX][0-9A-Fa-f]+(?=([^a-zA-Z0-9_]|$))

score 2 · Accepted Answer

各単語 todoubleとofを解析してみreturnてください。arraydouble

aからarrayofを取得する方法は次のとおりです。doublestring

double[] GetNumbers(string str)
{
    double num;
    List<double> l = new List<double>();
    foreach (string s in str.Split(' '))
    {
        bool isNum = double.TryParse(s, out num);
        if (isNum)
        {
            l.Add(num);
        }
    }
    return l.ToArray();
}

double.TryParse() ここについての詳細情報。

score 1 · Accepted Answer

Besides regex, which tends to have its own problems, you can build a state machine to do the processing. You can decide on which inputs the machine would accept as 'numbers'. Unlike regex, a state machine will have predictably decent performance, and will also give you predictable results (whereas regex can sometimes match rather surprising things).

It's not really that difficult, when you think about it. There are rather few states, and you can define special cases explicitly.

EDIT: The following is an edit as a response to the comment.
In .NET, Regex is implemented as an NFA (Nontdeterminisitc Finite Automaton). On one hand, it's a very powerful parser, but on the other, it can sometimes backtrack much more than it should. This is especially true when you're accepting unsafe input (input from the user, which can be just about anything). While I'm not sure what sort of Regex expression you'll be using to parse the result, you can induce a performance hit in pretty much anything. Although in most cases performance is a non-issue, Regex performance can scale exponentially with the input. That means that, in some cases, it really can be a bottleneck. And a rather unexpected one.

Another potential problem stemming from the greedy nature of Regex is that sometimes it can match unexpected things. You might use the same Regex expression for days, and it might work fine, waiting for the right combination of overlooked characters to be parsed, and you'll end up writing garbage into your database.

By state machine, I mean parsing the input using a deterministic finite automaton, or something like that. I'll show you what I mean. Here's a small DFA for parsing a positive decimal integer or float within a string. I'm pretty sure you can build a DFA using frameworks like ANTLR, though I'm sure there are also less powerful ones around.

score 1 · Accepted Answer

上記の入力を指定すると、この式はそこに存在するすべての数値と一致します

string line = "This a string than can have " + 
                      "-345 and 57 and could also have 35.4656 " +
                      "or a subtle 0xF46434 and more";

Regex r = new Regex(@"(-?0[Xx][A-Fa-f0-9]+|-?\d+\.\d+|-?\d+)");
var m = r.Matches(line);
foreach(Match h in m)
    Console.WriteLine(h.ToString());

編集: 置換の場合は、MatchEvaluator オーバーロードを取る Replace メソッドを使用します

string result = r.Replace(line, new MatchEvaluator(replacementMethod));
public string replacementMethod(Match match)
{
   return "?????";
}

正規表現パターンの説明

まず、このシーケンス"(pattern1|pattern2|pattern3)"は、文字列で 3 つのパターンを見つけることができることを意味します。それらの1つで十分です

最初のパターン-?0[Xx][A-Fa-f0-9]+は、オプションのマイナスの後にゼロが続き、その後に X または x 文字が続き、その後に AF af または 0-9 の範囲の一連の 1 つ以上の文字が続くことを意味します

2 番目のパターン-?\d+\.\d+は、オプションのマイナスの後に一連の 1 つ以上の数字が続き、その後に小数点が続き、その後に一連の 1 つ以上の数字が続くことを意味します。

3 番目のパターン-?\d+は、オプションのマイナスの後に一連の 1 つ以上の数字が続くことを意味します。

パターンの順序が最も重要です。パターンを逆にして整数一致を小数パターンの前に置くと、結果は間違ったものになります。

c# - 文字列内のすべての種類の数値を検索する

4 に答える 4

Related

Reference