regex - GNU libc regexec（）を使用して部分文字列をカウントします

Question

GNU libc regexec（）と一致する正規表現を使用して、文字列に部分文字列が出現する回数をカウントすることは可能ですか？

score 2 · Accepted Answer

いいえ、regexec() は呼び出しごとに 1 つの一致しか見つけません。次の一致を見つけたい場合は、文字列に沿ってさらに呼び出す必要があります。

単純な部分文字列のみを検索したい場合は、標準の C string.h 関数strstr()を使用する方がはるかに優れています。特殊な正規表現文字のエスケープについて心配する必要はありません。

score 0 · Accepted Answer

私は50の評判を持っていないので、別の回答を作成して申し訳ありません. @Oscar Raig Colon の回答にコメントすることはできません。

pmatch はすべての部分文字列に一致するわけではありません。pmatch は、部分式のオフセットを保存するために使用されます。重要なのは、部分式が何であるかを理解することです。部分式は、BRE では「\(\)」、ERE では「()」です。正規表現全体に部分式がない場合、regexec() は最初の一致文字列のオフセットのみを返し、それを pmatch[0] に配置します。

[ http://pubs.opengroup.org/onlinepubs/007908799/xsh/regcomp.html][1]で例を見つけることができます。

以下は、regexec() で REG_NOTBOL フラグを使用して、ユーザーが指定したパターンに一致する行内のすべての部分文字列を検索する方法を示しています。(例を簡単にするために、エラーチェックはほとんど行われません。)

(void) regcomp (&re, pattern, 0);
/* this call to regexec() finds the first match on the line */
error = regexec (&re, &buffer[0], 1, &pm, 0);
while (error == 0) {    /* while matches found */
    /* substring found between pm.rm_so and pm.rm_eo */
    /* This call to regexec() finds the next match */
    error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
}

score 0 · Accepted Answer

regexec は、4 番目のパラメーター "pmatch" で、すべての一致を含む構造を返します。「pmatch」は固定サイズの構造体です。さらに一致するものがある場合は、もう一度関数を呼び出します。

ネストされたループが 2 つあるこのコードを見つけたので、修正しました。元のタラはhttp://www.lemoda.net/c/unix-regex/index.htmlにあります:

static int match_regex (regex_t * r, const char * to_match)
{
    /* "P" is a pointer into the string which points to the end of the
       previous match. */
    const char * p = to_match;
    /* "N_matches" is the maximum number of matches allowed. */
    const int n_matches = 10;
    /* "M" contains the matches found. */
    regmatch_t m[n_matches];
    int number_of_matches = 0;
    while (1) {
        int i = 0;
        int nomatch = regexec (r, p, n_matches, m, 0);
        if (nomatch) {
            printf ("No more matches.\n");
            return nomatch;
        }
        for (i = 0; i < n_matches; i++) {
            if (m[i].rm_so == -1) {
                break;

            }
            number_of_matches ++;
        }
        p += m[0].rm_eo;
    }
    return number_of_matches ;
}

regex - GNU libc regexec（）を使用して部分文字列をカウントします

3 に答える 3

Related

Reference