regex - 一部のテキストが一致するオプションである 2 つの文字列を照合しますか?

Question

言語入力のリストを取得し、データベースクエリから取得したものが一致するかどうかを確認する単純な Java 関数を作成しようとしています。検索を容易にするために、データベース内のすべての文字列が正規化されています。ここに例があります。

研究所 A は、次の言語入力のいずれかを持つ参加者を求めています (パイプ文字で区切られています|)。

{English | English, Spanish | Spanish}

つまり、このラボでは、英語のみ、スペイン語のみ、または英語とスペイン語のバイリンガルのいずれかの参加者を受け入れることができます。これは非常に簡単です。データベースの結果が"English"or"English, Spanish"またはを返す場合"Spanish"、関数は一致を見つけます。

ただし、私のデータベースは、参加者が特定の言語 (文字を使用) の最小限の言語入力しかない場合にもマークします~。

"English, ~Spanish" = participant hears English and a little Spanish
"English, ~Spanish, Russian" = participant hears English, Russian, and a little Spanish

これは私が問題を抱えているところです。"English, ~Spanish"との両方"English"でのようなものを一致させたい"English, Spanish"。

マークの付いた言語を削除/非表示にすることだけを考えていましたが、のみを~必要とする研究室がある場合、一致する必要がありますが、一致しません。{English, Spanish}"English, ~Spanish"

また、正規表現を使用してこのタスクを実行する方法も考えられません。どんな助けでも大歓迎です！

score 1 · Accepted Answer

これを試して

\b(English[, ~]+Spanish|Spanish|English)\b

コード

try {
    if (subjectString.matches("(?im)\\b(English[, ~]+Spanish|Spanish|English)\\b")) {
        // String matched entirely
    } else {
        // Match attempt failed
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

説明

"\\b" +               // Assert position at a word boundary
"(" +                // Match the regular expression below and capture its match into backreference number 1
                        // Match either the regular expression below (attempting the next alternative only if this one fails)
      "English" +          // Match the characters “English” literally
      "[, ~]" +            // Match a single character present in the list “, ~”
         "+" +                // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      "Spanish" +          // Match the characters “Spanish” literally
   "|" +                // Or match regular expression number 2 below (attempting the next alternative only if this one fails)
      "Spanish" +          // Match the characters “Spanish” literally
   "|" +                // Or match regular expression number 3 below (the entire group fails if this one fails to match)
      "English" +          // Match the characters “English” literally
")" +
"\\b"                 // Assert position at a word boundary

アップデート

より一般化された形式は次のようになります。

(?-i)\b([A-Z][a-z]+[, ~]+[a-z]+|[A-Z][a-z]+)\b

ところで、そうすると、このパターンはすべて大文字の単語に一致するため、混乱する可能性があります。RegEx パターンを生成する際にこの構文を使用することで、これを行うためのより良いオプションがあるかもしれません。

(A[, ~]+B|A|B)

どこAで、Bは言語の名前になります。これはより良いアプローチになると思います。

regex - 一部のテキストが一致するオプションである 2 つの文字列を照合しますか?

1 に答える 1

Related

Reference