javascript - JavaScript 正規表現でオプションのグループを先読みで一致させる

Question

正規表現を使用して文字列一致の問題を解決しようとしています。次の形式の URL を照合する必要があります。

http://soundcloud.com/okapi23/dont-turn-your-back/

そして、このフォームの URL を「拒否」する必要があります。

http://soundcloud.com/okapi23/sets/happily-reversed/

末尾の「/」は明らかにオプションです。

だから基本的に：

ホスト名の後に 2 つまたは 3 つのグループがあり、2 番目のグループが"sets"と等しい場合、正規表現は一致しません。
「sets」は URL のどこにでも含めることができます
「セット」は完全に一致する必要があります

私がこれまでに思いついたのはhttp(s)?://(www\.)?soundcloud\.com/.+/(?!sets)\b(/.+)?、失敗することです。

助言がありますか？タスクを簡素化するライブラリはありますか (たとえば、末尾のスラッシュをオプションにするなど)?

score 5 · Accepted Answer

OPが、特定の文字列に次の要件を満たすURLが含まれているかどうかをテストする必要があると想定します。

URLスキームはまたはのいずれhttp:かである必要がありますhttps:。
URL権限はまたはのいずれかである必要があり//soundcloud.comます//www.soundcloud.com。
URLパスが存在し、2つまたは3つのパスセグメントが含まれている必要があります。
2番目のパスセグメントは次のようになってはなりません"sets"。
各パスセグメントは、英数字（）のみで構成される1つ以上の「単語」で構成されている必要[A-Za-z0-9]があり、複数の単語は1つのダッシュまたはアンダースコアで区切られます。
URLには、クエリまたはフラグメントコンポーネントがあってはなりません。
URLパスは、オプションので終わる場合があります"/"。
URLは大文字と小文字を区別せずに一致する必要があります。

これは、トリックを実行するテスト済みのJavaScript関数（完全にコメント化された正規表現を使用）です。

function isValidCustomUrl(text) {
    /* Here is the regex commented in free-spacing mode:
    # Match specific URL having non-"sets" 2nd path segment.
    ^                          # Anchor to start of string.
    https?:                    # URL Scheme (http or https).
    //                         # Begin URL Authority.
    (?:www\.)?                 # Optional www subdomain.
    soundcloud\.com            # URL DNS domain.
    /                          # 1st path segment (can be: "sets").
    [A-Za-z0-9]+               # 1st word-portion (required).
    (?:                        # Zero or more extra word portions.
      [-_]                     # only if separated by one - or _.
      [A-Za-z0-9]+             # Additional word-portion.
    )*                         # Zero or more extra word portions.
    (?!/sets(?:/|$))           # Assert 2nd segment not "sets".
    (?:                        # 2nd and 3rd path segments.
      /                        # Additional path segment.
      [A-Za-z0-9]+             # 1st word-portion.
      (?:                      # Zero or more extra word portions.
        [-_]                   # only if separated by one - or _.
        [A-Za-z0-9]+           # Additional word-portion.
      )*                       # Zero or more extra word portions.
    ){1,2}                     # 2nd path segment required, 3rd optional.
    /?                         # URL may end with optional /.
    $                          # Anchor to end of string.
    */
    // Same regex in javascript syntax:
    var re = /^https?:\/\/(?:www\.)?soundcloud\.com\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*(?!\/sets(?:\/|$))(?:\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*){1,2}\/?$/i;
    if (re.test(text)) return true;
    return false;
}

score 4 · Accepted Answer

.使用の代わりに、[a-zA-Z][\w-]* 「任意の数の文字、数字、アンダースコア、またはハイフンが続く文字に一致する」ことを意味します。

^https?://(www\.)?soundcloud\.com/[a-zA-Z][\w-]*/(?!sets(/|$))[a-zA-Z][\ w-]*(/[a-zA-Z][\w-]*)?/?$

オプションの末尾のスラッシュを取得するには、/?$.

Javascript 正規表現リテラルでは、すべてのスラッシュをエスケープする必要があります。

score 1 · Accepted Answer

正規表現パターンを使用することをお勧めします

^https?:\/\/soundcloud\.com(?!\/[^\/]+\/sets(?:\/|$))(?:\/[^\/]+){2,3}\/?$

javascript - JavaScript 正規表現でオプションのグループを先読みで一致させる

3 に答える 3

Related

Reference