html - 正規表現検索中にタグからコンテンツをスキップする方法は？

Question

重複の可能性：
正規表現は、XHTML自己完結型タグを除くオープンタグと一致します

私はこのようなhtmlである文字列を持っています

<html>
  <div>
      <p>this is sample content</p>
  </div>
  <div>
      <p>this is another sample</p>
      <span class="test">this sample should not caught</span>
      <div>
       this is another sample
      </div>
  </div>
</html>

今、私はこの文字列から単語を検索したいのですがsample、ここでは、内部にある「サンプル」を取得するべきではありません<span>...</span>

正規表現を使用してこれを実行したいのですが、たくさん試しましたが、できません。どんな助けでも素晴らしいです。

前もって感謝します。

score 4 · Accepted Answer

spanこれは非常に脆弱であり、ネストされたタグが存在する可能性がある場合は失敗します。それらがない場合は、試してみてください

(?s)sample(?!(?:(?!</?span).)*</span>)

sampleこれは、次のspanタグ（存在する場合）が終了タグでない場合にのみ一致します。

説明：

(?s)          # Switch on dot-matches-all mode
sample        # Match "sample".
(?!           # only if it's not followed by the following regex:
 (?:          #  Match...
  (?!</?span) #   (unless we're at the start of a span tag)
  .           #   any character
 )*           #  any number of times.
 </span>      #  Match a closing span tag.
)             # End of lookahead

内にないsample場合にのみ一致させるには、次を使用できます。spanp

(?s)sample(?!(?:(?!</?span).)*</span>)(?!(?:(?!</?p).)*</p>)

ただし、これはすべて、タグがネストされていないこと（つまり、同じ種類の2つのタグがネストされていないこと）と正しくバランスが取れていること（pタグで指定されていないことが多い）に完全に依存します。

html - 正規表現検索中にタグからコンテンツをスキップする方法は？

1 に答える 1

Related

Reference