regex - vb.netの2つのコメント間のhtmlを抽出する正規表現コードが機能しない

Question

2つのコメントの間のhtmlの一部を抽出しようとしています。

テストコードは次のとおりです。

Sub Main()

    Dim base_dir As String = "D:\"
    Dim test_file As String = base_dir & "72.htm"

    Dim start_comment As String = "<!-- start of content -->"
    Dim end_comment As String = "<!-- end of content -->"

    Dim regex_pattern As String = start_comment & ".*" & end_comment
    Dim input_text As String = start_comment & "some more html text" & end_comment 

    Dim match As Match = Regex.Match(input_text, regex_pattern)


    If match.Success Then
        Console.WriteLine("found {0}", match.Value)
    Else
        Console.WriteLine("not found")
    End If

    Console.ReadLine()

End Sub

上記は動作します。

ディスクから実際のデータを読み込もうとすると、以下のコードが失敗します。

Sub Main()

    Dim base_dir As String = "D:\"
    Dim test_file As String = base_dir & "72.htm"

    Dim start_comment As String = "<!-- start of content -->"
    Dim end_comment As String = "<!-- end of content -->"

    Dim regex_pattern As String = start_comment & ".*" & end_comment
    Dim input_text As String = System.IO.File.ReadAllText(test_file).Replace(vbCrLf, "") 

    Dim match As Match = Regex.Match(input_text, regex_pattern)


    If match.Success Then
        Console.WriteLine("found {0}", match.Value)
    Else
        Console.WriteLine("not found")
    End If

    Console.ReadLine()

End Sub

HTMLファイルには、開始コメントと終了コメント、およびその間に大量のHTMLが含まれています。HTMLファイルの一部のコンテンツはアラビア語です。

感謝と敬意を表して。

score 2 · Accepted Answer

このように渡してみてRegexOptions.SinglelineくださいRegex.Match(...)：

Dim match As Match = Regex.Match(input_text, regex_pattern, RegexOptions.Singleline)

これにより、ドットの.一致改行が作成されます。

score 0 · Accepted Answer

わかりませんがvb.net、.改行と一致しますか、それともそのために設定する必要のあるオプションがありますか？改行を含める[\s\S]代わりに使用することを検討してください。.

regex - vb.netの2つのコメント間のhtmlを抽出する正規表現コードが機能しない

2 に答える 2

Related