0

一部の HTML をスクリーン スクレイピングしようとしていますが、改行間でマッチングに問題があります (.Net 内)。

これはテキストです:

<td class=abc><span class=label>XXX</span></td>
<td class=def><span class=field>YYY</span></td>

私はYYYこの式と一致しようとしています

<td class=abc><span class=label>XXX</span></td>\n<td class=def><span class=field>(.*)</span></td>

私は\n行を分けていますが、一致しません...何かアイデアはありますか?

[編集]

\n の代わりに追加\r\nすると、うまくいきました。

4

2 に答える 2

1

You need to use the multi-line modifier m for your regex. In VB.NET this is supplied as an option for a regex expression. But you also need to escape all forward-slashes using a backslash:

<td class=abc><span class=label>XXX<\/span><\/td>\n<td class=def><span class=field>(.*)<\/span><\/td>

Please note, though, that regex is a very poor way to parse HTML - there are HTML parsers in most languages that do a much better job.

And your regex is very detailed and, therefore, brittle; an additional space would cause it to fail.

Note that in Windows newlines are typically created with a carriage-return and newline combination \r\n.

Here is an example supplying the Multiline option:

Dim rex As New Regex("\bsomething\b", RegexOptions.MultiLine)

Regex Options :MSDN

于 2013-09-23T23:27:54.860 に答える