regex - Rはパターンからパターンまですべてをキャプチャします

Question

2つのパターンの間にある部分文字列を抽出しようとしていBBます</p>：

require("stringr")
str = "<notes>\n  <p>AA:</p>\n   <p>BB: word, otherword</p>\n    <p>Number:</p>\n    <p>Level: 1</p>\n"
str_extract(str, "BB.*?:</p>")

抽出された部分文字列は「word、otherword」である必要がありますが、キャプチャしすぎています。

  [1] "BB: word, otherword</p>\n    <p>Number:</p>"

score 2 · Accepted Answer

たぶん、このようなものですか？

> gsub(".*BB: (.*?)</p>.*$", "\\1", str)
# [1] "word, otherword"

score 2 · Accepted Answer

これは Perl 正規表現の仕事です。つまり、先読み参照と後読み参照です。次のようstringrに、正規表現を関数でラップできます。perl

str_extract(str, perl("(?<=BB: ).*?(?=</p>)"))
[1] "word, otherword"

ベースでこれを行うこともできます：

regmatches(str, regexpr(perl("(?<=BB: ).*?(?=</p>)"), str, perl=TRUE))
[1] "word, otherword"

regex - Rはパターンからパターンまですべてをキャプチャします

2 に答える 2

Related

Reference