regex - 文字ベクトル内の 2 つの特定の単語の間にあるすべての単語を抽出する

Question

より効率的な方法はありますか？なしでこれを行うにはどうすればよいstringrですか？

txt <- "I want to extract the words between this and that, this goes with that, this is a long way from that"

library(stringr)
w_start <- "this"
w_end <- "that"
pattern <- paste0(w_start, "(.*?)", w_end)
wordsbetween <- unlist(str_extract_all(txt, pattern))
gsub("^\\s+|\\s+$", "", str_sub(wordsbetween, nchar(w_start)+1, -nchar(w_end)-1))
[1] "and"                "goes with"          "is a long way from"

score 12 · Accepted Answer

これは、qdap で使用するアプローチです。

qdap の使用:

library(qdap)
genXtract(txt, "this", "that")

## > genXtract(txt, "this", "that")
##         this  :  that1         this  :  that2         this  :  that3 
##                " and "          " goes with " " is a long way from "

アドオンパッケージなし:

regmatches(txt, gregexpr("(?<=this).*?(?=that)", txt, perl=TRUE))

## > regmatches(txt, gregexpr("(?<=this).*?(?=that)", txt, perl=TRUE))
## [[1]]
## [1] " and "                " goes with "          " is a long way from "

score 1 · Accepted Answer

strsplitこれは、おそらくさらに洗練される可能性がありますが、を使用した別の大まかな試みです。

txtspl <- unlist(strsplit(gsub("[[:punct:]]","",txt),"this|that"))
txtspl[txtspl!=" "][-1]

#[1] " and "                " goes with "          " is a long way from "

regex - 文字ベクトル内の 2 つの特定の単語の間にあるすべての単語を抽出する

2 に答える 2

Related

Reference