多分このようなもの:
library(stringr)
body <- "Scene 6: Second Lord: Nay, good my lord, put him to't; let him have his way. First Lord: If your lordship find him not a hilding, hold me no more in your respect. Second Lord: On my life, my lord, a bubble. BERTRAM: Do you think I am so far deceived in him? Second Lord: Believe it, my lord, in mine own direct knowledge, without any malice, but to speak of him as my kinsman, he's a most notable coward, an infinite and endless liar, an hourly promise-breaker, the owner of no one good quality worthy your lordship's entertainment."
p <- str_extract_all(body, "[:.?] [A-z ]*:")
# and get rid of extra signs
p <- str_replace_all(p[[1]], "[?:.]", "")
# strip white spaces
p <- str_trim(p)
p
"Second Lord" "First Lord" "Second Lord" "BERTRAM" "Second Lord"
# unique players
unique(p)
[1] "Second Lord" "First Lord" "BERTRAM"
正規表現の説明: (完全ではありません)
str_extract_all(body, "[:.?] [A-z ]*:")
一致は、:
または.
または?
( [:.?]
) のいずれかで始まり、その後に空白が続きます。次の まで、任意の文字と空白が一致し:
ます。
位置を取得
str_locate_all
同じ正規表現で使用できます:
str_locate_all(body, "[:.?] [A-z ]*:")