r - 複数の順序付けられていない分割引数を持つRstrsplit？

Question

与えられた文字列

test_1<-"abc def,ghi klm"
test_2<-"abc, def ghi klm"

入手したい

"abc"
"def"
"ghi"

ただし、strsplitを使用する場合は、文字列内の分割値の順序を知っている必要があります。strsplitは最初の値を使用して最初の分割を実行し、2番目の値を使用して2番目の分割を実行してからリサイクルします。

しかし、これはしません：

strsplit(test_1, c(",", " "))
strsplit(test_2, c(" ", ","))

strsplit(test_2, split=c("[:punct:]","[:space:]"))[[1]]

分割値のいずれかが1つのステップで見つかった場合は、どこでも文字列を分割しようとしています。

score 70 · Accepted Answer

Actually strsplit uses grep patterns as well. (A comma is a regex metacharacter whereas a space is not; hence the need for double escaping the commas in the pattern argument. So the use of "\\s" would be more to improve readability than of necessity):

> strsplit(test_1, "\\, |\\,| ")  # three possibilities OR'ed
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_2, "\\, |\\,| ")
[[1]]
[1] "abc" "def" "ghi" "klm"

Without using both \\, and \\, (note extra space that SO does not show) you would have gotten some character(0) values. Might have been clearer if I had written:

> strsplit(test_2, "\\,\\s|\\,|\\s")
[[1]]
[1] "abc" "def" "ghi" "klm"

@Fojtasek is so right: Using character classes often simplifies the task because it creates an implicit logical OR:

> strsplit(test_2, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_1, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"

score 9 · Accepted Answer

正規表現が気に入らない場合は、strsplit()複数回呼び出すことができます。

strsplits <- function(x, splits, ...)
{
    for (split in splits)
    {
        x <- unlist(strsplit(x, split, ...))
    }
    return(x[!x == ""]) # Remove empty values
}

strsplits(test_1, c(" ", ","))
# "abc" "def" "ghi" "klm"
strsplits(test_2, c(" ", ","))
# "abc" "def" "ghi" "klm"

追加された例のために更新

strsplits(test_1, c("[[:punct:]]","[[:space:]]"))
# "abc" "def" "ghi" "klm"
strsplits(test_2, c("[[:punct:]]","[[:space:]]"))
# "abc" "def" "ghi" "klm"

ただし、正規表現を使用する場合は、@DWinのアプローチを使用することをお勧めします。

strsplit(test_1, "[[:punct:][:space:]]+")[[1]]
# "abc" "def" "ghi" "klm"
strsplit(test_2, "[[:punct:][:space:]]+")[[1]]
# "abc" "def" "ghi" "klm"

score 5 · Accepted Answer

5

あなたはと行くことができますstrsplit(test_1, "\\W")。

于 2012-05-24T13:55:27.320 に答える

score 1 · Accepted Answer

 test_1<-"abc def,ghi klm"
 test_2<-"abc, def ghi klm"
 key_words <- c("abc","def","ghi")
 matches <- str_c(key_words, collapse ="|")
 str_extract_all(test_1, matches)
 str_extract_all(test_2, matches)

r - 複数の順序付けられていない分割引数を持つRstrsplit？

4 に答える 4

Related

Reference