regex - R での正規表現ベースのリストマッチング

Question

リストの 1 つのサブセットを生成するために正規表現を使用して比較したい 2 つのリスト (より正確には、文字アトミックベクトル) があります。これには「for」ループを使用できますが、もっと簡単なコードはありますか? 以下は私の場合の例です：

# list of unique cities
city <- c('Berlin', 'Perth', 'Oslo')

# list of city-months, like 'New York-Dec'
temp <- c('Berlin-Jan', 'Delhi-Jan', 'Lima-Feb', 'Perth-Feb', 'Oslo-Jan')

# need sub-set of 'temp' for only 'Jan' month for only the items in 'city' list:
#   'Berlin-Jan', 'Oslo-Jan'

明確化の追加: 私がコードを探している実際のケースでは、「月」に相当する値はより複雑で、最初の 2 文字だけが関心のある情報値を持つランダムな英数字の値です (「01」である必要があります)。）。

実際の事例を追加：

# equivalent of 'city' in the first example
# values match pattern TCGA-[0-9A-Z]{2}-[0-9A-Z]{4}
patient <- c('TCGA-43-4897', 'TCGA-65-4897', 'TCGA-78-8904', 'TCGA-90-8984')

# equivalent of 'temp' in the first example
# values match pattern TCGA-[0-9A-Z]{2}-[0-9A-Z]{4}-[\d]{2}[0-9A-Z]+
sample <- c('TCGA-21-5732-01A333', 'TCGA-43-4897-01A159', 'TCGA-65-4897-01T76', 'TCGA-78-8904-11A70')

# sub-set wanted (must have '01' after the 'patient' ID part)
#   'TCGA-43-4897-01A159', 'TCGA-65-4897-01T76'

score 4 · Accepted Answer

このようなもの？

temp <- temp[grepl("Jan", temp)]
temp[sapply(strsplit(temp, "-"), "[[", 1) %in% city]
# [1] "Berlin-Jan" "Oslo-Jan"

さらに良いことに、@ agstudyからアイデアを借りています：

> temp[temp %in% paste0(city, "-Jan")]
# [1] "Berlin-Jan" "Oslo-Jan"

編集：これはどうですか？

> sample[gsub("(.*-01).*$", "\\1", sample) %in% paste0(patient, "-01")]
# [1] "TCGA-43-4897-01A159" "TCGA-65-4897-01T76"

score 3 · Accepted Answer

新しい要件を備えた、他のソリューションに続くソリューションは次のとおりです。

sample[na.omit(pmatch(paste0(patient, '-01'), sample))]

score 2 · Accepted Answer

使用できますgsub

x <- gsub(paste(paste(city,collapse='-Jan|'),'-Jan',sep=''),1,temp)
> temp[x==1]
[1] "Berlin-Jan" "Oslo-Jan"

ここのパターンは次のとおりです。

 "Berlin-Jan|Perth-Jan|Oslo-Jan"

score 1 · Accepted Answer

これは、2 つの部分文字列一致による解決策です...

temp[agrep("Jan",temp)[which(agrep("Jan",temp) %in% sapply(city, agrep, x=temp))]]
# [1] "Berlin-Jan" "Oslo-Jan"

楽しみのための機能として...

fun <- function(x,y,pattern) y[agrep(pattern,y)[which(agrep(pattern,y) %in% sapply(x, agrep, x=y))]]
# x is a vector containing your data for filter
# y is a vector containing the data to filter on
# pattern is the quoted pattern you're filtering on

fun(temp, city, "Jan")
# [1] "Berlin-Jan" "Oslo-Jan"

regex - R での正規表現ベースのリスト マッチング

4 に答える 4

Related

Reference

regex - R での正規表現ベースのリストマッチング