r - 文字列から複数種類のパターンを抽出

Question

文字列から複数の種類のパターンを抽出しています。例えば、

「2013 年 3 月 25 日に 25000 で上場、2010 年 4 月 5 日に 10,250 ドルで売却」

日付「2013 年 3 月 25 日」「2010 年 4 月 5 日」をベクトル「日付」に、「25000」「$10,250」をベクトル金額に抽出したいと考えています。

text <- "Listed 03/25/2013 for 25000 and sold for $10,250 on 4/5/2010"
  # extract dates
dates <- str_extract_all(text,"\\d{1,2}\\/\\d{1,2}\\/\\d{4}")[[1]]
  # extract amounts
text2 <- as.character(gsub("\\d{1,2}\\/\\d{1,2}\\/\\d{4}", " ", text))
amountsdollar <- as.character(str_extract_all(text2,"\\$\\(?[0-9,.]+\\)?"))
text3 <- as.character(gsub("\\$\\(?[0-9,.]+\\)?", " ", text2))
amountsnum <- as.character(str_extract_all(text3,"\\(?[0-9,.]+\\)?"))
amounts <- as.vector(c(amountsdollar, amountsnum))
list(dates, amounts)

しかし、秩序は守られていません。それを行うより良い方法はありますか？ありがとう。

score 6 · Accepted Answer

ベースRはこの罰金を処理します

x <- "Listed 03/25/2013 for 25000 and sold for $10,250, on 4/5/2010"
date.pat <- '\\d{1,2}/\\d{1,2}/\\d{2,4}'
amount.pat <- '(?<=^| )[$,0-9]+[0-9](?=,|\\.|$| )'

dates <- regmatches(x, gregexpr(date.pat, x))
amounts <- regmatches(x, gregexpr(amount.pat, x, perl=TRUE))

r - 文字列から複数種類のパターンを抽出

1 に答える 1

Related

Reference