0

おおよそ次のようなデータフレームがあります。

March_created_at    March_email March_type  April_created_at April_email    April_type
3/11/12 7:28    jeremy@asynk.ch PushEvent   4/1/12 4:03                     PushEvent
3/11/12 7:28    jeremy@asynk.ch PushEvent   4/1/12 4:03                     PushEvent
3/11/12 7:28    jeremy@asynk.ch PushEvent   4/1/12 4:03                     PushEvent
3/11/12 7:28    jeremy@asynk.ch PushEvent   4/1/12 7:03     high            IssuesEvent
3/11/12 11:06   medium          PushEvent   4/1/12 13:57    medium          PushEvent
3/11/12 11:06   medium          PushEvent   4/1/12 13:57    medium          PushEvent
3/11/12 11:06   medium          PushEvent   4/1/12 13:57    medium          PushEvent
3/11/12 12:46                   PushEvent   
3/11/12 12:46                   PushEvent   
3/11/12 12:46                   PushEvent   

完全なデータセットはCSV ファイルとしてここにあります

現在、関数を使用して (Hadley Wickham に感謝)、特定のメール アドレスを文字列 (「高」や「中」など) に置き換えています。

# the find-and-replace function
replace_all <- function(df, pattern, replacement) {
  char <- vapply(df, function(x) is.factor(x) || is.character(x), logical(1))
  df[char] <- lapply(df[char], str_replace_all, pattern, replacement)  
  df
}

# the function call
df.new <- replace_all(df, fixed("bford@engineyard.com"), "core")

ただし、何も書き込まれていないセルもあります (たとえば、「March_email」列の 8 行目から 10 行目)。次の条件が当てはまる場合、これらのセルをすべて検索し、文字列「low」に置き換えます。

*同じ月に添付された日付があります (たとえば、行 8 ~ 10 の "March_created_at" 列に日付があるため、"March_email" の空のセルは、置換が必要な欠落データを示します)

これは、メール列に空白のセルがある行に日付が添付されていない場合 (たとえば、4 月の列 8 ~ 10)、そこでは何も置き換えられないことを意味します。その範囲のデータがないだけです。

Rでこれを達成するにはどうすればよいですか?

付録:データセットの先頭の dput() は次のとおりです。

structure(list(March_created_at = c("2012-03-11 07:28:04", "2012-03-11 07:28:04", 
"2012-03-11 07:28:04", "2012-03-11 07:28:19", "2012-03-11 07:28:19", 
"2012-03-11 07:28:19"), March_actor_attributes_email = c("jeremy@asynk.ch", 
"jeremy@asynk.ch", "jeremy@asynk.ch", "jeremy@asynk.ch", "jeremy@asynk.ch", 
"jeremy@asynk.ch"), March_type = c("PushEvent", "PushEvent", 
"PushEvent", "PushEvent", "PushEvent", "PushEvent"), April_created_at = c("2012-04-01     04:03:13", 
"2012-04-01 04:03:13", "2012-04-01 04:03:13", "2012-04-01 07:03:11", 
"2012-04-01 07:03:11", "2012-04-01 07:03:11"), April_actor_attributes_email = c("", 
"", "", "high", "high", "high"), April_type = c("PushEvent", 
"PushEvent", "PushEvent", "IssuesEvent", "IssuesEvent", "IssuesEvent"
), May_created_at = c("2012-05-01 00:16:05", "2012-05-01 00:16:05", 
"2012-05-01 00:16:05", "2012-05-01 01:03:19", "2012-05-01 01:03:19", 
"2012-05-01 01:03:19"), May_actor_attributes_email = c("john.firebaugh@gmail.com", 
"john.firebaugh@gmail.com", "john.firebaugh@gmail.com", "mitch.tishmack@gmail.com", 
"mitch.tishmack@gmail.com", "mitch.tishmack@gmail.com"), May_type = c("PushEvent", 
"PushEvent", "PushEvent", "IssueCommentEvent", "IssueCommentEvent", 
"IssueCommentEvent"), June_created_at = c("2012-06-01 00:25:05", 
"2012-06-01 00:25:05", "2012-06-01 00:25:05", "2012-06-01 00:42:29", 
"2012-06-01 00:42:29", "2012-06-01 00:42:29"), June_actor_attributes_email =     c("michaelklishin@me.com", 
"michaelklishin@me.com", "michaelklishin@me.com", "", "", ""), 
    June_type = c("IssueCommentEvent", "IssueCommentEvent", "IssueCommentEvent", 
    "PushEvent", "PushEvent", "PushEvent"), July_created_at = c("2012-07-01 13:46:20", 
    "2012-07-01 13:46:20", "2012-07-02 11:53:37", "2012-07-02 11:53:37", 
    "2012-07-02 12:27:30", "2012-07-02 12:27:30"), July_actor_attributes_email = c("medium", 
    "medium", "ryoqun@gmail.com", "ryoqun@gmail.com", "ryoqun@gmail.com", 
    "ryoqun@gmail.com"), July_type = c("PushEvent", "PushEvent", 
    "CreateEvent", "CreateEvent", "PushEvent", "PushEvent"), 
    August_created_at = c("2012-08-01 00:04:09", "2012-08-01 00:04:09", 
    "2012-08-01 00:04:42", "2012-08-01 00:04:42", "2012-08-01 00:05:04", 
    "2012-08-01 00:05:04"), August_actor_attributes_email = c("jeremy@asynk.ch", 
    "jeremy@asynk.ch", "jeremy@asynk.ch", "jeremy@asynk.ch", 
    "jeremy@asynk.ch", "jeremy@asynk.ch"), August_type = c("IssueCommentEvent", 
    "IssueCommentEvent", "IssuesEvent", "IssuesEvent", "IssueCommentEvent", 
    "IssueCommentEvent"), September_created_at = c("2012-09-01 18:12:24", 
    "2012-09-01 18:12:24", "2012-09-01 23:51:18", "2012-09-01 23:51:18", 
    "2012-09-02 00:34:54", "2012-09-02 00:34:54"), September_actor_attributes_email = c("ryoqun@gmail.com", 
    "ryoqun@gmail.com", "ryoqun@gmail.com", "ryoqun@gmail.com", 
    "ryoqun@gmail.com", "ryoqun@gmail.com"), September_type = c("CommitCommentEvent", 
    "CommitCommentEvent", "CreateEvent", "CreateEvent", "PushEvent", 
    "PushEvent"), October_created_at = c("2012-10-01 07:48:38", 
    "2012-10-01 10:01:40", "2012-10-01 10:01:43", "2012-10-01 10:17:00", 
    "2012-10-01 16:08:29", "2012-10-01 18:06:46"), October_actor_attributes_email = c("medium", 
    "medium", "medium", "medium", "", "core"), October_type = c("PushEvent", 
    "IssuesEvent", "PushEvent", "PushEvent", "ForkEvent", "PullRequestEvent"
    )), .Names = c("March_created_at", "March_actor_attributes_email", 
"March_type", "April_created_at", "April_actor_attributes_email", 
"April_type", "May_created_at", "May_actor_attributes_email", 
"May_type", "June_created_at", "June_actor_attributes_email", 
"June_type", "July_created_at", "July_actor_attributes_email", 
"July_type", "August_created_at", "August_actor_attributes_email", 
"August_type", "September_created_at", "September_actor_attributes_email", 
"September_type", "October_created_at", "October_actor_attributes_email", 
"October_type"), row.names = c(NA, 6L), class = "data.frame") 
4

1 に答える 1

0

原則としてどちらでもよいと思います。

1)
df2[df2$March_actor_attributes_email == "" & df2$March_created_at !="","March_actor_attributes_email"] <- "low"

2)
df2$March_actor_attributes_email <- ifelse(df2$March_actor_attributes_email == "" & df2$March_created_at !="", "low", df2$March_actor_attributes_email)

注意が必要なのは、日付列です。空白でないだけでなく、フィールドに実際に日付が含まれていることを確認したい場合がありますが、それは構造化の方法によって異なります。

于 2012-10-15T20:31:40.093 に答える