3

df「メッセージ」を含むデータ フレームがあります。各行はメッセージです。各メッセージには、df$messagedatePOSIXct 形式で呼び出されるタイムスタンプがあります%Y-%m-%d %H:%M:%S。例:

> head(df)
messageid   user.id    message.date         
123         999       2011-07-17 17:54:27
456         888       2011-07-19 16:56:50

(これは上記のdput()'ed バージョンです):

df <- structure(list(messageid = c(123L, 456L), user.id = c(999L, 888L), 
      message.date = structure(c(1310950467, 1311119810), class = c("POSIXct", 
      "POSIXt"), tzone = "")), .Names = c("messageid", "user.id", 
      "message.date"), row.names = c(NA, -2L), class = "data.frame")

1 日あたりの合計メッセージ数でデータ フレームを作成するにはどうすればよいですか? 例:

day                   message.count 
2011-07-17             1
2011-07-18             0
2011-07-19             1

メッセージのない日付を含めないのではなく、それらの日付の がmessage.countゼロに設定されていることを確認したいと思います。

私がこれまでに行ったこと: の暦日の部分を次のように抽出しましたmessage.date:

df$calendar.day<-as.POSIXct(strptime(substr(df$message.date,1,10),"%Y-%m-%d",tz="CST6CDT"))
> head(df$calendar.day)
[1] "2011-07-17 CDT" "2011-07-18 CDT" "2011-07-19 CDT"

そこから、日付範囲内のすべてのカレンダー日付のリストを生成できます: daterange <- seq(min(df$calendar.day), max(df$calendar.day), by="day")

4

2 に答える 2

2

sapply()これは、ログがまたがる各日付のメッセージ数をカウントするために使用するかなり簡単なソリューションです。

countMessages <- function(timeStamps) {
    Dates <- as.Date(strftime(df$message.date, "%Y-%m-%d"))
    allDates <- seq(from = min(Dates), to = max(Dates), by = "day")
    message.count <- sapply(allDates, FUN = function(X) sum(Dates == X))
    data.frame(day = allDates, message.count = message.count)
}

countMessages(df$message.date)
#          day message.count
# 1 2011-07-17             1
# 2 2011-07-18             0
# 3 2011-07-19             1
于 2012-04-17T16:58:31.510 に答える
1

You should be able to just use as.data.frame on the table() function to coerce the table results into a data frame. For instance:

test_data <- data.frame(date=c("March","April","April","May"),messageid=c(1,2,3,4),userid=c(55,33,1,56))
print(as.data.frame(table(test_data[1])))

Results in:

   Var1 Freq
1 April    2
2 March    1
3   May    1

To add in the dates that have zero messages, it strikes me that you could generate a vector of all dates that are applicable to your project (for instance, if the file covers an entire year), and then compare that vector to the data frame created by using the table statement. You just merge them together and subsequently assign 0 to those that are NA.

For instance:

months <- c("January","February","March","April","May","June")
full <- merge(counts,months,by=1,all=TRUE)

Obviously in this instance, the data frame is going to be weirdly ordered, but if you create a POSIX vector it shouldn't be an issue.

于 2012-04-17T15:02:23.860 に答える