r - 毎日の時系列を平均的な毎週に変換するにはどうすればよいですか？

Question

私は（算術的に）毎日のデータを平均して、毎日の時系列を毎週の時系列に変換したいと思っています。

このスレッドをたどる：Rを使用して列ごとの週次データの平均をどのように計算しますか？、xtsライブラリを使用しています。

# Averages daily time series into weekly time series
# where my source is a zoo object
source.w <- apply.weekly(source, colMeans)

私が抱えている問題は、火曜日から次の月曜日のデータまでのシリーズを平均化することです。

月曜日から金曜日までの毎日のデータを平均化するオプションを探しています。

ヒントはありますか？

ここにもう少しあります：

    # here is part of my data, from a "blé colza.txt" file


    24/07/2012  250.5   499
    23/07/2012  264.75  518.25
    20/07/2012  269.25  525.25
    19/07/2012  267 522.5
    18/07/2012  261.25  517
    17/07/2012  265.75  522.25
    16/07/2012  264.25  523.25
    13/07/2012  258.25  517
    12/07/2012  253.75  513
    11/07/2012  246.25  512.75
    10/07/2012  248 515
    09/07/2012  247 519.25
    06/07/2012  243.25  508.25
    05/07/2012  245 508.5
    04/07/2012  236 500.5
    03/07/2012  234 497.75
    02/07/2012  234.25  489.75
    29/06/2012  229 490.25
    28/06/2012  229.75  487.25
    27/06/2012  229.75  493
    26/06/2012  226.5   486
    25/06/2012  220 482.25
    22/06/2012  214.25  472.5
    21/06/2012  212 469.5
    20/06/2012  210.25  473.75
    19/06/2012  208 472.75
    18/06/2012  206.75  462.5
    15/06/2012  203 456.5
    14/06/2012  205.25  460.5
    13/06/2012  205.25  465.25
    12/06/2012  205.25  469
    11/06/2012  208 471.5
    08/06/2012  208 468.5
    07/06/2012  208 471.25
    06/06/2012  208 467
    05/06/2012  208 458.75
    04/06/2012  208 457.5
    01/06/2012  208 463.5
    31/05/2012  208 466.75
    30/05/2012  208 468
    29/05/2012  212.75  469.75
    28/05/2012  212.75  469.75
    25/05/2012  212.75  465.5



# Loads external libraries
library("zoo") # or require("zoo")
library("xts") # or require("xts")

# Loads data as a zoo object
source <- read.zoo("blé colza.txt", sep=",", dec=".", header=T, na.strings="NA",     format="%d/%m/%Y")

# Averages daily time series into weekly time series
# https://stackoverflow.com/questions/11129562/how-does-one-compute-the-mean-of-weekly-    data-by-column-using-r
source.w <- apply.weekly(source, colMeans)

score 6 · Accepted Answer

mrdwabの答えは、OPとタイムゾーン（またはその特性）を共有しているため、たまたまうまくいきます。説明する：

Lines <- 
    "24/07/2012  250.5   499
    23/07/2012  264.75  518.25
    20/07/2012  269.25  525.25
    19/07/2012  267 522.5
    18/07/2012  261.25  517
    17/07/2012  265.75  522.25
    16/07/2012  264.25  523.25
    13/07/2012  258.25  517
    12/07/2012  253.75  513
    11/07/2012  246.25  512.75
    10/07/2012  248 515
    09/07/2012  247 519.25
    06/07/2012  243.25  508.25
    05/07/2012  245 508.5
    04/07/2012  236 500.5
    03/07/2012  234 497.75
    02/07/2012  234.25  489.75
    29/06/2012  229 490.25
    28/06/2012  229.75  487.25
    27/06/2012  229.75  493
    26/06/2012  226.5   486
    25/06/2012  220 482.25
    22/06/2012  214.25  472.5
    21/06/2012  212 469.5
    20/06/2012  210.25  473.75
    19/06/2012  208 472.75
    18/06/2012  206.75  462.5
    15/06/2012  203 456.5
    14/06/2012  205.25  460.5
    13/06/2012  205.25  465.25
    12/06/2012  205.25  469
    11/06/2012  208 471.5
    08/06/2012  208 468.5
    07/06/2012  208 471.25
    06/06/2012  208 467
    05/06/2012  208 458.75
    04/06/2012  208 457.5
    01/06/2012  208 463.5
    31/05/2012  208 466.75
    30/05/2012  208 468
    29/05/2012  212.75  469.75
    28/05/2012  212.75  469.75
    25/05/2012  212.75  465.5"

# Get R's timezone information (from ?Sys.timezone)
tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")
tzones <- read.delim(tzfile, row.names = NULL, header = FALSE,
  col.names = c("country", "coords", "name", "comments"),
  as.is = TRUE, fill = TRUE, comment.char = "#")

# Run the analysis on each timezone
out <- list()
library(xts)
for(i in seq_along(tzones$name)) {
  tzn <- tzones$name[i]
  Sys.setenv(TZ=tzn)
  con <- textConnection(Lines)
  Source <- read.zoo(con, format="%d/%m/%Y")
  out[[tzn]] <- apply.weekly(Source, colMeans)
}

head(out,5)これで実行して、使用するタイムゾーンによって出力の一部が異なることを確認できます。

head(out,5)
$`Europe/Andorra`
               V2      V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000

$`Asia/Dubai`
               V2      V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000

$`Asia/Kabul`
               V2      V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000

$`America/Antigua`
                V2      V3
2012-05-25 212.750 465.500
2012-06-01 209.900 467.550
2012-06-08 208.000 464.600
2012-06-15 205.350 464.550
2012-06-22 210.250 470.200
2012-06-29 227.000 487.750
2012-07-06 238.500 500.950
2012-07-13 250.650 515.400
2012-07-20 265.500 522.050
2012-07-24 257.625 508.625

$`America/Anguilla`
                V2      V3
2012-05-25 212.750 465.500
2012-06-01 209.900 467.550
2012-06-08 208.000 464.600
2012-06-15 205.350 464.550
2012-06-22 210.250 470.200
2012-06-29 227.000 487.750
2012-07-06 238.500 500.950
2012-07-13 250.650 515.400
2012-07-20 265.500 522.050
2012-07-24 257.625 508.625

Sys.setenv(TZ="<yourTZ>")より堅牢な解決策は、を使用してグローバルに設定するかindexTZ(Source) <- "<yourTZ>"、個々のオブジェクトごとに設定することにより、タイムゾーンが正しく表現されるようにすることです。

score 3 · Accepted Answer

私はあなたの問題を再現することができましたperiod.apply()。カスタムの「エンドポイント」を使用して解決できます。

まず、あなたが提供したデータを、他の人が簡単に読める形式で。

temp = structure(list(V1 = structure(c(33L, 32L, 29L, 27L, 25L, 23L, 
22L, 19L, 17L, 15L, 13L, 12L, 9L, 7L, 5L, 3L, 2L, 41L, 39L, 37L, 
36L, 35L, 31L, 30L, 28L, 26L, 24L, 21L, 20L, 18L, 16L, 14L, 11L, 
10L, 8L, 6L, 4L, 1L, 43L, 42L, 40L, 38L, 34L), .Label = c("01/06/2012", 
"02/07/2012", "03/07/2012", "04/06/2012", "04/07/2012", "05/06/2012", 
"05/07/2012", "06/06/2012", "06/07/2012", "07/06/2012", "08/06/2012", 
"09/07/2012", "10/07/2012", "11/06/2012", "11/07/2012", "12/06/2012", 
"12/07/2012", "13/06/2012", "13/07/2012", "14/06/2012", "15/06/2012", 
"16/07/2012", "17/07/2012", "18/06/2012", "18/07/2012", "19/06/2012", 
"19/07/2012", "20/06/2012", "20/07/2012", "21/06/2012", "22/06/2012", 
"23/07/2012", "24/07/2012", "25/05/2012", "25/06/2012", "26/06/2012", 
"27/06/2012", "28/05/2012", "28/06/2012", "29/05/2012", "29/06/2012", 
"30/05/2012", "31/05/2012"), class = "factor"), V2 = c(250.5, 
264.75, 269.25, 267, 261.25, 265.75, 264.25, 258.25, 253.75, 
246.25, 248, 247, 243.25, 245, 236, 234, 234.25, 229, 229.75, 
229.75, 226.5, 220, 214.25, 212, 210.25, 208, 206.75, 203, 205.25, 
205.25, 205.25, 208, 208, 208, 208, 208, 208, 208, 208, 208, 
212.75, 212.75, 212.75), V3 = c(499, 518.25, 525.25, 522.5, 517, 
522.25, 523.25, 517, 513, 512.75, 515, 519.25, 508.25, 508.5, 
500.5, 497.75, 489.75, 490.25, 487.25, 493, 486, 482.25, 472.5, 
469.5, 473.75, 472.75, 462.5, 456.5, 460.5, 465.25, 469, 471.5, 
468.5, 471.25, 467, 458.75, 457.5, 463.5, 466.75, 468, 469.75, 
469.75, 465.5)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
-43L))

クリーンアップして、オブジェクトをオブジェクトに変換しxtsます。

temp$V1 = as.Date(temp$V1, format="%d/%m/%Y")
library(xts)
temp.x = xts(temp[-1], order.by=temp$V1)

今。関数を試してみapply.weekly()ますが、あなたが望むものは得られません。

apply.weekly(temp.x, colMeans)
#                V2      V3
# 2012-05-28 212.75 467.625
# 2012-06-04 208.95 465.100
# 2012-06-11 208.00 467.400
# 2012-06-18 205.10 462.750
# 2012-06-25 212.90 474.150
# 2012-07-02 229.85 489.250
# 2012-07-09 241.05 506.850
# 2012-07-16 254.10 516.200
# 2012-07-23 265.60 521.050
# 2012-07-24 250.50 499.000

使用period.apply()するには、生理の終点を指定する必要があります (不規則な場合があります)。ここでは、最初の期間は最初の日付であり、そこから 5 日ごとになります。数日残っているのでnrow(temp.x)、最終期間の終わりとして追加します。

ep = c(0, seq(1, nrow(temp.x), by = 5), nrow(temp.x))
period.apply(temp.x, INDEX = ep, FUN = colMeans)
#                 V2      V3
# 2012-05-25 212.750 465.500
# 2012-06-01 209.900 467.550
# 2012-06-08 208.000 464.600
# 2012-06-15 205.350 464.550
# 2012-06-22 210.250 470.200
# 2012-06-29 227.000 487.750
# 2012-07-06 238.500 500.950
# 2012-07-13 250.650 515.400
# 2012-07-20 265.500 522.050
# 2012-07-24 257.625 508.625

score 2 · Accepted Answer

私はあなたの例を実行しました。問題を正しく理解していれば、apply.weekly関数は最初の金曜日とデータの最初の月曜日を集計しています。私はパッケージを使用しないxtsので、他の誰かがそれについてより多くの洞察を提供する必要があります. 日付を日付のベクトルに変換し、各週の月曜日の日付がその週のすべての観測を表します。?strptime変換に使用するコードを要約します。

# Get the year of the first observation
start_year <- format(time(source)[1],"%Y")
# Convert this into a date for the 1st of Jan in that year.
start_date <- as.Date(strptime(paste(start_year, "1 1"), "%Y %d %m"))

# Using the difftime function determine the distance (days) since the first day of the first year.
jul_day <- as.numeric(difftime(time(source),start_date),units="days")
# Get the date of the Monday before each observation and add it to the start of the year. 
mondays <- start_date + (jul_day - (jul_day-1)%%7)
# the %% calculates the remainder.
# to check that it has worked convert the mondays vector into day names.
format(mondays, "%A")

# And now you can aggregate the observations using the mondays vector.
source.w <- aggregate(source[,1:2], mondays, "mean")

score 1 · Accepted Answer

目の前の問題をもう一度見てみましょう。

xtsライブラリを使用すると簡単です。

# say you have xts object name 'dat'
ep <- endpoints(dat, on = 'weeks')                          # 
period.apply(x = dat, INDEX = ep, FUN = mean)

score 1 · Accepted Answer

Joshua Ulrichの回答のフォローアップ。

私のシステム (kUbuntu 12) では、以下は zone.tab ファイルを取得しませんでした

tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")

しかし、私は zone.tab を見つけることができました

locate zone.tab

なんらかの理由 (おそらくファイルのアクセス許可) で、その zone.tab ファイルを直接指すことができませんでした。つまり、次のように記述します。

tzfile <- "usr/share/zoneinfo/zone.tab"

戻ってきた：

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'usr/share/zoneinfo/zone.tab': No such file or directory

zone.tab のローカルコピーを作成し、そのコピーを指定すると問題が解決しました。

tzfile <- "~/R/zone.tab"

これで、zone.tab を Google で検索すると、システムにない場合や破損している場合などに備えて、zone.tab のコピーがオンラインで見つかります。ここにそのような場所があります：

http://www.ietf.org/timezones/data/zone.tab

PS 私は 15 歳未満なので、コメントを投稿することはできません。

r - 毎日の時系列を平均的な毎週に変換するにはどうすればよいですか？

5 に答える 5

Related

Reference