r - How R formats POSIXct with fractional seconds

Question

I believe that R incorrectly formats POSIXct types with fractional seconds. I submitted this via R-bugs as an enhancement request and got brushed off with "we think the current behavior is correct -- bug deleted." While I am very appreciative of the work they have done and continue to do, I wanted to get other peoples' take on this particular issue, and perhaps advice on how to make the point more effectively.

Here is an example:

 > tt <- as.POSIXct('2011-10-11 07:49:36.3')
 > strftime(tt,'%Y-%m-%d %H:%M:%OS1')
 [1] "2011-10-11 07:49:36.2"

That is, tt is created as a POSIXct time with fractional part .3 seconds. When it is printed with one decimal digit, the value shown is .2. I work a lot with timestamps of millisecond precision and it causes me a lot of headaches that times are often printed one notch lower than the actual value.

Here is what is happening: POSIXct is a floating-point number of seconds since the epoch. All integer values are handled precisely, but in base-2 floating point, the closest value to .3 is very slightly smaller than .3. The stated behavior of strftime() for format %OSn is to round down to the requested number of decimal digits, so the displayed result is .2. For other fractional parts the floating point value is slightly above the value entered and the display gives the expected result:

 > tt <- as.POSIXct('2011-10-11 07:49:36.4')
 > strftime(tt,'%Y-%m-%d %H:%M:%OS1')
 [1] "2011-10-11 07:49:36.4"

The developers' argument is that for time types we should always round down to the requested precision. For example, if the time is 11:59:59.8 then printing it with format %H:%M should give "11:59" not "12:00", and %H:%M:%S should give "11:59:59" not "12:00:00". I agree with this for integer numbers of seconds and for format flag %S, but I think the behavior should be different for format flags that are designed for fractional parts of seconds. I would like to see %OSn use round-to-nearest behavior even for n = 0 while %S uses round-down, so that printing 11:59:59.8 with format %H:%M:%OS0 would give "12:00:00". This would not affect anything for integer numbers of seconds because those are always represented precisely, but it would more naturally handle round-off errors for fractional seconds.

This is how printing of fractional parts is handled in, for example C, because integer casting rounds down:

 double x = 9.97;
 printf("%d\n",(int) x);   //  9
 printf("%.0f\n",x);       //  10
 printf("%.1f\n",x);       //  10.0
 printf("%.2f\n",x);       //  9.97

I did a quick survey of how fractional seconds are handled in other languages and environments, and there really doens't seem to be a consensus. Most constructs are designed for integer numbers of seconds and the fractional parts are an afterthought. It seems to me that in this case the R developers made a choice that is not completely unreasonable but is in fact not the best one, and is not consistent with the conventions elsewhere for displaying floating-point numbers.

What are peoples' thoughts? Is the R behavior correct? Is it the way you yourself would design it?

score 36 · Accepted Answer

根本的な問題の 1 つは、POSIXct 表現が POSIXlt 表現よりも正確でなく、POSIXct 表現がフォーマット前に POSIXlt 表現に変換されることです。以下では、文字列を POSIXlt 表現に直接変換すると、正しく出力されることがわかります。

> as.POSIXct('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.2 CDT"
> as.POSIXlt('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.3"

また、2 つの形式のバイナリ表現と 0.3 の通常の表現の違いを見ることでもわかります。

> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> as.numeric(t1 - round(unclass(t1))) - 0.3
[1] -4.768372e-08

> t2 <- as.POSIXlt('2011-10-11 07:49:36.3')
> as.numeric(t2$sec - round(unclass(t2$sec))) - 0.3
[1] -2.831069e-15

興味深いことに、どちらの表現も実際には通常の表現である 0.3 よりも小さいように見えますが、2 番目の表現は十分に近いか、ここで想像しているのとは異なる方法で切り詰められています。そのため、浮動小数点表現の問題について心配するつもりはありません。それらはまだ発生する可能性がありますが、使用する表現に注意すれば、うまくいけば最小限に抑えられます。

丸められた出力に対するロバートの欲求は、単に出力の問題であり、さまざまな方法で対処できます。私の提案は次のようなものです。

myformat.POSIXct <- function(x, digits=0) {
  x2 <- round(unclass(x), digits)
  attributes(x2) <- attributes(x)
  x <- as.POSIXlt(x2)
  x$sec <- round(x$sec, digits)
  format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep=""))
}

これは POSIXct 入力で始まり、最初に目的の桁に丸められます。その後、POSIXlt に変換され、再び丸められます。最初の丸めは、分/時間/日の境界にいるときにすべての単位が適切に増加することを確認します。2 回目の丸めは、より正確な表現に変換した後に丸めます。

> options(digits.secs=1)
> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> format(t1)
[1] "2011-10-11 07:49:36.2"
> myformat.POSIXct(t1,1)
[1] "2011-10-11 07:49:36.3"

> t2 <- as.POSIXct('2011-10-11 23:59:59.999')
> format(t2)
[1] "2011-10-11 23:59:59.9"
> myformat.POSIXct(t2,0)
[1] "2011-10-12 00:00:00"
> myformat.POSIXct(t2,1)
[1] "2011-10-12 00:00:00.0"

最後に余談ですが、標準ではうるう秒を 2 秒まで許容していることをご存知でしたか?

> as.POSIXlt('2011-10-11 23:59:60.9')
[1] "2011-10-11 23:59:60.9"

では、もう 1 つ。OP によって報告されたバグ ( Bug 14579 )により、実際には 5 月に動作が変更されました。それ以前は、小数秒を丸めていました。残念ながら、これは、不可能な秒に切り上げられる場合があることを意味していました。バグレポートでは、次の分にロールオーバーする必要があるときに 60 まで上昇しました。丸めではなく切り捨てが決定された理由の 1 つは、各単位が個別に格納されている POSIXlt 表現から印刷されていることです。したがって、次の分/時間/などへのロールオーバーは、単純な丸め操作よりも困難です。簡単に丸めるには、私が提案するように、POSIXct 表現で丸めてから元に戻す必要があります。

score 20 · Accepted Answer

私はこの問題に遭遇したので、解決策を探し始めました。@Aaronの答えは良いですが、大きな日付ではまだ壊れています。

formatまたはに従って、秒を適切に丸めるコードを次に示しますoption("digits.secs")。

form <- function(x, format = "", tz= "", ...) {
  # From format.POSIXct
  if (!inherits(x, "POSIXct")) 
    stop("wrong class")
  if (missing(tz) && !is.null(tzone <- attr(x, "tzone"))) 
    tz <- tzone

  # Find the number of digits required based on the format string
  if (length(format) > 1)
    stop("length(format) > 1 not supported")

  m <- gregexpr("%OS[[:digit:]]?", format)[[1]]
  l <- attr(m, "match.length")
  if (l == 4) {
    d <- as.integer(substring(format, l+m-1, l+m-1))
  } else {
    d <- unlist(options("digits.secs"))
    if (is.null(d)) {
      d <- 0
    }
  }  


  secs.since.origin <- unclass(x)            # Seconds since origin
  secs <- round(secs.since.origin %% 60, d)  # Seconds within the minute
  mins <- floor(secs.since.origin / 60)      # Minutes since origin
  # Fix up overflow on seconds
  if (secs >= 60) {
    secs <- secs - 60
    mins <- mins + 1
  }

  # Represents the prior minute
  lt <- as.POSIXlt(60 * mins, tz=tz, origin=ISOdatetime(1970,1,1,0,0,0,tz="GMT"));
  lt$sec <- secs + 10^(-d-1)  # Add in the seconds, plus a fudge factor.
  format.POSIXlt(as.POSIXlt(lt), format, ...)
}

10^(-d-1) のファッジファクターは次のとおりです: Aaron によるミリ秒未満の datetimes を持つ character->POSIXct->character から正確に変換します。

いくつかの例：

f  <- "%Y-%m-%d %H:%M:%OS"
f3 <- "%Y-%m-%d %H:%M:%OS3"
f6 <- "%Y-%m-%d %H:%M:%OS6"

ほぼ同じ質問から：

x <- as.POSIXct("2012-12-14 15:42:04.577895")

> format(x, f6)
[1] "2012-12-14 15:42:04.577894"
> form(x, f6)
[1] "2012-12-14 15:42:04.577895"
> myformat.POSIXct(x, 6)
[1] "2012-12-14 15:42:04.577895"

上から：

> format(t1)
[1] "2011-10-11 07:49:36.2"
> myformat.POSIXct(t1,1)
[1] "2011-10-11 07:49:36.3"
> form(t1)
[1] "2011-10-11 07:49:36.3"

> format(t2)
[1] "2011-10-11 23:59:59.9"
> myformat.POSIXct(t2,0)
[1] "2011-10-12 00:00:00"
> myformat.POSIXct(t2,1)
[1] "2011-10-12 00:00:00.0"

> form(t2)
[1] "2011-10-12"
> form(t2, f)
[1] "2011-10-12 00:00:00.0"

本当の楽しみは、いくつかの日付で 2038 年に訪れます。これは、仮数部の精度が 1 ビット失われるためだと思います。seconds フィールドの値に注意してください。

> t3 <- as.POSIXct('2038-12-14 15:42:04.577895')
> format(t3)
[1] "2038-12-14 15:42:05.5"
> myformat.POSIXct(t3, 1)
[1] "2038-12-14 15:42:05.6"
> form(t3)
[1] "2038-12-14 15:42:04.6"

このコードは、私が試した他のエッジケースでも機能するようです。アーロンの答えとの間の共通点は、秒フィールドをそのままにしてからからformat.POSIXctへの変換です。myformat.POSIXctPOSIXctPOSIXlt

これは、その変換のバグを示しています。で利用できないデータは使用していませんas.POSIXlt()。

アップデート

バグはsrc/main/datetime.c:434静的関数localtime0にありますが、正しい修正はまだわかりません:

433 ～ 434 行目:

day = (int) floor(d/86400.0);
left = (int) (d - day * 86400.0 + 0.5);

0.5値を丸めるための余分なものが原因です。上記のサブ秒の値t3が .5 を超えていることに注意してください。 localtime0秒のみを扱い、localtime0戻り値の後にサブ秒が追加されます。

localtime0提示された double が整数値の場合、正しい結果が返されます。

r - How R formats POSIXct with fractional seconds

2 に答える 2

Related

Reference