c++ - What is a good way to round double-precision values to a (somewhat) lower precision?

Question

My problem is that I have to use a thrid-party function/algorithm which takes an array of double-precision values as input, but apparently can be sensitive to very small changes in the input data. However for my application I have to get identical results for inputs that are (almost) identical! In particular I have two test input arrays which are identical up to the 5-th position after the decimal point and still I get different results. So what causes the "problem" must be after the 5-th position after the decimal point.

Now my idea was to round the input to a slightly lower precision in order to get identical results from inputs that are very similar, yet not 100% identical. Therefore I am looking for a good/efficient way to round double-precision values to a slightly lower precision. So far I am using this code to round to the 9-th position after the decimal point:

double x = original_input();
x = double(qRound(x * 1000000000.0)) / 1000000000.0;

Here qRound() is the normal double to integer rounding function from Qt. This code works and it indeed resolved my problem with the two "problematic" test sets. But: Is there a more efficient way to this?

Also what bothers me: Rounding to the 9-th position after the decimal point might be reasonable for input data that is in the -100.0 to 100.0 range (as is the case with my current input data). But it may be too much (i,e, too much precision loss) for input data in the -0.001 to 0.001 range, for example. Unfortunately I don't know in what range my input values will be in other cases...

After all, I think what I would need is something like a function which does the following: Cut off, by proper rounding, a given double-precision value X to at most L-N positions after the decimal point, where L is the number of positions after the decimal point that double-precision can store (represent) for the given value; and N is fixed, like 3. It means that for "small" values we would allow more positions after the decimal point than for "large" values. In other words I would like to round the 64-Bit floating-point value to a (somewhat) smaller precision like 60-Bit or 56-Bit and then store it back to a 64-Bit double value.

Does this make sense to you? And if so, can you suggest a way to do this (efficiently) in C++ ???

Thanks in advance!

score 1 · Accepted Answer

ビジネスシナリオは、質問から明らかではありません。それでも、値が許容範囲内にあることを確認しようとしているように感じます。== ではなく、2 番目の値が特定の % 範囲 (+/- 0.001% など) 内にあるかどうかを確認できます。

範囲のパーセンテージを固定できない場合 (つまり、精度の長さによって異なります。たとえば、小数点以下 2 桁の場合は 0.001 パーセントで十分ですが、小数点以下 4 桁の場合は 0.000001 パーセントが必要です)、1/仮数で求めることができます。

score 1 · Accepted Answer

double bit layoutを見ると、これを少しのビット単位の魔法と組み合わせて、任意の精度への高速 (バイナリ) 丸めを実装する方法がわかります。次のビットレイアウトがあります。

SEEEEEEEEEEEFFFFFFFFFFF.......FFFFFFFFFF

ここSで、は符号ビット、Es は指数ビット、Fs は小数ビットです。次のようなビットマスクを作成できます。

11111111111111111111111.......1111000000

and bitwise-and ( &) 2 つを一緒にします。結果は、元の入力の丸められたバージョンです。

SEEEEEEEEEEEFFFFFFFFFFF.......FFFF000000

また、末尾のゼロの数を変更することで、切り捨てられるデータの量を制御できます。ゼロが多い = 丸めが多い。少ない=少ない。また、必要な他の効果も得られます。各ビットが対応する「場所」は指数によって決定されるため、小さな入力値は大きな入力値よりも比例して影響を受けます。

それが役立つことを願っています!

警告: これは技術的には真の丸めではなく切り捨てです (他の可能な結果にどれだけ近いかに関係なく、すべての値がゼロに近くなります)、うまくいけば、あなたのケースでも同じように役立ちます.

score 1 · Accepted Answer

これまでの入力に感謝します。

しかし、さらに検索した後、 frexp() および ldexp() 関数に出くわしました! これらの関数は、指定されたdouble値の「仮数」と「指数」へのアクセスを提供し、仮数+指数からdoubleに戻すこともできます。ここで、仮数を丸める必要があります。

double value = original_input();
static const double FACTOR = 32.0;
int exponent;
double temp = double(round(frexp(value, &exponent) * FACTOR));
value = ldexp(temp / FACTOR, exponent);

これがまったく効率的かどうかはわかりませんが、妥当な結果が得られます。

0.000010000000000   0.000009765625000
0.000010100000000   0.000010375976563
0.000010200000000   0.000010375976563
0.000010300000000   0.000010375976563
0.000010400000000   0.000010375976563
0.000010500000000   0.000010375976563
0.000010600000000   0.000010375976563
0.000010700000000   0.000010986328125
0.000010800000000   0.000010986328125
0.000010900000000   0.000010986328125
0.000011000000000   0.000010986328125
0.000011100000000   0.000010986328125
0.000011200000000   0.000010986328125
0.000011300000000   0.000011596679688
0.000011400000000   0.000011596679688
0.000011500000000   0.000011596679688
0.000011600000000   0.000011596679688
0.000011700000000   0.000011596679688
0.000011800000000   0.000011596679688
0.000011900000000   0.000011596679688
0.000012000000000   0.000012207031250
0.000012100000000   0.000012207031250
0.000012200000000   0.000012207031250
0.000012300000000   0.000012207031250
0.000012400000000   0.000012207031250
0.000012500000000   0.000012207031250
0.000012600000000   0.000012817382813
0.000012700000000   0.000012817382813
0.000012800000000   0.000012817382813
0.000012900000000   0.000012817382813
0.000013000000000   0.000012817382813
0.000013100000000   0.000012817382813
0.000013200000000   0.000013427734375
0.000013300000000   0.000013427734375
0.000013400000000   0.000013427734375
0.000013500000000   0.000013427734375
0.000013600000000   0.000013427734375
0.000013700000000   0.000013427734375
0.000013800000000   0.000014038085938
0.000013900000000   0.000014038085938
0.000014000000000   0.000014038085938
0.000014100000000   0.000014038085938
0.000014200000000   0.000014038085938
0.000014300000000   0.000014038085938
0.000014400000000   0.000014648437500
0.000014500000000   0.000014648437500
0.000014600000000   0.000014648437500
0.000014700000000   0.000014648437500
0.000014800000000   0.000014648437500
0.000014900000000   0.000014648437500
0.000015000000000   0.000015258789063
0.000015100000000   0.000015258789063
0.000015200000000   0.000015258789063
0.000015300000000   0.000015869140625
0.000015400000000   0.000015869140625
0.000015500000000   0.000015869140625
0.000015600000000   0.000015869140625
0.000015700000000   0.000015869140625
0.000015800000000   0.000015869140625
0.000015900000000   0.000015869140625
0.000016000000000   0.000015869140625
0.000016100000000   0.000015869140625
0.000016200000000   0.000015869140625
0.000016300000000   0.000015869140625
0.000016400000000   0.000015869140625
0.000016500000000   0.000017089843750
0.000016600000000   0.000017089843750
0.000016700000000   0.000017089843750
0.000016800000000   0.000017089843750
0.000016900000000   0.000017089843750
0.000017000000000   0.000017089843750
0.000017100000000   0.000017089843750
0.000017200000000   0.000017089843750
0.000017300000000   0.000017089843750
0.000017400000000   0.000017089843750
0.000017500000000   0.000017089843750
0.000017600000000   0.000017089843750
0.000017700000000   0.000017089843750
0.000017800000000   0.000018310546875
0.000017900000000   0.000018310546875
0.000018000000000   0.000018310546875
0.000018100000000   0.000018310546875
0.000018200000000   0.000018310546875
0.000018300000000   0.000018310546875
0.000018400000000   0.000018310546875
0.000018500000000   0.000018310546875
0.000018600000000   0.000018310546875
0.000018700000000   0.000018310546875
0.000018800000000   0.000018310546875
0.000018900000000   0.000018310546875
0.000019000000000   0.000019531250000
0.000019100000000   0.000019531250000
0.000019200000000   0.000019531250000
0.000019300000000   0.000019531250000
0.000019400000000   0.000019531250000
0.000019500000000   0.000019531250000
0.000019600000000   0.000019531250000
0.000019700000000   0.000019531250000
0.000019800000000   0.000019531250000
0.000019900000000   0.000019531250000
0.000020000000000   0.000019531250000
0.000020100000000   0.000019531250000

結局、私が探していたものが気に入ったようです：

http://img833.imageshack.us/img833/9055/clipboard09.png

今、私は自分の関数に適した FACTOR 値を見つける必要があります....

コメントや提案はありますか？

score 0 · Accepted Answer

doubleこの質問はかなり古いことは知っていますが、値をより低い精度に丸める方法も探しました。たぶん、この答えは誰かを助けるでしょう。

バイナリ表現の浮動小数点数を想像してください。たとえば1101.101。ビット1101は数値の整数部を表し、左から右2^3に、2^2、2^1で重み付けされます。2^0小数部分のビット101は、、で重み付けされ2^-1、2^-2、、に2^-3等しくなります。1/21/41/8

では、小数点以下 2 ビットを切り捨てたときに生成される 10 進数エラーとは何でしょう? この例では0.125、ビットが設定されているためです。ビットが設定されない場合、エラーは0. したがって、エラーは<= 0.125.

ここで、より一般的な方法で考えてみましょう。仮数が無限に長い場合、小数部は 1 に収束します (こちらを参照)。実際、あなたは 52 ビットしか持っていないので (こちらを参照)、合計は「ほぼ」1<= 1です。（整数部分も仮数スペースを占有することに注意してください！ただし、バイナリ表現のような数値を想定すると1.5、1.1仮数は小数点以下の部分のみを格納します。）

小数ビットをすべて<= 1切り取るとのエラーが発生するため、小数点の右側の最初のビット以外をすべて切り取るとのエラーが発生します。これは<= 1/2、このビットがで重み付けされているため2^-1です。さらにビットを保持すると、エラーがに減少します<= 1/4。

これは関数によって記述できますf(x) = 1/2^(52-x)。ここxで、は右側からカウントされたカットオフビットの数でy = f(x)あり、結果のエラーの上限です。

小数点以下 2 桁で丸めるということは、数値を 100 分の 1 で「グループ化」することを意味します。これは、上記の関数で実行できます: 1/100 >= 1/2^(52-x). これは、x ビットを切り取ると、結果のエラーが 100 分の 1 に制限されることを意味します。この不等式を x で解くと、次52-log2(100) >= xのようになり52-log2(100)ます45.36。これは、ビットを超えないようにカットすることで45、浮動小数点の後の 2 つの小数点 (!) 桁の「精度」が保証されることを意味します。

一般に、仮数は整数部分と小数部分で構成されます。iそれらの長さをとと呼びましょうf。正の指数はを表しiます。また52=f+i、保持します。上記の不等式の解決策は次のように変わり52-i-log2(10^n) >= xます。小数部分が終わった後、仮数を切り捨てるのをやめなければならないからです! (nは小数精度です。)

対数規則を適用すると、次のようにカットオフできる最大ビット数を計算できます。

x = f - (uint16_t) ceil(n / 0.3010299956639812);ここで、定数はを表しlog10(2)ます。切り捨ては、次の方法で実行できます。

mantissa >>= x; mantissa <<= x;

xより大きい場合はf、だけシフトすることを忘れないでfください。そうしないと、仮数の整数部分に影響を与えます。

c++ - What is a good way to round double-precision values to a (somewhat) lower precision?

4 に答える 4

Related

Reference