erlang - Erlang での io:fread の予期しない動作

Question

これはアーランの質問です。

io:fread による予期しない動作に遭遇しました。

io:fread の使い方に何か問題があるのか、io:fread にバグがあるのか、誰かが確認できるかどうか疑問に思っていました。

次のような「三角形の数字」を含むテキストファイルがあります。

59
73 41
52 40 09
26 53 06 34
10 51 87 86 81
61 95 66 57 25 68
90 81 80 38 92 67 73
30 28 51 76 81 18 75 44
...

数値の各ペアの間には 1 つのスペースがあり、各行はキャリッジリターンと改行のペアで終わります。

次の Erlang プログラムを使用して、このファイルをリストに読み込みます。

-モジュール(euler67)。
-作者('ケイル・スパンドン').

-export([解決/0])。

解決する() ->
    {ok, ファイル} = file:open("triangle.txt", [読み取り]),
    データ = read_file(ファイル)、
    ok = file:close(ファイル),
    データ。

read_file(ファイル) ->
    read_file(ファイル、[])。

read_file(ファイル、データ) ->
    case io:fread(File, "", "~d") の
        {わかりました、[N]} ->
            read_file(ファイル, [N | データ]);
        eof ->
            リスト:reverse(データ)
    終わり。

このプログラムの出力は次のとおりです。

(erlide@cayle-spandons-computer.local)30> euler67:solve().
[59,73,41,52,40,9,26,53,6,3410,51,87,86,8161,95,66,57,25,
 6890,81,80,38,92,67,7330,28,51,76,81|...]

4 行目の最後の数字 (34) と 5 行目の最初の数字 (10) が 1 つの数字 3410 にマージされていることに注意してください。

「od」を使用してテキストファイルをダンプすると、これらの行に特別なことは何もありません。それらは、他の行と同じように cr-nl で終わります:

> od -ta 三角形.txt
0000000 5 9 cr nl 7 3 sp 4 1 cr nl 5 2 sp 4 0
0000020 sp 0 9 cr nl 2 6 sp 5 3 sp 0 6 sp 3 4
0000040 cr nl 1 0 sp 5 1 sp 8 7 sp 8 6 sp 8 1
0000060 cr nl 6 1 sp 9 5 sp 6 6 sp 5 7 sp 2 5
0000100 sp 6 8 cr nl 9 0 sp 8 1 sp 8 0 sp 3 8
0000120 sp 9 2 sp 6 7 sp 7 3 cr nl 3 0 sp 2 8
0000140 sp 5 1 sp 7 6 sp 8 1 sp 1 8 sp 7 5 sp
0000160 4 4 cr nl 8 4 sp 1 4 sp 9 5 sp 8 7 sp

興味深い観察結果の 1 つは、問題が発生する数値の一部がたまたまテキストファイル内の 16 バイト境界にあることです (ただし、6890 など、すべてではありません)。

score 2 · Accepted Answer

erlang ライブラリの 1 つのバグのように見えるという事実に加えて、この問題は (非常に) 簡単に回避できると思います。

ファイルが行指向であるという事実を考えると、ベストプラクティスは行ごとに処理することだと思います。

次の構造を考えてみましょう。パッチが適用されていない erlang で問題なく動作し、遅延評価を使用するため、最初にすべてのファイルをメモリに読み込むことなく、任意の長さのファイルを処理できます。このモジュールには、各行に適用する関数の例が含まれています - 整数のテキスト表現の行を整数のリストに変換します。


-module(liner).
-author("Harro Verkouter").
-export([liner/2, integerize/0, lazyfile/1]).

% Applies a function to all lines of the file
% before reducing (foldl).
liner(File, Fun) ->
    lists:foldl(fun(X, Acc) -> Acc++Fun(X) end, [], lazyfile(File)).

% Reads the lines of a file in a lazy fashion
lazyfile(File) ->
    {ok, Fd} = file:open(File, [read]),
    lazylines(Fd).
% Actually, this one does the lazy read ;)
lazylines(Fd) ->
    case io:get_line(Fd, "") of
        eof -> file:close(Fd), [];
        {error, Reason} ->
            file:close(Fd), exit(Reason);
        L ->
            [L|lazylines(Fd)]
    end.

% Take a line of space separated integers (string) and transform
% them into a list of integers
integerize() ->
    fun(X) ->
        lists:map(fun(Y) -> list_to_integer(Y) end,
                string:tokens(X, " \n")) end.


Example usage:
Eshell V5.6.5  (abort with ^G)
1> c(liner).
{ok,liner}
2> liner:liner("triangle.txt", liner:integerize()).
[59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25,
 68,90,81,80,38,92,67,73,30|...]

And as a bonus, you can easily fold over the lines of any (lineoriented) file w/o running out of memory :)

6> lists:foldl( fun(X, Acc) -> 
6>                  io:format("~.2w: ~s", [Acc,X]), Acc+1
6>                  end,
6>              1,  
6>              liner:lazyfile("triangle.txt")).                                        
 1: 59
 2: 73 41
 3: 52 40 09
 4: 26 53 06 34
 5: 10 51 87 86 81
 6: 61 95 66 57 25 68
 7: 90 81 80 38 92 67 73
 8: 30 28 51 76 81 18 75 44

乾杯、h。

score 0 · Accepted Answer

2 つの数値がマージされているインスタンスが複数あることに気付きました。4 行目以降から始まるすべての行の行境界にあるようです。

5行目から始まる各行の先頭に空白文字を追加すると、次のようになることがわかりました。

59
73 41
52 40 09
26 53 06 34
 10 51 87 86 81
 61 95 66 57 25 68
 90 81 80 38 92 67 73
 30 28 51 76 81 18 75 44
...

数値は適切に解析されます。

39> euler67:solve().
[59,73,41,52,40,9,26,53,6,34,10,51,87,86,81,61,95,66,57,25,
 68,90,81,80,38,92,67,73,30|...]

最初の 4 行の先頭に空白を追加しても機能します。

これは実際の解決策というよりも回避策ですが、機能します。これを行う必要がないように、io:fread の書式文字列を設定する方法を見つけたいと思います。

更新ファイルの変更を強制しない回避策を次に示します。これは、すべての数字が 2 文字 (< 100) であることを前提としています。

read_file(File, Data) ->
case io:fread(File, "", "~d") of
    {ok, [N] } -> 
        if
            N > 100 ->
                First = N div 100,
                Second = N - (First * 100),
                read_file(File, [First , Second | Data]);

            true ->
                read_file(File, [N | Data])
        end;
    eof ->
        lists:reverse(Data)
end.

基本的に、このコードは、改行を挟んで 2 つを連結した数字をすべてキャッチし、それらを 2 つに分割します。

繰り返しますが、これは io:fread のバグの可能性を暗示する言い訳ですが、それで問題ないはずです。

UPDATE AGAIN上記は 2 桁の入力に対してのみ機能しますが、この例ではすべての桁 (10 未満のものも含む) を 2 桁の形式にパックしているため、この例では機能します。

erlang - Erlang での io:fread の予期しない動作

3 に答える 3

Related

Reference