erlang - ファイルログでユニークユーザーを数える方法は？

Question

次の形式のtxtログファイルを指定します。

USER_A timestamp1 otherstuff
USER_B timestamp2 otherstuff
USER_C timestamp3 otherstuff
USER_A timestamp4 otherstuff
USER_A timestamp5 otherstuff
USER_C timestamp6 otherstuff
USER_B timestamp7 otherstuff

erlangのさまざまなユニークユーザーの数をどのように数えますか？私はファイルを1行ずつ読み取り、proplistsモジュールを使用することを考えていました。各ユーザーがキーになり、値は発生回数になります。ファイルが読み取られたら、次のように呼び出します。

length(proplists:get_keys(List)).

これは私の結果を達成するための正しい方法ですか？

score 4 · Accepted Answer

また、これにはsetsモジュールを使用します。これは、高速であり、セットに重複が含まれていないためです。

次のコードがその役割を果たします。

{ok,Bin} = file:read_file("test"),
List = binary_to_list(Bin),
Usernames = [hd(string:tokens(X," ")) || X <- string:tokens(List,[$\n])],
sets:size(sets:from_list(Usernames)).

編集：値が追加されなかったため、ワンライナーを削除しました

score 3 · Accepted Answer

setsモジュールのセットを使用してユーザー名を保存してから、を使用する方がおそらく適切でしょうsets:size/1。

score 1 · Accepted Answer

ログファイルは大きいことが多いので、再帰関数で一度に1行ずつ操作することを検討してください。

% Count the number of distinct users in the file named Filename                        
count_users(Filename) ->
    {ok, File} = file:open(Filename, [read, raw, read_ahead]),
    Usernames = usernames(File, sets:new()),
    file:close(File),
    sets:size(Usernames).

% Add all users in File, from the current file pointer position and forward,
% to Set.
% Side-effects: File is read and the file pointer is moved to the end.          
usernames(File, Set) ->
    case file:read_line(File) of
        {ok, Line} ->
            Username = hd(string:tokens(Line, " ")),
            usernames(File, sets:add_element(Username, Set));
        eof ->
            Set
    end.

You just call it like this: count_users("logfile").

Note that usernames/2 must be tail recursive for this to work efficiently. Otherwise it would just consume even more memory.

erlang - ファイルログでユニークユーザーを数える方法は？

3 に答える 3

Related

Reference