ruby - Ruby: テキストファイルからランダムな行を選択するエレガントな方法は?

Question

私は Ruby の本当に美しい例をいくつか見てきましたが、ただ賞賛するのではなく、それらを生み出すことができるように自分の考えを変えようとしています。ファイルからランダムな行を選択するために私が思いつくことができる最高のものは次のとおりです。

def pick_random_line
  random_line = nil
  File.open("data.txt") do |file|
    file_lines = file.readlines()
    random_line = file_lines[Random.rand(0...file_lines.size())]
  end 

  random_line                                                                                                                                                               
end

ファイルの内容全体をメモリに保存することなく、より短く、よりエレガントな方法でこれを行うことが可能でなければならないと感じています。ある？

score 38 · Accepted Answer

Ruby Array クラスには、sample() というランダムエントリセレクタが既に組み込まれています。

def pick_random_line
  File.readlines("data.txt").sample
end

score 14 · Accepted Answer

最近読み取った行と、返されたランダム行の現在の候補以外は何も保存せずに実行できます。

def pick_random_line
  chosen_line = nil
  File.foreach("data.txt").each_with_index do |line, number|
    chosen_line = line if rand < 1.0/(number+1)
  end
  return chosen_line
end

したがって、最初の行は確率 1/1 = 1 で選択されます。2 番目の行は 1/2 の確率で選択されるため、半分の時間で最初の行が保持され、半分の時間で 2 番目の行に切り替わります。

次に、3 番目の行が 1/3 の確率で選択されます。つまり、1/3 の確率で選択され、残りの 2/3 の確率で、最初の 2 つのうち選択された方が保持されます。それぞれが 2 行目で 50% の確率で選ばれるため、3 行目で選ばれる確率はそれぞれ 1/3 になります。

等々。N 行目では、1 から N までのすべての行が 1/N の確率で選択される可能性があり、これはファイル全体で保持されます (ファイルが 1/( ファイル内の行数が 1/( ) はイプシロンより小さい :))。また、ファイルを 1 回だけ通過させ、一度に 2 行以上を保存することはありません。

編集このアルゴリズムでは本当に簡潔な解決策を得るつもりはありませんが、必要に応じてワンライナーに変えることができます:

def pick_random_line
  File.foreach("data.txt").each_with_index.reduce(nil) { |picked,pair| 
    rand < 1.0/(1+pair[1]) ? pair[0] : picked }
end

score 4 · Accepted Answer

この関数はまさに必要なことを行います。

ワンライナーではありません。ただし、任意のサイズのテキストファイルで動作します (ゼロサイズを除く、おそらく:)。

def random_line(filename)
  blocksize, line = 1024, ""
  File.open(filename) do |file|
    initial_position = rand(File.size(filename)-1)+1 # random pointer position. Not a line number!
    pos = Array.new(2).fill( initial_position ) # array [prev_position, current_position]
    # Find beginning of current line
    begin
      pos.push([pos[1]-blocksize, 0].max).shift # calc new position
      file.pos = pos[1] # move pointer backward within file
      offset = (n = file.read(pos[0] - pos[1]).rindex(/\n/) ) ? n+1 : nil
    end until pos[1] == 0 || offset
    file.pos = pos[1] + offset.to_i
    # Collect line text till the end
    begin
      data = file.read(blocksize)
      line.concat((p = data.index(/\n/)) ? data[0,p.to_i] : data)
    end until file.eof? or p
  end
  line
end

それを試してみてください：

filename = "huge_text_file.txt"
100.times { puts random_line(filename).force_encoding("UTF-8") }

無視できる（imho）欠点：

ラインが長いほど、選択される可能性が高くなります。
「\r」行区切りは考慮されません (Windows 固有)。Unix スタイルの行末を持つファイルを使用してください!

score 2 · Accepted Answer

これはあなたが思いついたものよりもはるかに優れているわけではありませんが、少なくともそれは短いです：

def pick_random_line
  lines = File.readlines("data.txt")
  lines[rand(lines.length)]
end

コードをよりルビーっぽくするためにできることの1つは、中括弧を省略することです。readlinesandsizeの代わりにreadlines()andを使用しsize()ます。

score 0 · Accepted Answer

ここでは、マークの優れた回答の短いバージョンですが、デイブほど短くはありません

def pick_random_line number=1, chosen_line=""
  File.foreach("data.txt") {|line| chosen_line = line if rand < 1.0/number+=1}
  chosen_line 
end

score 0 · Accepted Answer

ワンライナー：

def pick_random_line(file)
  `head -$((${RANDOM} % `wc -l < #{file}` + 1)) #{file} | tail -1`
end

Ruby ではないことに抗議する場合は、今年の Euruko でRuby は Banana とは異なりますというタイトルの講演を見つけてください。

PS: SO の不適切な構文強調表示は無視してください。

score -1 · Accepted Answer

ファイルを統計し、ゼロとファイルのサイズの間の乱数を選択し、ファイル内のそのバイトをシークします。次の改行までスキャンし、次の行を読み取って返します (ファイルの最後にいないと仮定します)。

ruby - Ruby: テキスト ファイルからランダムな行を選択するエレガントな方法は?

7 に答える 7

Related

Reference

ruby - Ruby: テキストファイルからランダムな行を選択するエレガントな方法は?