ruby - Ruby でファイル内の文字列と配列を操作する

Question

8K 以上の英単語を含むテキストファイル ("dict.txt") があります。

apple -- description text
angry -- description text
bear -- description text
...

ファイルの各行の「--」の後のすべてのテキストを削除する必要があります。

この問題を解決する最も簡単で最速の方法は何ですか?

score 1 · Accepted Answer

で始まります：

words = [
  'apple -- description text',
  'angry -- description text',
  'bear -- description text',
]

前の単語だけが必要な場合--：

words.map{ |w| w.split(/\s-+\s/).first }  # => ["apple", "angry", "bear"]

または：

words.map{ |w| w[/^(.+) --/, 1] } # => ["apple", "angry", "bear"]

AND という単語が必要な場合--:

words.map{ |w| w[/^(.+ --)/, 1] } # => ["apple --", "angry --", "bear --"]

目的が説明のないバージョンのファイルを作成することである場合:

File.open('new_dict.txt', 'w') do |fo|
  File.foreach('dict.txt') do |li|
    fo.puts li.split(/\s-+\s/).first
  end
end

一般に、入力ファイルが非常に大きくなった場合にスケーラビリティの問題を回避するには、を使用foreachして入力ファイルを繰り返し処理し、単一行として処理します。行ごとに反復するとき、またはすべてを丸呑みしてバッファまたは配列として処理しようとするとき、処理速度が上がる限り、それはウォッシュです。巨大なファイルを丸呑みすると、マシンのクロールが遅くなったり、コードがクラッシュして無限に遅くなったりする可能性があります。行ごとの IO は驚くほど高速で、潜在的な問題はありません。

score 1 · Accepted Answer

1

File.read("dict.txt").gsub(/(?<=--).*/, "")

出力

apple --
angry --
bear --
...

于 2013-10-30T15:44:37.110 に答える

score 1 · Accepted Answer

lines_without_description = File.read('dict.txt').lines.map{|line| line[0..line.index('-')+1]}
File.open('dict2.txt', 'w'){|f| f.write(lines_without_description.join("\n"))}

score 1 · Accepted Answer

速度が必要な場合はsed、コマンドラインで実行することを検討してください。

sed -r 's/(.*?) -- .*/\1/g' < dict.txt > new_dict.txt

new_dict.txtこれにより、単語のみを含む新しいファイルが作成されます。

ruby - Ruby でファイル内の文字列と配列を操作する

4 に答える 4

Related

Reference