16

I have a file in which lines are separated using a delimeter say .. I want to read this file line by line, where lines should be based on presence of . instead of newline.

One way is:

f = open('file','r')
for line in f.read().strip().split('.'):
   #....do some work
f.close()

But this is not memory efficient if my file is too large. Instead of reading a whole file together I want to read it line by line.

open supports a parameter 'newline' but this parameter only takes None, '', '\n', '\r', and '\r\n' as input as mentioned here.

Is there any way to read files line efficiently but based on a pre-specified delimiter?

4

3 に答える 3

22

You could use a generator:

def myreadlines(f, newline):
  buf = ""
  while True:
    while newline in buf:
      pos = buf.index(newline)
      yield buf[:pos]
      buf = buf[pos + len(newline):]
    chunk = f.read(4096)
    if not chunk:
      yield buf
      break
    buf += chunk

with open('file') as f:
  for line in myreadlines(f, "."):
    print line
于 2013-04-28T06:10:07.520 に答える
2

最も簡単な方法は、ファイルを前処理して、必要な場所に改行を生成することです。

これは perl を使用した例です (文字列 'abc' を改行にしたいと仮定します):

perl -pe 's/abc/\n/g' text.txt > processed_text.txt

元の改行も無視する場合は、代わりに次を使用します。

perl -ne 's/\n//; s/abc/\n/g; print' text.txt > processed_text.txt
于 2013-05-07T23:15:45.763 に答える