python - Pythonを使用してファイルから行を削除する

Question

重複の可能性:
複数行のコメントを見つけるための Python 3 正規表現

これを行う方法についていくつかの入力が必要です。入力に本当に感謝します。他の投稿を見ましたが、私の要件に一致するものはありませんでした。

Pythonでファイルから行を削除する方法 Pythonでテキストファイルから行を削除する

提供された入力文字列に基づいて、ファイル内の複数行のコメントを照合する必要があります。

例：-

ファイル「test.txt」に次のコメントがある場合、inputstring="This is a test, script written" の場合、このコメントをファイルから削除する必要があるとします。

import os
import sys

import re
import fnmatch

def find_and_remove(haystack, needle):
    pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
    return re.sub(pattern, "", haystack)

for path,dirs,files in os.walk(sys.argv[1]):
    for fname in files:
        for pat in ['*.cpp','*.c','*.h','*.txt']:
            if fnmatch.fnmatch(fname,pat):
                fullname = os.path.join(path,fname)
                with open(fullname, "r") as f:
                    find_and_remove(f, r"This is a test, script written")

エラー：-

Traceback (most recent call last):
  File "comment.py", line 16, in <module>
    find_and_remove(f, r"This is a test, script written")
  File "comment.py", line 8, in find_and_remove
    return re.sub(pattern, "", haystack)
  File "/usr/lib/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
TypeError: expected string or buffer

score 3 · Accepted Answer

質問を見て最初に頭に浮かんだのは「ステートマシン」でした。Python で「ステートマシン」を考えるときはいつでも、最初に頭に浮かぶのは「ジェネレーター」別名yield です。

def skip_comments(f):
    """
    Emit all the lines that are not part of a multi-line comment.
    """
    is_comment = False

    for line in f:
        if line.strip().startswith('/*'):
            is_comment = True

        if line.strip().endswith('*/'): 
            is_comment = False
        elif is_comment:
            pass
        else:
            yield line


def print_file(file_name):
    with file(file_name, 'r') as f:
        skipper = skip_comments(f)

        for line in skipper:
            print line,

編集: user1927396 は、特定のテキストを含む、除外する特定のブロックであることを指定することで、条件を引き上げました。コメントブロック内にあるため、ブロックを拒否する必要があるかどうかは事前にわかりません。

私の最初の考えはバッファでした。Ack. プー。2 番目に考えたのは、15 年間頭の中にあり、今まで一度も使用したことのない忘れられないフレーズでした。「ステートマシンのスタック」...

def squelch_comment(f, first_line, exclude_if):
    """
    Comment is a multi-line comment that we may want to suppress
    """
    comment = [first_line]

    if not first_line.strip().endswith('*/'):
        for line in f:

            if exclude_if in line:
                comment = None

            if comment and len(comment):
                comment.append(line)

            if line.strip().endswith('*/'):
                break

    if comment:
        for comment_line in comment:
            yield '...' + comment_line


def skip_comments(f):
    """
    Emit all the lines that are not part of a multi-line comment.
    """
    for line in f:
        if line.strip().startswith('/*'):
            # hand off to the nested, comment-handling, state machine
            for comment_line in squelch_comment(f, line, 'This is a test'):
                yield comment_line
        else:
            yield line


def print_file(file_name):
    with file(file_name, 'r') as f:
        for line in skip_comments(f):
            print line,

score 1 · Accepted Answer

これは原則として機能するはずです

def skip(file, lines):
 cline = 0
 result = ""
 for fileLine in file.read():
  if cline not in lines:
   result += fileLine
  cline += 1
 return result

行は番号のリストである必要があり、ファイルは開いているファイルである必要があります

score 1 · Accepted Answer

これはリクエストのようにそれを行います: 目的の文字列を含むすべての複数行コメントを削除します:

これをというファイルに入れます。program.txt

/*
 * This is a test, script written
 * This is a comment line
 * Multi-line comment
 * Last comment
 *
 */

some code

/*
 * This is a comment line
 * And should 
 *     not be removed
 *
 */

more code

次に、検索して置換します。needle正規表現の特殊文字が導入されていないことを確認してください。

import re

def find_and_remove(haystack, needle):
    pattern = re.compile(r'/\*.*?'+ needle + '.*?\*/', re.DOTALL)
    return re.sub(pattern, "", haystack)

# assuming your program is in a file called program.txt
program = open("program.txt", "r").read()

print find_and_remove(program, r"This is a test, script written")

結果：

some code

/*
 * This is a comment line
 * And should 
 * not be removed
 *
 */

more code

関連する質問の正規表現を適応させます

コードの最後のセクションを編集する:

for path,dirs,files in os.walk(sys.argv[1]):
    for fname in files:
        for pat in ['*.cpp','*.c','*.h','*.txt']:
            if fnmatch.fnmatch(fname,pat):
                fullname = os.path.join(path,fname)
                # put all the text into f and read and replace...
                f = open(fullname).read()
                result = find_and_remove(f, r"This is a test, script written")

                new_name = fullname + ".new"
                # After testing, then replace newname with fullname in the 
                # next line in order to replace the original file.
                handle = open(new_name, 'w')
                handle.write(result)
                handle.close()

needleですべての正規表現の特殊文字をエスケープしていることを確認してください().。たとえば、テキストに角かっこが含まれている場合(any text)は、needleas\(any text\)

python - Pythonを使用してファイルから行を削除する

3 に答える 3

Related

Reference