python - ディレクトリに存在するファイル内のユーザー文字列データの解析

Question

私は次のことをしようとしていますが、良いケースでさえ一致しません。サンプル入力ファイルと完全なコードを以下に示します?コードが以下のサンプル入力ファイルと一致しないのはなぜですか?それを克服する方法は?

1.引数に基づいて、ディレクトリとサブディレクトリ内の各ファイルを開きます（

2.各ファイルの著作権情報が正確に 3 行であるかどうかを確認します。これらの 3 行は 3 行で始まる必要はありません。

 Copyright (c) 2012 Company, Inc. 
 All Rights Reserved.
 Company Confidential and Proprietary.

サンプル入力ファイル:-

File1.txt

/*==========================================================================
 *
 *  @file:     Compiler.h
 *
 *  @brief:    This file 
 *
 *
 *  @author:   david
 *
 *  Copyright (c) 2012 Company, Inc. 
 *  All Rights Reserved.
 *  Company Confidential and Proprietary
 *
 *=========================================================================*/
#ifndef __COMPILER_ABSTRACT_H
#define __COMPILER_ABSTRACT_H

コード：

import os
import sys
userstring="Copyright (c) 2012 Company, Inc.\nAll Rights Reserved.\nCompany Confidential and Proprietary."
print len(sys.argv)
print sys.argv[1]
if len(sys.argv) < 2:
    sys.exit('Usage: python.py <build directory>')
for r,d,f in os.walk(sys.argv[1]):
    for files in f:
        with open(os.path.join(r, files), "r") as file:
            if ''.join(file.readlines()[:3]).strip() != userstring:
                print files

score 1 · Accepted Answer

何があなたに与えるかを確認して''.join(file.readlines()[:3]).strip()ください。*行間がまだ残っていることに気付くでしょう。最初の 3 行が表示されます ([:3]そうです)。彼らは中にいませんがuserstring。

考えられる解決策の 1 つは、各行を個別にチェックすることです。このようなもの：

userlines = userstring.split('\n') # Separate the string into lines
with open(os.path.join(r, files), "r") as file:
    match = 0
    for line in file:
        if userlines[match] in line: # Check if the line at index `m` is in the user lines
            match += 1 # Next time check the following line
        elif match > 0: # If there was no match, reset the counter
            match = 0
        if match >= len(userlines): # If 3 consecutive lines match, then you found a match
            break
    if match == len(userlines): # You found a match
        print files

この背後にある考え方は、空白行、*、ドット、空白などがあるため、探しているものは完全に一致しないということです。in演算子を使用して多かれ少なかれこれを説明しましたが、もっと多くのことを考え出すことができますライン単位で作業する場合は柔軟です。ファイルを操作しているときはなおさらです...

更新：

各行でより高度な解析を行うには、reパッケージを使用して正規表現を使用できますが、パターンよりも文字列に一致させたい場合が多いため、ユースケースでは実用的ではない可能性があります。したがって、最後の文字を無視するには、最初または最後の文字 (空白またはドットまたは星) を削除/無視してみてください。

例えば：

>>> a = '   This is a string.   '
>>> a.strip()
'This is a string.' # removes the whitespace by default
>>> a.strip('.')
'   This is a string.   ' # removes only dots
>>> a.strip('. ')
'This is a string' # removes dots and spaces

あなたの入力と一致させるにuserstringは、userstring. 変更すると、次のようになります。

userlines = [s.strip('\n\r .') for s in userstring.split('\n')]
# ...
        if userlines[match] == line.strip('\n\r .'):
# ...

ファイルを行ごとに処理するとstartswith、endswith、strip、、、countなどの便利な関数を多数利用できます。完全なリストについては、インタープリターをfind入力するだけです。help(str)

python - ディレクトリに存在するファイル内のユーザー文字列データの解析

1 に答える 1

Related

Reference