python - Pythonファイルの行の正確なインデントを抽出するには?

Question

私の目的は、Python ファイル内のコード行の正確なインデントを特定することです。私はある場所でステートメントをインストルメント化するので、行の必要なインデントを決定することは私の目標を達成するために重要です。この問題は、次の例で説明できます。

First Scenario
#A.py

a=0                  <----------- indentation '0' spaces or '0' \t
while a<5:           <----------- indentation '0' spaces or '0' \t
    print a          <----------- indentation '4' spaces or '1' \t
    a=a+1            <----------- indentation '4' spaces or '1' \t

Second scenario
#A.py

a=0                  <----------- indentation '0' spaces or '0' \t
while a<5:           <----------- indentation '0' spaces or '0' \t
        print a      <----------- indentation '8' spaces or '2' \t
        a=a+1        <----------- indentation '8' spaces or '2' \t

多くのファイルで構成されるアプリケーションを検査しているため、上記のシナリオのファイルに出くわしました。Python ファイル内の任意の行のインデントを確認する方法を知りたいですか?

score 4 · Accepted Answer

インデントを決定するために選択する方法は、パフォーマンスに大きな影響を与える可能性があることに注意してください。例として、先頭の空白を測定するタスクに正規表現を使用できますが、より簡単ではるかに効率的な方法があります。

import re

line = '            Then the result is even.'
r = re.compile(r"^ *")

%timeit len(line) - len(line.lstrip())    # 1000000 loops, best of 3: 0.387 µs per loop
%timeit len(re.findall(r"^ *", line)[0])  #  100000 loops, best of 3: 1.94 µs per loop
%timeit len(r.findall(line)[0])           # 1000000 loops, best of 3: 0.890 µs per loop

他の回答からの正規表現が遅い理由は、正規表現が、正規表現の構築時にコンパイルされたステートマシンであるためです。内部にはキャッシュがありますが、それでも自分で正規表現を手作業でコンパイルして再利用する方がよいでしょう。

ただし、正規表現のソリューションは、空白の除去前後の文字列を比較する最初のサンプルよりも 20% しか高速ではないことに注意してください (最悪の場合、コンパイル済みの式を使用する場合は 43%)。

重要な注意:.replace() Python はタブを 8 スペースのインデントとして解釈するため、評価の前に同等の量のスペースを持つリテラルタブも必要になります。

追加するために編集: Python パーサー自体は特定のインデントレベルを気にせず、特定の「ブロック」が一貫してインデントされていることだけを気にします。インデントの増加量は事実上無視されて取り除かれ、代わりに INDENT および DEDENT トークンに置き換えられます。(スペース 16 個のインデント → INDENT トークンは 1 つだけ。) 本当に重要なのは、行ごとのインデントの変更です。

score 0 · Accepted Answer

どうですか

line = '    \t  asdf'
len(re.split('\w', line)[0].replace('\t', '    '))
>>> 10

他の提案された解決策はどれもタブを正しくカウントしないことに注意してください。

score 0 · Accepted Answer

正規表現を使用できます：

import re
with open("/path/to/file") as file:
    for mark, line in enumerate(file.readlines()):
        print mark, len(re.findall("^ *", line)[0])

最初の数字は行番号で、2 番目はインデントです。

または、特定の行が必要な場合は、次のようにします。

import re
with open("/path/to/file") as file:
    print len(re.findall("^ *", file.readlines()[3])[0])

これにより、4 行目のインデントが返されます (インデックスは、必要な行番号 -1 になることに注意してください)。

score 0 · Accepted Answer

「他のテクニックについて最低限の知識しか持っていない」方法。

read = open('stringstuff.py','rb')
indent_space = []
for line in read:
    spaces = 0
    for char in line:
        if char != " ":
            break
        spaces += 1
    indent_space.append(spaces)


for i in xrange(len(indent_space)-1):
    new_indentation = abs(indent_space[i+1] - indent_space[i-1])
    if new_indentation != 0:
        indentation = new_indentation
        if new_indentation != indentation:
            print 'Indentation:', new_indentation, "found"
            indentation = new_indentation

for line in indent_space:
    print "Indentation of", line, "spaces or", line/indentation, "indents."

python - Pythonファイルの行の正確なインデントを抽出するには?

5 に答える 5

Related

Reference