python - r.findall の Python 正規表現

Question

テキストを区切るために findall を使用しています。

この式 re.findall(r'(. ?)(\$. ?\$)' から始めましたが、最後のテキストが見つかった後のデータが得られません。'6\n\n '

最後のテキストを取得するにはどうすればよいですか?

ここに私のpythonコードがあります：

#!/usr/bin/env python

import re

allData = '''
1
2
3 here Some text in here 
$file1.txt$
4 Some text in here and more  $file2.txt$
5 Some text $file3.txt$ here  
$file3.txt$
6

'''

for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData,flags=re.DOTALL) :
print repr(record)

これに対して得られる出力は次のとおりです。

('\n1\n2\n3 here Some text in here \n', '$file1.txt$', '')
('\n4 Some text in here and more  ', '$file2.txt$', '')
('\n5 Some text ', '$file3.txt$', '')
(' here  \n', '$file3.txt$', '')
('', '', '\n6\n')
('', '', '')
('', '', '')

私は本当にこの出力が欲しいです：

('\n1\n2\n3 here Some text in here \n', '$file1.txt$')
('\n4 Some text in here and more  ', '$file2.txt$')
('\n5 Some text ', '$file3.txt$')
(' here  \n', '$file3.txt$')
('\n6\n', '', )

全体像を表示する必要がある場合の背景情報。

あなたが興味を持っている場合、私はこれをpythonで書き直しています。残りのコードは管理下にあります。私はfindallからあまりにも多くのものを得ています。

https://discussions.apple.com/message/21202021#21202021

score 2 · Accepted Answer

そのAppleリンクから正しく理解できれば、次のようなことをしたい：

import re


allData = '''
1
2
3 here Some text in here
$file1.txt$
4 Some text in here and more  $file2.txt$
5 Some text $file3.txt$ here
$file3.txt$
6

'''


def read_file(m):
    return open(m.group(1)).read()

# Sloppy matching :D
# print re.sub("\$(.*?)\$",  read_file, allData)
# More precise.
print re.sub("\$(file\d+?\.txt)\$",  read_file, allData)

編集オスカーが提案するように、一致をより正確にします。

すなわち。s の間のファイル名を$取得し、データのファイルを読み取ると、上記のようになります。

出力例:

1
2
3 here Some text in here

I'am file1.txt

4 Some text in here and more  
I'am file2.txt

5 Some text 
I'am file3.txt
 here

I'am file3.txt

6

ファイル:

==> file1.txt <==

I'am file1.txt

==> file2.txt <==

I'am file2.txt

==> file3.txt <==

I'am file3.txt

score 1 · Accepted Answer

必要な出力を実現するには、パターンを 2 つのキャプチャグループに制限する必要があります。(3 つのキャプチャグループを使用する場合、すべての「レコード」に 3 つの要素があります)。

2 番目のグループをオプションにすることができます。

r'([^$]*)(\$.*?\$)?'

score 1 · Accepted Answer

で置換問題を解決する 1 つの方法を次に示しますfindall。

def readfile(name):
    with open(name) as f:
        return f.read()

r = re.compile(r"\$(.+?)\$|(\$|[^$]+)")

print "".join(readfile(filename) if filename else text 
    for filename, text in r.findall(allData))

score 0 · Accepted Answer

これはあなたの問題を部分的に解決しています

import re

allData = '''
1
2
3 here Some text in here 
$file1.txt$
4 Some text in here and more  $file2.txt$
5 Some text $file3.txt$ here  
$file3.txt$
6

'''

for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData.strip(),flags=re.DOTALL) :
    print  [ x for x in record if x]

出力の生成

['1\n2\n3 here Some text in here \n', '$file1.txt$']
['\n4 Some text in here and more  ', '$file2.txt$']
['\n5 Some text ', '$file3.txt$']
[' here  \n', '$file3.txt$']
['\n6']
[]

最後の空のリストを避ける

for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData.strip(),flags=re.DOTALL) :
    if ([ x for x in record if x] != []):
        print  [ x for x in record if x]

python - r.findall の Python 正規表現

4 に答える 4

Related

Reference