python - 正規表現の文字列から正確に 4 つの整数を抽出する

Question

list1 = ['Contact: Hamdan Z Hamdan, MBBS, Msc',
        '\r\n            ',
        '+249912468264',
        '\r\n                  ',
        'hamdanology@hotmail.com',
        '\r\n                ',
        'Contact: Maha I Mohammed, MBBS, PhD',
        '\r\n            ',
        '+249912230895',
        '\r\n                  ',
        '\r\n                ',
        'Sudan',
        'Jaber abo aliz',
        '\r\n                  ',
        'Recruiting',
        '\r\n          ',
        'Khartoum, Sudan, 1111  ',
        u'Contact: Khaled H Bakheet, MD,PhD \xa0 \xa0 +249912957764 \xa0 \xa0 ',
        'khalid2_3456@yahoo.com',
        u' \xa0 \xa0 ',
        u'Principal Investigator: Hamdan Z Hamdan, MBBS,MSc \xa0 \xa0  \xa0 \xa0  \xa0 \xa0 ',
       'Principal Investigator:',
       '\r\n      ',
       'Hamdan Z Hamdan, MBBS, MSc',
       '\r\n            ',
        'Al-Neelain University',
        '\r\n                '
    ]

この文字列のリストから、他の文字に関連付けるべきではない 4 桁の整数のみを抽出する必要がありますか?

例: '1111' のみが必要な出力です。

Pythonで正規表現をどのように書くべきですか? 明らかに、これは機能しません: *([\d]{4})*.

score 6 · Accepted Answer

正規表現で使用\bして単語の境界を示すことができるため、次のように機能します。

import re

for s in list1:
    m = re.search(r'\b\d{4}\b', s)
    if m:
        print m.group(0)

...出力するだけ1111です。のドキュメントで\bさらに説明します。

\b

空の文字列に一致しますが、単語の最初または最後にのみ一致します。単語は英数字またはアンダースコア文字のシーケンスとして定義されるため、単語の終わりは空白または英数字以外のアンダースコア文字で示されます。[...]

score 3 · Accepted Answer

あなたは以下を試すことができます

>>> [l for l in (re.findall(r"[^\d](\d{4})[^\d]",s) for s in list1) if l]
[['1111'], ['3456']]

単語境界の使用で4桁の数字のみに関心がある場合

>>> [l for l in (re.findall(r"\b\d{4}\b",s) for s in list1) if l]
[['1111']]

python - 正規表現の文字列から正確に 4 つの整数を抽出する

2 に答える 2

Related

Reference