python - Pythonでhtmlファイルの特定の部分を削除する方法

Question

アイテム 1、アイテム 2、およびアイテム 3 を含む html ファイルで作業しています。アイテム 2 の後に来るすべてのテキストを削除したいです。ファイル内のアイテム 2 は次のように見つけることができます。

Item2= re.compile (r'(Item&nbsp;2)',re.I|re.S)
Item2match= Item2.findall(file)

しかし、その後のテキストを削除する方法がわかりません。

score 0 · Accepted Answer

文字列メソッドを使用して html テキストを分割し、最初の部分を取ります。str.partition()はるかに簡単に動作します:

file.partition('Item&nbsp;2')[0]

Item 2テキストも保持したい場合は、次を使用します。

''.join(file.partition('Item&nbsp;2')[:2])

ここで正規表現を使用する必要はありません。あなたはリテラルテキストと一致しています。正規表現は非常に表現力が高く、強力なツールですが、より簡単な代替手段がある場合は使用しないでください。

デモ：

>>> 'Some text with Item&nbsp;2 in it'.partition('Item&nbsp;2')[0]
'Some text with '
>>> ''.join('Some text with Item&nbsp;2 in it'.partition('Item&nbsp;2')[:2])
'Some text with Item&nbsp;2'

score 0 · Accepted Answer

>>> re.sub(r'(?s)(?<=Item&nbsp;2)(.*)', '', file)

例：

>>> s
'Item&nbsp;2...feiugeogherger\nfjweifjwef\nsfjioweiefjwe'
>>> re.sub(r'(?s)(?<=Item&nbsp;2)(.*)', '', s)
'Item&nbsp;2'

python - Pythonでhtmlファイルの特定の部分を削除する方法

2 に答える 2

Related

Reference