python - Pythonのxml解析でxrefタグを削除する

Question

ElementTree を使用して、このような構造の .nxml ファイルを Python で解析しようとしています.....

<body>
    <sec>
        <title>INTRODUCTION</title>
        <p>Experimentation with substances usually takes place during adolescence [<xref ref-type="bibr" rid="b1">1</xref>]. Adolescents are highly vulnerable to social influences [<xref ref-type="bibr" rid="b2">2</xref>], have lower tolerance levels and become dependent at lower doses than adults [<xref ref-type="bibr" rid="b3">3</xref>]. Adolescent-onset substance abuse is characterized by more rapid development of multiple drug dependencies and more severe psychopathology [<xref ref-type="bibr" rid="b4">4</xref>]. However, the majority of adolescents who experiment with substances do not become problem users. A better understanding is needed of the factors underlying initiation of substance use in adolescence versus heavy use and problem use. Specifically, if the liability to progress to heavier substance use is influenced by processes other than those that influence initiation, then primary prevention/intervention programmes can be only partly effective. It may be more successful, in terms of both cost and impact, to target those factors implicated in the progression to heavy/problem use. However, if the underlying liabilities to initiation and progression were strongly related, interventions could be tailored to both behaviours.</p>

具体的には、間のテキストを抽出しようとしています

<p> </p> tags.

ただし要素は

[<xref> </xref>]

テキスト内のは解析を中断しています。

使ってみました

for sec in body:
    for p in sec:
        for e in p:
           e.remove (xref)

しかし、要素は認識されません。何か案は？

score 1 · Accepted Answer

これはうまくいく可能性が高くなります：

for xref in body.findall('xref'):
    body.remove(xref)

あなたが今までやってきたこととより一致するようにするには、次のことを試してください。

for sec in body.findall('sec'):
    for p in sec.findall('p'):
        for e in p.findall('xref'):
           p.remove(e)

score 0 · Accepted Answer

実際、私はそれをすべて破棄し、BeautifulSoup を使用してすべてのタグを削除しました。御馳走を働いた。私がそんなに馬鹿だったなんて信じられない。

python - Pythonのxml解析でxrefタグを削除する

2 に答える 2

Related

Reference