python - XML minidom を使用したグラフの反復処理のフォローアップ

Question

これは質問のフォローアップです（リンク）

私がやろうとしているのは、XML を使用して、NetworkX を使用してグラフを作成することです。以下の DOM 構造を見ると、同じノード内のすべてのノードにはエッジがあり、同じ会議に参加したすべてのノードにはその会議へのノードがあるはずです。要約すると、論文で一緒に作業したすべての著者は互いに接続する必要があり、特定の会議に参加したすべての著者はその会議に接続する必要があります。

<conference name="CONF 2009">
<paper>
<author>Yih-Chun Hu(UIUC)</author>
<author>David McGrew(Cisco Systems)</author>
<author>Adrian Perrig(CMU)</author>
<author>Brian Weis(Cisco Systems)</author>
<author>Dan Wendlandt(CMU)</author>
</paper>
<paper>
<author>Dan Wendlandt(CMU)</author>
<author>Ioannis Avramopoulos(Princeton)</author>
<author>David G. Andersen(CMU)</author>
<author>Jennifer Rexford(Princeton)</author>
</paper>
</conference>

著者を会議に接続する方法はわかりましたが、著者同士を接続する方法がわかりません。私が苦労しているのは、同じ論文に取り組んだ著者を繰り返し処理し、それらを結び付ける方法です。

    dom = parse(filepath)
    conference=dom.getElementsByTagName('conference')
    for node in conference:
        conf_name=node.getAttribute('name')
        print conf_name
        G.add_node(conf_name)

    #The nodeValue is split in order to get the name of the author 
#and to exclude the university they are part of

        plist=node.getElementsByTagName('paper')
        for p in plist:
            author=str(p.childNodes[0].nodeValue)
            author= author.split("(")
#Figure out a way to create edges between authors in the same <paper> </paper>

        alist=node.getElementsByTagName('author')
        for a in alist:
            authortext= str(a.childNodes[0].nodeValue).split("(")

            if authortext[0] in dict:
                edgeQuantity=dict[authortext[0]]
                edgeQuantity+=1
                dict[authortext[0]]=edgeQuantity
                G.add_edge(authortext[0],conf_name)

            #Otherwise, add it to the dictionary and create an edge to the conference.
            else:
                dict[authortext[0]]= 1
                G.add_node(authortext[0])
                G.add_edge(authortext[0],conf_name)
                i+=1

score 0 · Accepted Answer

あなたが何を探しているのか完全にはわかりませんが、あなたの説明に基づいて、私はあなたが説明する関係をカプセル化したと思うグラフをまとめました。

http://imgur.com/o2HvT.png

私はこれを行うためにopenfstを使用しました。このようなコードに飛び込む前に、グラフィカルな関係を明確にレイアウトする方がはるかに簡単だと思います。

また、実際に作成者間に明示的なエッジを生成する必要がありますか？これはトラバーサルの問題のようです。

score 0 · Accepted Answer

著者同士を接続する方法がわかりません。

それらをエッジとして追加できるように、(作成者、他の作成者) ペアを生成する必要があります。これを行う典型的な方法は、ネストされた反復です。

for thing in things:
    for otherthing in things:
        add_edge(thing, otherthing)

これは自己ループ (作成者に自分自身を接続するエッジを与える) を含むナイーブな実装であり、必要な場合とそうでない場合があります。また、(1,2) と (2,1) の両方が含まれています。これは、無向グラフを作成している場合は冗長です。(Python 2.6 では、組み込みのpermutationsジェネレーターもこれを行います。) これらの問題を修正するジェネレーターを次に示します。

def pairs(l):
    for i in range(len(l)-1):
        for j in range(i+1, len(l)):
            yield l[i], l[j]

私は NetworkX を使用していませんが、ドキュメントを見ると、同じノードで add_node を 2 回呼び出すことができるようです (2 回目は何も起こりません)。その場合、使用していた辞書を破棄して、挿入したノードを追跡することができます。また、未知のノードにエッジを追加すると、そのノードが自動的に追加されるようです。したがって、コードをはるかに短くすることができるはずです。

for conference in dom.getElementsByTagName('conference'):
    var conf_name= node.getAttribute('name')
    for paper in conference.getElementsByTagName('paper'):
        authors= paper.getElementsByTagName('author')
        auth_names= [author.firstChild.data.split('(')[0] for author in authors]

        # Note author's conference attendance
        #
        for auth_name in auth_names:
            G.add_edge(auth_name, conf_name)

        # Note combinations of authors working on same paper
        #
        for auth_name, other_name in pairs(auth_names):
            G.add_edge(auth_name, otherauth_name)

python - XML minidom を使用したグラフの反復処理のフォローアップ

2 に答える 2

Related

Reference