path - Networkx - ラベルの代わりにノード ID を表示するノード間の最短パス長を取得する方法

Question

Python で NetworkX ライブラリを使用するのは初めてです。

Pajek 形式のファイルをインポートするとします。

import networkx as nx
G=nx.read_pajek("pajek_network_file.net")
G=nx.Graph(G)

私のファイルの内容は次のとおりです（Pajekでは、ノードは「頂点」と呼ばれます）：

*Network
*Vertices 6
123 Author1
456 Author2
789 Author3
111 Author4
222 Author5
333 Author6
*Edges 
123 333
333 789
789 222
222 111
111 456

ここで、ネットワーク内のノード間のすべての最短パスの長さを計算したいと思います。ライブラリのドキュメントに従って、この関数を使用しています。

path = nx.all_pairs_shortest_path_length(G)

戻り値: lengths – ソースとターゲットをキーとする最短パスの長さのディクショナリ。

私が得ているリターン：

print path
{u'Author4': {u'Author4': 0, u'Author5': 1, u'Author6': 3, u'Author1': 4, u'Author2': 1, u'Author3': 2}, u'Author5': {u'Author4': 1, u'Author5': 0, u'Author6': 2, u'Author1': 3, u'Author2': 2, u'Author3': 1}, u'Author6': {u'Author4': 3, u'Author5': 2, u'Author6': 0, u'Author1': 1, u'Author2': 4, u'Author3': 1}, u'Author1': {u'Author4': 4, u'Author5': 3, u'Author6': 1, u'Author1': 0, u'Author2': 5, u'Author3': 2}, u'Author2': {u'Author4': 1, u'Author5': 2, u'Author6': 4, u'Author1': 5, u'Author2': 0, u'Author3': 3}, u'Author3': {u'Author4': 2, u'Author5': 1, u'Author6': 1, u'Author1': 2, u'Author2': 3, u'Author3': 0}}

ご覧のとおり、読むのも、後で使用するのも非常に困難です...

理想的には、次のような形式のリターンが必要です。

source_node_id, target_node_id, path_length
123, 456, 5
123, 789, 2
123, 111, 4

つまり、ノードラベルを表示するだけでなく、ノード ID のみを使用して (または少なくともノード ID を含めて) リターンを取得する必要があります。そして、可能なすべてのペアを、対応する最短パスとともに 1 行で取得するには...

これは NetworkX で可能ですか?

関数リファレンス: https://networkx.github.io/documentation/latest/reference/generated/networkx.algorithms.shortest_paths.unweighted.all_pairs_shortest_path_length.html

score 1 · Accepted Answer

最終的に、ネットワーク全体のサブセットの最短パスを計算するだけで済みました (実際のネットワークは 600K ノードと 6M エッジの巨大なネットワークです)。そのため、ソースノードとターゲットノードのペアを CSV から読み取るスクリプトを作成しました。ファイルに格納し、numpy 配列に格納してから、それらをパラメーターとして nx.shortest_path_length に渡し、すべてのペアについて計算し、最後に結果を CSV ファイルに保存します。

コードは以下のとおりです。誰かに役立つ場合に備えて投稿しています。

print "Importing libraries..."

import networkx as nx
import csv
import numpy as np

#Import network in Pajek format .net
myG=nx.read_pajek("MyNetwork_0711_onlylabel.net")

print "Finished importing Network Pajek file"

#Simplify graph into networkx format
G=nx.Graph(myG)

print "Finished converting to Networkx format"

#Network info
print "Nodes found: ",G.number_of_nodes()
print "Edges found: ",G.number_of_edges()


#Reading file and storing to array
with open('paired_nodes.csv','rb') as csvfile:
    reader = csv.reader(csvfile, delimiter = ',', quoting=csv.QUOTE_MINIMAL)#, quotechar = '"')
    data = [data for data in reader]
paired_nodes = np.asarray(data)
paired_nodes.astype(int)

print "Finished reading paired nodes file"

#Add extra column in array to store shortest path value
paired_nodes = np.append(paired_nodes,np.zeros([len(paired_nodes),1],dtype=np.int),1)

print "Just appended new column to paired nodes array"

#Get shortest path for every pair of nodes

for index in range(len(paired_nodes)):
    try:
    shortest=nx.shortest_path_length(G,paired_nodes[index,0],paired_nodes[index,1])
        #print shortest
        paired_nodes[index,2] = shortest
    except nx.NetworkXNoPath:
        #print '99999'  #Value to print when no path is found
        paired_nodes[index,2] = 99999

print "Finished calculating shortest path for paired nodes"

#Store results to csv file      
f = open('shortest_path_results.csv','w')

for item in paired_nodes:
    f.write(','.join(map(str,item)))
    f.write('\n')
f.close()

print "Done writing file with results, bye!"

score 0 · Accepted Answer

このようなものはどうですか？

import networkx as nx                                                            
G=nx.read_pajek("pajek_network_file.net")                                        
G=nx.Graph(G)
# first get all the lengths      
path_lengths = nx.all_pairs_shortest_path_length(G)                              

# now iterate over all pairs of nodes      
for src in G.nodes():
    # look up the id as desired                           
    id_src = G.node[src].get('id')
    for dest in G.nodes():                                                       
        if src != dest: # ignore self-self paths
            id_dest =  G.node[dest].get('id')                                    
            l = path_lengths.get(src).get(dest)                                  
            print "{}, {}, {}".format(id_src, id_dest, l)

これにより、出力が得られます

111, 222, 1
111, 333, 3
111, 123, 4
111, 456, 1
111, 789, 2
...

さらに処理 (並べ替えなど) が必要な場合lは、単に値を出力するのではなく、値を保存してください。

（次のような方法でペアをよりきれいにループできますが、慣れていない場合のために、上記の方法はもう少し明示的です。）itertools.combinations(G.nodes(), 2)

path - Networkx - ラベルの代わりにノード ID を表示するノード間の最短パス長を取得する方法

2 に答える 2

Related

Reference