0

From a network, I want to plot the probability of two nodes to be connected as a function of their distance to each other.

I have two pandas series, one (distance) is the distance between each pair of node and the other (adjacency) is filled with zeros and ones and tells if the nodes are connected.

My idea was to use cut and value_counts to first compute the number of pairs having a distance inside bins, which works fine:

factor = pandas.cut(distance, 100)
num_bin = pandas.value_counts(factor)

Now if had a vector of the same size of num_bin with the number of connected nodes inside each bins, i would have my probability. but how to compute this vector?

My problem is how to know among, lets says the 3 couple of nodes inside the second bin, how many are connected?

thanks

4

1 に答える 1

3

これに使用できますcrosstab

import numpy as np
import pandas as pd

factor = pd.cut(distance, 100)

# the crosstab dataframe with the value counts in each bucket
ct = pd.crosstab(factor, adjacency, margins=True,
                 rownames=['distance'], colnames=['adjacency'])

# from here computing the probability of nodes being adjacent is straightforward
ct['prob'] = np.true_divide(ct[1], ct['All'])

次の形式のデータフレームが得られます。

>>> ct

adjacency           0    1  All      prob
distance
(0.00685, 0.107]    7    4   11  0.363636
(0.107, 0.205]      6    9   15  0.600000
(0.205, 0.304]      6    6   12  0.500000
(0.304, 0.403]      5    2    7  0.285714
(0.403, 0.502]      4    6   10  0.600000
(0.502, 0.6]        8    3   11  0.272727
(0.6, 0.699]        6    2    8  0.250000
(0.699, 0.798]      4    6   10  0.600000
(0.798, 0.896]      4    5    9  0.555556
(0.896, 0.995]      5    2    7  0.285714
All                55   45  100  0.450000
于 2013-03-21T13:12:00.667 に答える