python - 値が正の値 k 離れている場合、ランキングシステムから値を削除するにはどうすればよいですか?

Question

次のコードがあるとします。

def compute_ranks(graph, k):
    d = .8 #dampening factor
    loops = 10
    ranks = {}
    npages = len(graph)
    for page in graph:
        ranks[page] = 1.0 / npages
    for c in range(0, loops):
        newranks = {}
        for page in graph:
            newrank = (1-d) / npages
            for node in graph:
                if page in graph[node]:
                    newrank = newrank + d * (ranks[node]/len(graph[node]))
            newranks[page] = newrank
        ranks = newranks
     return ranks

では、互いに共謀する可能性のあるアイテムを許可したくないとします。アイテム辞書があれば

g = {'a': ['a', 'b', 'c'], 'b':['a'], 'c':['d'], 'd':['a']}

任意のパス A==>B について、B==>A からのパスが私の番号 k 以下の距離にあることを許可したくありません。

たとえば、k = 0 の場合、許可しない唯一のパスは A==>A です。

ただし、k = 2 の場合、以前のようにリンク A==>A は許可されませんが、D==>A、B==>A、または A==>C などのリンクも許可されます。

私はこれが非常に紛らわしいことを知っており、私の問題の大部分はこれが何を意味するのかを正確に理解していないことに起因しています.

質問の書き起こしは次のとおりです。

# Question 2: Combatting Link Spam

# One of the problems with our page ranking system is pages can 
# collude with each other to improve their page ranks.  We consider 
# A->B a reciprocal link if there is a link path from B to A of length 
# equal to or below the collusion level, k.  The length of a link path 
# is the number of links which are taken to travel from one page to the 
# other.

# If k = 0, then a link from A to A is a reciprocal link for node A, 
# since no links needs to be taken to get from A to A.

# If k=1, B->A would count as a reciprocal link  if there is a link 
# A->B, which includes one link and so is of length 1. (it requires 
# two parties, A and B, to collude to increase each others page rank).

# If k=2, B->A would count as a reciprocal link for node A if there is
# a path A->C->B, for some page C, (link path of length 2),
# or a direct link A-> B (link path of length 1).

# Modify the compute_ranks code to 
#   - take an extra input k, which is a non-negative integer, and 
#   - exclude reciprocal links of length up to and including k from 
#     helping the page rank.

score 0 · Accepted Answer

私はそれを考え出した：

def Colluding(p1,p2,itemDict,k, time):
if k >= len(itemDict)/2:
    return True
if k == 0:
    if time == 0:
        if p1 == p2 and p1 in itemDict[p2]:
            return True
        else:
            return False
    if time == 1:
        if p1 in itemDict[p2]:
            return True
        else:
            return False
for p in itemDict[p1]:
    if Colluding(p1,p,itemDict,k-1,1) and p == p2:
        return True
    else:
        return False
return False

def compute_ranks(graph, k):
    d = 0.8 # damping factor
    numloops = 10
    ranks = {}
    npages = len(graph)
    for page in graph:
        ranks[page] = 1.0 / npages
    for i in range(0, numloops):
        newranks = {}
        for page in graph:
            newrank = (1 - d) / npages
            for node in graph:
                if page in graph[node] and not Colluding(page, node, graph, k,0):
                    newrank = newrank + d * (ranks[node]/len(graph[node]))
            newranks[page] = newrank
        ranks = newranks
    return ranks

正しい方向に向けてくれた@Kenに感謝します。

score 0 · Accepted Answer

私もこのコースを受講しました..そして、次の解決策を提案しました。基本的に必要なのは「全相互リンク」です。次の手順では、グラフとレベル k を指定して、すべての相互リンクを計算しました。相互リンク (node1、node2) を示すタプルのリストを返します。あなたがする必要があるのは、（ノード、ページ）がこのリストにないかどうかをチェックして、ランクを計算することです.

def get_reciprocal_links(graph, k):
reci=[]
for node in g:
    for ele in g[node]:
        got = False
        work = []
        i=0
        if node == ele and i == k:
            reci.append((node,ele))
            break
        if ele in g:
            for e in g[ele]:
                work.append(e)

        while i < k:
            if node in work:
                reci.append((node,ele))
                got = True
                break
            else:
                i=i+1
                check=work.pop()
                if check in g:
                    for e in g[check]:
                        work.append(e)
        if got == True:
            continue
return reci

score 0 · Accepted Answer

可能な解決策は、共謀を検出する再帰的な方法を導入することです。何かのようなもの：

def Colluding(p1,p2,itemDict,k):
    if p1 == p2:
        return True
    elif k == 0:
        return False
    else p2 in itemDict[p1]:
        return True
    for p in itemDict[p1]:
        if Colluding(p1,p,itemDict,k-1):
            return True
    return False

次に、それがif item in itemDict[node]あなたが持っていると言っている場所if item in itemDict[node] and not Colluding(item,node,itemDict,k)またはそれに似たもの。

これは深さ優先の検索を行いますが、これは、数回の完全な深さの検索の後にしか見つからない可能性があるため、小さな深さ (A->B->A など) に多くの共謀リンクがある場合、最良の選択ではない可能性があります。その場合、幅優先検索を行う別の方法を見つけることができるかもしれません。また、リンクが多い場合は、再帰を使用すると Python でスタックオーバーフローの問題が発生する可能性があるため、代わりにループに変換してみる価値があるかもしれません。再帰アルゴリズムは、再帰を使用してツリーをトラバースするのが自然に思えるため、最初に頭に浮かんだものです。

注: これは宿題の助けになるので、私はテストしていません。一部の比較は正しくない可能性があり、何らかの方法で調整する必要があります。

python - 値が正の値 k 離れている場合、ランキング システムから値を削除するにはどうすればよいですか?

3 に答える 3

Related

Reference

python - 値が正の値 k 離れている場合、ランキングシステムから値を削除するにはどうすればよいですか?