python - 辞書で最も関連性の高い子供を探す

Question

この辞書を取る：

{'local': {'count': 7,
    'dining-and-nightlife': {'count': 1,
        'bar-clubs': {'count': 1}
    },
    'activities-events': {'count': 6,
        'outdoor-adventures': {'count': 4},
        'life-skill-classes': {'count': 2}
    }
}}

（30％の余裕の範囲内で）最も関連性の高い一致を判断するにはどうすればよいですか？たとえば、アクティビティイベントのカウントは6であるため、6/7 = 85％であり、その子のアウトドアアドベンチャーのカウントは6のうち4（66％）です。したがって、これから最も関連性の高いカテゴリはアウトドアアドベンチャーです。

この例では：

{'local': {'count': 11,
    'dining-and-nightlife': {'count': 4,
        'bar-clubs': {'count': 4}
    },
    'activities-events': {'count': 6,
        'outdoor-adventures': {'count': 4},
        'life-skill-classes': {'count': 2}
    }
}}

バークラブ（100％）での食事とナイトライフ（33％）と、アウトドアアベンチャー（66％）でのアクティビティイベント（54％）の両方を利用してください。

私はパーセンテージカットオフがによって決定されることを望んでいました

cutoff = 0.3

ここでの考え方は、より小さな結果（30％未満）の一致を削除して、どのカテゴリが最も関連性があるかを判断することです。

@FJは以下の質問に答えましたが、ツリーのカウントを更新したいと思います。

初期出力：

{'local': {'activities-events': {'count': 6,
                             'life-skill-classes': {'count': 2},
                             'outdoor-adventures': {'count': 4}},
       'count': 11,
       'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}

ポスト出力：

{'local': {'activities-events': {'count': 6,
                             'life-skill-classes': {'count': 2},
                             'outdoor-adventures': {'count': 4}},
       'count': 10,
       'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}

score 1 · Accepted Answer

以下は機能するはずです。これにより、入力辞書が適切に変更されることに注意してください。

def keep_most_relevant(d, cutoff=0.3):
    for k, v in list(d.items()):
        if k == 'count':
            continue
        if 'count' in d and v['count'] < d['count'] * cutoff:
            del d[k]
        else:
            keep_most_relevant(v)

例：

>>> d1 = {'local': {'count': 7, 'dining-and-nightlife': {'count': 1, 'bar-clubs': {'count': 1}}, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2}}}}
>>> keep_most_relevant(d1)
>>> pprint.pprint(d1)
{'local': {'activities-events': {'count': 6,
                                 'life-skill-classes': {'count': 2},
                                 'outdoor-adventures': {'count': 4}},
           'count': 7}}

>>> d2 = {'local': {'count': 11, 'dining-and-nightlife': {'count': 4, 'bar-clubs': {'count': 4}}, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2}}}}
>>> keep_most_relevant(d2)
>>> pprint.pprint(d2)
{'local': {'activities-events': {'count': 6,
                                 'life-skill-classes': {'count': 2},
                                 'outdoor-adventures': {'count': 4}},
           'count': 11,
           'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}

score 0 · Accepted Answer

def matches(match, cutoff):
    total = float(match['count'])

    for k in match:
        if k == 'count':
            continue

        score = match[k]['count'] / total

        if score >= cutoff:
            yield (k, score)

            m = list(matches(match[k], cutoff))
            if m: yield max(m, key=lambda (c, s): s)

def best_matches(d, cutoff):
    for k in d:
        for m in matches(d[k], cutoff):
            yield m

テスト1

>>> d = {'local': {'count': 7,
    'dining-and-nightlife': {'count': 1,
        'bar-clubs': {'count': 1}
    },
    'activities-events': {'count': 6,
        'outdoor-adventures': {'count': 4},
        'life-skill-classes': {'count': 2}
    }
}}
>>> print list(best_matches(d, 0.3))
[('activities-events', 0.8571428571428571), ('outdoor-adventures', 0.66666666666666663)]

テスト2

>>> d = {'local': {'count': 11,
    'dining-and-nightlife': {'count': 4,
        'bar-clubs': {'count': 4}
    },
    'activities-events': {'count': 6,
        'outdoor-adventures': {'count': 4},
        'life-skill-classes': {'count': 2}
    }
}}
>>> print list(best_matches(d, 0.3))
[('dining-and-nightlife', 0.36363636363636365), ('bar-clubs', 1.0), ('activities-events', 0.54545454545454541), ('outdoor-adventures', 0.66666666666666663)]

python - 辞書で最も関連性の高い子供を探す

2 に答える 2

テスト1

テスト2

Related

Reference