python - 1 つの列が 100% を満たす場合の 2 つの列の fuzzywuzzy 比率は、最良の列と一致します

Question

私のデータフレームは

Matcher = df2['Account Name']

match = if df1['Billing Country'] == df2['Billing Country'] (process.extractOne(df1['Account Name'], Matcher))

上記のコードは機能しませんが、国が一致する場合にのみアカウント名のあいまい一致を実行したいと考えています。

score 1 · Accepted Answer

これが私が提案しているものです。まず、2 つの dfs で完全なデカルト結合を行います。

df1.loc[:, 'MergeKey'] = 1 #create a mergekey
df2.loc[:, 'MergeKey'] = 1 #it is the same for both so that when you merge you get the cartesian product
#merge them to get the cartesian product (all possible combos)
merged = df1.merge(df2, on = 'MergeKey', suffixes = ['_1', '_2'])

次に、各コンボのファズ率を計算します。

def fuzzratio(row):
    try: #avoid errors for example on NaN's
        return fuzz.ratio(row['Billing Country_1'], row['Billing Country_2'])
    except:
        return 0. #you'll want to expiriment w/o the try/except too
merged.loc[:, 'Ratio'] = merged.apply(fuzzratio, axis = 1) #create ratio column by applying function

df1['Billing Country']これで、とのすべての可能な組み合わせの比率を持つ df が得られるはずですdf2['Billing Country']。そこに来たら、単純にフィルタリングして、比率が 100% のものを取得します。

result = merged[merged.Ratio ==1]

score 0 · Accepted Answer

私は少し違う方法でそれを理解しました。

最初に使用してマージしました

merged_file = pd.merge(df2, df1, on='Billing Country', how = 'left')

そして、私がすべての可能な試合をしたとき。

fuzzywuzzy を適用します

`Reference_data= df2['Account Name']`

`Result = process.extractOne(df1, choices)`

上記の文字列により、検索したい各値に最も近い一致が得られました。後で、比率を計算するためにもう 1 つの文字列を追加しました。

Result['ratio']= fuzz.ratio(Result['Account Name_x'],Result['Account Name_y'] )

python - 1 つの列が 100% を満たす場合の 2 つの列の fuzzywuzzy 比率は、最良の列と一致します

2 に答える 2

Related

Reference