2

2 つのデータ フレームがあります。

1つは次のように構成されています。

id,value

2番目のもの:

id, neighbor_1, neighbor_2, neighbor_3, neighbor_4, neighbor_5, ...

ここで、対応する列のそれぞれについてデータ フレーム内のそれぞれを検索し、id のすべての近隣を計算して、id近隣の合計合計値が最大である id を見つけたいと考えています。neighborhoodneighborsumvaluetotal sum

import pandas as pd
from h3 import h3
k=1

df = pd.DataFrame({'x': {0: 16,
  1: 17,
  2: 18,
  3: 19,
  4: 20},
 'y': {0: 48,
  1: 49,
  2: 50,
  3: 51,
  4: 52},
 'value': {0: 2.0, 1: 4.0, 2: 100.0, 3: 40.0, 4: 500.0},
 'id': {0: '891e15b706bffff',
  1: '891e15b738fffff',
  2: '891e15b714fffff',
  3: '891e15b44c3ffff',
  4: '891e15b448bffff'}})

display(df)

df_neighbors = df[['id']]
df_neighbors.index = df_neighbors['id']
df_neighbors = df_neighbors['id'].apply(lambda x: pd.Series(list(h3.k_ring(x,k))))
display(df_neighbors)

pandas でこのような問題 (反復結合と集計) を計算する効果的な方法は何ですか?

素朴な解決策:

import pandas as pd
from h3 import h3
import numpy as np
k=2

df = pd.DataFrame({'x': {0: 16,
  1: 17,
  2: 18,
  3: 19,
  4: 20},
 'y': {0: 48,
  1: 49,
  2: 50,
  3: 51,
  4: 52},
 'value': {0: 2.0, 1: 4.0, 2: 100.0, 3: 40.0, 4: 500.0},
 'id': {0: '891e15b706bffff',
  1: '891e15b738fffff',
  2: '891e15b714fffff',
  3: '891e15b44c3ffff',
  4: '891e15b448bffff'}})

display(df)

df_neighbors = df[['id']]
df_neighbors.index = df_neighbors['id']
df_neighbors = df_neighbors['id'].apply(lambda x: pd.Series(list(h3.k_ring(x,k))))
display(df_neighbors)

joined = df.merge(df_neighbors.reset_index(), left_on='id', right_on='id', how='left')#.drop(['id_neighbors'], axis=1)
# display(joined)

for c in joined[df_neighbors.columns].columns:
    joined[f'sum_of_{c}'] = joined.groupby([c]).value.transform(pd.Series.sum)

xx = [f'sum_of_{c}' for c in joined[df_neighbors.columns].columns]
joined['total_value_sum'] = joined[xx].sum(axis=1)
display(joined)

maximal_neighborhood = joined[df_neighbors.columns].iloc[joined.total_value_sum.argmax()]
display(maximal_neighborhood)

max_neighborhood_raw_elements = df[df['id'].isin(maximal_neighborhood)]
display(max_neighborhood_raw_elements)

avg_y_lat = np.average(max_neighborhood_raw_elements.y, weights=max_neighborhood_raw_elements.value)
avg_x_long = np.average(max_neighborhood_raw_elements.x, weights=max_neighborhood_raw_elements.value)

print(f'(x,y): ({avg_x_long},{avg_y_lat})')
4

1 に答える 1