numpy - 2 つの配列を同期するためのベクトル化された方法

Question

私は 2 つの Pandas TimeSeries: x、およびを持っていますy。これを「現在」同期したいと考えています。x最新の（インデックスによる）要素のすべての要素を、yそれより前にある（インデックス値による）検索したいと思います。たとえば、これを計算したいと思いますnew_x:

x       new_x
----    -----
13:01   13:00  
14:02   14:00

y
----
13:00
13:01
13:30
14:00

Python ループではなく、ベクトル化されたソリューションを探しています。時間値は Numpy に基づいていますdatetime64。配列のy長さは数百万のオーダーであるため、O(n^2) ソリューションはおそらく実用的ではありません。

score 2 · Accepted Answer

一部のサークルでは、この操作は「asof」結合として知られています。これが実装です：

def diffCols(df1, df2):
    """ Find columns in df1 not present in df2
    Return df1.columns  - df2.columns maintaining the order which the resulting
    columns appears in df1.

    Parameters:
    ----------
    df1 : pandas dataframe object
    df2 : pandas dataframe objct
    Pandas already offers df1.columns - df2.columns, but unfortunately
    the original order of the resulting columns is not maintained.
    """
    return [i for i in df1.columns if i not in df2.columns]


def aj(df1, df2, overwriteColumns=True, inplace=False):
    """ KDB+ like asof join.
    Finds prevailing values of df2 asof df1's index. The resulting dataframe
    will have same number of rows as df1.

    Parameters
    ----------
    df1 : Pandas dataframe
    df2 : Pandas dataframe
    overwriteColumns : boolean, default True
         The columns of df2 will overwrite the columns of df1 if they have the same
         name unless overwriteColumns is set to False. In that case, this function
         will only join columns of df2 which are not present in df1.
    inplace : boolean, default False.
        If True, adds columns of df2 to df1. Otherwise, create a new dataframe with
        columns of both df1 and df2.

    *Assumes both df1 and df2 have datetime64 index. """
    joiner = lambda x : x.asof(df1.index)
    if not overwriteColumns:
        # Get columns of df2 not present in df1
        cols = diffCols(df2, df1)
        if len(cols) > 0:
            df2 = df2.ix[:,cols]
    result = df2.apply(joiner)
    if inplace:
        for i in result.columns:
            df1[i] = result[i]
        return df1
    else:
        return result

内部的に、これはを使用しpandas.Series.asof()ます。

score 1 · Accepted Answer

挿入する場所Series.searchsorted()のインデックスを返すために使用するのはどうですか。次に、その値から 1 を引き、それを index に使用できます。yxy

In [1]: x
Out[1]:
0    1301
1    1402

In [2]: y
Out[2]:
0    1300
1    1301
2    1330
3    1400

In [3]: y[y.searchsorted(x)-1]
Out[3]:
0    1300
3    1400

注: 上記の例では int64 シリーズを使用しています

numpy - 2 つの配列を同期するためのベクトル化された方法

2 に答える 2

Related

Reference