r - Rの移動ウィンドウを使用して、データフレーム内のデータのサブセットに関数を適用する

Question

移動ウィンドウを使用して、データフレーム全体の空間ポイントのサブセットに関数を適用するスクリプトを作成したいと考えています。

緯度位置の列と経度位置の列を含むデータマトリックスが与えられた場合、データセット全体の連続する 5 つの場所ごとに正弦波の測定値を取得したいと考えています (つまり、最初から最後まで 5 つの場所のすべてのセットに関数を適用します)。）。屈曲性は、一連のポイントに沿って移動した実際の距離と、始点と終点の間を移動した直線距離の比率です。

サンプルデータ:

df <- structure(list(IndexNo = 1:13, Latitude = c(52.363205, 52.640715, 
52.940366, 53.267749, 53.512608, 53.53215, 53.536443, 53.553523, 
53.546862, 53.55095, 53.571766, 53.587558, 53.592084), Longitude = c(3.433247, 
3.305727, 3.103194, 2.973257, 2.966621, 3.013587, 3.002674, 3.004011, 
2.98778, 2.995589, 3.004867, 3.003511, 2.999092)), .Names = c("IndexNo", "Latitude", "Longitude"), class = "data.frame", row.names=c(NA,-13L))

望ましい出力:

IndexNo       Latitude  Longitude   Sinuosity
1             52.36321  3.433247    NA
2             52.64072  3.305727    1.0085
3             52.94037  3.103194    1.0085
4             53.26775  2.973257    1.0085
5             53.51261  2.966621    1.0085
6             53.53215  3.013587    1.9392
7             53.53644  3.002674    1.9392
8             53.55352  3.004011    1.9392
9             53.54686  2.987780    1.9392
10            53.55095  2.995589    1.0669
11            53.57177  3.004867    1.0669
12            53.58756  3.003511    1.0669
13            53.59208  2.999092    1.0669

最初の試行 (5 つの場所の単一セクションの正弦波を計算するためのコードで):

# To create a subset of the first 5 locations in the data frame
subset<- bird[1:5, c("Latitude", "Longitude","IndexNo")]
library(trip)

# To calculate the straight-line distance between the beginning and end point of a 5-point sequence
straightd<- trackDistance(subset[1,2], subset[1,1], subset[5,2], subset[5,1], longlat=TRUE)

# To calculate the distance between each pair of consecutive points (for a 5-point sequence)
d1<- trackDistance(subset[1,2], subset[1,1], subset[2,2], subset[2,1], longlat=TRUE)
d2<- trackDistance(subset[2,2], subset[2,1], subset[3,2], subset[3,1], longlat=TRUE)
d3<- trackDistance(subset[3,2], subset[3,1], subset[4,2], subset[4,1], longlat=TRUE)
d4<- trackDistance(subset[4,2], subset[4,1], subset[5,2], subset[5,1], longlat=TRUE)
# To return the actual distance between the beginning and end point of a 5-point sequence
actd<- sum(d1,d2,d3,d4)

# Function to calcualte the sinuosity (ratio between the actual distance and the straight-line distance)
sinuosity <- function (x, y) {
  x/y
}
new <- sinuosity(actd, straightd)

# To add a sinuosity column to the 5 rows of locations on which the sinuosity index was measured
subset$Sinuosity <- rep(new, nrow(subset))

score 2 · Accepted Answer

次の行に沿ってループを設定できます-

for(i in seq(1,(dim(df)[1]), by = 4)
{
subset<- bird[i:(i+4), c("Latitude", "Longitude","IndexNo")]
straightd<- trackDistance(subset[i,(i+1)], subset[i,i], subset[(i+4),(i+1)], subset[(i+4),i], longlat=TRUE)
# etc.
}

投稿したコードと比較すると、何が起こっているかがわかります。これは、このロジックを関数の残りの部分に外挿できるようにするためのガイドにすぎません。

score 1 · Accepted Answer

ご覧のとおり、多くの方法があります。@Codoremifa が示したような一連のループ、または @RInatM が説明したような便利なアドオンパッケージを使用して、これを行うことができると思いますdata.table。sapply関数を使用してデータをループする例を作成しました。

まず、コードに基づいて、データセット全体のポイントの各ペア間の距離を順番に計算しました。以前withは、ドル記号表記や抽出関数を使用する必要はありません[でした。ベクトル出力pairdistがデータセットの行数よりも 1 単位短いことがわかります。

pairdist = sapply(2:nrow(bird), function(x) with(bird, trackDistance(Longitude[x-1], Latitude[x-1], 
                                 Longitude[x], Latitude[x], longlat=TRUE) ))

次に、同様の手順を実行して、距離の 4 つのペアの各グループを合計し、合計距離の尺度を取得します。これには、サンプルデータセットの値が 3 つしかないことがわかります。

totdist= sapply(seq(1,length(pairdist)-3, by = 4), function(x) sum(pairdist[x:(x+3)]))

次に、1 点目と 5 点目、5 点目と 9 点目などの間の直線距離を計算します。

straight = sapply(seq(1, nrow(bird)-4, by = 4), function(x) with(bird,trackDistance(Longitude[x],
                                                                    Latitude[x], 
                                 Longitude[x+4], Latitude[x+4], longlat=TRUE) ))

最後に、比率を計算し、最初のポイントの NA とその後の 4 つのポイントのすべてのセットに同じ値を使用して元のデータセットに追加します。これをさまざまな長さのデータセットに一般化できるようにするために、必要に応じて最後に NA をパディングします。そのためのコードはわかりにくいかもしれませんが、ポイントをグループ化する方法に基づいて必要なパディングの量を計算するための単なる数学でした.

bird$Sinuosity = c(NA, rep(totdist/straight, each = 4), 
                rep(NA, length(pairdist)-4*floor(length(pairdist)/4)))

score 1 · Accepted Answer

適切なキャプションを選択し、興味深い問題を抱えていましたが、詳細を詰め込みすぎました (質問を他の人にとって役立つものにします)。私が理解しているように、あなたはする必要があります

テーブル行間でペアワイズ操作を実行します（あなたの場合 - 距離）
いくつかの条件（隣接ポイント）を使用して、この操作の結果を折りたたむ
多くの要素に対してそれを繰り返します（各ポイントに対して）

私はdata.table パッケージがとても楽しいので、ここに私の（少し一般的で最適ではない）ソリューションがあります

0) データテーブルをそれ自体とマージし、各ペア間の距離を計算します

library(data.table)
dt <- as.data.table(df)
setkey(dt[, k := 1], k)
dt2 <- merge(dt, dt, allow.cartesian = T]

k は、完全な crossjoin を取得するための人工的なインデックスです (この場合は過剰ですが単純です)。

1) 距離を計算する

dt2[IndexNo != IndexNo.1
   , dist := trackDistance(Longitude, Latitude, Longitude.1, Latitude.1
   , longlat = T) ]

2) 条件を適用します (隣接点間の距離を要約します)

sinuosity <- function(start, end) {
  long.dist <- dt2[IndexNo %in% c(start:end) & IndexNo.1 %in% c(start:end) 
                                             & IndexNo == IndexNo.1 - 1
                  , sum(dist, na.rm = T) ]
  short.dist <- dt2[IndexNo == start & IndexNo.1 == end, dist]
  res <- long.dist/short.dist
  return(res)
}

3) ポイントごとに繰り返す

dt2[IndexNo > IndexNo.1 - 5 & IndexNo <= IndexNo.1
    ,  list(Latitude, Longitude, sinuosity(IndexNo, IndexNo + 4))
    , by = c("IndexNo", "IndexNo.1")]

あなたが欲しかったものを（私が推測する）与える

    IndexNo IndexNo.1 Latitude Longitude       V3
 1:       1         1 52.36321  3.433247 1.008512
 2:       1         2 52.36321  3.433247 1.008512
 3:       1         3 52.36321  3.433247 1.008512
 4:       1         4 52.36321  3.433247 1.008512
 5:       1         5 52.36321  3.433247 1.008512
 6:       2         2 52.64072  3.305727 1.033964
 7:       2         3 52.64072  3.305727 1.033964
 8:       2         4 52.64072  3.305727 1.033964
 ......

に慣れるために時間を費やすことをお勧めしdata.tableます。後で時間を大幅に節約できます。また、特定のケースでは、大きなテーブル (> 1000 行) がある場合は、完全なクロス結合を避け、dt を IndexNo == IndexNo - 1 でそれ自体とマージする必要があります。

r - Rの移動ウィンドウを使用して、データフレーム内のデータのサブセットに関数を適用する

3 に答える 3

Related

Reference