2

各ステーションの緯度と経度が同じ大規模なデータ セットがあります。データ セットでは、一部の行で緯度と経度が欠落しており、代わりに「不明」と表示されています。そのデータが欠落していない他のステーションからの緯度経度で未知数を埋める必要があります。

この例では、行 5 に緯度と経度に 3 と 8 を挿入します。

> station <- c("a","b","c","c","c")
> lat <- c("1","2","3","3","unknown")
> lon <- c("6","7","8","8","unknown")
> data.frame(station,lat,lon)
  station     lat     lon
1       a       1       6
2       b       2       7
3       c       3       8
4       c       3       8
5       c unknown unknown

私のデータ セットには 100 万行ありますが、分析が開始される前に 1 回しか実行されないため、完了するのに数分かかる場合でも問題ありません。本当に必要でない限り、別のパッケージをインストールしたくありません。

4

3 に答える 3

3

このようなもの、おそらく -

df$station <- as.character(df$station)

unknownstations <- unique(subset(df,df$lat == "unknown","station"))
unknownstationscoords <- unique(subset(df,station %in% unknownstations$station & lat != "unknown"))

for( i in unknownstations$station)
{
df[df$station == i,"lat"] <- subset(unknownstationscoords,station %in% i,"lat")
df[df$station == i,"lon"] <- subset(unknownstationscoords,station %in% i,"lon")
}
于 2013-11-03T07:37:27.020 に答える
2

私はna.locfzooパッケージから使用します。まず、次のように変更unknownNAてから適用しna.locfます。

> library(zoo)
> df[ df=="unknown"] <- NA
> df2 <- do.call(rbind, lapply(split(df, df$station), na.locf))
> df2[, -1]  <- sapply(df2[, -1], as.numeric)  # numeric variables should be numeric 
> df2
    station lat lon
a         a   1   6
b         b   2   7
c.3       c   3   8
c.4       c   3   8
c.5       c   3   8

行名を唱えたい場合はrownames、名前を使用して割り当てます。

> rownames(df2) <- 1:nrow(df2)
> df2
  station lat lon
1       a   1   6
2       b   2   7
3       c   3   8
4       c   3   8
5       c   3   8
于 2013-11-03T10:27:41.880 に答える
0
y=function(station,lat,lon){

  temp=cbind(station,lat,lon)
  lat_ind=lat!="unknown"
  lon_ind=lon!="unknown"


  if(all(lat_ind)==0){
    hash=unique(temp[lat_ind,])
    ind2=hash[,1]==station[!lat_ind]
    temp[!lat_ind,]=temp[ind2,]

    return(temp) 

  }else if(all(lon_ind)==0){
    hash=unique(temp[lon_ind,])
    ind2=hash[,1]==station[!lon_ind]
    temp[!lon_ind,]=temp[ind2,]

    return(temp)


  }else {

    return(temp)
  }


}




##case1

station <- c("a","b","c","c","c")
lat <- c("1","2","3","3","unknown")
lon <- c("6","7","8","8","unknown")

y(station,lat,lon)
# station lat lon
# [1,] "a"     "1" "6"
# [2,] "b"     "2" "7"
# [3,] "c"     "3" "8"
# [4,] "c"     "3" "8"
# [5,] "c"     "3" "8"


##case2

station <- c("a","b","c","c","c")
lat <- c("1","2","3","3","3")
lon <- c("6","7","8","8","unknown")
y(station,lat,lon)
# station lat lon
# [1,] "a"     "1" "6"
# [2,] "b"     "2" "7"
# [3,] "c"     "3" "8"
# [4,] "c"     "3" "8"
# [5,] "c"     "3" "8"


##case3

station <- c("a","b","c","c","c")
lat <- c("1","2","3","3","unknown")
lon <- c("6","7","8","8","8")
y(station,lat,lon)
# station lat lon
# [1,] "a"     "1" "6"
# [2,] "b"     "2" "7"
# [3,] "c"     "3" "8"
# [4,] "c"     "3" "8"
# [5,] "c"     "3" "8"
于 2013-11-03T07:45:57.677 に答える