r - グループ化された値内のRテーブルから値を抽出する

Question

次の表は、1番目、2番目、名前でグループ化されています。

    myData <- structure(list(first = c(120L, 120L, 126L, 126L, 126L, 132L, 132L), second = c(1.33, 1.33, 0.36, 0.37, 0.34, 0.46, 0.53), 
    Name = structure(c(5L, 5L, 3L, 3L, 4L, 1L, 2L), .Label = c("Benzene", 
    "Ethene._trichloro-", "Heptene", "Methylamine", "Pentanone"
    ), class = "factor"), Area = c(699468L, 153744L, 32913L, 
    4948619L, 83528L, 536339L, 105598L), Sample = structure(c(3L, 
    2L, 3L, 3L, 3L, 1L, 1L), .Label = c("PO1:1", "PO2:1", "PO4:1"
    ), class = "factor")), .Names = c("first", "second", "Name", 
    "Area", "Sample"), class = "data.frame", row.names = c(NA, -7L))

各グループ内で、特定のサンプルに対応する領域を抽出したいと思います。いくつかのグループにはサンプルからの領域がないため、サンプルが検出されない場合は「NA」を返す必要があります。理想的には、最終出力は各サンプルの列である必要があります。

ifelse関数を試して、各サンプルに1つの列を作成しました。

PO1<-ifelse(myData$Sample=="PO1:1",myData$Area, "NA")

ただし、これはグループの分布を考慮していません。私はこれをやりたいのですが、グループ内です。各グループ内（sample = PO1：1の場合は1列目、2列目、およびName列の値が等しいグループ）、それ以外の場合はNA。

最初のグループの場合：

structure(list(first = c(120L, 120L), second = c(1.33, 1.33), 
Name = structure(c(1L, 1L), .Label = "Pentanone", class = "factor"), 
Area = c(699468L, 153744L), Sample = structure(c(2L, 1L), .Label = c("PO2:1", 
"PO4:1"), class = "factor")), .Names = c("first", "second", "Name", 
"Area", "Sample"), class = "data.frame", row.names = c(NA, -2L))

出力は次のようになります。

structure(list(PO1.1 = NA, PO2.1 = 153744L, PO3.1 = NA, PO4.1 = 699468L), .Names =c("PO1.1", "PO2.1", "PO3.1", "PO4.1"), class = "data.frame", row.names = c(NA, -1L))

なにか提案を？

score 1 · Accepted Answer

質問の例のように、私はそれSampleが要因であると仮定しています。そうでない場合は、そのようにすることを検討してください。

まず、列`Sample`をクリーンアップして正式な名前にします。そうしないと、エラーが発生する可能性があります

levels(myData$Sample)  <-  make.names(levels(myData$Sample))


## DEFINE THE CUTS##

# Adjust these as necessary
#--------------------------
  max.second <- 3  #  max & nin range of myData$second 
  min.second <- 0  #
  sprd <- 0.15     # with spread for each group
#--------------------------

# we will cut the myData$second according to intervals,   cut(myData$second, intervals)
intervals <- seq(min.second, max.second, sprd*2)

# Next, lets create a group column to split our  data frame by 
myData$group <- paste(myData$first, cut(myData$second, intervals), myData$Name, sep='-') 
groups <- split(myData, myData$group)

samples <- levels(myData$Sample)   ## I'm assuming not all samples are present in the example.  Manually adjusting with: samples <- sort(c(samples,  "PO3.1"))


# Apply over each group, then apply over each sample    
myOutput <- 
  t(sapply(groups, function(g) {

      #-------------------------------
      # NOTE: If it's possible that within a group there is more than one Area per Sample, then we have to somehow allow for thi. Hence the "paste(...)"
      res <- sapply(samples, function(s) paste0(g$Area[g$Sample==s], collapse=" - "))  # allowing for multiple values
      unlist(ifelse(res=="", NA, res))

      ## If there is (or should be) only one Area per Sample, then remove the two lines aboce and uncomment the two below:
      # res <- sapply(samples, function(s) g$Area[g$Sample==s])  # <~~ This line will work when only one value per sample
      # unlist(ifelse(res==0, NA, res))
      #-------------------------------

  }))

# Cleanup names
rownames(myOutput) <- paste("Group", 1:nrow(myOutput), sep="-")  ## or whichever proper group name

# remove dummy column 
myData$group <- NULL

結果

myOutput

        PO1.1    PO2.1    PO3.1 PO4.1            
Group-1 NA       "153744" NA    "699468"         
Group-2 NA       NA       NA    "32913 - 4948619"
Group-3 NA       NA       NA    "83528"          
Group-4 "536339" NA       NA    NA               
Group-5 "105598" NA       NA    NA

score 1 · Accepted Answer

RがPO2とPO4の間に4番目の要素レベルがあることを直感的に理解することは期待できません。

> reshape(inp, direction="wide", idvar=c('first','second','Name'), timevar="Sample")
  first second               Name Area.PO4:1 Area.PO2:1 Area.PO1:1
1   120    1.3          Pentanone     699468     153744         NA
3   126    0.4            Heptene      32913         NA         NA
4   126    0.4            Heptene    4948619         NA         NA
5   126    0.3        Methylamine      83528         NA         NA
6   132    0.5            Benzene         NA         NA     536339
7   132    0.5 Ethene._trichloro-         NA         NA     105598

r - グループ化された値内のRテーブルから値を抽出する

2 に答える 2

まず、列Sampleをクリーンアップして正式な名前にします。そうしないと、エラーが発生する可能性があります

結果

Related

Reference

まず、列`Sample`をクリーンアップして正式な名前にします。そうしないと、エラーが発生する可能性があります