r - データフレームでキャストが機能しない

Question

大きなデータフレームを扱っており、ピボットテーブルタイプの関数を実行したいと考えています。reshape2 パッケージを使用しようとしましたが、何らかの理由で溶融データフレームが再形成されません。

私はこのようなフレームを取りたいです:

County      Industry   Type    Variable   Value 
LA          Plumbing     Tax       Rev       1000 
LA          Plumbing     No tax    Emp       100 
LA          Plumbing     Tax       Pay       500

そして、それを（Typeで集計）にします：

        Plumbing       Tailors
County  Rev   Emp  Pay Rev   Emp  Pay
LA      1000  100  500 1000  50   65

次のコードを実行しています。

dcast(m.data, county ~ variable + industry)

しかし、データフレームをまったく変更していません。私はどこを台無しにしていますか？

編集：

この問題についてもう少し情報を含めています。溶けたデータフレームに到達する前に、必要な場所にデータを移動するために非常に悪いクリーンアップを行っています。以下のコードは理想的ではなく、実際に修正する必要があることはわかっていますが、基本的には複数の CSV ファイル (同じ列名を持つ) をアップロードし、それらを結合し、いくつかの値を再コーディングし、いくつかの列を削除し、データのサブセットを選択し、それを溶融フレームに入れ、dcast を使用して再形成しようとしています。特定の値を再コーディングするコードを削除しましたが、その部分は問題なく動作しているようです。コードの一部を次に示します。

data1 <- read.table("census_data_r_1.csv",header=TRUE,sep=",",stringsAsFactors=FALSE) 
data2 <- read.table("census_data_r_2.csv",header=TRUE,sep=",", stringsAsFactors=FALSE)
fulldata <- rbind(data1,data2)
delete <- c("GEO.id","GEO.id2","NAICS.id","OPTAX.id","YEAR.id")
fulldata <- fulldata[, !(names(fulldata) %in% delete)]
colnames(fulldata) <- c("county","industry","tax_type","firms","revenue","payroll","num_employees","non_emp_firms","non_emp_firms_rev")
fulldata[c("firms","revenue","payroll","num_employees","non_emp_firms","non_emp_firms_rev")] <- recode.variables(fulldata[c("firms","revenue","payroll","num_employees","non_emp_firms","non_emp_firms_rev")],"'N' -> 'Nothing';'D' -> 'Withheld';'b' -> 20;'c' -> 100;'e' -> 250;'a' -> 10;'g' -> 1000;'f' -> 500;'Q' -> 'No Rev Collected';'h' -> 2500;'i' -> 5000;'j' -> 10000;'l' -> 50000;'k' -> 25000;'S' -> 'Bad Data';'m' -> 100000;")
fulldata.sub <- subset(fulldata, subset = (tax_type %in% c('Total', 'All establishments')) & (!(revenue %in% c('Nothing', 'Withheld','No Rev Collected'))) & (!(non_emp_firms %in% c('Nothing','Withheld'))))
m.data <- melt(fulldata.sub, id.vars = 1:3)
dcast(m.data, county ~ variable, sum)

今、私は次のエラーが発生しています:

構造のエラー (順序付け、dim = ns) : ディム [製品 18300] がオブジェクト [0] の長さと一致しません

からの出力dput(head(fulldata.sub,40)):

structure(list(county = c("Autauga County, Alabama", "Autauga County, Alabama", 
"Autauga County, Alabama", "Autauga County, Alabama", "Autauga County, Alabama", 
"Autauga County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Baldwin County, Alabama", "Baldwin County, Alabama", "Baldwin County, Alabama", 
"Barbour County, Alabama", "Barbour County, Alabama"), industry = c("Rental and leasing services", 
"Professional, scientific, and technical services", "Professional, scientific, and technical services", 
"Accounting, tax preparation, bookkeeping, and payroll services", 
"Accounting, tax preparation, bookkeeping, and payroll services", 
"Architectural, engineering, and related services", "Real estate and rental and leasing", 
"Real estate", "Lessors of real estate", "Offices of real estate agents and brokers", 
"Offices of real estate agents and brokers", "Activities related to real estate", 
"Real estate property managers", "Offices of real estate appraisers", 
"Consumer goods rental", "Accounting, tax preparation, bookkeeping, and payroll services", 
"Accounting, tax preparation, bookkeeping, and payroll services", 
"Offices of certified public accountants", "Tax preparation services", 
"Architectural, engineering, and related services", "Architectural services", 
"Engineering services", "Specialized design services", "Computer systems design and related services", 
"Computer systems design and related services", "Management, scientific, and technical consulting services", 
"Advertising, public relations, and related services", "Veterinary services", 
"Administrative and support and waste management and remediation services", 
"Administrative and support services", "Employment services", 
"Business support services", "Investigation and security services", 
"Services to buildings and dwellings", "Exterminating and pest control services", 
"Janitorial services", "Landscaping services", "Waste management and remediation services", 
"Lessors of real estate", "Legal services"), tax_type = c("Total", 
"All establishments", "All establishments", "All establishments", 
"All establishments", "All establishments", "Total", "Total", 
"Total", "Total", "Total", "Total", "Total", "Total", "Total", 
"All establishments", "All establishments", "All establishments", 
"All establishments", "All establishments", "All establishments", 
"All establishments", "All establishments", "All establishments", 
"All establishments", "All establishments", "All establishments", 
"All establishments", "Total", "Total", "Total", "Total", "Total", 
"Total", "Total", "Total", "Total", "Total", "Total", "All establishments"
), firms = c("10", "61", "61", "14", "14", "10", "358", "312", 
"77", "161", "161", "74", "52", "16", "28", "79", "79", "36", 
"20", "77", "13", "37", "19", "27", "27", "63", "17", "26", "250", 
"238", "26", "14", "17", "157", "16", "29", "96", "12", "11", 
"19"), revenue = c("8433", "42285", "42285", "8581", "8581", 
"5571", "266692", "201777", "59742", "104768", "104768", "37267", 
"32141", "4615", "20691", "33203", "33203", "19805", "3160", 
"39318", "10494", "21167", "6833", "12391", "12391", "21496", 
"11097", "18388", "163661", "145935", "30746", "4048", "13849", 
"77076", "9934", "15832", "47411", "17726", "1585", "6439"), 
    payroll = c("1641", "15473", "15473", "3506", "3506", "2229", 
    "59476", "47937", "4053", "30180", "30180", "13704", "11902", 
    "1674", "4854", "17298", "17298", "9718", "1122", "15263", 
    "3688", "8649", "908", "4429", "4429", "7335", "2634", "6073", 
    "67526", "62354", "19529", "1002", "6824", "27688", "3181", 
    "8632", "14434", "5172", "265", "1431"), num_employees = c("56", 
    "386", "386", "127", "127", "41", "1987", "1643", "160", 
    "1030", "1030", "453", "406", "42", "217", "491", "491", 
    "217", "138", "356", "69", "204", "45", "111", "111", "165", 
    "101", "282", "2807", "2686", "806", "53", "399", "1241", 
    "110", "399", "675", "121", "23", "36"), non_emp_firms = c("8", 
    "330", "330", "49", "49", "35", "2358", "2289", "648", "840", 
    "840", "801", "186", "32", "19", "208", "208", "20", "40", 
    "203", "21", "74", "107", "99", "99", "356", "82", "10", 
    "1452", "1435", "25", "153", "61", "982", "12", "526", "350", 
    "17", "40", "16"), non_emp_firms_rev = c("882", "10111", 
    "10111", "493", "493", "1280", "164778", "160968", "55888", 
    "33321", "33321", "71759", "25870", "1504", "692", "2961", 
    "2961", "533", "466", "9220", "889", "5387", "4448", "3235", 
    "3235", "14395", "10337", "602", "35998", "33953", "708", 
    "3991", "806", "18726", "329", "6246", "9974", "2045", "1978", 
    "488")), .Names = c("county", "industry", "tax_type", "firms", 
"revenue", "payroll", "num_employees", "non_emp_firms", "non_emp_firms_rev"
), row.names = c(6L, 7L, 9L, 19L, 21L, 25L, 54L, 55L, 56L, 65L, 
66L, 70L, 71L, 74L, 77L, 99L, 101L, 103L, 105L, 109L, 111L, 115L, 
119L, 125L, 127L, 131L, 139L, 143L, 147L, 148L, 152L, 155L, 159L, 
162L, 163L, 165L, 167L, 169L, 174L, 180L), class = "data.frame")

編集

>str(fulldata.sub) および str(m.data) からの出力を含むもう 1 つの編集

data.frame': 130098 obs. $ county :
Factor w/ 3237 level "Abbeville County, South Carolina",..: 121 121 121 121 121 121 121 121 131 131 ...
$ industry : Factor w/ 369 level "Accounting, tax prepare,簿記、給与サービス",..: 283 239 239 1 1 33 358 358 274 273 ...
$ tax_type : 4 レベルの係数 "すべての事業所",..: 4 1 1 1 1 1 1 1 4 4 . ..
$ 企業: num 10 61 61 14 14 10 4 4 358 312 ...
$ 収益: num 31466 21347 21347 31717 31717 ...
$ 給与: num 5521 4863 4863 13729 13729 ...
$ num_employees: 4 num_6 256 571 571 ...
$ non_emp_firms : num 3122 1887 1887 2486 2486 ...
$ non_emp_firms_rev: num 17550 96 96 12669 12669 ...
'data.frame': 780588 obs. $ county :
Factor w/ 3237 level "Abbeville County, South Carolina",..: 121 121 121 121 121 121 121 121 131 131 ...
$ industry: Factor w/ 369 level "Accounting, tax prepare,簿記、給与サービス",..: 283 239 239 1 1 33 358 358 274 273 ...
$ tax_type: 4 レベルの係数 "すべての事業所",..: 4 1 1 1 1 1 1 1 4 4 . ..
$ 変数: 6 レベルの因子 "会社","収益",..: 1 1 1 1 1 1 1 1 1 ...
$ 値: num 10 61 61 14 14 10 4 4 358 312 .. .

score 2 · Accepted Answer

見るstr(fulldata.sub)と、4 列目から 9 列目の数字が文字として扱われていることがわかります。そのためmelt()、文字列を因数に変換すると。次に、タイプ factor の変数で sum() 評価を実行しようとしていますが、これは計算されません。

次のように数値に変換できます。

...    
fulldata.sub[4:9] <- sapply(fulldata.sub[4:9],as.numeric)
# Then run your melt/cast sequence
m.data <- melt(fulldata.sub, id.vars = 1:3)
dcast(m.data, county ~ variable, sum)

または、データのインポートを修正します。これは、""、"-"、"、"、"n/a"、"na"、" " などの文字列があるために発生する可能性があります。を使用すると、引数 read.csvを設定することでこれを修正できます。na.strings=c("erroneous_string","other_erroneous_string",...)

r - データフレームでキャストが機能しない

1 に答える 1

Related

Reference