r - data.frame の列の内容を変更するにはどうすればよいですか

Question

世界開発指標 (WDI) のデータを使用しており、このデータを他のデータとマージしたいと考えています。私の問題は、2 つのデータセットの国名のスペルが異なることです。国変数を変更するにはどうすればよいですか?

library('WDI')
df <- WDI(country="all", indicator= c("NY.GDP.MKTP.CD", "EN.ATM.CO2E.KD.GD", 'SE.TER.ENRR'), start=1998, end=2011, extra=FALSE)

head(df)
      country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99  ArabWorld    1A 1998   575369488074          1.365953          NA
100 ArabWorld    1A 1999   627550544566          1.355583    19.54259
101 ArabWorld    1A 2000   723111925659          1.476619          NA
102 ArabWorld    1A 2001   703688747656          1.412750          NA
103 ArabWorld    1A 2002   713021728054          1.413733          NA
104 ArabWorld    1A 2003   803017236111          1.469197          NA

アラブワールドをアラブワールドに変更するにはどうすればよいですか?

変更する必要がある名前がたくさんあるので、row.numbers を使用してこれを行うと、十分な柔軟性が得られません。replaceStataの機能に似たものが欲しいです。

score 5 · Accepted Answer

これは、性格または要因に対して機能します。

df$country <- sub("ArabWorld", "Arab World", df$country)

これは同等です：

> df[,1] <- sub("ArabWorld", "Arab World", df[,1] )
> head(df)
       country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD
99  Arab World    1A 1998   575369488074          1.365953
100 Arab World    1A 1999   627550544566          1.355583
101 Arab World    1A 2000   723111925659          1.476619
102 Arab World    1A 2001   703688747656          1.412750

必要な変更を加えたデータフレームを作成すると、ループして変更できます。に正しく渡されるように、その列に括弧を入力する方法を示すように、これを更新したことに注意してくださいsub。

name.cng <- data.frame(orig = c("AntiguaandBarbuda", "AmericanSamoa", 
                                    "EastAsia&Pacific\\(developingonly\\)",
                                    "Europe&CentralAsia\\(developingonly\\)", 
                                    "UnitedArabEmirates"), 
                           spaced=c("Antigua and Barbuda", "American Samoa",
                                    "East Asia & Pacific (developing only)",
                                     "Europe&CentralAsia (developing only)", 
                                      "United Arab Emirates") )
for (i in 1:NROW(name.cng)){ 
      df$country <- sub(name.cng[i,1], name.cng[i,2], df$country) }

score 1 · Accepted Answer

特に変更する名前が多数ある場合、最も簡単な方法は、対応テーブルをに配置し、コマンドdata.frameを使用してデータと結合することです。mergeたとえば、コリアの名前を変更したい場合:

# Correspondance table
countries <- data.frame(
  iso2c = c("KR", "KP"),
  country = c("South Korea", "North Korea")
)

# Join the data.frames
d <- merge( df, countries, by="iso2c", all.x=TRUE )
# Compute the new country name
d$country <- ifelse(is.na(d$country.y), as.character(d$country.x), as.character(d$country.y))
# Remove the columns we no longer need
d <- d[, setdiff(names(d), c("country.x", "country.y"))]

# Check that the result looks correct
head(d)
head(d[ d$iso2c %in% c("KR", "KP"), ])

ただし、2 つのデータセットを国名ではなく、より標準的な国 ISO コードで結合する方が安全な場合があります。

score 0 · Accepted Answer

サブセット化の使用:

df[df[, "country"] == "ArabWorld", "country"] <- "Arab World"

head(df)
   country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99  Arab World    1A 1998   575369488074          1.365953          NA
100 Arab World    1A 1999   627550544566          1.355583    19.54259
101 Arab World    1A 2000   723111925659          1.476619          NA
102 Arab World    1A 2001   703688747656          1.412750          NA
103 Arab World    1A 2002   713021728054          1.413733          NA
104 Arab World    1A 2003   803017236111          1.469197          NA

r - data.frame の列の内容を変更するにはどうすればよいですか

3 に答える 3

Related

Reference