regex - 文字列のベクトルをタイトルケースに変換する方法

Question

小文字の文字列のベクトルがあります。タイトルケースに変更したいと思います。つまり、すべての単語の最初の文字が大文字になります。私は二重ループでそれを行うことができましたが、おそらくgsub正規表現を使用したワンライナーなど、より効率的でエレガントな方法があることを願っています。

これは、機能する二重ループとともに、いくつかのサンプルデータです。

strings = c("first phrase", "another phrase to convert",
            "and here's another one", "last-one")

# For each string in the strings vector, find the position of each 
#  instance of a space followed by a letter
matches = gregexpr("\\b[a-z]+", strings) 

# For each string in the strings vector, convert the first letter 
#  of each word to upper case
for (i in 1:length(strings)) {

  # Extract the position of each regex match for the string in row i
  #  of the strings vector.
  match.positions = matches[[i]][1:length(matches[[i]])] 

  # Convert the letter in each match position to upper case
  for (j in 1:length(match.positions)) {

    substr(strings[i], match.positions[j], match.positions[j]) = 
      toupper(substr(strings[i], match.positions[j], match.positions[j]))
  }
}

これは機能しましたが、非常に複雑なようです。より単純なアプローチで失敗した実験の後でのみ、私はそれに頼りました。出力とともに、私が試したことのいくつかを次に示します。

# Google search suggested \\U might work, but evidently not in R
gsub("(\\b[a-z]+)", "\\U\\1" ,strings)
[1] "Ufirst Uphrase"                "Uanother Uphrase Uto Uconvert"
[3] "Uand Uhere'Us Uanother Uone"   "Ulast-Uone"                   

# I tried this on a lark, but to no avail
gsub("(\\b[a-z]+)", toupper("\\1"), strings)
[1] "first phrase"              "another phrase to convert"
[3] "and here's another one"    "last-one"

への呼び出しによって示されるように、正規表現は各文字列の正しい位置をキャプチャしますgregexprが、置換文字列は明らかに期待どおりに機能していません。

まだわからない場合は、私は正規表現に比較的慣れていないため、置換を正しく機能させる方法について助けていただければ幸いです。また、これらの文字の大文字と小文字を変更したくないため、アポストロフィの後に文字をキャプチャしないように正規表現を構造化する方法も学びたいと思います。

score 21 · Accepted Answer

主な問題は、あなたが行方不明になっていることですperl=TRUE(そして、正規表現が少し間違っていますが、それは最初の問題を修正しようとして試行錯誤した結果かもしれません)。

コードがアルファベットの最後の文字ではない変な (申し訳ありませんが、エストニア人) ロケールで実行される場合は、[:lower:]代わりに使用する方が少し安全です...[a-z]z

re_from <- "\\b([[:lower:]])([[:lower:]]+)"
strings <- c("first phrase", "another phrase to convert",
             "and here's another one", "last-one")
gsub(re_from, "\\U\\1\\L\\2" ,strings, perl=TRUE)
## [1] "First Phrase"              "Another Phrase To Convert"
## [3] "And Here's Another One"    "Last-One"

従うべき規則に応じて、(小文字で始める\\E) よりも (大文字で始めるのをやめる)を使用することを好むかもしれません。\\L

string2 <- "using AIC for model selection"
gsub(re_from, "\\U\\1\\E\\2" ,string2, perl=TRUE)
## [1] "Using AIC For Model Selection"

score 6 · Accepted Answer

ここですでに優れた答え。これは、レポートパッケージの便利な関数を使用したものです。

strings <- c("first phrase", "another phrase to convert",
    "and here's another one", "last-one")

CA(strings)

## > CA(strings)
## [1] "First Phrase"              "Another Phrase To Convert"
## [3] "And Here's Another One"    "Last-one"

私の目的のために大文字にするのは意味がなかったので、大文字にはしません。

更新真のタイトルケースを実行する（タイトルケース）機能を持つqdapRegexパッケージを管理します：TC

TC(strings)

## [[1]]
## [1] "First Phrase"
## 
## [[2]]
## [1] "Another Phrase to Convert"
## 
## [[3]]
## [1] "And Here's Another One"
## 
## [[4]]
## [1] "Last-One"

score 4 · Accepted Answer

楽しみのためにもう 1 つ追加します。

topropper(strings)
[1] "First Phrase"              "Another Phrase To Convert" "And Here's Another One"   
[4] "Last-one"  

topropper <- function(x) {
  # Makes Proper Capitalization out of a string or collection of strings. 
  sapply(x, function(strn)
   { s <- strsplit(strn, "\\s")[[1]]
       paste0(toupper(substring(s, 1,1)), 
             tolower(substring(s, 2)),
             collapse=" ")}, USE.NAMES=FALSE)
}

score 1 · Accepted Answer

stringrパッケージに基づく別のワンライナーを次に示します。

str_to_title(strings, locale = "en")

strings文字列のベクトルはどこにありますか。

ソース

score 0 · Accepted Answer

任意のケースを別のケースに変換する最良の方法はsnakecase、r でパッケージを使用することです。

パッケージをそのまま使う

library(snakecase)
strings = c("first phrase", "another phrase to convert",
        "and here's another one", "last-one")

to_title_case(strings)

## [1] "First Phrase"              "Another Phrase to Convert" 
## [3] "And Here s Another One"    "Last One"

コーディングを続けてください！

regex - 文字列のベクトルをタイトルケースに変換する方法

6 に答える 6

Related

Reference