r - 文字列を文字ごとに分析して、R で可能な単語数を計算する

Question

音節の組み合わせの文字列のリストを指定して、可能な単語の数を計算しています。音節の組み合わせリストは次のようになります。

syllable_combinations <- c("C", "CC", "CCCV-CCV", "CCCV-CCV-CV", "CCCV-CV-CCV", "CCCV-CCV-CCV-CV", "CCCV-CC-CV", "CCCV-CCV-C", "CCCV-CV", "CV-C-CCCV")

このリストに基づいて、与えられた音声規則に従って英語で可能な単語の数を計算したいと思います。これを行うには、音節の組み合わせリストの個々の項目を調べて、その音節の組み合わせで可能な単語の数を計算する必要があります。

特定の音節の組み合わせで可能な単語の数を生成するには、音節の組み合わせを調べて、環境に関連して各文字を順番に調べる必要があります。たとえば、最初の音節の組み合わせについては、次のことを行う必要があります。

この単語が (2 つまたは 3 つの子音ではなく) 単一の子音 C で始まることを識別します。
この最初の単一子音の後に母音 V が続くことを識別します。
単語が次の音節 (ハイフンで示される) に続くことを識別します。
この 2 番目の音節も 1 つの子音 C で始まることを確認します。
別の母音 V で終わります。

この情報は、これらの位置に表示される可能性のある音に関する情報と関連付ける必要があります。

number_of_vowels <- 20
number_of_initial_consonants_length_1 <- 22
number_of_initial_consonants_length_2 <- 47
number_of_final_consonants_length_1 <- 24

英語で「CVCV」音節構造を持つ可能な単語の数を計算するには、次のようにします。

number_of_CVCV_words <- number_of_initial_consonants_length_1*number_of_vowels*number_of_initial_consonants_length_1*number_of_vowels

number_of_CVCV_words
193600

これを行う方法に関するアドバイスはありますか？

これでもう少し進みましたが、いくつかの問題に遭遇しました。

まず、音節の組み合わせを個別の音節に分割します。

split_syllables <- c()

for(i in 1:length(syllable_combinations)){
strsplit(as.character(syllable_combinations[i]), split = "-") -> split_syllable
split_syllables <- append(split_syllables, split_syllable)
}

次に、各音節に一致する関数 (一意の音節の数には限りがあるため、これは実行可能です) (counter1 変数は、特定の音節構造が与えられた英語で可能な音の組み合わせの数を示します)。

detect_syllables <- function(syllable){
if(syllable == "C") {
counter1 <- 25
} else if(syllable == "CC") {
counter1 <- 528
} else if(syllable == "CCCV") {
counter1 <- 200 
} else if(syllable == "CCV") {
counter1 <- 940
} else if(syllable == "CV") {
counter1 <- 440
} else if(syllable == "CVC") {
counter1 <- 10560
} else 
print(syllable, "syllable not matched")
}

次に、元の音節の組み合わせの各音節に対してdetect_syllables関数を実行する関数:

one_syllable <- function(first_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
first_syl -> number1
print(number1)
}

two_syllables <- function(first_syllable, second_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
lapply(split_syllables[[i]][2], FUN = detect_syllables)
counter1 -> second_syl
first_syl*second_syl -> number2
print(number2) 
}

three_syllables <- function(first_syllable, second_syllable, third_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
lapply(split_syllables[[i]][2], FUN = detect_syllables)
counter1 -> second_syl
lapply(split_syllables[[i]][3], FUN = detect_syllables)
counter1 -> third_syl
first_syl*second_syl*third_syl -> number3
print(number3)
}

four_syllables <- function(first_syllable, second_syllable, third_syllable, fourth_syllable){
lapply(split_syllables[[i]][1], FUN = detect_syllables)
counter1 -> first_syl
lapply(split_syllables[[i]][2], FUN = detect_syllables)
counter1 -> second_syl
lapply(split_syllables[[i]][3], FUN = detect_syllables)
counter1 -> third_syl
lapply(split_syllables[[i]][4], FUN = detect_syllables)
counter1 -> fourth_syl
first_syl*second_syl*third_syl*fourth_syl -> number4
print(number4)
}

そして、detect_syllables 関数が適切に使用されていることを確認するための for ループ:

for(i in 1:10){
if(length(split_syllables[[i]]) == 1) { 
lapply(split_syllables[[i]][1], FUN = one_syllable)
} else if(length(split_syllables[[i]]) == 2) {
lapply(split_syllables[[i]][1], split_syllables[[i]][2], FUN = two_syllables)
} else if(length(split_syllables[[i]]) == 3) {
lapply(split_syllables[[i]][1], split_syllables[[i]][2], split_syllables[[i]][3], FUN = three_syllables)
} else if(length(split_syllables[[i]]) == 4) {
lapply(split_syllables[[i]][1], split_syllables[[i]][2], split_syllables[[i]][3], split_syllables[[i]][4], FUN = four_syllables)
} else 
print("number of syllables is bigger than 4")
}

ただし、for ループを使用しようとすると、次のエラーメッセージが表示されます。

Error in four_syllables(split_syllables[[1]]) : object 'counter1' not found

ここで述べたように、これは 'counter1' が評価される環境に関係していることに気付きました: Using get inside lapply, inside a functionですが、解決方法がわかりません。適切な環境にそれらを向けようとすると、ラップリーのどちらもそれを好まないようです（FUN（ "C" [[1L]]、...）のエラー：未使用の引数）。

この必要な結果は、lapply() を使用しないことで非常に洗練されていないものになる可能性があります。誰かが別の解決策を持っている場合は、それについて学んでいただければ幸いです。

for(i in 1:10){
if(length(split_syllables[[i]]) == 1) { 
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
first_syl -> number1
print(number1)
} else if(length(split_syllables[[i]]) == 2) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
detect_syllables(split_syllables[[i]][2]) -> counter1
counter1 -> second_syl
first_syl*second_syl -> number2
print(number2)
} else if(length(split_syllables[[i]]) == 3) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
detect_syllables(split_syllables[[i]][2]) -> counter1
counter1 -> second_syl
detect_syllables(split_syllables[[i]][3]) -> counter1
counter1 -> third_syl
first_syl*second_syl*third_syl -> number3
print(number3)
} else if(length(split_syllables[[i]]) == 4) {
detect_syllables(split_syllables[[i]][1]) -> counter1
counter1 -> first_syl
detect_syllables(split_syllables[[i]][2]) -> counter1
counter1 -> second_syl
detect_syllables(split_syllables[[i]][3]) -> counter1
counter1 -> third_syl
detect_syllables(split_syllables[[i]][4]) -> counter1
counter1 -> fourth_syl
first_syl*second_syl*third_syl*fourth_syl -> number4
print(number4)
} else 
print("number of syllables is bigger than 4")
}

score 0 · Accepted Answer

あなたがやりたいことすべてに従っているかどうかはわかりませんが、始めるのに役立つコードがいくつかあります。

# save first two syllables
split_combs <- strsplit(syllable_combinations, "-")
syl1 <- sapply(split_combs, "[", 1)
syl2 <- sapply(split_combs, "[", 2)

# function to look at how a string starts
check.start <- function(string, start) {
    # does the string start with this?
    tfn <- substring(string, 1, nchar(start))==start
    tfn[is.na(tfn)] <- FALSE
    tfn
    }

# show all syllable combinations with the first two syllables starting with CV
syllable_combinations[check.start(syl1, "CV") & check.start(syl2, "CV")]

r - 文字列を文字ごとに分析して、R で可能な単語数を計算する

1 に答える 1

Related

Reference