R で、ある文字列が別の文字列の省略形であるかどうかを判断する効率的な方法を探しています。私が取っている基本的なアプローチは、短い文字列の文字が長い文字列で同じ順序で表示されるかどうかを確認することです。たとえば、短い文字列が「abv」で長い文字列が「abbreviation」の場合、肯定的な結果が必要になりますが、短い文字列が「avb」の場合、否定的な結果が必要になります。機能する機能をまとめましたが、それはかなり洗練されていないソリューションのようで、正規表現の魔法が欠けているのではないかと思いました。R の「stringdist」関数も調べましたが、特にこれを行っているように見えるものは見つかりませんでした。これが私の機能です:
# This function computes whether one of the input strings (input strings x and y) could be an abbreviation of the other
# input strings should be all the same case, and probably devoid of anything but letters and numbers
abbrevFind = function(x, y) {
# Compute the number of characters in each string
len.x = nchar(x)
len.y = nchar(y)
# Find out which string is shorter, and therefore a possible abbreviation
# split each string into its component characters
if (len.x < len.y) {
# Designate the abbreviation and the full string
abv = substring(x, 1:len.x, 1:len.x)
full = substring(y, 1:len.y, 1:len.y)
} else if (len.x >= len.y) {
abv = substring(y, 1:len.y, 1:len.y)
full = substring(x, 1:len.x, 1:len.x)
}
# Get the number of letters in the abbreviation
small = length(abv)
# Set up old position, which will be a comparison criteria
pos.old = 0
# set up an empty vector which will hold the letter positions of already used letters
letters = c()
# Loop through each letter in the abbreviation
for (i in 1:small) {
# Get the position in the full string of the ith letter in the abbreviation
pos = grep(abv[i], full)
# Exclude positions which have already been used
pos = pos[!pos %in% letters]
# Get the earliest position (note that if the grep found no matches, the min function will return 'Inf' here)
pos = min(pos)
# Store that position
letters[i] = pos
# If there are no matches to the current letter, or the current letter's only match is earlier in the string than the last match
# it is not a possible abbreviation. The loop breaks, and the function returns False
# If the function makes it all the way through without breaking out of the loop, the function will return true
if (is.infinite(pos) | pos <= pos.old) {abbreviation = F; break} else {abbreviation = T}
# Set old position equal to the current position
pos.old = pos
}
return(abbreviation)
}
助けてくれてありがとう!