ruby - 文字列内の一般的な単語を検出する Ruby

Question

Ruby で 2 つ以上の文の間で共通の部分文字列を検出するにはどうすればよいですか。

次のような文字列がたくさんあります。

ジョン・D
ポール・ジョン
ジョン

サブストリング John を取得する必要があります。これを実装するにはどうすればよいですか?

ありがとう

score 2 · Accepted Answer

一般的なケースの解決:

def count_tokens(*args)
  tokens = args.join(" ").split(/\s/)
  tokens.inject(Hash.new(0)) {|counts, token| counts[token] += 1; counts }
end

counts = count_tokens("John D", "Paul John", "John")
# => {"John"=>3, "D"=>1, "Paul"=>1}

これにより、各文字列がトークンに分割され、各トークンのインスタンス数がカウントされます。そこから、最も一般的に使用されるトークンを取得するためにハッシュを並べ替えるのは簡単です。

score 1 · Accepted Answer

最新の要素を見つけて比較します。

list_of_strings = ["some", "random", "strings"]

def most_common_value(a)
  a.group_by do |array_element|
    array_element
  end.values.max_by(&:size).first
end

list_of_strings.each do |array_element|
  if /^#{most_common_value(list_of_strings)}$/ =~ array_element
    puts array_element
  end
end

score 1 · Accepted Answer

def string_count(sentence)
  counts = Hash.new(0)
  str_array = sentence.downcase.scan(/[\w]+/)
  for string in str_array
    counts[string] += 1
  end
  counts
end

に文を渡すとstring_count("John D John Paul John") 、出力が生成されます。

# => {"john"=>3, "d"=>1, "paul"=>1}

お役に立てれば！

ruby - 文字列内の一般的な単語を検出する Ruby

3 に答える 3

Related

Reference