0

I have a list of bigrams of a sentence and another original list of relevantbigrams, I want to check that if any of the relevantbigrams are present in the sentences then I want to return the sentence. I was thinking of implementing it as follows: map each of the bigrams in the list to the sentence they come from then do a search on the key an return the value.

example:

relevantbigrams = (This is, is not, not what)
bigrams List(list(This of, of no, no the),list(not what, what is))

So each list is a bigram of separate sentences. Here "not what" from the second sentence matches, so I would like to return the second sentence. I am planning to have a map of Map("This of" -> "This of no the", "of no" ->"This of no the", "not what"->"not what is"). etc. and return the sentences that match on relevant bigram, so here I return "not what is"

This is my code:

val bigram = usableTweets.map(x =>Tokenize(x).sliding(2).flatMap{case Vector(x,y) => List(x+" "+y)}.map(z => z, x))
for(i<- 0 to relevantbigram.length)
    if(bigram.contains(relevantbigram(i)))) bigram.get(relevantbigram(i))
    else useableTweets.head
4

1 に答える 1

1

あなたは注文したかflatMapmap間違った方法を取りました:

val bigramMap = usableTweets.flatMap { x => 
    x.split(" ").sliding(2).
      map(bg => bg.mkString(" ") -> x)
} toMap

次に、次のように検索を実行できます。

relevantbigrams collect { rb if theMap contains rb => bigramMap(rb) }

または

val found = 
  for { 
    rb <- relevantbigrams
    sentence <- theMap get rb
  } yield sentence

どちらもリストを提供するはずですが、コードからは、検索で何も見つからなかった場合、デフォルトで最初の文を使用したいようです:

found.headOption.getOrElse(usableTweets.head)
于 2013-03-08T00:37:51.347 に答える