2

私は yebrahim からこの tfidf を取得しましたがどういうわけか私の出力ドキュメントは結果に対してすべて 0 を生成します。これに問題はありますか?出力の例は、hippo 0.0 hipper 0.0 hip 0.0 Hint 0.0hindsight 0.0 hill 0.0 hilarious 0.0 です。

助けてくれてありがとう

    # increment local count
    for word in doc_words:
        if word in terms_in_doc:
            terms_in_doc[word] += 1
        else:
            terms_in_doc[word]  = 1

    # increment global frequency
     for (word,freq) in terms_in_doc.items():
        if word in global_term_freq:
            global_term_freq[word] += 1
        else:
            global_term_freq[word]  = 1

     global_terms_in_doc[f] = terms_in_doc

print('working through documents.. ')
for f in all_files:

    writer = open(f + '_final', 'w')
    result = []
    # iterate over terms in f, calculate their tf-idf, put in new list
    max_freq = 0;
    for (term,freq) in global_terms_in_doc[f].items():
        if freq > max_freq:
            max_freq = freq
    for (term,freq) in global_terms_in_doc[f].items():
        idf = math.log(float(1 + num_docs) / float(1 + global_term_freq[term]))
        tfidf = float(freq) / float(max_freq) * float(idf)
        result.append([tfidf, term])

    # sort result on tfidf and write them in descending order
    result = sorted(result, reverse=True)
    for (tfidf, term) in result[:top_k]:
        if display_mode == 'both':
            writer.write(term + '\t' + str(tfidf) + '\n')
        else:
            writer.write(term + '\n')
4

1 に答える 1