37

I'm a Literature grad student, and I've been going through the O'Reilly book in Natural Language Processing (nltk.org/book). It looks incredibly useful. I've played around with all the example texts and example tasks in Chapter 1, like concordances. I now know how many times Moby Dick uses the word "whale." The problem is, I can't figure out how to do these calculations on one of my own texts. I've found information on how to create my own corpora (Ch. 2 of the O'Reilly book), but I don't think that's exactly what I want to do. In other words, I want to be able to do

import nltk 
text1.concordance('yellow')

and get the places where the word 'yellow' is used in my text. At the moment I can do this with the example texts, but not my own.

I'm very new to python and programming, and so this stuff is very exciting, but very confusing.

4

3 に答える 3

73

答えは自分で見つけました。それは恥ずかしいです。または素晴らしい。

Chから。3:

f=open('my-file.txt','rU')
raw=f.read()
tokens = nltk.word_tokenize(raw)
text = nltk.Text(tokens)

トリックを行います。

于 2012-05-06T00:22:14.393 に答える