python - 文字列をリストに分割する方法は？

Question

Python関数で文（入力）を分割し、各単語をリストに格納したいと思います。私の現在のコードは文を分割しますが、単語をリストとして保存しません。それ、どうやったら出来るの？

def split_line(text):

    # split the text
    words = text.split()

    # for each word in the line:
    for word in words:

        # print the word
        print(words)

score 520 · Accepted Answer

text.split()

これは、各単語をリストに保存するのに十分なはずです。 wordsはすでに文中の単語のリストであるため、ループは必要ありません。

次に、タイプミスかもしれませんが、ループが少し混乱しています。本当に追加を使用したい場合は、次のようになります。

words.append(word)

いいえ

word.append(words)

score 467 · Accepted Answer

text空白の連続実行で文字列を分割します。

words = text.split()

text文字列を区切り文字で分割します： ","。

words = text.split(",")

words変数はaになり、区切り文字でsplitlistからの単語が含まれます。text

score 91 · Accepted Answer

str.split()

区切り文字として sep を使用して、文字列内の単語のリストを返します ... sep が指定されていないか、None の場合、別の分割アルゴリズムが適用されます: 連続する空白の実行は単一の区切り文字と見なされ、結果には含まれます文字列の先頭または末尾に空白がある場合、先頭または末尾に空の文字列はありません。

>>> line="a sentence with a few words"
>>> line.split()
['a', 'sentence', 'with', 'a', 'few', 'words']
>>>

score 58 · Accepted Answer

リストとしての文で何をする予定かによっては、Natural LanguageTookKitを確認することをお勧めします。テキスト処理と評価を多用します。また、問題を解決するために使用することもできます。

import nltk
words = nltk.word_tokenize(raw_sentence)

これには、句読点を分割するという追加の利点があります。

例：

>>> import nltk
>>> s = "The fox's foot grazed the sleeping dog, waking it."
>>> words = nltk.word_tokenize(s)
>>> words
['The', 'fox', "'s", 'foot', 'grazed', 'the', 'sleeping', 'dog', ',', 
'waking', 'it', '.']

これにより、不要な句読点を除外して、単語のみを使用できます。

string.split()文の複雑な操作を行う予定がない場合は、を使用する他の解決策の方が優れていることに注意してください。

[編集]

score 36 · Accepted Answer

このアルゴリズムはどうですか？空白でテキストを分割し、句読点を削除します。これにより、単語の端にある句読点が慎重に削除されますが、we're.

>>> text
"'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'"

>>> text.split()
["'Oh,", 'you', "can't", 'help', "that,'", 'said', 'the', 'Cat:', "'we're", 'all', 'mad', 'here.', "I'm", 'mad.', "You're", "mad.'"]

>>> import string
>>> [word.strip(string.punctuation) for word in text.split()]
['Oh', 'you', "can't", 'help', 'that', 'said', 'the', 'Cat', "we're", 'all', 'mad', 'here', "I'm", 'mad', "You're", 'mad']

score 17 · Accepted Answer

Python 関数で文 (入力) を分割し、各単語をリストに格納する必要があります

メソッドはこれstr().split()を行い、文字列を受け取り、それをリストに分割します。

>>> the_string = "this is a sentence"
>>> words = the_string.split(" ")
>>> print(words)
['this', 'is', 'a', 'sentence']
>>> type(words)
<type 'list'> # or <class 'list'> in Python 3.0

あなたが抱えている問題はタイプミスによるもので、print(words)代わりに次のように書きましたprint(word)：

word変数の名前をに変更するとcurrent_word、次のようになります。

def split_line(text):
    words = text.split()
    for current_word in words:
        print(words)

..いつすべきか:

def split_line(text):
    words = text.split()
    for current_word in words:
        print(current_word)

何らかの理由で for ループで手動でリストを作成したい場合は、listappend()メソッドを使用します。おそらく、すべての単語を小文字にしたいからです (たとえば)。

my_list = [] # make empty list
for current_word in words:
    my_list.append(current_word.lower())

または、 list-comprehensionを使用して、もう少しきれいにします。

my_list = [current_word.lower() for current_word in words]

score 14 · Accepted Answer

シュレックスには.split()機能があります。str.split()引用符を保持せず、引用句を 1 つの単語として扱うという点でとは異なります。

>>> import shlex
>>> shlex.split("sudo echo 'foo && bar'")
['sudo', 'echo', 'foo && bar']

注意: Unix ライクなコマンドライン文字列に対してはうまく機能します。自然言語処理では機能しません。

score 4 · Accepted Answer

タイプミスで混乱していると思います。

ループ内で置き換えprint(words)て、print(word)すべての単語を別の行に出力します

python - 文字列をリストに分割する方法は？

10 に答える 10

Related

Reference