python - 文字列を単語のリストに変換しますか?

Question

Pythonを使用して文字列を単語のリストに変換しようとしています。私は次のようなものを取りたいです：

string = 'This is a string, with words!'

次に、次のように変換します。

list = ['This', 'is', 'a', 'string', 'with', 'words']

句読点とスペースの省略に注意してください。これについての最速の方法は何ですか？

score 105 · Accepted Answer

これを試して：

import re

mystr = 'This is a string, with words!'
wordList = re.sub("[^\w]", " ",  mystr).split()

使い方：

ドキュメントから：

re.sub(pattern, repl, string, count=0, flags=0)

string 内のパターンの重複しない左端のオカレンスを置換 repl で置き換えることによって得られる文字列を返します。パターンが見つからない場合、string は変更されずに返されます。repl は、文字列または関数にすることができます。

私たちの場合：

pattern は英数字以外の任意の文字です。

[\w] は任意の英数字を意味し、文字セット [a-zA-Z0-9_] と同じです。

a から z、A から Z 、0 から 9 およびアンダースコア。

そのため、英数字以外の文字に一致し、それをスペースに置き換えます。

それから、文字列をスペースで分割してリストに変換する split()

だから「ハローワールド」

「ハローワールド」になる

re.subで

そして ['hello' , 'world']

分割後()

疑問が生じた場合はお知らせください。

score 104 · Accepted Answer

遅い応答を考えると、これがこの投稿に出くわした他の人にとって最も簡単な方法だと思います:

>>> string = 'This is a string, with words!'
>>> string.split()
['This', 'is', 'a', 'string,', 'with', 'words!']

score 36 · Accepted Answer

これを適切に行うのは非常に複雑です。あなたの研究では、これは単語のトークン化として知られています。ゼロから始めるのではなく、他の人が何をしたかを見たい場合は、 NLTKを見る必要があります。

>>> import nltk
>>> paragraph = u"Hi, this is my first sentence. And this is my second."
>>> sentences = nltk.sent_tokenize(paragraph)
>>> for sentence in sentences:
...     nltk.word_tokenize(sentence)
[u'Hi', u',', u'this', u'is', u'my', u'first', u'sentence', u'.']
[u'And', u'this', u'is', u'my', u'second', u'.']

score 20 · Accepted Answer

最も簡単な方法：

>>> import re
>>> string = 'This is a string, with words!'
>>> re.findall(r'\w+', string)
['This', 'is', 'a', 'string', 'with', 'words']

score 15 · Accepted Answer

完全を期すための使用string.punctuation：

import re
import string
x = re.sub('['+string.punctuation+']', '', s).split()

これは改行も処理します。

score 8 · Accepted Answer

まあ、あなたは使うことができます

import re
list = re.sub(r'[.!,;?]', ' ', string).split()

stringとは両方ともlist組み込み型の名前であるため、おそらくそれらを変数名として使用したくないことに注意してください。

score 3 · Accepted Answer

単語の正規表現を使用すると、最も制御しやすくなります。"I'm" のように、ダッシュやアポストロフィーを含む単語をどのように扱うかを慎重に検討する必要があります。

score 3 · Accepted Answer

個人的には、これは提供された回答よりも少しきれいだと思います

def split_to_words(sentence):
    return list(filter(lambda w: len(w) > 0, re.split('\W+', sentence))) #Use sentence.lower(), if needed

score 1 · Accepted Answer

1

list=mystr.split(" ",mystr.count(" "))

于 2015-08-11T15:14:35.827 に答える

score 1 · Accepted Answer

def split_string(string):
    return string.split()

この関数は、指定された文字列の単語のリストを返します。この場合、次のように関数を呼び出すと、

string = 'This is a string, with words!'
split_string(string)

関数の戻り出力は次のようになります。

['This', 'is', 'a', 'string,', 'with', 'words!']

score 0 · Accepted Answer

これは、正規表現を使用できないコーディングの課題に対する私の試みからのものです。

outputList = "".join((c if c.isalnum() or c=="'" else ' ') for c in inputStr ).split(' ')

アポストロフィの役割は興味深いようです。

python - 文字列を単語のリストに変換しますか?

15 に答える 15

Related

Reference