python - 文字列分割の問題

Question

問題: リストとして渡された区切り文字によって文字列を単語のリストに分割します。

弦："After the flood ... all the colors came out."

望ましい出力: ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

私は次の関数を書きました-関数に組み込まれているpythonのいくつかを使用して文字列を分割するより良い方法があることを認識していますが、学習のためにこの方法で進めると思いました:

def split_string(source,splitlist):
    result = []
    for e in source:
           if e in splitlist:
                end = source.find(e)
                result.append(source[0:end])
                tmp = source[end+1:]
                for f in tmp:
                    if f not in splitlist:
                        start = tmp.find(f)
                        break
                source = tmp[start:]
    return result

out = split_string("After  the flood   ...  all the colors came out.", " .")

print out

['After', 'the', 'flood', 'all', 'the', 'colors', 'came out', '', '', '', '', '', '', '', '', '']

なぜ「出てきた」が「出た」と「出た」に分けられないのか、私には理解できません。2 つの単語の間の空白文字が無視されているようです。出力の残りは、「出てきた」問題に関連する問題に起因するジャンクだと思います。

編集：

@Ivc の提案に従い、次のコードを思いつきました。

def split_string(source,splitlist):
    result = []
    lasti = -1
    for i, e in enumerate(source):
        if e in splitlist:
            tmp = source[lasti+1:i]
            if tmp not in splitlist:
                result.append(tmp)
            lasti = i
        if e not in splitlist and i == len(source) - 1:
            tmp = source[lasti+1:i+1]
            result.append(tmp)
    return result

out = split_string("This is a test-of the,string separation-code!"," ,!-")
print out
#>>> ['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code']

out = split_string("After  the flood   ...  all the colors came out.", " .")
print out
#>>> ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
print out
#>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']

out = split_string(" After  the flood   ...  all the colors came out...............", " ."
print out
#>>>['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

score 3 · Accepted Answer

内部ループ呼び出しは必要ありません。これだけで十分です：

def split_string(source,splitlist):
    result = []
    for e in source:
           if e in splitlist:
                end = source.find(e)
                result.append(source[0:end])
                source = source[end+1:]
    return result

リストに追加する前に source[:end] が空の文字列かどうかをチェックすることで、「がらくた」(つまり、空の文字列) を排除できます。

score 2 · Accepted Answer

あなたは期待しているようです：

source = tmp[start:]

source外側の for ループが反復しているを変更します。そうはなりません-そのループは、現在その名前を使用しているオブジェクトではなく、指定した文字列を処理し続けます。これは、現在のキャラクターがの残りの部分にない可能性があることを意味しますsource。

そうしようとする代わりに、次の方法で文字列の現在のインデックスを追跡します。

for i, e in enumerate(source):
   ...

追加するものは常にでありsource[lasti+1:i]、追跡する必要があるだけですlasti。

score 2 · Accepted Answer

上記の文字列の単語だけが必要な場合は、正規表現を使用すると簡単に取得できると思います。

>>> import re
>>> string="After the flood ... all the colors came out."
>>> re.findall('\w+',string)
['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

score 0 · Accepted Answer

[x for x in a.replace('.', '').split(' ') if len(x)>0]

ここで、「a」は入力文字列です。

score 0 · Accepted Answer

より単純な方法、少なくともより単純に見えます。

import string

    def split_string(source, splitlist):
        table = string.maketrans(splitlist,  ' ' * len(splitlist))
        return string.translate(source, table).split()

string.maketransとstring.translateをチェックアウトできます

score 0 · Accepted Answer

あまりにも多くのことを行う理由, この単純な, 試してください..
str.split(strSplitter , intMaxSplitCount) intMaxSplitCount はオプションです
あなたの場合, ハウスキーピングも行う必要がありますstr.replace(".","", 3) .最初の 3 つのドットのみを置換

つまり、次の
print ((str.replace(".", "",3)).split(" ")) ことを行う必要があります。必要なものが出力されます

私は処刑しました、ここで確認してください...

python - 文字列分割の問題

6 に答える 6

Related

Reference