python - Python、文字列の操作

Question

ファイルから乱雑なテキストを読み取り、このテキストをブック形式にするプログラムをクラスに構築する必要があるため、入力から:

This    is programing   story , for programmers  . One day    a variable
called
v  comes    to a   bar    and ordred   some whiskey,   when suddenly 
      a      new variable was declared .
a new variable asked : "    What did you ordered? "

出力に

This is programing story,
for programmers. One day 
a variable called v comes
to a bar and ordred some 
whiskey, when suddenly a 
new variable was 
declared. A new variable
asked: "what did you 
ordered?"

私はプログラミングの初心者で、私のコードはここにあります

   def vypis(t):
    cely_text = ''
    for riadok in t:
        cely_text += riadok.strip()
    a = 0     
    for i in range(0,80):
        if cely_text[0+a] == " " and cely_text[a+1] == " ":
            cely_text = cely_text.replace ("  ", " ")
        a+=1
    d=0    
    for c in range(0,80):
        if cely_text[0+d] == " " and (cely_text[a+1] == "," or cely_text[a+1] == "." or cely_text[a+1] == "!" or cely_text[a+1] == "?"):
            cely_text = cely_text.replace (" ", "")
        d+=1   
def vymen(riadok):
    for ch in riadok:
        if ch in '.,":':
            riadok = riadok[ch-1].replace(" ", "")
x = int(input("Zadaj x"))
t = open("text.txt", "r")
v = open("prazdny.txt", "w")
print(vypis(t))

このコードはいくつかのスペースを削除しており、「.,_?」などの記号の前のスペースを削除しようとしました。しかし、これはなぜうまくいきませんか？手伝ってくれてありがとう：）

score 3 · Accepted Answer

やりたいことはたくさんあるので、順番に見ていきましょう。

素敵なテキスト形式 (文字列のリスト) でテキストを取得しましょう。

>>> with open('text.txt', 'r') as f:
...     lines = f.readlines()

>>> lines
['This    is programing   story , for programmers  . One day    a variable', 
 'called', 'v  comes    to a   bar    and ordred   some whiskey,   when suddenly ',
 '      a      new variable was declared .', 
 'a new variable asked : "    What did you ordered? "']

いたるところに改行があります。それらをスペースに置き換えて、すべてを 1 つの大きな文字列に結合しましょう。

>>> text = ' '.join(line.replace('\n', ' ') for line in lines)

>>> text
'This    is programing   story , for programmers  . One day    a variable called v  comes    to a   bar    and ordred   some whiskey,   when suddenly        a      new variable was declared . a new variable asked : "    What did you ordered? "'

次に、複数のスペースを削除します。スペース、タブなどで分割し、空でない単語のみを保持します。

>>> words = [word for word in text.split() if word]
>>> words
['This', 'is', 'programing', 'story', ',', 'for', 'programmers', '.', 'One', 'day', 'a', 'variable', 'called', 'v', 'comes', 'to', 'a', 'bar', 'and', 'ordred', 'some', 'whiskey,', 'when', 'suddenly', 'a', 'new', 'variable', 'was', 'declared', '.', 'a', 'new', 'variable', 'asked', ':', '"', 'What', 'did', 'you', 'ordered?', '"']

単語をスペースで結合しましょう... (今回は 1 つだけ)

>>> text = ' '.join(words)
>>> text
'This is programing story , for programmers . One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared . a new variable asked : " What did you ordered? "'

<SPACE>.などをすべて削除し<SPACE>,ます。

>>> for char in (',', '.', ':', '"', '?', '!'):
...     text = text.replace(' ' + char, char)
>>> text
'This is programing story, for programmers. One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared. a new variable asked:" What did you ordered?"'

OK、作業はまだ終わってい"ません。大文字が設定されていないなどです。まだテキストを段階的に更新できます。大文字の場合、たとえば次のように考えてください。

>>> sentences = text.split('.')
>>> sentences
['This is programing story, for programmers', ' One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared', ' a new variable asked:" What did you ordered?"']

どのように修正できるかご覧ください。秘訣は、次のような文字列変換のみを取ることです。

正しい文は変換によって UNCHANGED です
間違った文は変換によって改善されます

このようにして、それらを構成し、テキストを段階的に改善できます。

次のように、適切にフォーマットされたテキストを作成したら、次のようにします。

>>> text
'This is programing story, for programmers. One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared. A new variable asked: "what did you ordered?"'

ブック形式で印刷するには、同様の構文規則を定義する必要があります。たとえば、関数を考えてみましょう:

>>> def prettyprint(text):
...     return '\n'.join(text[i:i+50] for i in range(0, len(text), 50))

各行を 50 文字の正確な長さで出力します。

>>> print prettyprint(text)
This is programing story, for programmers. One day
 a variable called v comes to a bar and ordred som
e whiskey, when suddenly a new variable was declar
ed. A new variable asked: "what did you ordered?"

悪くはありませんが、もっと良くなる可能性があります。以前、英語の構文規則に一致するようにテキスト、行、文、および単語をジャグリングしたように、印刷された本の構文規則に一致するようにまったく同じことを行いたいと考えています。

その場合、英語と印刷された本の両方が同じ単位、つまり文に配置された単語で機能します。これは、これらに直接取り組みたいと思うかもしれないことを示唆しています。これを行う簡単な方法は、独自のオブジェクトを定義することです。

>>> class Sentence(object):
...     def __init__(self, content, punctuation):
...         self.content = content
...         self.endby = punctuation
...     def pretty(self):
...         nice = []
...         content = self.content.pretty()
...         # A sentence starts with a capital letter
...         nice.append(content[0].upper())
...         # The rest has already been prettified by the content
...         nice.extend(content[1:])
...         # Do not forget the punctuation sign
...         nice.append('.')
...         return ''.join(nice)

>>> class Paragraph(object):
...     def __init__(self, sentences):
...         self.sentences = sentences
...     def pretty(self):
...         # Separating our sentences by a single space
...         return ' '.join(sentence.pretty() for sentence in sentences)

など...このようにして、テキストを次のように表すことができます。

>>> Paragraph(
...   Sentence(
...     Propositions([Proposition(['this', 
...                                'is', 
...                                'programming', 
...                                'story']),
...                   Proposition(['for',
...                                'programmers'])],
...                   ',')
...     '.'),
...   Sentence(...

等...

文字列 (混乱したものであっても) からそのようなツリーへの変換は、可能な限り最小の要素に分解するだけなので、比較的簡単です。ブック形式で印刷したい場合bookは、ツリーの各要素に独自のメソッドを定義できます。たとえば、次のように、 current line、出力lines、およびcurrentoffsetのcurrent を渡しlineます。

 class Proposition(object):
      ...
      def book(self, line, lines, offset, line_length):
          for word in self.words:
              if offset + len(word) > line_length:
                  lines.append(' '.join(line))
                  line = []
                  offset = 0
              line.append(word)
          return line, lines, offset

 ...

 class Propositions(object):
     ...
     def book(self, lines, offset, line_length):
         lines, offset = self.Proposition1.book(lines, offset, line_length)
         if offset + len(self.punctuation) + 1 > line_length: 
              # Need to add the punctuation sign with the last word
              # to a new line
              word = line.pop()
              lines.append(' '.join(line))
              line = [word + self.punctuation + ' ']
              offset = len(word + self.punctuation + ' ')
         line, lines, offset = self.Proposition2.book(lines, offset, line_length)
         return line, lines, offset

Sentence、Paragraph、Chapter...まで進みます。

これは非常に単純化された実装 (そして実際には重要な問題) であり、音節化や正当化 (おそらく必要になるでしょう) を考慮していませんが、これが進むべき道です。

構文規則または変換を定義できるようになったら使用するツールであるstring モジュール、文字列の書式設定、または正規表現については触れていないことに注意してください。これらは非常に強力なツールですが、ここで最も重要なのは、無効な文字列を有効な文字列に変換するアルゴリズムを正確に知ることです。動作する疑似コードができたら、正規表現とフォーマット文字列を使用すると、単純な文字の繰り返しよりも簡単にそれを実現できます。(たとえば、前の単語ツリーの例では、正規表現はツリーの構築を非常に容易にし、Python の強力な文字列フォーマット関数は、またはの書き込みを行うことができます。bookprettyメソッドははるかに簡単です）。

score 1 · Accepted Answer

複数のスペースを削除するには、単純な正規表現置換を使用できます。

import re
cely_text = re.sub(' +',' ', cely_text)

次に、句読点について、同様のサブを実行できます。

cely_text = re.sub(' +([,.:])','\g<1>', cely_text)

python - Python、文字列の操作

2 に答える 2

Related

Reference