c++ - C ++テキストファイルから単語単位で、単語単位または文字単位で読み取る

Question

私はグーグルで本を読み、テキストファイルを読み込んで単語を1つずつ処理するコードを書き出そうとしているので、アルファベット順に並べて単語数を数えることができますどこで使われ、多くの言葉が使われました。GetNextWord() 関数を正しく動作させることができないようで、気が狂いそうです。

単語を 1 つずつ読み取り、大文字の場合は各文字を小文字に変換する必要があります。私はそれを行う方法を知っており、それを成功させました。文字ごとに単語を取得し、それを文字列に入れているだけです。

これは私の最近の試みです。入力ファイルから単語を読み取る方法に関するヘルプやチュートリアルへのリンクは素晴らしいでしょう。(英字 a ～ z および ' (不可) は、空白、コンマ、ピリオド、; 、 : などで終わる単語です。

void GetNextWord()
{
    string word = "";
    char c;

    while(inFile.get(c))
    {
        while( c > 64 && c < 123 || c == 39)
        {
            if((isupper(c)))
            {
                c = (tolower(c));
            }
            word = word + c;
        }
        outFile << word;
    }
}

score 8 · Accepted Answer

>>演算子を使用して、ファイルを単語ごとに読み取ることができます。たとえば、次のリンクを参照してください: http://www.daniweb.com/forums/thread30942.html .

ここで彼らの例を抜粋しました：

ifstream in ( "somefile" );
vector<string> words;
string word

if ( !in )
  return;

while ( in>> word )
  words.push_back ( word );

score 3 · Accepted Answer

あなたの論理は間違っています。c内側のループは変わらない限り実行され、変更されるものは何もありませんc。

とにかく2つのループがあるのはなぜですか？その関数が次の単語を読み取るのか、すべての単語を読み取るのかについて混乱するかもしれません。それらの懸念を分離し、異なる関数に入れてみてください (一方が他方を呼び出しています)。このような問題には、トップダウンの順序でアプローチするのが最も簡単だと思います。

while(inFile.good()) {
  std::string word = GetNextWord(inFile);
  if(!word.empty())
    std::cout << word << std::endl;
}

GetNextWord()次に、次の単語境界まですべてを読み取るように定義して、ギャップを埋めます。

score 0 · Accepted Answer

個人的には、入力を読み込むのが好きですstd::getline(std::istream&, std::string&)(<string>ヘッダー内ですが、もちろん#includeストリームヘッダーも必要になります)。

この関数は、問題の定義による空白である改行で中断します。しかし、それはあなたの質問に対する完全な答えではありません。テキスト行を読み取った後、文字列操作または標準アルゴリズムを使用して文字列を単語に分割する必要があります。または、文字列を手でループすることもできます。

内臓は次のようになります。

std::string buffer;
while (std::getline(std::cin, buffer) {
// break each line into words, according to problem spec
}

score 0 · Accepted Answer

私が使う

// str is a string that holds the line of data from ifs- the text file.
// str holds the words to be split, res the vector to store them in.
while( getline( ifs, str ) ) 
    split(str, res);


void split(const string& str, vector<string>& vec)
{
    typedef unsigned int uint;

    const string::size_type size(str.size());
    uint start(0);
    uint range(0);

 /* Explanation: 
  * Range - Length of the word to be extracted without spaces.
  * start - Start of next word. During initialization, starts at space 0.
  * 
  * Runs until it encounters a ' ', then splits the string with a substr() function,
  * as well as making sure that all characters are lower-case (without wasting time
  * to check if they already are, as I feel a char-by-char check for upper-case takes
  * just as much time as lowering them all anyway.                                       
 */
    for( uint i(0); i < size; ++i )
    {
        if( isspace(str[i]) )
        {
            vec.push_back( toLower(str.substr(start, range + 1)) );
            start = i + 1;
            range = 0;
        } else
            ++range;
    }
    vec.push_back( toLower(str.substr(start, range)) );
}

これが特に役立つかどうかはわかりませんが、試してみます。toLower 関数は、::toLower() 関数を使用するだけの簡単な関数です。これは、各文字をスペースまで読み取り、ベクターに詰め込みます。char by char の意味がよくわかりません。

時間ごとに単語の文字を抽出しますか? それとも、キャラクターごとにチェックしていきますか？それとも、単語を 1 つ抽出して終了し、戻ってくるという意味ですか? もしそうなら、私は 1) とにかくベクトルを推奨し、2) コードをリファクタリングできるように私に知らせます。

score 0 · Accepted Answer

c == 'a' の場合、内側のループを終了するのは何ですか? 「a」の ASCII 値は 97 です。

c++ - C ++テキストファイルから単語単位で、単語単位または文字単位で読み取る

5 に答える 5

Related

Reference