c++ - ファイル内の文字の出現回数をカウントする

Question

各文字がファイルに表示される回数を数えようとしています。以下のコードを実行すると、「Z」が 2 回カウントされます。誰でも理由を説明できますか？

テストデータは次のとおりです。

abcdefghijklmnopqrstuvwxyz

ABCDEFGHIJKLMNOPQRSTUVWXYZ

#include <iostream>                 //Required if your program does any I/O
#include <iomanip>                  //Required for output formatting
#include <fstream>                  //Required for file I/O
#include <string>                   //Required if your program uses C++ strings
#include <cmath>                    //Required for complex math functions
#include <cctype>                   //Required for letter case conversion

using namespace std;                //Required for ANSI C++ 1998 standard.

int main ()
{
string reply;
string inputFileName;
ifstream inputFile;
char character;
int letterCount[127] = {};

cout << "Input file name: ";
getline(cin, inputFileName);

// Open the input file.
inputFile.open(inputFileName.c_str());      // Need .c_str() to convert a C++ string to a C-style string
// Check the file opened successfully.
if ( ! inputFile.is_open())
{
    cout << "Unable to open input file." << endl;
    cout << "Press enter to continue...";
    getline(cin, reply);
    exit(1);
}

while ( inputFile.peek() != EOF )
{
      inputFile >> character;
      //toupper(character);

      letterCount[static_cast<int>(character)]++;
}

for (int iteration = 0; iteration <= 127; iteration++)
{
    if ( letterCount[iteration] > 0 )
    {
         cout << static_cast<char>(iteration) << " " << letterCount[iteration] << endl;
    }
}

system("pause");
exit(0);
}

score 4 · Accepted Answer

他の人が指摘したように、入力には 2 つの Q があります。Z が 2 つある理由は、最後の

inputFile >> character;

(おそらく、ストリームに改行文字が残っているため、EOF ではない場合) は何も変換できず、前の繰り返しのグローバルな「文字」に「Z」が残ります。後で inputFile.fail() を調べて、これを確認してください。

while (inputFile.peek() != EOF)
{
    inputFile >> character;

    if (!inputFile.fail())
    {
        letterCount[static_cast<int>(character)]++;
    }
}

ループを記述する慣用的な方法で、「Z」の問題も修正します。

while (inputFile >> character)
{
      letterCount[static_cast<int>(character)]++;
}

score 2 · Accepted Answer

Q大文字の文字列には2つの'があります。2つのカウントが得られる理由は、キャラクターを読んだ後ではなく、読んだ後Zで確認する必要があるためだと思いますが、それについてはよくわかりません。EOF

score 2 · Accepted Answer

さて、他の人はすでにあなたのコードのエラーを指摘しています。

しかし、これがファイルを読んでその中の文字を数えることができる1つのエレガントな方法です：

 struct letter_only: std::ctype<char> 
 {
    letter_only(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
       static std::vector<std::ctype_base::mask> 
             rc(std::ctype<char>::table_size,std::ctype_base::space);

       std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
       return &rc[0];
    }
 };

struct Counter
{
    std::map<char, int> letterCount;
    void operator()(char  item) 
    { 
       if ( item != std::ctype_base::space)
         ++letterCount[tolower(item)]; //remove tolower if you want case-sensitive solution!
    }
    operator std::map<char, int>() { return letterCount ; }
};

int main()
{
     ifstream input;
     input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only leters only!
     input.open("filename.txt");
     istream_iterator<char> start(input);
     istream_iterator<char> end;
     std::map<char, int> letterCount = std::for_each(start, end, Counter());
     for (std::map<char, int>::iterator it = letterCount.begin(); it != letterCount.end(); ++it)
     {
          cout << it->first <<" : "<< it->second << endl;
     }
 }

これは、このソリューションの変更された（テストされていない）バージョンです。

ファイル内の単語の頻度を数えるエレガントな方法

score 1 · Accepted Answer

一つには、入力に2つのQがあります。

Zに関しては、@ Jeremiahは、最後の文字であり、コードがEOFを正しく検出していないため、二重にカウントされるという点でおそらく正しいでしょう。これは、入力文字の順序を変更するなどして簡単に確認できます。

ちなみに、ここに

for (int iteration = 0; iteration <= 127; iteration++)

インデックスが範囲外になります。ループ条件はiteration < 127、であるか、配列がとして宣言されている必要がありますint letterCount[128]。

score 1 · Accepted Answer

どうやら英語の文字だけを数えたいと考えていることを考えると、コードを大幅に簡素化できるはずです。

int main(int argc, char **argv) { 
   std::ifstream infile(argv[1]);

    char ch;
    static int counts[26];

    while (infile >> ch)
       if (isalpha(ch))
           ++counts[tolower(ch)-'a'];

    for (int i=0; i<26; i++)
        std::cout << 'A'  + i << ": " << counts[i] <<"\n";
    return 0;
}

もちろん、さらに多くの可能性があります。@Nawaz のコード (たとえば) と比較すると、これは明らかにかなり短くて単純ですが、より制限されています (たとえば、現状では、アクセントのない英語の文字でのみ機能します)。基本的なASCII文字にかなり制限されています.EBCDICエンコーディング、ISO 8859-x、またはUnicodeは完全に壊れます.

彼はまた、「文字のみ」のフィルタリングを任意のファイルに簡単に適用できるようにします。どちらを選択するかは、その柔軟性が必要かどうか、または使用できるかどうかによって異なります。質問で言及されている文字のみを気にし、ASCII のスーパーセットを使用する典型的なマシンでのみ気にする場合、このコードはより簡単にジョブを処理しますが、それ以上の文字が必要な場合はまったく適していません。

c++ - ファイル内の文字の出現回数をカウントする

5 に答える 5

Related

Reference