c++ - C++ reading from file puts three weird characters

Question

When i read from a file string by string, >> operation gets first string but it starts with "ï»¿i" . Assume that first string is "street", than it gets as "ï»¿istreet".

Other strings are okay. I tried for different txt files. The result is same. First string starts with "ï»¿i". What is the problem?

Here is my code :

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int cube(int x){ return (x*x*x);}

int main(){

int maxChar;
int lineLength=0;
int cost=0;

cout<<"Enter the max char per line... : ";
cin>>maxChar;
cout<<endl<<"Max char per line is : "<<maxChar<<endl;

fstream inFile("bla.txt",ios::in);

if (!inFile) {
    cerr << "Unable to open file datafile.txt";
    exit(1);   // call system to stop
}

while(!inFile.eof()) {
    string word;

    inFile >> word;
    cout<<word<<endl;
    cout<<word.length()<<endl;
    if(word.length()+lineLength<=maxChar){
        lineLength +=(word.length()+1);
    }
    else {
        cost+=cube(maxChar-(lineLength-1));
        lineLength=(word.length()+1);
    }   
}

}

score 9 · Accepted Answer

UTF-8バイトオーダーマーク（BOM）が表示されています。これは、ファイルを作成したアプリケーションによって追加されました。

マーカーを検出して無視するには、次の（テストされていない）関数を試すことができます。

bool SkipBOM(std::istream & in)
{
    char test[4] = {0};
    in.read(test, 3);
    if (strcmp(test, "\xEF\xBB\xBF") == 0)
        return true;
    in.seekg(0);
    return false;
}

score 2 · Accepted Answer

With reference to the excellent answer by Mark Ransom above, adding this code skips the BOM (Byte Order Mark) on an existing stream. Call it after opening a file.

// Skips the Byte Order Mark (BOM) that defines UTF-8 in some text files.
void SkipBOM(std::ifstream &in)
{
    char test[3] = {0};
    in.read(test, 3);
    if ((unsigned char)test[0] == 0xEF && 
        (unsigned char)test[1] == 0xBB && 
        (unsigned char)test[2] == 0xBF)
    {
        return;
    }
    in.seekg(0);
}

To use:

ifstream in(path);
SkipBOM(in);
string line;
while (getline(in, line))
{
    // Process lines of input here.
}

score 0 · Accepted Answer

Here is another two ideas.

if you are the one who create the files, save they length along with them, and when reading them, just cut all the prefix with this simple calculation: trueFileLength - savedFileLength = numOfByesToCut
create your own prefix when saving the files, and when reading search for it and delete all what you found before.

c++ - C++ reading from file puts three weird characters

3 に答える 3

Related

Reference