c++ - ローカルからの行で UTF8 文字をカウントします

Question

score 0 · Accepted Answer

解決しました。

UTF8 コードポイント: https://en.wikipedia.org/wiki/UTF-8

基本的な考え方は、バイトをマスクし、マルチバイト文字を完全に読み取るために無視する必要があるバイト数を確認することです。

unsigned char masks[5] = { 192, 224, 240, 248, 252 };

Local<String> str = ...
String::Utf8Value s (str->ToString ());
unsigned char c;
int utf8Bytes = 0;

for (int i=0; (c = (*s)[i]) != 0; i++){
    //Ignore utf8 check for one byte chars
    if (c > 127){
        if (utf8Bytes){
            utf8Bytes--;
            continue;
        }

        //Check whether is a utf8 multibyte char
        for (int i=4; i>=0; i--){
            if ((c & r->masks[i]) == r->masks[i]){
                utf8Bytes = i + 1;
                break;
            }
        }

        if (utf8Bytes){
             //Do something if it's a multibyte char
        }

        continue;
    }

        //Do something to check lines, chars, etc
}

score 0 · Accepted Answer

のこの実装を使用していると仮定するString::AsciiValueと、length()メソッドがあるようです

c++ - ローカルからの行で UTF8 文字をカウントします

2 に答える 2

Related

Reference