4

Objective-C では...

「Δ」のような文字がある場合、どうすればユニコード値を取得し、それが特定の値の範囲内にあるかどうかを判断できます。

たとえば、特定の文字が から の Unicode 範囲にあるかどうかを知りたいU+1F300場合U+1F6FF

4

1 に答える 1

2

NSString uses UTF-16 to store codepoints internally, so those in the range you're looking for (U+1F300 to U+1F6FF) will be stored as a surrogate pair (four bytes). Despite its name, characterAtIndex: (and unichar) doesn't know about codepoints and will give you the two bytes that it sees at the index you give it (the 55357 you're seeing is the lead surrogate of the codepoint in UTF-16).

To examine the raw codepoints, you'll want to convert the string/characters into UTF-32 (which encodes them directly). To do this, you have a few options:

  1. Get all UTF-16 bytes that make up the codepoint, and use either this algorithm or CFStringGetLongCharacterForSurrogatePair to convert the surrogate pairs to UTF-32.

  2. Use either dataUsingEncoding: or getBytes:maxLength:usedLength:encoding:options:range:remainingRange: to convert the NSString to UTF-32, and interpret the raw bytes as a uint32_t.

  3. Use a library like ICU.

于 2013-02-14T04:52:09.433 に答える