Objective-C では...
「Δ」のような文字がある場合、どうすればユニコード値を取得し、それが特定の値の範囲内にあるかどうかを判断できます。
たとえば、特定の文字が から の Unicode 範囲にあるかどうかを知りたいU+1F300
場合U+1F6FF
Objective-C では...
「Δ」のような文字がある場合、どうすればユニコード値を取得し、それが特定の値の範囲内にあるかどうかを判断できます。
たとえば、特定の文字が から の Unicode 範囲にあるかどうかを知りたいU+1F300
場合U+1F6FF
NSString
uses UTF-16 to store codepoints internally, so those in the range you're looking for (U+1F300
to U+1F6FF
) will be stored as a surrogate pair (four bytes). Despite its name, characterAtIndex:
(and unichar
) doesn't know about codepoints and will give you the two bytes that it sees at the index you give it (the 55357
you're seeing is the lead surrogate of the codepoint in UTF-16).
To examine the raw codepoints, you'll want to convert the string/characters into UTF-32 (which encodes them directly). To do this, you have a few options:
Get all UTF-16 bytes that make up the codepoint, and use either this algorithm or CFStringGetLongCharacterForSurrogatePair
to convert the surrogate pairs to UTF-32.
Use either dataUsingEncoding:
or getBytes:maxLength:usedLength:encoding:options:range:remainingRange:
to convert the NSString
to UTF-32, and interpret the raw bytes as a uint32_t
.
Use a library like ICU.