Given this string
HELLO水</p>
Legend: http://en.wikipedia.org/wiki/UTF-16
is 4 bytes
水 is 2 bytes
Postgresql database (UTF-8) returns the correct length of 7:
select length('HELLO水');
I noticed both .NET and Java returns 8:
Console.WriteLine("HELLO水");
System.out.println("HELLO水");
And Sql Server returns 8 too:
SELECT LEN('HELLO水');
.NET,Java and Sql Server returns correct string length when a given unicode character is not variable-length, they all return 6:
HELLO水
They return 7 for variable-length ones, which is incorrect:
HELLO
.NET,Java and Sql Server uses UTF-16. It seems that their implementation of counting the length of UTF-16 string is broken. Or is this mandated by UTF-16? UTF-16 is variable-length capable as its UTF-8 cousin. But why UTF-16 (or is it the fault of .NET,Java,SQL Server and whatnot?) is not capable of counting the length of string correctly like with UTF-8?
Python returns a length of 12, I dont know how to interpret why it returns 12 though. This might be another topic entirely, I digress.
len("HELLO水")
Question is, how do I get the correct count of characters on .NET, Java and Sql Server? It will be difficult to implement the next twitter if a function returns incorrect character count.
If I may add, I was not able to post this using Firefox. I posted this question in Google Chrome. Firefox cannot display variable-length unicodes