By writing $a = "大";
into a PHP file, the variable $a
contains a byte sequence of whatever was between the quotes in your source code file. If that source code file was saved in UTF-8, the string is a UTF-8 byte sequence representing the character "大". If the source code file was saved in GB2312, it is the GB2312 byte sequence representing "大". But a PHP file saved in GB2312 won't actually parse as valid PHP, since PHP needs an ASCII compatible encoding.
mb_strlen
is supposed to give you the number of characters in the given string in the specified encoding. I.e. mb_strlen('大', 'gb2312')
expects the string to be a GB2312 byte sequence representation and is supposed to return 1. You're wrong in expecting it to return 2, even if GB2312 is a double byte encoding. mb_strlen
returns the number of characters.
strlen('大')
would give you the number the bytes, because it's a naïve old-style functions which doesn't know anything about encodings and only counts bytes.
The bottom-line being: your expectation was wrong, and you have a mismatch between what the "大" is actually encoded in (whatever you saved your source code as) and what you tell mb_strlen
it is encoded in (gb2312
). Therefore mb_strlen
cannot do its job correctly and gives you varying random results.