The ISO-8859-1 encoding is, by definition, the same as the first 256 codepoints of the Unicode table. So a simple numeric cast is enough. Note, however that Unicode codepoints need at least 32 bits (actually just 21 bits, but... uint21_t
does not usually exist):
char ch_iso88591 = 'a';
uint32_t ch_unicode = (uint32_t)(unsigned char)ch_iso88591;
And as you correctly noted in your question, you have to cast it to unsigned char
because of the posibility char
being signed.
If the original charset would be anything other than ISO-8859-1 (or ASCII, of course) you'd need to use a table. For example, the Windows-1252 is usually confused with ISO-8859-1, but they are somewhat different (see your € example). If you have Windows-1252 then you do need a table. This table is actually quite simple to build, you can copy the values yourself from the Wikipedia article (only the values from 0x80 to 0xFF) are needed, because the 0x00-0x7F range is exactly the same).