- not supported by X, and because of this, there are not very many linux applications which support them.
- Unicode
- UTF-8
Before the advent of Unicode, each char was represented by a single byte, which let us have a range of 256 chars. The char for hex code 0xe2 in the Latin-1 charset maps to an "â" (circumflex "a"), while in the ISO-8859-7 (greek) charset it maps to the "β" (beta) letter. Unicode introduced multibyte characters with the objective of having each char of every culture and civilization on earth mapping to its unique multibyte hex code. So in our example "â" is 0x00e2 and "β" 0x03b2.
UTF-8 is a Unicode encoding that uses only one byte for the ASCII characters, two for the characters in the Latin-1 (ISO-8859-1) charset with values higher to 128, and tree or fourth bytes in other cases. An UTF-8 file that contain text in the english language is byte-identical to its Latin-1 and ASCII versions. If other characters are used in this same file, each of these characters will be multibyte, prefixed by some UTF-8 escaping bytes. Modern applications as OpenOffice.org produce UTF-8 documents. UTF-8 must be the charset of choice when you create plain text, HTML, etc. files. Modern Linux installations use UTF-8 for their environment in any country with any language and is currently the de facto standard for to represent text. A system adminstrator must have very good reasons to not use UTF-8.
Prev
Home
Next
References
Recompiling FreeType for BCI

