8.5 Character-Sets
A character-set is normally represented by a list (or table or chart) of characters along with the byte code assigned to each character. The codes for a byte range from 0 to 255 (00 to FF in hexadecimal). In MS-DOS, character-set tables are called "code-pages". You should examine such a table if you're not familiar with them. They are sometimes included in printer and terminal manuals but also are found on the Internet.
Many character sets include letters from foreign languages. But they may also include special characters used to draw boxes and other special characters.
ASCII was the traditional English character set used on text terminals It is a 7-bit code but will usually work OK even if your terminal is set to 8-bit mode. In 8-bit mode with ASCII, the high order bit is always set to zero. Other character-sets are usually available and usually use 8-bit codes (except on very old terminals where the only choice is ASCII). The first half of most character-sets are the conventional 128 ASCII characters and the second half (with the high-order bit set to 1) belong to a wide variety of character-sets. Character sets are often ISO standards. To get specialized character sets on a terminal, you may need to download a soft-font for that character-set into the memory of the terminal. Many terminals have a number of built-in character sets (but perhaps not the one you need).
Here are some common 8-bit character sets. CP stands for Code Page character sets invented by IBM: CP-437 (DOS ECS), ISO-8859-1 (Latin-1), CP-850 (Multilingual Latin 1 --not the same as ISO Latin-1), CP-1252 (WinLatin1 = MS-ANSI). MS Windows uses CP-1252 (WinLatin1) while the Internet often uses Latin-1. There are several ISO-8859- character sets in addition to Latin-1. These include Greek (-7), Arabic (-6), Eastern European (-2), and a replacement for Latin-1 (-15) called Latin-9. There are many others. For example, KOI8-R is more commonly used for Russian than IS0-8859-5. Unicode is a very large character-set where each character is represented by 2 bytes instead on just one byte.
More info re character-sets are:
- Manual pages: charsets, iso_8859-l or latin1 (covers 8859 series), ascii
- HOWTO's for various languages (often written in that language).
- ISO-8859 Alphabet Soup More than just iso8859. Extensive.
- A tutorial on character code issues Clearly written.
- Languages, Countries and Character Sets
- Languages of the World by Computers ...
- Links re Internationalization A long list of links (in Russian but most words in English).
- ... International Character Sets
Once you've found the character set name (or alpha-numeric designation) you are interested in, you may search for more info about it on the Internet.
Graphics (Line Drawing, etc.)
There are special characters for drawing boxes, etc. There are also numerous non-ASCII symbols such as bullets. These may either be part of an 8-bit character set (such as WinLatin1 = CP-1252) or provided as a separate font (in vt100 terminals). Your terminfo may be set up to use them. But if you see a row of letters when there should be a line, it may mean that terminfo hasn't implemented them.
You need to know the following if your graphics don't work right. The default graphic character set is the vt-100 ANSI graphics. Otherwise the string acsc must be defined in your terminfo. It establishes a map between the vt-100 graphic characters codes and the actual codes used on your terminal. So even if your terminal doesn't have the vt-100 graphics, it can likely still
* License
