International Chars - GNU Emacs Manual

Next: Enabling Multibyte, Up: International

27.1 Introduction to International Character Sets

The users of international character sets and scripts have established many more-or-less standard coding systems for storing files. Emacs internally uses a single multibyte character encoding, so that it can intermix characters from all these scripts in a single buffer or string. This encoding represents each non-ASCII character as a sequence of bytes in the range 0200 through 0377. Emacs translates between the multibyte character encoding and various other coding systems when reading and writing files, when exchanging data with subprocesses, and (in some cases) in the C-q command (see Multibyte Conversion).

The command C-h h (view-hello-file) displays the file etc/HELLO, which shows how to say “hello” in many languages. This illustrates various scripts. If some characters can't be displayed on your terminal, they appear as ‘?’ or as hollow boxes (see Undisplayable Characters).

Keyboards, even in the countries where these character sets are used, generally don't have keys for all the characters in them. So Emacs supports various input methods, typically one for each script or language, to make it convenient to type them.

The prefix key C-x <RET> is used for commands that pertain to multibyte characters, coding systems, and input methods.