Next: , Previous: , Up: Text   [Contents][Index]


5.1.10 Input Encodings

Recall from Groff Options, that the groff command’s -k option runs the preconv preprocessor to perform input character encoding conversions to satisfy GNU troff’s requirement of a single-byte encoding compatible with ISO 646:1991 IRV (US-ASCII).

Localization influences automatic hyphenation in two distinct but related respects. A macro file specific to a character coding identifies which character codes correspond to letters expected in the language’s hyphenation pattern files and sets up case equivalences for those letters. A language’s macro file determines which of these letters are equivalent to other letters for hyphenation purposes.

For example, in English, the letter ‘ñ’ occurs in loan words. The latin1.tmac and latin9.tmac macro files define a hyphenation code for ‘ñ’ and make ‘Ñ’ equivalent to it. The English localization file en.tmac furthermore makes ‘ñ’ equivalent to ‘n’. In Spanish (es.tmac), however, ‘ñ’ and ‘n’ are not equivalent. The language localization file (see Manipulating Hyphenation) loads an appropriate encoding localization file; a document need not do so directly.

koi8-r

To use KOI8-R, an encoding for the Russian language, either place ‘.mso koi8-r.tmac at the very beginning of your document or supply ‘-m koi8-r’ as a command-line argument to groff. The ru.tmac localization file loads koi8-r.tmac automatically.37

latin1

ISO Latin-1 is an encoding for Western European languages. The de.tmac, en.tmac, it.tmac, and sv.tmac localization files load latin1.tmac automatically.

latin2

To use ISO Latin-2, an encoding for Central and Eastern European languages, invoke ‘.mso latin2.tmac at the beginning of your document or supply ‘-m latin2’ as a command-line argument to groff. The cs.tmac and pl.tmac localization files load latin2.tmac automatically.

latin5

To use ISO Latin-5, an encoding for the Turkish language, invoke ‘.mso latin5.tmac at the beginning of your document or supply ‘-m latin5’ as a command-line argument to groff.

latin9

ISO Latin-9 succeeds Latin-1; it includes a Euro sign and better coverage for French. To use this encoding, invoke ‘.mso latin9.tmac at the beginning of your document or supply ‘-m latin9’ as a command-line argument to groff. The es.tmac and fr.tmac localization files load latin9.tmac automatically.

Some characters from an input encoding may not be available with a particular output driver, or their glyphs may not have representation in the font used. For terminal devices, fallbacks are defined, like ‘EUR’ for the Euro sign and ‘(C)’ for the copyright sign. For typesetter devices, you may need to “mount” fonts that support glyphs required by the document. See Font Positions.

Because a Euro glyph was not historically defined in PostScript fonts, groff comes with a font called freeeuro.pfa that provides the Euro in several styles. Standard PostScript fonts contain the glyphs from Latin-5 and Latin-9 that Latin-1 lacks, so these encodings are supported for the ps and pdf output devices as groff ships, while Latin-2 is not.

Unicode supports characters from all other input encodings; the utf8 output driver for terminals therefore does as well. The DVI output driver supports the Latin-2 and Latin-9 encodings if the command-line option ‘-m ec’ is used as well. 38


Next: , Previous: , Up: Text   [Contents][Index]