Next: ref-iou-sfh-cis, Previous: ref-iou-sfh-opn, Up: ref-iou-sfh [Contents][Index]
SICStus Prolog supports character codes up to 31 bits wide where the codes are interpreted as for Unicode for the common subset.
When a character code (a “code point” in Unicode terminology) is read or written to a stream, it must be encoded into a byte sequence. The method by which each character code is encoded to or decoded from a byte sequence is called “character encoding”.
The following character encodings are currently supported by SICStus Prolog.
ANSI_X3.4-1968The 7-bit subset of Unicode, commonly referred to as ASCII.
ISO-8859-1The 8-bit subset of Unicode, commonly referred to as Latin 1.
ISO-8859-2A variant of ISO-8859-1, commonly referred to as Latin 2.
ISO-8859-15A variant of ISO-8859-1, commonly referred to as Latin 9.
windows 1252The Microsoft Windows code page 1252.
UTF-8UTF-16UTF-16LEUTF-16BEUTF-32UTF-32LEUTF-32BEThe suffixes LE and BE denote respectively little endian
and big endian.
These encodings can be auto-detected if a Unicode signature is present in a file opened for read. A Unicode signature is also known as a Byte order mark (BOM).
In addition, it is possible to use all alternative names defined by the IANA registry http://www.iana.org/assignments/character-sets.
All encodings in the table above, except the UTF-XXX encodings, supports
the reposition(true) option to open/4
(see mpg-ref-open).
The encoding to use can be specified when using open/4 and
similar predicates using the option encoding/1. When opening a
file for input, the encoding can often be determined automatically. The
default is ISO-8859-1 if no encoding is specified and no encoding
can be detected from the file contents.
The encoding used by a text stream can be queried using
stream_property/2.
See mpg-ref-open for details on how character encoding is auto-detected when opening text files.