4.6.7.5 Text Stream Encodings

SICStus Prolog supports character codes up to 31 bits wide where the codes are interpreted as for Unicode for the common subset.

When a character code (a “code point” in Unicode terminology) is read or written to a stream it must be endoded into a byte sequence. The method by which each character code is encoded to or decoded from a byte sequence is called “character encoding”.

The following character encodings are currently supported by SICStus Prolog.

ANSI_X3.4-1968
The 7-bit subset of Unicode, commonly referred to as ASCII.
ISO-8859-1
The 8-bit subset of Unicode, commonly referred to as Latin 1.
ISO-8859-15
A variant of ISO-8859-1, commonly referred to as Latin 9.
windows 1252
The Microsoft Windows code page 1252.
UTF-8
UTF-16
UTF-16LE
UTF-16BE
UTF-32
UTF-32LE
UTF-32BE
The suffix LE and BE denotes respectiviely little endian and big endian.

These encodings can be auto-detected if a Unicode signature is present in a file opened for read. A Unicode signature is also known as a Byte order mark (BOM)

in addition it is possible to use all alternative names defined by the IANA registry http://www.iana.org/assignments/character-sets/.

The encoding to use can be specified when using open/4 and similar predicates using the option encoding/1. When openening a file for input the encoding can often be determined automatically. The default is ISO-8859-1 if no encoding is specified and no encoding can be detected from the file contents.

The encoding used by a text stream can be queried using stream_property/2.


Send feedback on this subject.