Text Stream Encodings

SICStus Prolog supports character codes up to 31 bits wide where the codes are interpreted as for Unicode for the common subset.

When a character code (a “code point” in Unicode terminology) is read or written to a stream it must be encoded into a byte sequence. The method by which each character code is encoded to or decoded from a byte sequence is called “character encoding”.

The following character encodings are currently supported by SICStus Prolog.

The 7-bit subset of Unicode, commonly referred to as ASCII.
The 8-bit subset of Unicode, commonly referred to as Latin 1.
A variant of ISO-8859-1, commonly referred to as Latin 9.
windows 1252
The Microsoft Windows code page 1252.
The suffix LE and BE denotes respectiviely little endian and big endian.

These encodings can be auto-detected if a Unicode signature is present in a file opened for read. A Unicode signature is also known as a Byte order mark (BOM)

in addition it is possible to use all alternative names defined by the IANA registry http://www.iana.org/assignments/character-sets.

The encoding to use can be specified when using open/4 and similar predicates using the option encoding/1. When openening a file for input the encoding can often be determined automatically. The default is ISO-8859-1 if no encoding is specified and no encoding can be detected from the file contents.

The encoding used by a text stream can be queried using stream_property/2.

Send feedback on this subject.