3.8 WCX - SICStus Prolog Release Notes

10.7.1 Wide Character Support

Wide character handling is introduced, with the following highlights:

character code sets up to 31 bit wide;
three built-in wide character modes (ISO_8859_1, UTF8, EUC), selectable via environment flags;
complete control over the external encoding via hook functions.

For programs using the default ISO_8859_1 character set, the introduction of wide characters is transparent, except for the string format change in the foreign language interface; see below.

In programs using the EUC character set, the multibyte EUC characters are now input as a single, up to 23 bit wide, character code. This character code can be easily decomposed into its constituent bytes, if needed. The encoding function is described in detail in the SICStus manual.

To support wide characters, the foreign language interface now uses UTF-8 encoding for strings containing non-ASCII characters (codes >= 128). This affects programs with strings that contain e.g. accented characters and that transfer such strings between Prolog and C. If such a string is created on the C side, it should be converted to UTF-8, before passing it to Prolog. Similarly for a string passed from Prolog to C, if it is to be decomposed into characters on the C side, the inverse transformation has to be applied.

Utility functions SP_code_wci() and SP_wci_code() are provided to support the conversion of strings between the WCI (Wide Character Internal encoding, i.e. UTF-8) format and wide character codes.