Node:Prolog Level WCX Features, Next:, Previous:WCX Concepts, Up:Handling Wide Characters



Summary of Prolog level WCX features

SICStus Prolog has a Prolog flag, called wcx, whose value can be an arbitrary atom, and which is initialized to []. This flag is used at opening a stream, its value is normally passed to a user-defined hook function. This can be used to pass some information from Prolog to the hook function. In the example of A Sample WCX Box, which supports the selection of external encodings on a stream-by-stream basis, the value of the wcx flag is used to specify the encoding to be used for the newly opened stream.

The value of the wcx flag can be overridden by supplying a wcx(Value) option to open/4 and load_files/2. If such an option is present, then the Value is passed on to the hook function.

The wcx flag has a reserved value. The value wci (wide character internal encoding) signifies that the stream should use the SICStus Prolog internal encoding (UTF-8), bypassing the hook functions supplied by the user. This is appropriate, e.g. if a file with wide characters is to be produced, which has to be readable irrespective of the (possibly user supplied) encoding scheme.

Wide characters generally require several bytes to be input or output. Therefore, for each stream, SICStus Prolog keeps track of the number of bytes input or output, in addition to the number of (wide) characters. Accordingly there is a built-in predicate byte_count(+Stream,?N) for accessing the number of bytes read/written on a stream.

Note that the predicate character_count/2 returns the number of characters read or written, which may be less than the number of bytes, if some of the characters are multibyte. (On output streams the byte_count/2 can also be less than the character_count/2, if some codes, not belonging to the code-set handled, are not written out.)

Note that if a stream is opened as a binary stream:

open(..., ..., ..., [type(binary)])

then no wide character handling will take place; every character output will produce a single byte on the stream, and every byte input will be considered a separate character.