Users can have complete control over the way wide characters are handled by SICStus Prolog if they supply their own definitions of appropriate hook functions. A set of such functions, implementing a specific environment for handling wide characters is called a WCX box. A sample WCX box is described below (see A Sample WCX Box).
Plugging-in of the WCX hook functions can be performed by calling
void SP_set_wcx_hooks ( int usage, SP_WcxOpenHook *wcx_open, SP_WcxCloseHook *wcx_close, SP_WcxCharTypeHook *wcx_chartype, SP_WcxConvHook *wcx_from_os, SP_WcxConvHook *wcx_to_os);
The effect of SP_set_wcx_hooks()
is controlled by the value of
usage
. The remaining arguments are pointers to appropriate
hook functions or NULL
values, the latter implying that the
hook should take some default value.
There are three independent aspects to be controlled, and usage
should be supplied as a bitwise OR of chosen constant names for each
aspect. The defaults have value 0, so need not be included.
The aspects are the following:
This decides the default behavior of the wcx_open
and
wcx_chartype
hook functions (if both are supplied by the
user, the choice of the default is irrelevant). The possible values
are:
WCX_USE_LATIN1
(default)
WCX_USE_UTF8
WCX_USE_EUC
iso_8859_1
, utf8
, and euc
,
respectively; see WCX Environment Variables.
The flags below determine what function to use for conversion
from/to the operating system encoding, if such functions are not
supplied by the user through the wcx_from_os
and wcx_to_os
arguments (if both are supplied by the user, the choice of default
is irrelevant).
WCX_OS_8BIT
WCX_OS_UTF8
This is important if some of the conversion functions
(wcx_from_os
, wcx_to_os
, and wcx_getc
,
wcx_putc
, see later) are user-defined. In such cases it may
be beneficial for the user to inform SICStus Prolog whether the
supplied encoding functions preserve ASCII characters. (The default
encodings do preserve ASCII.)
WCX_PRESERVES_ASCII
WCX_CHANGES_ASCII
We now describe the role of the arguments following usage
in the argument list of SP_set_wcx_hooks()
.
SP_WcxOpenHook *wcx_open
typedef void (SP_WcxOpenHook)
(SP_stream *s, SP_atom option, int context);
This function is called by SICStus Prolog for each s
stream opened, except when the encoding to be used for the
stream is pre-specified (binary files, files opened using the
wci
option, and the C streams created with contexts
SP_STREAMHOOK_WCI
and SP_STREAMHOOK_BIN
).
The main task of the wcx_open
hook is to associate the
two WCX-processing functions with the stream, by storing them in
the appropriate fields of the SP_stream
data structure:
SP_WcxGetcHook *wcx_getc; SP_WcxPutcHook *wcx_putc;
These fields are pointers to the functions performing the external decoding and encoding as described below. They are initialized to functions that truncate to 8 bits on output and zero-extend to 31 bits on input.
SP_WcxGetcHook *wcx_getc
typedef int (SP_WcxGetcHook)
(int first_byte, SP_stream *s, long *pbyte_count);
This function is generally invoked whenever a character has to
be read from a stream. Before invoking this function, however, a
byte is read from the stream by SICStus Prolog itself. If the byte
read is an ASCII character (its value is < 128), and
WCX_PRESERVES_ASCII
is in force, then the byte read is deemed to
be the next character code, and wcx_getc
is not
invoked. Otherwise, wcx_getc
is invoked with the byte and
stream in question and is expected to return the next character
code.
The wcx_getc
function may need to read additional bytes
from the stream, if first byte
signifies the start of a
multi-byte character. A byte may be read from the stream s
in the following way:
byte = s->sgetc(s->user_handle);
The wcx_getc
function is expected to increment its
*pbyte_count
argument by 1 for each such byte read.
The default wcx_open
hook will install a
wcx_getc
function according to the usage
argument.
The three default external decoding functions are also available
to users through the SP_wcx_getc()
function (see WCX Utility Functions).
SP_WcxPutcHook *wcx_putc
typedef int (SP_WcxPutcHook)
(int char_code, SP_stream *s, long *pbyte_count);
This function is generally invoked whenever a character has to
be written to a stream. However, if the character code to be
written is an ASCII character (its value is < 128), and
WCX_PRESERVES_ASCII
is in force, then the code is written
directly on the stream, and wcx_putc
is not
invoked. Otherwise, wcx_putc
is invoked with the character
code and stream in question and is expected to do whatever is
needed to output the character code to the stream.
This will require outputting one or more bytes to the
stream. A byte byte
can be written to the stream
s
in the following way:
return_code = s->sputc(byte,s->user_handle);
The wcx_putc
function is expected to return the return
value of the last invocation of s->sputc
, or -1 as an error code,
if incapable of outputting the character code. The latter may be
the case, for example, if the code to be output does not belong to the
character code set in force. It is also expected to increment its
*pbyte_count
argument by 1 for each byte written.
The default wcx_open
hook function will install a
wcx_putc
function according to the usage
argument. The three default external encoding functions are
also available to users through the SP_wcx_putc()
function
(see WCX Utility Functions).
In making a decision regarding the selection of these WCX-processing
functions, the context
and option
arguments of the
wcx_open
hook can be used. The option
argument is an atom. The context
argument
encodes the context of invocation. It is one of the following values
SP_STREAMHOOK_STDIN
SP_STREAMHOOK_STDOUT
SP_STREAMHOOK_STDERR
SP_STREAMHOOK_OPEN
open/[3,4]
SP_STREAMHOOK_NULL
open_null_stream/1
SP_STREAMHOOK_LIB
SP_STREAMHOOK_C, SP_STREAMHOOK_C+1, ...
SP_make_stream()
The option
argument comes from the user and it can
carry some WCX-related information to be associated with the
stream opened. For example, this can be used to implement a scheme
supporting multiple encodings, supplied on a stream-by-stream
basis, as shown in the sample WCX-box (see A Sample WCX Box).
If the stream is opened from Prolog code, the option
argument for this hook function is derived from the
wcx(Option)
option of open/4
and
load_files/2
. If this option is not present, or the stream
is opened using some other built-in predicate, then the value of
the wcx
Prolog flag will be passed on to the open hook.
If the stream is opened from C, via SP_make_stream()
,
then the option argument will be the value of the Prolog flag
wcx
.
There is also a variant of SP_make_stream()
, called
SP_make_stream_context()
, which takes two additional
arguments, the option and the context, to be passed on to the
wcx_open
hook (see WCX Foreign Interface).
The wcx_open
hook can associate the information derived from
option
with the stream in question using a new field in the
SP_stream
data structure: void *wcx_info
, initialized
to NULL
. If there is more information than can be stored in
this field, or if the encoding to be implemented requires keeping
track of a state, then the wcx_open
hook should allocate
sufficient amount of memory for storing the information and/or the
state, using SP_malloc()
, and deposit a pointer to that piece
of memory in wcx_info
.
The default wcx_open
hook function sets the
wcx_getc
and wcx_putc
stream fields to functions
performing the external decoding and encoding according to
option
. Permitted values for option
are the same as for
the SP_CTYPE
environment variable; see WCX Environment Variables. If the option
argument is not supported then
the usage
argument of SP_set_wcx_hooks()
will be
used instead.
Note that, if option
or usage
is euc
then there
will be no attempt to translate between UNICODE code points and EUC
code points. For this reason it is probably not meaningful to mix
EUC with any of the other supported encodings. You should not rely
on this behavior, future versions of SICStus may do a proper
translation of EUC to and from UNICODE.
As an example, if SP_CTYPE
is utf8
you can load an
ISO 8859/1 encoded prolog file using load_files('file.pl',
[wcx(iso_8859_1)]).
SP_WcxCloseHook *wcx_close
typedef void (SP_WcxCloseHook) (SP_stream *s);
This hook function is called whenever a stream is
closed, for which the wcx_open
hook was invoked at its
creation. The argument s
points to the stream being
closed. It can be used to implement the closing activities related to
external encoding, e.g. freeing any memory allocated in
wcx_open
hook.
The default wcx_close
hook function does nothing.
SP_WcxCharTypeHook *wcx_chartype
typedef int (SP_WcxCharTypeHook) (int char_code);
This function should be prepared to take any char_code
>= 128
and return one of the following constants:
CHT_LAYOUT_CHAR
CHT_SMALL_LETTER
CHT_CAPITAL_LETTER
CHT_SYMBOL_CHAR
CHT_SOLO_CHAR
Regarding the meaning of these syntactic categories, see Token String.
The value returned by this function is not expected to change over
time, therefore, for efficiency reasons, its behavior is cached.
The cache is cleared by SP_set_wcx_hooks()
.
As a help in implementing this function, SICStus Prolog provides the
function SP_latin1_chartype()
, which returns the character type
category for the codes 1..255 according to the ISO 8859/1
standard.
Note that if a character code >= 512 is categorized as a
layout-char, and a character with this code occurs within an
atom being written out in quoted form (e.g. using writeq
)
in native sicstus
mode (as opposed to iso
mode), then this
code will be output as itself, rather than an octal escape
sequence. This is because in sicstus
mode escape
sequences consist of at most 3 octal digits.
SP_WcxConvHook *wcx_to_os
typedef char* (SP_WcxConvHook) (char *string, int context);
This function is normally called each time SICStus Prolog wishes to
communicate a string of possibly wide characters to the operating
system. However, if the string in question consists of ASCII
characters only, and WCX_PRESERVES_ASCII
is in force, then
wcx_to_os
may not be called, and the original string may be
passed to the operating system.
The first argument of wcx_to_os
is a zero terminated
string, using the internal encoding of SICStus Prolog, namely
UTF-8. The function is expected to convert the string to a form required
by the operating system, in the context described by the second,
context
argument, and to return the converted string. If no
conversion is needed, it should simply return its first argument.
Otherwise, the conversion should be done in a memory area controlled by
this function (preferably a static buffer, reused each time the function
is called).
The second argument specifies the context of conversion. It can be one of the following integer values:
WCX_FILE
WCX_OPTION
WCX_WINDOW_TITLE
WCX_C_CODE
SICStus Prolog provides a utility function SP_wci_code()
, see
below, for obtaining a wide character code from a UTF-8
encoded string, which can be used to implement the
wcx_to_os
hook function.
The default of the wcx_to_os
function depends on the
usage
argument of SP_set_wcx_hooks()
. If the value
of usage
includes WCX_OS_UTF8
, then the function does no
conversion, as the operating system uses the same encoding as SICStus
Prolog. If the value of usage
includes WCX_OS_8BIT
, then
the function decodes the UTF-8 encoded string and converts this
sequence of codes into a sequence of bytes by truncating each code to 8
bits.
Note that the default wcx_to_os
functions ignore their
context
argument.
SP_WcxConvHook *wcx_from_os
typedef char* (SP_WcxConvHook) (char *string, int context);
This function is called each time SICStus Prolog receives from the
operating system a zero terminated sequence of bytes possibly encoding a
wide character string. The function is expected to convert the byte
sequence, if needed, to a string in the internal encoding of
SICStus Prolog (UTF-8), and return the converted string. The conversion
should be done in a memory area controlled by this function (preferably
a static buffer, reused each time the function is called, but different
from the buffer used in wcx_to_os
).
The second argument specifies the context of conversion, as in
the case of wcx_to_os
.
SICStus Prolog provides a utility function SP_code_wci()
, see
below, for converting a character code (up to 31 bits) into
UTF-8 encoding, which can be used to implement the
wcx_from_os
hook function.
The default of the wcx_from_os
function depends on the
usage
argument of SP_set_wcx_hooks()
. If the value
of usage
includes WCX_OS_UTF8
, then the function does no
conversion. If the value of usage
includes WCX_OS_8BIT
,
then the function transforms the string of 8-bit codes into a UTF-8
encoded string.
Note that the default wcx_from_os
functions ignore their
context
argument.